A Self-boosting Framework for Automated Radiographic Report Generation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Automated radiographic report generation is a challenging task since it requires to generate paragraphs describing fine-grained visual differences of cases, especially for those between the diseased and the healthy. Existing image captioning methods commonly target at generic images, and lack mechanism to meet this requirement. To bridge this gap, in this paper, we propose a self-boosting framework that improves radiographic report generation based on the cooperation of the main task of report generation and an auxiliary task of image-text matching. The two tasks are built as the two branches of a network model and influence each other in a cooperative way. On one hand, the image-text matching branch helps to learn highly text-correlated visual features for the report generation branch to output high quality reports. On the other hand, the improved reports produced by the report generation branch provide additional harder samples for the image-text matching branch and enforce the latter to improve itself by learning better visual and text feature representations. This, in turn, helps improve the report generation branch again. These two branches are jointly trained to help improve each other iteratively and progressively, so that the whole model is self-boosted without requiring external resources. Experimental results demonstrate the effectiveness of our method on two public datasets, showing its superior performance over multiple state-of-the-art image captioning and medical report generation methods.
Open Access Status
This publication is not available as open access
National Natural Science Foundation of China