MICCAI 2021 Daily – Wednesday

“ This is a high-level representation that captures what the data sample is really about. For example, in the visual part, we have the image. I represented the image by using visual features from the CNN, but also, I used a special visual token which encapsulates the information from all other local visual features by using the self-attention. I applied the same idea to the language part. Every word has its own embedding. I used a special language token to learn the holistic representation of each sequence of embeddings in the report. Then the alignment of these different multimodal features is basically the alignment of the holistic characteristics of the data. ” This model is framed in a probabilistic manner and based on a variational inference paradigm . Probabilistic modeling has a strong mathematical foundation and is more flexible, so it offers a way to model the uncertainty and ambiguities present in the report generation process. Other models are usually based on deterministic encoder-decoder models , which produce reports that are near copies of the ground truth. They struggle to provide new sentences or new combinations of words. By using latent topics, the probabilistic model can offer a few variations of reports per image with slightly novel sentences. This better reflects what radiologists, who all have different experience and expertise, would produce. The work uses two datasets: the Indiana Open-I dataset from Indiana University and the MIMIC-CXR dataset , which is the largest dataset of chest X-ray and report pairs. It includes the whole report that describes the diagnosis of the patient. As long as the training data is available, there is no reason why the method could not be applied to other medical datasets. 6 DAILY MICCAI Wednesday Oral Presentation

RkJQdWJsaXNoZXIy NTc3NzU=