ICCV Daily 2023 - Thursday

19 DAILY ICCV Thursday LIMITR “The second thing we did was leverage the known structure,” Gefen adds. “We know that in the images, there’s a structure. The heart is always in the middle of the image. We use positional encoding to leverage this known structure in the entire dataset. Thirdly, we know that the interesting differences between the images and the reports lay in small areas of the image or a few words in the reports. Our model learns to weigh each of the words in the report or the regions in the images and gives more weight to areas that represent those pathologies or abnormalities.” Outside of this work, the pair think their method could be helpful in other fields and datasets. “One of our colleagues worked on archaeology, and he had similar challenges with small datasets,” Gefen continues. “You can’t always use the big models to solve your problems.” Elad agrees: “If you have a small dataset without an ability to segment the regions and match image regions to text, but you know that there’s a connection between the words and the image, you can utilize or easily adapt our method.” While their immediate plans do not involve a direct continuation of this work, Gefen and Elad are sticking with the field but exploring new horizons. One promising direction is the generation of reports from images, a task similar to this one. They are also investigating the phrase-grounding task, which visually connects words or phrases in a report to specific regions in an image and can potentially benefit both trainee and expert radiologists. We were intrigued to know what led to Gefen and Elad’s decision to submit their work to ICCV rather than MICCAI. Although it is easy to see why their backgrounds in computer vision led them to ICCV,

RkJQdWJsaXNoZXIy NTc3NzU=