17 DAILY ICCV Thursday LIMITR Firstly, chest X-rays have a unique structure influenced by the layout of the human body. Secondly, the pathologies within medical images are typically confined to small regions, unlike natural images, where differences are often pronounced and readily discernible. Furthermore, the medical reports accompanying these images tend to focus heavily on normal observations, with relatively brief pathology descriptions. Thirdly, the studies used in this work typically contain a medical report and one or more images. “One of the images is always the frontal image, and sometimes, in the studies, they provide an additional view, which is the lateral view,” Gefen tells us. “Most of the works that were done before us ignored the lateral view, even though it is mentioned in some of the reports, and the information it contains is helpful for the radiologist to understand the pathology and the conditions of the subject in that examination.” The team’s primary goal was to create a shared space where text and image data could coexist. They envisioned a system where an image representing a specific pathology would align closely with text describing the same condition. This alignment could be leveraged in various applications, including retrieval, where the text can be used to retrieve images that closely match the description. They introduced the concept of generating attention maps linking phrases to corresponding areas in the image. This technique, known as phrase grounding, demonstrates the quality of the local alignment. “The end goal is to help new radiologists and those who are not yet experts and give them the option to retrieve similar studies,” Elad points out. “Say they have an image and are not sure what happens in that image. They can easily retrieve and get similar reports and then see if they missed something or can learn from it. That’s why we need this representation space that captures these similarities.”
RkJQdWJsaXNoZXIy NTc3NzU=