When asked about the origin of their work, Gefen, who is the first author, shared that her background in biomedical engineering provided a starting point. Then, her collaboration with Elad, whose prior work related to image and text, led them in this innovative direction. However, given their engineering backgrounds, they both acknowledge the significant challenge they faced in becoming familiar with the medical field. “We’re engineers, not doctors,” Elad affirms. “We didn’t know how to read chest X-rays, but we had to do that. It’s very important to verify and find the failures of the system. We had to observe the data and try to understand what was right and wrong. In natural images, it’s easy. You see that there’s a dog, and if the system says it’s a cat, you try to understand what caused the failure. If there’s a lung lesion or opacity, and you don’t know either of them, it’s much more difficult.” After overcoming this initial hurdle, they devoted their efforts to understanding the unique properties and challenges of the field and tailoring their solution accordingly. This solution revolved around several key principles: handling multiple images, leveraging a known structure, and weighted learning. They developed a flexible solution that accommodated cases with additional lateral images and cases with only one frontal view. In doing this, they recognized that a radiologist would naturally examine lateral images if available but wouldn’t if not, so it was essential to mimic that behavior. Their model learned to match words in the report to relevant images or regions within the images. 18 DAILY ICCV Thursday Poster Presentation
RkJQdWJsaXNoZXIy NTc3NzU=