Computer Vision News Computer Vision News 10 “The first experiment we did was to find out if they understood things out of the box. Could we upload pictures from the OR and have them tell us what we’re talking about?” However, the results were not very successful. Rather than producing valuable insights, the model would often generate nonsensical outputs, such as claiming that a surgeon was performing an action on a piece of furniture instead of a patient. The team realized a specialized network needed to be built on top of these powerful models. “The knowledge integration we do relies on being able to tell the model what it should look for,” Chantal points out. “For this, we still believe these large pretrained models are very beneficial.” The work features several key innovations, including an image pooler that compresses information from multiple views in the OR into a single, fixed-size representation using a transformer network, ensuring the model receives a comprehensive visual summary of the scene. A question that often arises for the team is why scene graphs are so critical to their research. “From the simplest end, OR light makers come to us and say, ‘We’d like to automatically adjust the lights function to what’s going on in the surgery,’” Ege reveals. “Ideally, in the future, a digital OR would have something like an API that you can query and get information about this.” Another practical use case is more directly tied to improving surgical outcomes. The World Health Organization has shown that MICCAI 2024 Best Paper Runner-Up
RkJQdWJsaXNoZXIy NTc3NzU=