9 Computer Vision News Computer Vision News ORacle In this paper, Chantal and Ege use external cameras to capture the entire surgical process and comprehensively understand what happens in the operating room (OR) second by second. “We use scene graphs, where nodes represent people or objects in the OR and the relations between them,” Chantal explains. “If you imagine the head surgeon is doing something – drilling in the patient’s knee, for example – then this is one relation in our scene graph.” Summarizing everything happening in a scene in a structured representation has been the focus of the team’s research for the past few years. Specifically, this paper aims to enable knowledge guidance and a scalable, adaptable approach to the real world. “What we mean with adaptability is every OR looks different,” Chantal continues. “In every hospital in every country, you have different ORs with different people and tools. It’s not feasible to train a new model every time. Therefore, we want adaptability during test time, where we can tell our model, ‘Today, this tool looks like this,’ or, ‘This action looks like this.’” The team had the idea that large language models (LLMs) could offer an advantage here by leveraging their knowledge of the world. “We wanted to see if they could help us understand the OR better,” Ege tells us. Chantal Pellegrini and Ege Özsoy are PhD students in Nassir Navab’s lab at the Technical University of Munich. They speak to us after winning the Best Paper Runner-Up Award at MICCAI 2024 for their work on achieving a holistic, adaptable, realtime understanding of surgical environments. ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling
RkJQdWJsaXNoZXIy NTc3NzU=