Computer Vision News Computer Vision News 12 fly by adding and defining new symbols without retraining. “It’s trivial to add a 13th letter, define what it is, and suddenly you can predict something that you’ve never seen during training,” Ege notes. “For us, the entire intuition behind using LLMs was never just to be state of the art but to allow this adaptability during inference time, and these textual and visual descriptors or prompts were the key to making that work!” To further enhance adaptability, they needed to address a limitation in their dataset, which only included recordings of 10 simulated surgeries. “You often saw the same things,” Chantal recalls. “For our model to learn how to use these descriptions and link them to objects in the scene, we needed to increase the variability in our data.” To solve this, they used stable diffusion to generate synthetic images of surgical tools in varying colors, shapes, and sizes. This allowed the model to practice identifying new objects by linking detailed textual descriptions to these synthetic images in the scene. With all this innovation and the work’s several major contributions to surgical data science, it is no surprise that it was selected as Best Paper Runner-Up at MICCAI 2024. Chantal and Ege both express their gratitude and surprise at the honor. “We knew we were on the shortlist, but of course, you never expect something like this,” Ege says. What MICCAI 2024 Best Paper Runner-Up
RkJQdWJsaXNoZXIy NTc3NzU=