11 Computer Vision News Computer Vision News following a surgical checklist reduces the likelihood of errors and adverse outcomes. However, overwhelmed clinical staff cannot always have the capacity to keep track of these steps during a procedure, which is where an automatic system could step in to enhance patient safety. Looking further into the future, Ege envisions a time when robots might assist with, or even replace, specific roles in the OR, such as the circulating nurse. “It’s not enough to only replace their dexterity or spatial awareness,” he says. “You need to replace contextual awareness – the entire brain of that person!” This level of understanding would require AI systems to perceive the surgery much deeper, recognizing and interpreting all the interactions between people, tools, and tasks. Scene graphs representing these relationships are vital for achieving such a complex system. One challenge the team faced was making scene graphs and large language models (LLMs) more compatible. To address this, they devised a method to represent scene graphs as sequences, encoding them as a list of triplets in the form of <subject, object, predicate>. “It becomes semantically the same thing,” Ege explains. Initially, they considered allowing the LLM to work with humanreadable names, such as <head surgeon, patient, suturing>, but found that this led to overfitting during training. The model would focus on familiar terms rather than adapting to new situations. To overcome this, they introduced a symbolic representation. Instead of using terms like ‘head surgeon’ or ‘patient,’ they encoded everything with symbols (e.g., A, B, C, or Alpha, Beta, Gamma) and provided corresponding descriptions in the prompts. This change forced the model to read or analyze the descriptions instead of overfitting to labels. “We switch the meaning of the symbols every sample,” Chantal reveals. “In one sample, Alpha is ‘drilling’; in the next, it’s Beta, Gamma, etc.” By randomizing the symbols, the model has to interpret the descriptions rather than simply remembering class names. This approach allows the system to handle unseen actions or tools on the “You need to replace contextual awareness – the entire brain of that person!” ORacle
RkJQdWJsaXNoZXIy NTc3NzU=