CVPR Daily - Sunday

that it’s a more abstract version of the information compared to the raw camera. For example, RGB-D point clouds. It's possible to abstract our output like that.” Looking at the bigger picture, Yiqing is curious about how this research could intersect with multimodal large language models. “For LLMs, the multimodal side still has a lot to explore,” she points out. “People are interested in how to encode visual information more efficiently, and how to let it interact more with textual information.” More than anything, what excites Yiqing most is this model’s generalizability. “It’s really cool how general it is!” she says with a smile. “We’ve tested it on out-of-domain datasets – real-world, high-motion scenes – and it still works!” To learn more about Yiqing’s work, visit Oral Session 5C: Visual and Spatial Computing (Davidson Ballroom) this morning from 09:00 to 10:15 [Oral 4] and Poster Session 5 (ExHall D) from 10:30 to 12:30 [Poster 165]. 7 DAILY CVPR Sunday Yiqing Liang

RkJQdWJsaXNoZXIy NTc3NzU=