CVPR Daily - Sunday

4 DAILY CVPR Sunday Oral & Award Candidate In this paper, Yiqing introduces a novel generalizable foundation model for estimating both geometry and motion in dynamic scenes using just a pair of image frames as input. This task, known as scene flow estimation, has long been a challenge in computer vision and is crucial for applications such as robotics, augmented reality, and autonomous driving, where understanding 3D motion is essential. Yiqing likens the task to a firstperson video game: “Your head is always in the center,” she explains. “You see the wall move, people walk around, and objects change shape. Our model can predict the geometry and motion of all of it!” The timing of this work was key. Monocular scene flow was proposed about five years ago, but it hit a wall: there was not enough compute, data, or pretrained weights to make it work. Now, that has all changed. “We benefited from advancements in 3D over the last year,” she reveals. “People found that if you scale up training for 3D geometry prediction, you can get feed-forward methods that predict 3D geometry from 2D information. We go one step further than that, and ask: can we also add motion?” The answer, it turns out, is yes, but the biggest challenge was not the model architecture – it was the data. “I’m probably giving an answer that my fellow researchers working in the field are very familiar with!” she laughs. “The coding part is fairly easy, but having enough data to formulate the problem properly takes Yiqing Liang is a PhD student in Computer Science at Brown University. Her recent paper on scene flow estimation, developed during a summer internship at NVIDIA Research, has been accepted for an oral presentation at CVPR 2025 and nominated for a coveted Best Paper award. Ahead of her presentation this morning, Yiqing tells us more about her fascinating work. Zero-Shot Monocular Scene Flow Estimation in the Wild

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=