Computer Vision News - June 2024

Computer Vision News Computer Vision News 2 ICLR Outstanding Paper Sherry Yang is a Research Scientist at Google DeepMind and recently graduated from UC Berkeley. She talks to us about her work on learning a realistic interactive simulator through video generation, which earned a super selective oral presentation slot at ICLR 2024 and scooped a coveted Outstanding Paper Award. A huge amount of video data is available on the internet showing humans performing activities, from cooking to assembling furniture. Typically, this content is consumed passively, but Sherry proposes to train a model that can absorb these videos and allow users to control the action with language instructions. “Starting from a particular frame – for example, me facing my laptop with my hands in the air – I can give an instruction saying, ‘Touch the screen of the laptop,’ and then generate the video of my hand reaching out and touching the screen of the laptop,” she explains. “This is simulated because a user is not Learning Interactive Real-World Simulators Outstanding Paper Winner!

RkJQdWJsaXNoZXIy NTc3NzU=