6 DAILY CVPR Friday Best Paper Award Winner representation to produce a more consistent result more efficiently. I think that’s the main difference compared with other video generation methods you might see.” The novelty of this approach has not gone unnoticed, with the work being picked as a top-rated paper at this year’s CVPR, given a coveted oral presentation slot, and recognized as one of only 24 papers in line for a best paper award. If we were placing bets on the winners, this work, with its stellar team of authors, would be our hot tip. What does Zhengqi believe are the magic ingredients that have afforded it such honors? “There are a few thousand papers on video generation dynamics, and they all have similar ideas,” he responds. “They predict the raw pixel, and we’re going in a completely different direction predicting the underlying motion. That’s something the research community appreciates because it’s unique. I guess they believe this might be an interesting future research direction for people to explore because, for generative AI, people are more focused on how you can scale those big models trained on 10 billion data while we’re trying to use a different representation that we can train more efficiently to get even better results. That’s a completely different angle, and the award community might like those very different, unique, special angles.” However, the road to this point was not without its challenges. Collecting sufficient data to train the model was a significant hurdle the team had to overcome. They searched the Internet and internal Google video resources and even captured their “If you don’t have data, you can’t train your model to get good results!”
RkJQdWJsaXNoZXIy NTc3NzU=