CVPR Daily - Thursday
The team found it difficult to take inspiration from other video understanding models, which did not help them understand the notion of activities in this data set. In the end, they looked to language models . That is apparent in the type of teacher-forcing training used, and the types of losses relied upon, which in language models try to, in an online fashion, understand long texts that interleave stories. The notion of weaving stories was inspired by how people write novels and bring different characters and narrative threads into play. Dima tells us the team is keen to see what others can do with this model. They have ideas for new directions, including understanding other types of goals from different footage. “ Kitchen footage tends to be fairly exciting because we go to kitchens with a goal in mind, ” she points out. “ You don’t stick around the kitchen not doing anything. But this type of footage, which has formed most of the data sets, is not what most of life is. Most of life is monotonous and boring! We’re trying to shift our focus into how UnweaveNet can understand the slower, more repetitive aspects of daily life. ” The team has been part of an extensive collaboration that has collected the Ego4D data set , also being presented at CVPR this year, featuring a wealth of egocentric footage recorded by participants worldwide . “ I’m hoping more and more we will have data-inspired research as opposed to benchmark and numbers-inspired research, ” Dima says. “ The community has moved in the direction of getting better numbers, and at times this comes at the risk of not solving the more critical problems that exist purely by watching footage. ” To learn more about Dima and the team’s work [ID 1811], come along to poster session 3.2 today at 14:30. 11 DAILY CVPR Thursday Dima Damen and Team
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=