CVPR Daily - Thursday

EPIC KITCHENS is a large-scale egocentric data set where videos observe what people do in their kitchens. This paper explores the way humans perceive such footage by watching it online from beginning to end , rather than offline, where they can take in the data all at once. An observer understands the goal, then notice these switches, which the team calls new threads, and watches people resuming previously paused threads or activities. Dima reveals the presence of this data set, which they could see needed a new model, was the real inspiration behind the work . The team had been working with egocentric data for a while, and the footage stimulated several research questions that did not exist before due to the reliance on short clips for most video data sets. “ Once we formed this problem, we were looking for a model that could learn and perceive this type of footage, ” Dima recalls. “ UnweaveNet is capable of having a memory, which we call a thread bank, that explains the video up to the current point. You have an input clip, a short clip of the actual moment in time, and then a controller answers what to do with it. Is it a new thread? Is it continuing an ongoing thread or resuming a previously paused thread? Based on the controller’s decision, the thread bank gets updated. ” The team built on its knowledge of transformers in modeling the controller. It has embeddings and the ability to jointly understand the relationship between the 9 DAILY CVPR Thursday Dima Damen and Team

RkJQdWJsaXNoZXIy NTc3NzU=