CVPR Daily - Thursday

6 DAILY CVPR Thursday The model begins using a feature network to process each detected feature track separately. This network consists of existing common, fully convolutional layers, a correlation layer, and a recurrent layer . The correlation layer allows for correlating a grayscale image patch with event sequences, while the recurrent layer utilizes temporal information in the events, which inherently contain motion. “ Then we add a frame attention module on top of the feature network, ” Nico explains. “ We have feature networks for each track in one frame, and then we want to share the global information inside one frame, so you have multiple tracks in one frame. The frame attention module relies on a multi- head attention layer to combine the information from the different tracks in one frame and then output the final displacement vector. ” Best Paper Award Candidate