CVPR Daily - Tuesday

Andrea says they have considered the excitation backprop, a CNN-based top down saliency framework devised for image understanding, and they have extended the approach for video understanding considering recurrent models ( RNNs ). Their challenge was to extend this method for video understanding. In video understanding, the difficult part is that you have to also consider the temporal information, because there is not only the spatial information of the input. He says the most challenging part was to extend this method considering the LSTM part. The actual functioning of the LSTM is difficult because there are a lot of gates and other things inside the architecture. Since their approach is probabilistic that tried to backpropagate the probability inside the net, trying to see which are the excitatory connections that the model has used to classify the label. Andrea says: “ Actually, the trickiest part was to extend this method for LSTM architecture. We solved it by normalising in time these backpropagated probabilities. The probabilities that you are backpropagating you have to preserve in time, because LSTM also considers the time. The most challenging part was to normalise this probability for all the time-steps to consider in the clip. Basically, we have added this temporal normalisation in the backpropagation .” Sarah adds: “ We know already the next steps for this work. In this work we focus on one form of grounding: identifying evidence of a prediction within the image or video input. However, grounding can take other forms including: identifying evidence within the model, or identifying evidence within the training corpus .” Andrea tells us that he has enjoyed working with Sarah on this project and feels that their skills are complementary. He is enthusiastic about their collaboration and that of the two institutions and hopes it will continue. To find out more about their work, please visit Andrea and Sarah’s poster [D22] today at 12:30-2:50 Halls C-E. “…grounding can take other forms including: identifying evidence within the model, or identifying evidence within the training corpus.” Tuesday 11 Sarah & Andrea “What they have done is try to extend these explanations for CNN but also using LSTM methods, considering video as input.”

RkJQdWJsaXNoZXIy NTc3NzU=