CVPR Daily - Tuesday

He adds that most of the related work on this subject is more about using pose data information, so skeleton data which is given by the software, but in their case, they wanted to have a model which is able to run using only RGB data for understanding some fine-grained action done by humans. Fabien explains further: “ We wanted our method to be fully differentiable, to use backpropagation in order to learn our ways, so we have used a spatial transformer to do some cropping in the video in a differentiable way. This extracts a patch in the video, because we believe that extracting local information is important for the whole task .” He tells us that they have been focusing on trimmed video and that is already a difficult enough task, but in terms of next steps, the biggest challenge – and one which would have a lot of applications in the real world – would be to do an extension of this work on untrimmed video, so understanding when there is no action happening in the video and when the action starts and ends. If you want more information about Fabien’s work you can visit the project page and there is also a github repository where they will be releasing the code this week and you will be able to train and evaluate their work. You can also learn more and ask Fabien any questions about this work by coming along to his poster [H2] today at 10:10-12:30 in Halls C-E. “We wanted our method to be fully differentiable…” Tuesday 15 Fabien Baradel “…in terms of next steps, the biggest challenge – and one which would have a lot of applications in the real world – would be to do an extension of this work on untrimmed video.”

RkJQdWJsaXNoZXIy NTc3NzU=