ECCV 2018 Daily - Monday

Daily MONDAY 10 Hector Basevi is a research fellow at the University of Birmingham in the United Kingdom. He speaks to us about the poster he presented yesterday as part of the workshop on Visual Learning and Embodied Agents in Simulation Environments (VLEASE). Hector’s work combines question answering systems , which are very good at describing spatial dependencies and semantic relations, with a physics prediction model , which allows the prediction of future temporal dependencies in a way that is meaningful to non-experts. Hector tells us that the main novelty of the work is that they have applied QA models to a time prediction problem. They have also integrated a model- based physics prediction engine into the system which allows it to learn small amounts of data, without needing the deep learning components to learn physics as well as language. With a physics engine in the middle of the problem, they can break it down into two smaller sub-problems, each of which are easy to learn. One challenge he encountered was coming up with a final architecture, because a monolithic end-to-end approach does not work, and you need to decompose into meaningful representations. He says another challenge was that in simulation, it’s very easy to generate videos of scenes, but hard to find enough humans to look at them and describe things. Thankfully, they got there in the end. Presentation Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions 6 “We want to put this system onto an actual robot!”

RkJQdWJsaXNoZXIy NTc3NzU=