ECCV 2022 - Wednesday

self-driving. Using stochastic latent variables, we can model different modalities correctly, and our future predictions can cover several modalities rather than only one. ” The network must understand the temporal dynamics in the video to predict where each object is and where it will be in the next few seconds. “ Much of the work so far has been in the pixel space, but we’re using something more compact because self-driving vehicles have multiple cameras, ” Fatma explains. “ We’re summarising the information from six cameras into this compact bird’s -eye view representation and trying to predict the future instances in that. ” Alex Kendall gave a fascinating talk at the Workshop on Uncertainty Quantification for Computer Vision earlier this week on the Foundation Model he’s building for autonomous driving and how to make an end-to-end learning approach safe. Fatma also points to Yann LeCun ’s recent position piece on world models as a great way to approach self-driving. “ Once you have the world model, you can generate arbitrary length futures, ” she tells us. “ You can ask, given this state and this action, what’s going to happen next? With that kind of model, which can accurately predict the future, you can learn how to act based on its predictions. You don’t need a simulator. You don’t need something like CARLA to give you the next thing because your model can do that for you. It opens up many possibilities in terms of action. ” 23 Adil Kaan Akan