ICCV Daily 2021 - Wednesday

This workshop is based at the intersection of computer vision and natural language processing (NLP) . The field has seen rapid development in recent years, with new and exciting tasks and tools appearing, and with the recent, renewed interest in language-supervised vision learning, the field has really opened up and demonstrated the benefit of the enormous amount of multi-modal data available out there. “ We have always strongly believed in the value of multi-modality, ” Anna tells us. “ We always stood by vision and language, even when it was still a niche area and there were many sceptics – there are still some. Ultimately, we believe that both modalities have to be analyzed and reasoned about jointly. Vision can benefit language and language can benefit vision but studying each in isolation is probably suboptimal. Transformers and attention mechanism both came into vision from NLP, for example. I would 8 DAILY ICCV Wednesday Workshop Preview Anna Rohrbach is a research scientist at UC Berkeley. Mohamed Elhoseiny is an Assistant Professor at KAUST, the King Abdullah University of Science and Technology. Closing the Loop Between Vision and Language They are co-organizers of the 4th Workshop on Closing the Loop Between Vision and Language at ICCV this year, together with Andrew Brown, Xin Eric Wang, and Marcus Rohrbach. They speak to us ahead of the main event on Sunday. Bose et al. address a new task of emotion prediction from multi-modal data, namely images and associated subjective captions.