Computer Vision News - May 2022

14 Congrats, Doctor! An important objective of AI is to understand real-world observations and build up interactive communication with people. The ability to interpret and react to the perception reveals the important necessity of developing such a system across both the modalities of Vision (V) and Language (L). Although there have been massive efforts on various VL tasks, e.g., Image/Video Captioning, Visual Question Answering and Textual Grounding, very few of them focus on building the VL models with increased efficiency under real-world scenarios. The focus of my research is to comprehensively investigate the very uncharted efficient VL learning, aiming to build lightweight, data-efficient, and real- world applicable VL models. The proposed studies in my research take three primary aspects into account when it comes to efficient VL: 1) Data Efficiency : collecting task-specific annotations is prohibitively expensive and so manual labor is not always attainable (see Figure 1). Techniques are developed to assist the VL learning from implicit supervisions, i.e., learn the associations between visual concepts and semantics in a weak/un-supervised fashion. For example, when given an image, learn which specific region best corresponds to a textual query “A man in red topping”, when we have only an image-level general description without knowing the exact region corresponds to specific visual concepts. Part of my works aim to build textual grounding system in images and moment localization in videos through language in weakly supervised fashion. 2) Efficient representation learning with increased scalability . It is challenging (Jacob) Zhiyuan Fang recently completed his PhD at the Active Perception Group, Arizona State University. His research interest spans over the intersection of Computer Vision and Natural Language Processing, with the focus of building efficient and strong Vision-Language Models from weak supervisions. He developed Vision-Language learning algorithms from the perspective of knowledge distillation and implicit supervisions on efficient deep neural architectures. During his PhD, he also spent times in Microsoft Azure AI as research intern and he is joining Amazon Alexa AI-Lab126 as a research scientist in June 2022. Congrats, Doctor Jacob!

RkJQdWJsaXNoZXIy NTc3NzU=