Computer Vision News - July‏ 2024

Computer Vision News Computer Vision News 20 Highlight Presentation Aniruddha Kembhavi (top left) is a Senior Director at the Allen Institute for AI (AI2), leading the Perceptual Reasoning and Interaction Research (PRIOR) team, where Christopher Clark (center) and Jiasen Lu (top right) are Research Scientists, Sangho Lee (bottom left) is a Postdoctoral Researcher, and Zichen “Charles” Zhang (bottom right) is a Predoctoral Young Investigator. They spoke to us about their highlight paper proposing Unified-IO 2, a versatile autoregressive multimodal model. Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision Language Audio and Action Unified-IO 2 is the first autoregressive multimodal model capable of understanding and generating images, text, audio, and action. It can handle multiple input and output modalities and incorporates a wide range of tasks from vision research. Unlike traditional models with specialized components for different tasks, it uses a single encoder-decoder transformer model to handle all tasks, with a unified loss function and pretraining objective.

RkJQdWJsaXNoZXIy NTc3NzU=