11 DAILY CVPR Friday The development of Unified-IO 2 has been a collaborative effort involving the four first authors: Jiasen, Christopher, Sangho, and Charles. Aniruddha is keen to ensure they get the recognition they deserve for the feat they have pulled off. “This project is a Herculean effort by these four people,” he points out. “Usually, people will take a large language model, then put a vision backbone, and then finetune that on some computer vision tasks. In this model, the language model is also trained from scratch. Think of large companies with hundreds of researchers trying to train a language model. Contrast that with this paper, which has four first authors trying to train a model that does everything. These four gentlemen have toiled night and day for many, many months. I can testify to that.” Everything about Unified-IO 2 is open source. If you visit the team’s poster today, you can feel safe knowing they are willing to share every aspect of the project. “We’ve released all the data, the training recipes, the challenges, especially in stabilizing the model training, and all the evaluation pipelines,” Sangho confirms. “If you come to our poster booth, we’ll be very happy to share all the recipes and know-how for training this special kind of multimodal foundation model.” Unified-IO 2
RkJQdWJsaXNoZXIy NTc3NzU=