Computer Vision News - July‏ 2024

Computer Vision News Computer Vision News 22 Highlight Presentation Unified-IO 2 builds on the foundations laid by its predecessor, Unified-IO, aiming to create a model that can truly input and output anything. Training such a comprehensive model, especially with limited resources, has been incredibly tough. The team’s first major challenge was collecting the pretraining and instruction tuning data. The second was training a multimodal model from scratch rather than adapting existing unimodal models. “We tried a few months of tricks to stabilize the model and make it train better,” Jiasen recalls. “We figured out a few key recipes that were used by later papers and shown to be very effective, even in other things like image generation. We’re training on a relatively large scale with 7B models and over 1 trillion data. More than 230 tasks were involved in training these giant models.”

Made with FlippingBook

RkJQdWJsaXNoZXIy NTc3NzU=