CVPR Daily - Wednesday

25 DAILY CVPR Wednesday Ivana Balažević Ivana Balažević is a Research Scientist at Google DeepMind. She spoke to us at CVPR 2024 on Monday, right after her talk and panel at the Prompting in Vision workshop. Read 160 FASCINATING interviews with Women in Computer Vision Where does Balažević come from? Croatia. I am not very far away. I am Italian. Ah, okay, so we’re neighbors! [she laughs] What is your work about? Well, various different things. I have mainly, in the past couple of years, worked on multimodal image and video understanding. As of recently, I moved into Gemini, working more on language. But, yeah, Gemini, super secret, can’t talk about it – you know how it is! Is convergence of multimodalities really happening? Text with vision, video, audio, all these things together? I think it is, especially in the past couple of years. I finished my PhD in 2021, and there were just these small models doing various different tasks. Everyone was working on their little model in their PhD or in whichever company, and now suddenly, everything is converging into one big model, which can do these various things, which is exciting but also maybe a bit scary. I don’t know. Mainly exciting, I would say. Why exciting, and why scary? That’s a very good question! In my mind, exciting because it unlocks a whole world of possibilities for what we can possibly do with these models in some possibly distant future. I don’t know because I didn’t think we’d be where we are now, but we would maybe be able to learn from these models or learn something new that we don’t know. These models might be able to make some sort of inferences, like combining various modalities to teach us things. How amazing would it be if we had a model that would be able to read scientific papers and come up with a new paper that is actually correct and that teaches us something new? Or a model that takes all our knowledge about medicine and biology and chemistry and finds a cure for cancer or something like that? Some of these things would be really, really amazing. Scary because, well, as with anything, people can abuse these sorts of models. And you cannot control the person who wants to abuse. Exactly. Any tool probably in human history – not any, but a lot of them can be used for good and for bad things. You have a kitchen knife that you can cut vegetables with or something… Or cut the neighbor! Yeah, exactly! [both laugh] It’s the same with the technology nowadays.

RkJQdWJsaXNoZXIy NTc3NzU=