ICCV Daily 2021 - Wednesday

Cadigan et al. explore entity linking and cross-domain local similarity scaling to address image-text verification in news media. and say, ‘What is it? What am I looking at? What does it say here?’ ” Anna points out. “ I don’t think we are doing a great job with that in-the-wild setting right now. There are still challenges. Most of the data we are seeing in these benchmarks is captured by sighted people who see what they’re photographing, so there’s still a long way to go. ” Another area that has seen a lot of interest is multi-modal pre-training . This is pre-training which exploits both modalities and is beneficial for downstream tasks in either modality. The grounding acquired in this pre- training benefits them mutually. The workshop features a stellar roster of speakers, including Cordelia Schmid from INRIA, Lisa Anne Hendricks from DeepMind, Bryan Plummer from Boston University, Mohit Bansal from UNC Chapel Hill, and Alec Radford from Open AI. “ We’re really excited about the panel that we hope to have towards the end of the workshop, ” Anna reveals. “ It’s always fun to see what these great minds have to say when we throw difficult questions at them! ” she laughs. There are also two exciting new challenges this year. One is from Oxford University on movie understanding , and the other is a broad effort from many institutions called VALUE , which is capturing a number of video-and- language datasets . “ We’re keen to see this sort of broader picture of how we’re doing if we try to evaluate across many benchmarks, ” Anna adds. “ That should be very cool! ” 10 DAILY ICCV Wednesday Workshop Preview