CVPR Daily - Friday

for us humans, and combining linguistic information with the scene text and the image itself is challenging . The team now has a model that can reason over vision and a model that can read, but it does not have both. Combining the capabilities of past models with this new model to have a model that knows both how to read and perform high-quality visual reasoning is an essential next step for this work. “ We’ve pinpointed the problem that there is some sort of bias, and obviously, you have to fix it, ” Ali points out. “ The first option is to ask humans for more data because we need to collect more data requiring vision to be taken into account to train better models. That is a valid strategy, but it’s faulty reasoning because we already collected data from humans, and we’re living in a biased world, so it will just open up new biases. The second option is still using this data but somehow making sure the models use the visual features no matter what. ” He leaves us with a final anecdote: “ Mark Twain said, ‘History doesn’t repeat itself, but it often rhymes. ’ I think we’ve found that is true, right? History rhymed again in that we are trying to make V matter again in VQA, in a sense, in this paper, and that’s kind of funny to me. ” To learn more about Ali’s work [ID 1446], come along to oral session 4.1.3 today at 08:30 and poster session 4.1. 8 DAILY CVPR Friday Oral Presentation

RkJQdWJsaXNoZXIy NTc3NzU=