Computer Vision News - April 2018

attention-weight per word of the question is shown by degree of pink or red stress on that word (top), and then the answers produced by DRAU (middle) and MCB (bottom) are given. It seems that recurrence does indeed help the model attend to multiple targets, as can be seen in the difference between the two models’ attenuation maps image below: (1) the DRAU correctly attend to the correct kind of object as in the bear image; (2) attend to the location required to answer the question -- as in the second column; (3) including complex relational questions -- as in the racket question. Results: The authors propose an RNN-based network to produce parallel visual and textual attention: the new VQA-focused attention units architecture is termed Recurrent Attention Unit (RAU). The attention module helps the network focus both textually and visually on the relevant localizations, allowing the network to connect and better “understand” the relations between parts of the question and areas of the image. The authors demonstrated quantitative and qualitative results showing the model’s effectiveness. The DRAU model’s achieved accuracy was better than other models’ in particular on tasks involving complex or sequential reasoning, like counting the number of objects in an image. The proposed model outperforms the top performer of the VQA-1.0 Benchmark and demonstrated results comparable to those of the VQA-2.0 state-of-the-art contenders. All this, with a single model not using ensemble -- while the top performers of VQA-2.0 were models including the use of ensemble of 20 different models and more. Research 9 Research Computer Vision News

RkJQdWJsaXNoZXIy NTc3NzU=