Computer Vision News - April 2018
Dataset: Several datasets have been published for VQA (see Visual question answering: A survey of methods and datasets for a survey). Since its introduction in 2015, the VQA dataset of Antol et al. has been the de-facto benchmark. VQA has two versions: VQA-1.0 from 2015 and VQA-2.0 from 2017 (VQA-2.0 was built to reduce the language prior and bias inherent in VQA). Results: The full results table for VQA-1.0 is not included due to space constraints, and can be viewed in the original article . The authors’ model, DRAU, achieved comparable performance to state-of-the-art models. Kim et al.’s MLB (“ Hadamard Product for Low-rank Bilinear Pooling ”) achieved the best published result of 65.07% on the test-std split, while DRAU achieved 65.03%. Let’s focus on the VQA-2.0 results. The results on the validation split benchmark are: RVAU shows significant improvement in accuracy over LSTM. However, when RVAU is used as a multi-label classifier the results deteriorate drastically -- which could be the result of several causes, the most likely being that since there are now several answers per question -- the task becomes much harder. A number of additional variations on the model are presented, with the best overall accuracy being achieved by the DRAU high final dropout model. Research 7 Research Computer Vision News
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=