Computer Vision News

Results Audio and video classification pre-fusion performance . The results are percentages with standard deviation. The video results are per frame. Table shows performance of the audio and video classifiers separately, without classifier fusion. Images without a classification decision, such as when the person is not speaking or facial recognition failed, were excluded for the purpose of this evaluation. The table presents precision and the F1 rate (where P is the precision and R is the recall) for the four emotional state categories. Compared to the best performance previously achieved on the benchmark these results are already impressive (for instance, the best precision for predicting ‘Arousal’ was 61%). The next two tables present performance for the four categories with uni-modal classifier fusion -- that is fusion of the audio base classifiers separately and the video base classifiers separately. Audio classification performance after fusion using Kalman Filter: Video classification performance after fusion using Kalman Filter . The table below presents video-channel results. Here, all categories show improvement compared to the pre-fusion results. Note, however, that Arousal achieved better results using the audio channel: 7 Research Computer Vision News Kalman Filter Based Classifier…

Computer Vision News - March 2019