Computer Vision News - March 2019
Reject indicates percentage of decisions rejected, and r correspond to the process noise and the observation noise, respectively. is set to 0.5 and observation noise is set to a higher rate than originally assigned to r . At first glance, comparing the results of the second table (“audio fusion”) to the first table -- no-classifier-fusion -- it may seem only Valence and Arousal show improvement. However, the most important improvement in evaluation is that now there is a classification decision for every video frame. The table below presents results for the fusion of audio and video channel data. This multi-modal fusion improves performance for Power and Expectancy. However, Arousal performance was best fusing only the audio data, and Valence performance was best fusing only the video data. The lowered performance of Arousal is likely caused by an imbalance between the audio and video. Audio channel decisions aren’t always available, while the smart platforms of the video channel are almost always available. In the case of the Valence category, the lowered performance can be attributed to the relatively low F1 of the audio channel. Conclusion The paper presents an implementation of Kalman Filters for fusing the decisions of a number of classifiers. The authors show the feasibility of fusing multi-modal classifier outputs . The model fuses a large number of measurements and can handle missing values by increasing the noise level for that classifier. The authors used the audio/visual emotional challenge (AVEC) 2011 data set to evaluate the performance of the fused classifier. Fusion clearly improved performance for all four emotional state categories measured in the challenge. Moreover, using the Kalman Filter enabled the system to estimate the missing classification results, so that it was able to classify all data (all video frames). Despite the fact that the Kalman Filter can be considered the simplest instance of a time series (with no control matrix and assuming an identity matrix for dynamics), the results presented in the paper are excellen t. Research 8 Research Computer Vision News
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=