Computer Vision News - March 2019
emotional states. The data are audio and video recordings of a human communicating with a virtual agent and trying to stimulate one of four emotional states: Arousal, Expectancy, Power and Valence. Ground truth was determined by having eight human evaluators evaluate each recording on a continuous scale, with the final binary labeling determined by applying a threshold to the average evaluation. The Base Classifiers Below are the features used as input by the base classifiers. Each classifier was trained either on audio recordings or video recordings, and optimized through cross-validation with other classifiers for the same recording type. Audio: To arrive at a fixed length input vector (from a long continuous varying input), HMM-based transformation is used. Classification is done by five bags of random forests, and the final decision is determined by averaging the five trees, with the standard deviation used to compute the measure of confidence. Audio classification is conducted per-word using 3 bug-of-words composed of the following features: ● Fundamental frequency, the energy and linear predictive coding (LPC) ● Mel frequency cepstral coefficient (MFCC) ● Relative spectral transform - perceptual linear prediction (RASTA-PLP) Video: Video channel features were acquired from the computer expression recognition toolbox (CERT), dedicated to facial expression recognition. Four models from the CERT toolbox were used: Basic Smile Detector, Unilaterals, FACS 4.4 and Emotions 4.4.3. The outputs of all four models were concatenated to create a length 36 vector for every video frame. The overall classification decisions and confidence measures were determined similarly to the audio by using five bags of random forests. In 8% of cases facial recognition failed leading to missing results for the base classifier, which the Kalman Filter handled. The hyperparameter settings of all base classifiers underwent optimization using the training and validation datasets. To optimize the hyperparameters of the classifier fusion algorithm (Kalman Filter) the same training and validation datasets were used, this time treating the classification decisions and confidence measures of each base classifier as features. Research 6 Research Computer Vision News The model fuses a large number of measurements and can handle missing values by increasing the noise level for that classifier
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=