Computer Vision News - February 2018

Computer Vision News Dataset: The following datasets were used for evaluation: 1. MNIST - a benchmark for the handwritten digit 10-class classification task, including 60,000 grayscale training images and 10,000 test cases. For training Gaussian noise was added to the data. 2. CIFAR-10 - a benchmark dataset for the 10-class object classification task, including 50,000 (32 × 32 pixel) colored images for training and 10,000 for testing. For training, horizontal reflection and jitter were randomly applied to the data. 3. Facial Expression Recognition (classification) challenge at the ICML 2013 Representation Learning Workshop at the University of Montreal - the dataset consisted of 28,709 (48x48 pixel) images of faces with 7 expression- types for training and 3,589 test images. Networks used: MNIST The network used for MNIST consisted of 2 fully-connected hidden layers of 512 units each, followed by either the softmax or the L2-SVM loss function. The data was divided into 300 batches of 200 samples each. Linear decay from 0.1 to 0.0 was used. The L2 weight cost on the softmax was set to 0.001. Computer Vision News Research 5 Research Derivative =-1+ exp ( ) ( exp ( )) =-1+ exp ( ) ( exp ( )) ℎ = −2 ⋅ (max 1 − ℎ , 0 ) Objectives Minimizes cross-entropy or maximizes the log-likelihood Maximum margin between data points of different classes (one-vs-rest approach) Loss function property Gives a probability of correct classification (never 100%); as a consequence, the learning process could theoretically continue indefinitely Returns 0 when correctly classified; doesn’t differentiate between small-margin and large-margin correct classification Inter-class loss Cross entropy one-vs-rest