Computer Vision News

Results: The model trains a CNN to maximize the following likelihood: Similarly to the idea of transfer learning, the method uses the convolutional layers to compute the feature-vectors of the input images Train a CRF to maximize the local likelihood approximation: where: , represents the t-th image in sequence i. , represents the label of the t-th image in sequence i. ℎ , is the AlexNet output for image , . R and Q are the pairwise potential matrix. U is the unary potential and is the label bias. Q is the low-dimensional embedding matrix that encodes the one-hot encoding for the labeling vector for the images. The columns of R are the embeddings of the classes of the left-side object. Some more details: the AlexNet hidden layer size is s = 2048. The number of classes in the dataset is m = 972, and the class embedding dimensionality is d = 32. The class embedding matrix Qdm, a neighbor embedding matrix Rdm, a unary potential matrix and a bias vector 1 . The authors found that training each part of the model separately was sufficient for achieving optimal results. Test (inference) Algorithm: Given a sequence of observations [ 1 , . . . , ] predict the sequence of target labels [ 1 , . . . , ] 1. Use AlexNet to extract the image features for each observation [ 1 , . . . , ] 2. Find the label of the object sequence [ 1 , . . . , ] by applying the Viterbi algorithm on the CRF output. 6 Research Research Computer Vision News = ( ( , | , ( , , , ) = ( ( , |ℎ , , , −1 ) ( , |ℎ , , , −1 ) = 1 ( ( , −1 ) , + ℎ , , + , ) ( | ) = 1 ( −1 + ℎ ) +

Computer Vision News - March 2018