Computer Vision News - March 2018

Computer Vision News Main ideas and advantages: The central insight is to construct a model for the series of cropped single- product images, and learn the parameters relevant to the model using linear- chain CRFs to learn. The parameters are learned using the local image features and from the nearby product labels, where the local image features are computed using an AlexNet CNN and the neighboring product labels are learned by factoring the CRF pairwise potential matrix. Main benefits: • Handling large datasets by combining deep class embedding into a CRF. • Introducing batch-normalization in the training process. • Computationally efficient process, by using the approximated CRF likelihood. Method: The input data is a series of pre-cropped images each containing a single product, ordered in a series based on their sequence on the shelf. , represents the t-th image in sequence and , represents the label of the t-th image in sequence . The model is demonstrated in the figure below: Images are pre-cropped so that the input consists of single-product images. The method can be thought of as two branches: 1) extracts image features using AlexNet (top of figure), 2) models the contextual information from the preceding image’s label using one-hot encoding (bottom of figure). The output of the two branches is the input for the CRF which learns three matrices encoding the products’ contextual relations (to be detailed below) and the final labeling is determined by Softmax. Research 5 Research Computer Vision News

RkJQdWJsaXNoZXIy NTc3NzU=