Computer Vision News - March 2018
Large-Scale Classification of Structured Objects using a CRF with Deep Class Embedding Every month, Computer Vision News reviews a research paper from our field. This month we have chosen to review Large-Scale Classification of Structured Objects using a CRF with Deep Class Embedding . We are indebted to the authors Eran Goldman and Jacob Goldberger from the Engineering Faculty, Bar- Ilan University in Ramat Gan (Israel) , for allowing us to use images from the paper to illustrate this review. Their work is here . Credit is given also to supervisor Prof. Jacob Goldberger , Bar- Ilan University and Trax Image Recognition . Introduction: The model presented in this paper introduces a novel deep learning architecture, aimed at classifying structured objects in datasets with a large number of visually similar categories. The authors use CNN features and CRF on a huge dataset that contains images of retail-store product displays, taken from varying settings and viewpoints. Using approximated CRF likelihood, coupled with batch-normalization, the authors demonstrate significantly improved results compared to linear CRF modeling and unnormalized likelihood optimization. Aim, Motivation, Challenge: The article proposes a method for labeling and precisely characterizing (from a retail perspective) products on the shelf from a single image. The idea is demonstrated in the image below: the first 7 Red Bull cans have been labeled, and the eighth needs to be labeled. As you can see, the products are very similar to each other and sometimes needing to be differentiated by minute details. Moreover, some of them are partially occluded, or visually distorted by reflections, illuminations, and sometimes may be out of focus. Due to these conditions, the authors conclude that contextual relationships are the best basis for the products’ precise identification, and propose a novel deep learning architecture to classify structured objects in datasets with a large number of visually similar categories for best modeling this contextual information. 4 Research Research by Assaf Spanier Computer Vision News
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=