Computer Vision News

The following table summarizes recent developments in R-CNNs: R-CNN: The first R-CNN network was developed in 2014 by Girshick et al. and had the following structure: The network had 4 main parts: (1) It receives input image. (2) Using selective search, it extracts approximately 2000 proposed regions as candidates for objects in the image. (3) Using AlexNet, it extracts a 4096-dimensional feature vector from each of the 2000 region proposals. (a) The network uses five convolutional layers and two fully connected layers from AlexNet. (b) As AlexNet requires input in the form of 227x227 images, so for each region proposal, the network warps all pixels in a tight bounding box around it to the required size. (4) Finally, it classifies each category and region, using SVM and LR, respectively. This first model, proposed in 2014, showed significant improvements in performance (30% improvement compared to existing methods at the time), but suffered from significant drawbacks: the network’s training is performed in separate stages (stages 3 and 4 above): first ConvNet is fine-tuned using log- Loss, the data is saved to disk and then used to train an SVM and a linear regression model per category. This two-stage saving leads to a slow process, which wastes disk-space. For instance, in the case of the Pascal dataset it takes up about 200GB. Tool Computer Vision News Tool 9 Year Author Paper Title 2014 Girshick et al Rich feature hierarchies for accurate object detection and semantic segmentation R-CNN 2015 Girshick et al Fast R-CNN Fast R-CNN 2016 Ren et al Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R- CNN 2017 Redmon et al YOLO9000: Better, Faster, Stronger YOLO9000

Computer Vision News - March 2017