Computer Vision News

Fast R-CNN: The next model was developed in 2015, also by Girshick et al. In this model the ‘selective- search’ and ‘features convolutional layers’ are reversed. The image enters first the ConvNet layers, and only after that the ROIs around the various objects are extracted. The figure below shows the network’s structure: (1) Fast R-CNN receives an input image and a region proposal extracted using, for instance, selective search. (2) The network first processes the entire image, down to the Conv5 layer (of AlexNet). (3) Then, the ROI-pooling-layer samples the region proposals into standard sized chunks that are fed into Fc vectors. (4) Finally, the data is handed off to two classifiers: softmax probabilities; the object category and per-class bounding-box regression offsets. Comparing the original R-CNN and Fast R-CNN We observe that this network is indeed significantly faster (about 150 times!), but (yes, there is a but…) we are left with a bottleneck at the region proposal stage, which ‘costs’ us 2 seconds per image, reducing thus the speed-up to ‘only’ 25 times faster than original. 10 Computer Vision News Tool Tool R-CNN Fast R-CNN Test time per-image 47 sec 0.32 sec Network speedup 1x 146x Test time per image with selective search 50 sec. 2 sec Speedup including selective search 1x 25x

Computer Vision News - March 2017