Computer Vision News

MXNet is an open-source toolbox for a scalable deep learning framework, developed at CMU. It enables defining, constructing and assembling deep neural networks in a wide range of environments: from cloud infrastructure to mobile devices, and supporting a variety of programming languages. At first glance, MXNet may appear as yet another software toolbox facilitating programmers in constructing models of deep learning networks. However, MXNet has a number of advantages worth noting: (1) MXNet supports an unusually large number of languages, such as C++, Python, as well as R, Scala, Matlab and JavaScript; (2) MXNet’s scaling capability stands out as nearly linear: for every 100 GPUs added to the network, its throughput rose by 85 times (a better ratio than that of any other library). It must be noted that currently MXNet doesn’t enjoy as wide usage in the general deep learning community as other toolboxes, like TensorFlow, but perhaps this will change now that MXNet has become an integral part of the Apache Software Foundation. " MXNet is perhaps the closest system in design to TensorFlow. It uses a dataflow graph to represent the computation at each worker, and uses a parameter server to scale training across multiple machines. " [From The Morning Paper ] We’ll use the demonstration of MXNet to talk about Region-based Convolutional Neural Networks (R-CNN) , networks that both identify the category of the object in the image (such as cat, dog, etc.), and locate the object’s location in the image - 4 coordinates that define the rectangular ROI. An R-CNN is primarily made up of three parts: the first part extracts regional proposal. The second extracts features for each regain - this part is performed by a convolutional layer taken from a standard ConvNet network (such as AlexNet, ResNet or VggNet). The third part classifies the extracted features per category and region, using SVM and LR, respectively. The figure below gives the network’s structure: The training dataset for R-CNNs is a set of images, with every object in an image given its category and the 4 coordinates defining the ROI of the object. Next, we’ll review the three significant recent developments in R-CNNs, all of which are implemented in MXNet; and finally, a few words about YOLO9000 - which we will review in greater detail in one of our coming issues, so stay tuned. 8 Computer Vision News Tool Tool MXNet - A Scalable Deep Learning Framework with emphasis on R-CNN

Computer Vision News - March 2017