Computer Vision News

The GGNN network element is used in order to generate refined polygons at a much higher resolution. GGNN has been proven efficient for semantic segmentation, when used at pixel-level. The authors take the polygon predicted by the RNN Decoder (orange vertices in the left figure) and, at a higher resolution, add midpoints (in blue) between every pair of consecutive orange vertices. The GGNN uses three types of edges (red, blue, green) to arrive at improved predictions of the relative location for each of the nodes (vertices) -- the black dashed arrows (zoomed-in middle figure). Right figure is the high resolution polygon output by the GGNN. Dataset and Evaluation: The authors use the Cityscapes dataset , which to this date is one of the most comprehensive benchmarks for instance segmentation. It contains 2975 training, 500 validation and 1525 test images with 8 semantic classes. The ground-truth polygons are pre-processed according to depth-ordering, to obtain polygons for only the visible regions of each instance. Two evaluation metrics were used: 1) Automatic Mode -- Intersection over Union (IoU) metric for evaluating the quality of the generated polygons and 2) Interactive Mode -- the average number of annotator clicks required to correct the model’s predictions. Results: Automatic Mode Evaluation . Comparison of the model against: (1) SquareBox considers the provided bounding box as a prediction; (2) Dilation10 , (3) DeepMask , (4) SharpMask , as well as (5) Polygon-RNN considered as state-of- the-art baselines; (6) Ablation study. The full model outperforms all other methods by almost 10% IoU; it also achieves best performance for each class. The goal of Interactive Mode Evaluation is minimizing annotation time, while obtaining high quality annotations. Research 8 Research Computer Vision News

Computer Vision News - June 2018