Computer Vision News - June 2018

the GGNN, while the 28x28 orange tensor is the input for the First Vertex network. 2. First Vertex: A separate network is used to predict the first vertex. It adds two DxD layers to the 28x28 orange tensor (in the above figure). The first layer predicts edges, while the second predicts the vertices of the polygon. The first vertex is sampled from this vertices-predicting final layer. 3. RNN Decoder Two-layer ConvLTSM was used with 64 kernels in the first layer and 16 in the second, each with a kernel size 3x3. the output at each time step t is a matrix of size DxD of zeros and ones, where 1 indicates a vertex and 0 otherwise. D is the resolution used by the system for rough polygon prediction (the authors used D=28). When the polygon is closed an end-of-seq token is signaled. Network loss: Polygon prediction is formulated as a reinforcement learning problem. The policy (denoted by p) for selecting the next vertex vt is computed by maximizing the reward r as the IoU between the mask enclosed by the generated polygon and the ground-truth mask m. To maximize the expected reward, our loss function becomes ( ) = − ∼ [ ( , )] where Research 6 Research Computer Vision News

RkJQdWJsaXNoZXIy NTc3NzU=