Computer Vision News - February 2021

Remarkably, the final architecture and training options are the results of detailed ablation experiments conducted on several isolated components to make sure that the best setting was implemented. The need for an update operator is justified by comparing the performance of a GRU block with a set of 3 convolutional layers with ReLU activation, and tied weights at all instances are found to provide better convergence than untied ones. The addition of the context encoder also results in better performance after tests. Features are extracted only at a single resolution because multi scale one over-complicates the architecture and it is not found to perform significantly better. The presence of the correlation pooling operation on the output features allows to capture both large and small displacements. All-pairs correlation is preferable and convenient over the computation of a local neighbourhood correlation. The correlation volume is compared to a layer warping the features from onto by use of the current estimate of optical flow, and found superior. The upsampling module results in better performance than the bilinear upsampling. Finally, a total number of 32 inference updates is picked after relative experiments. CONCLUSION RAFT was evaluated on 3 datasets: Sintel , KITTI and DAVIS . Examples of the results obtained on them are shown below. The images speak for themselves! The top row includes comparisons with state-of-the-art methods similar to RAFT, and the differences in performance are noticeable especially at the boundaries of fine details. This is due to the presence of the convex upsampling module in the architecture which improves accuracy near motion boundaries, and also allows RAFT to recover the flow of small fast-moving objects such as the birds shown in the figure. 1 2 7 RAFT (best paper ECCV)