Computer Vision News - August 2018

Target and source domain are illustrated in the figure below: Given the affinity matrix, the problem becomes a transfer which maximizes the performance across all tasks, while minimizing the used supervision (the number of network trained from scratch). This problem can be formulated as subgraph selection where tasks are nodes and transfers are edges. Boolean Integer Programming (BIP) was used to solve this problem. The parameters of the problem are: the supervision budget (the number of network trained from scratch) and a measure of performance on a target from each of its transfers i.e. the affinities and the number of targets allowed. Datasets: There is currently no dataset suitable for mapping all 26 tasks. The authors therefore produced a dataset of 4 million indoor images, taken in about 600 buildings; every image was automatically annotated with ground truth for every task. Registration and alignment of the images with building wide meshes was used by the authors to automatically generate the ground truth. For other tasks, they took a knowledge distillation approach for labelling -- using the output of a state-of-the-art large fully supervised network. Task Dictionary: With 26 tasks in the dictionary (4 source-only tasks), the authors needed to train 26 fully supervised task-specific networks, 22×25 transfer networks in 1st order, and 22 ⋅ 22 for k-th order, from which they sampled according to the procedure detailed in the paper. The total number of transfer functions trained was near 3,000 which took 47,886 GPU hours of training on the cloud. Results: In the figure on the next page, the columns represent the maximum Supervision Budget (the number of networks that you are willing to train from scratch on a large dataset): the first column -- 2 networks, the next -- 8, and so on. The rows are the Transfer Order -- that is the number of pre-trained networks to be used concurrently as a source for transfer learning. Research 8 Research Computer Vision News