Computer Vision News - November 2019

3 Summary Generating High Fidelity I ages ... 5 To solve the challenges mentioned above, the authors suggest a model that contains three networks: the first is a decoder on a small size, low depth (depth in terms of RGB bits) image slices subsampled at every n pixels from the original full resolution image. The second is a size upscaling decoder that generates large size low depth image, conditioned on the small size low depth image. The third is depth upscaling decoder that generates a large size high depth images conditioned on the large size, low depth image. You can see an illustration scheme in the below figure: To address the difficulty of training the second and third decoders, the authors suggest the Subscale Pixel Network (SPN) architecture . This architecture divides the image of size NxN into sub-images of size N/SxN/S sliced out at interleaving positions. The NxN image is then generated by one slice at a time conditioned on the previously generated slice. The SPN consists of two networks: a conditioning network that embeds the previous slices and a decoder that predict the next slice given the previous slice embedding. A demonstration of this scheme can be seen below: