Computer Vision News - January 2022

27 Fabian Mentzer compressing an image of a landscape, we do not have to store exactly how each blade of grass is oriented, it's fine to get the overall amount and type and color of grass right. The idea underlying this argument is the "rate-distortion-realism" trade-off . "Rate" refers to the bitrate, how many bits do we need to store an image, "distortion" refers to how close is the reconstruction to the input on a per-pixel level (for example, we want grass blades in the reconstruction where there were grass blades in the input, not a house), and "realism" refers to how realistic an image looks: can a human tell that the image is compressed, or does it look like a "real" image taken with a camera. This is visualized in the figure, where we see an input image, a reconstruction from a model trained only for rate and distortion (r + ƛd), for rate and realism (r + βR) or all three. The last picture looks close to the input and is realistic, but if you look closely, you see that not all blades are exactly where they should be. That image is a reconstruction from " High-Fidelity Generative Image Compression ", a paper where the above trade-off was optimized using GANs, to achieve 2x lower rates than the previous best algorithm ( see here for a demo ) . Other publications from the thesis are available here .