Computer Vision News - October 2022
6 Best Paper SIGGRAPH challenging to do with CLIP because it wasn’t meant for this task. It tends to lose the geometric relation to the input image. How do we combine the strong semantic features we get from CLIP while maintaining some relation to the structure of the input subject? ” This paper proposes two approaches to solve this issue. The first is inspired by a method called LPIPS . When comparing two images, you can use something simple like pixel-based loss or something more advanced called perceptual loss. In perceptual loss, instead of comparing pixels, you are comparing the activations of a pre-trained neural network. You are comparing something already learned about the images and the important features of each image. “ Inspired by that, we proposed to use CLIP as a perceptual loss, ” Yael continues. “ We use the intermediate-level activations of CLIP to define a perceptual loss between the input image and the output sketch. That way, we can use the huge power of CLIP to compare the two images. ” Secondly, the sketches consist of a limited number of strokes, and implementation is based on directly optimizing the strokes’ parameters. This process is highly non-stable because it optimizes the parameters of the strokes directly rather than training a network. Initialization matters a great deal here. “ We want the model to draw an input image and maintain a connection to that image , ” Yael says. “ If I give it an image of my cat, and I want it to draw it, I don’t want it to draw any cat; I want it to draw my cat. It must bear some relation to the input image, which is One image shows more examples of progressive abstraction, and the other one demonstrates the robustness of the method to unique categories that we don’t usually see in common sketch datasets.
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=