Computer Vision News - October 2022

5 CLIPasso with Yael Vinker “ I also think there is a less practical and more philosophical point, from a research perspective, ” Yael points out. “ These days, we have large new models combining text and images, which have been proven to be very strong. By defining challenging tasks like ours performing this progressive abstraction, I think we’ve shed light on the advancement of this field and how far it can go. If we can take it one step further to solve it with computational methods, it’s really exciting research-wise. ” This work uses a popular model from OpenAI called CLIP . CLIP was originally trained to fit pairs of images and text. Being trained on a vast dataset of text-image pairs, CLIP is primarily semantic in nature and, until now, has been used mainly to generate images from text, not photos. However, this work is only interested in visuals; it does not use text at all. Another important concept in the paper is that the sketches are semantically aware . When asked to draw something, we first analyze and understand what we see. If we were drawing an image of a cat, we know the triangular shape of cats’ ears and that they have whiskers. These are probably the first features we would draw, even if they do not necessarily relate to the image of the cat we are sketching. This work takes inspiration from this prior semantic knowledge and tries to mimic it computationally to make the images more human-like. Abstracting visuals is a core concept in art and design, but performing this abstraction is a non-trivial task. Making the correct choices about what to emphasize in a piece depends on your context, goals, personal taste, and customers’ taste. Therefore, performing this task in a visually pleasing way is highly challenging. A tool that computationally performs abstractions would be helpful for designers and artists alike. Results of the method.

RkJQdWJsaXNoZXIy NTc3NzU=