ECCV 2020 Daily - Thursday
2 Spotlight Presentation 8 Dario Pavllo is an Italian PhD student at ETH Zurich, under the supervision of Professor Thomas Hofmann and Senior Researcher Aurelien Lucchi. His paper is about generative models and he speaks to us ahead of his spotlight presentation today. The goal of this work is to improve the control of generative models . For downstream applications, it is important to think about what you want to use a generative model for. If you want to generate a dog and use that in a video game or animation, you must tell the model that. While some recent work generates high-quality, almost photorealistic results, you do not have very much high-level control. This model complements those other works but is about improving that control. “Specifically, what we do is generate complex scenes ,” Dario tells us. “Not a single object, but multiple objects in a way that we can control the style of those objects. Say you want to generate a car near a tree, for example, but you also want to specify the colour of the car and the style of each object in the scene. We call these attributes . Another thing we want to do is use text to control the style of the scene. You can generate a landscape, for instance, and then describe the weather with text.” This approach is weakly supervised and does not require labeled data. It contributes a scheme that can be attached to other architectures and the explicit control over style has not been done so extensively before. Recent models for generation of complex scenes use a mask, or a semantic image that describes what is there in every position of the image, to improve the results. This model does not require anything to be hand-labeled by humans. Instead, it uses another model that was trained for object detection to prepare the labels. Dario tells us they have observed some limitations on mask-based approaches – including this model – and the problem is the generated results stick too much to the mask. “Right now, if you want to DAILY T h u r s d a y Controlling Style and Semantics in Weakly-Supervised Image Generation “Everyone should jump on the 3D train!”
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=