Computer Vision News - March 2024

3 Computer Vision News Yumeng Li In this paper, Yumeng and the rest of the all-female team behind this work explore image generation given layouts such as semantic label maps. The idea is based on diffusion models. Previous training approaches for diffusion model pipelines relied on mean squared error (MSE) reconstruction loss, overlooking the explicit consideration of layout conditions. Furthermore, when employing diffusion models for image generation, it is necessary to undergo an iterative denoising process. However, previously, people only considered one single denoising step, disregarding the significance of this iterative approach. “Before, people tried to improve the architecture of the networks,” Yumeng tells us. “They didn’t pay too much attention to the training pipeline, like the training loss and objectives. They introduced ControlNet. Basically, you add another branch for incorporating the conditions, but they used exactly the same training objective as the previous diffusion models.” To mitigate these issues, the team proposes to integrate adversarial supervision into the training pipeline of L2I diffusion models, leveraging a layout condition to encourage conditional alignment. A segmentation network-based discriminator guides the diffusion model training using semantic label maps as a supervision signal. The diffusion model generator is encouraged to follow the label map explicitly.

RkJQdWJsaXNoZXIy NTc3NzU=