Computer Vision News Computer Vision News 20 To the best of her knowledge, this is the first work that is trying to address the trade-offs in prompt fidelity, subject fidelity and diversity in text-to-image personalization by combining the benefits of early and late checkpoints during image generation. “We identified the phenomenon of catastrophic attention collapse,” Shwetha explains, “and we also proposed a method to be able to mitigate that using this cross attention guidance. This has improved results upon existing state-of-theart fine-tuning techniques for textto-image personalization, which is the novelty of this work!“ This work is a step towards improving the quality of text-toimage personalization. It can be hoped that it might open the doors for more people to think of more ways in which we can address this problem. In the meantime, can Shwetha explain the problem of text-to-image personalization, in case some readers don't know? “Okay!” Shwetha accepts the challenge. “Basically, if you take a pre-trained model like stable diffusion and ask it to generate a photo of Barack Obama in outer space, it The numerical models are just too computationally expensive, and that’s where deep learning methods from computer vision can help! WACV Poster Presentation Across various subjects and prompts, DreamBlend successfully preserves the layout of the reference underfit image as well as the identity of the input subject.
RkJQdWJsaXNoZXIy NTc3NzU=