Computer Vision News - April 2025

Shwetha Ram is an Applied Scientist at Amazon, working on Rufus - Amazon's Conversational Shopping Assistant. She’s also the first author of a wonderful paper that was accepted as a poster at WACV 2025. Computer Vision News Computer Vision News 18 WACV Poster Presentation DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models The motivation for the work came from noticing that in the underfit checkpoints there is more of the prompt fidelity and diversity but in the overfit checkpoints there is subject fidelity. If we could find a way to combine the two and get the benefits, that would be awesome. Starting from this, the team tried different approaches using masks and so on, but they have their own challenges like blending artifacts at the mask boundaries. Over time, they started doing a more thorough analysis and observed a phenomenon of attention collapse and realized that maybe the solution to correct for this is to do some cross attention guidance using the attention maps. That’s when they decided to try out this regularization using the cross attention maps which actually gave good results. The paper tries to address a fundamental trade-off in the prompt fidelity, subject fidelity and diversity that occurs when you are fine-tuning for text-toimage personalization. This is a very fundamental problem that everybody who is fine-tuning these pre-trained text-to-image models for personalization will face, as confirmed by talks with other paper authors at WACV. Not knowing which training step or checkpoint one should pick while doing the fine-tuning, this is a fundamental trade-off. Shwetha’s paper is a step towards addressing and improving this trade-off.

RkJQdWJsaXNoZXIy NTc3NzU=