Computer Vision News - April 2025

image model with this new concept. There is this concept called my teddy bear and then once it learns that, I can now at inference time use this fine-tuned model to generate images like a photo of my teddy bear with a blue house in the background and so on.” However, while doing this, there is a trade-off. In the early finetuning steps, you have good prompt fidelity and diversity that comes from the world knowledge of the pre-trained model but you don't have subject fidelity because the fine-tuning is still early and we have not learned the subject yet. In the later steps of fine-tuning, we start to see overfitting and catastrophic forgetting. Because of this, what happens is you start losing the prompt fidelity and diversity but 21 Computer Vision News Computer Vision News can probably do that because it knows who is Barack Obama. On the other hand, if I have a more not-famous concept like myself or my dog or my teddy bear and if I say generate a photo of my teddy bear with a blue house in the background, it doesn't know what my teddy bear is. Then, the problem is how do you teach these concepts to a pre-trained model and one of the successful ways to do that is by fine-tuning this pre-trained text-to- now you have good subject fidelity because you already overfitted. This means that Shwetha’s method is trying to get the best of both and try to design a method that can achieve both prompt fidelity and diversity as well as subject fidelity by combining the benefits of both the early and late fine-tuning checkpoints during inference. What’s next? Shwetha is working on a few different things at the moment, and hopefully she will share with the community soon! DreamBlend applied on different backbones, different fine-tuning techniques, real image editing … there is a trade-off! DreamBlend

RkJQdWJsaXNoZXIy NTc3NzU=