5 Computer Vision News denoising process during the inference time, single-timestep supervision may not be sufficient in many cases, such as text-to-3D or text-to-video diffusion models. It’s crucial to apply multiple timesteps during training.” Originally from Tianjin, China, Yumeng has so far explored generative models, from the first PhD project on generative adversarial networks (GANs) to the current focus on diffusion models. In addition to the technical advancements, this work aims to bridge the gap between generative models and real-world applications. Leveraging synthetic data generated through its methodology demonstrates notable improvements in tasks like semantic segmentation. “ControlNet can be controlled by text information; for example, we can generate snowy scenes and rainy scenes, and it’s quite flexible,” Yumeng adds. “We generated diverse images and then applied this to the semantic segmentation task, showing that it can significantly boost the generalization of the segmenter. People put a lot of focus on generative models, but quite often, they just show beautiful images without mentioning how to use this synthetic data for real-life applications. You can have some fancy images just for fun, but eventually, we want to use them to improve downstream models. That’s quite important.” Yumeng Li
RkJQdWJsaXNoZXIy NTc3NzU=