3 Computer Vision News Another challenge stemmed from the computational demands of training the model. The iteration was long, and Soon was not working with a big budget or a large GPU, so with limited resources, innovative solutions were necessary. “The obvious choice is you get a bigger GPU, but we didn’t have a bigger GPU, so I had to be smart,” he recalls. “I modified the model a little to make it a bit smaller. Also, we used a 3D body model, unlike conventional methods that use the whole 3D body map. If you have to map out every single point on your body surface, that takes up a lot of memory, but we used the SMPL model, which contains 10 parameters for body shape and 72 for the rotation of your joints. Instead of trying to run every single thing, you just need to run: What is the angle of this rotation? Wewere able to reduce the computational requirements a lot!” The extraction of 3D pose information from given images involved using a library to estimate the 3D body shapes and poses. Deep learning methods, including segmentation, were harnessed to predict silhouette masks, enabling precise positioning of elements within the generated images. Conventional approaches require fine-grained semantic segmentation, where you must segment every pixel of the different body parts. “The off-the-shelf semantic segmentation models were not very good,” Soon explains. “It’s very difficult to get the segmentation correct, especially if a person faces backward. Our methods don’t use semantic segmentation, so we’re able to avoid that problem. We just need a very simple silhouette mask.” The integration of visual prompts adds another layer of sophistication UPGPT: Universal Diffusion Model for …
RkJQdWJsaXNoZXIy NTc3NzU=