Computer Vision News - June 2020

2 Summary Co puter Vision Challenge 8 In the case of modelling the pose phase, my intuition is that fitting a GAN might be an overcomplicated model compared to training a GAN for images. For images, you need to generate a lot of detail, and it’s difficult to fit an easier parametric model here.” Michael points out that we should distinguish between a full GAN, which is really about a generative model which has been sampled from, and discriminative training, which is more what we’re talking about here: “You have an adversary, which you would have in a GAN, but you don’t have the generation process because what you’re trying to do is go from pixels and regress to body parameters. It’s not like you’re sampling necessarily. Our task is a regression task, and having an adversary is one way to learn about human poses.” They have also explored CycleGANs to try to perform the task in a fully unsupervised way. “Trying to learn 3D human pose from images is a super hard problem without training data,” Michael explains. So how can you solve this problem without it? “We actually had a fancy GAN which had multiple steps to it. It went from images to learn a segmentation of the body and then back from that segmentation to produce images of people. You have discriminators in both sides saying, is it a reasonable segmentation? You can get lots of segmentations of people as training data so that completes that cycle. Then you can go from segmentations to 3D pose. That’s something Gerard did in his work. But you can also do that in a GAN style. You can go to 3D poses of people and then back from 3D poses to 2D segmentations. We treat the 2D segmentations as an intermediate representation and then have multiple CycleGANs to try and learn this.” Wrapping up, we ask the team what they find most fascinating about this task. “I’ve been working on this for several years and it’s exciting trying to

RkJQdWJsaXNoZXIy NTc3NzU=