Computer Vision News - November 2020

341 can't, because there's a lot of it. It's also because a lot of these things you can't describe inwords, thewhole point is that they're subconscious to us. A lot of these things we can't actually talk about. There is no image that we could annotate like it's happening exactly. Annotation is not going to work in a general sense. I think that the most important and exciting question is: how do you set this up so that you can still learn stuff, but either in an unsupervised, or self-supervised, or weakly supervised way, so that you can bypass the need to do supervision? Are the metrics difficult? Yes, the metrics are difficult. Once you have data, and you figure out how to learn, it turns out that the metrics are the most depressing. You ask yourself, “How will I know that this succeeded?” There's always this part of the project where you say, “Oh, no! This is a bad direction. There's no way to evaluate.” Up until now, in all the stuff that I've done, there has always been a way to at least figure out something that you're doing right. Maybe the metric is not always exactly on what your goal is, but you can make a scaffolding of metrics in all kinds of directions, to see that you're going in the right direction. It turns out that that requires the most of your thinking, but it always works out at the end. The first time I heard about you was in relation to a dance project. Can you share more with our readers? Yes, there's a lot to talk about! The dance project was a very interesting project. Basically, the idea started from my co- author Tinghui Zhou, who said that there is this pix2pix, which is image to image. He asked, “What would happen if we do this with a video?” His dream was to dance like Michael Jackson. [laughs] I really liked this project. It was technically very simple. Pix2pix already existed. We actually had Caroline Chan, who is now a PhD student at MIT. During her time as an undergrad, she did very well in a computational photography class. We said, “Okay, this is your project. You can do it.” It worked very well! I think the goal there was very interesting, trying to model something that is very fine grained with tiny, little details. What we were interested in was taking a subject that we want to animate into dancing and just learn everything about them. We would record a 10 or 20 minute video of this person doing all kinds of random motions. What we wanted to figure out was how this person looks from all directions, all poses, what their body is like, how they hold themselves in space. Are they symmetric? Did they fall down when they were two-years-old that now gives them something interesting about their posture? That is specifically them. Then the motion transfer part was basically pix2pix with a little bit of normalization of one person and another person to align them exactly with their limbs. Shiry Ginosar

RkJQdWJsaXNoZXIy NTc3NzU=