Computer Vision News - April 2023
5 SinMDM: Single Motion Diffusion part of one of the pioneering works, which showed that, unlike image diffusion models that use the famous U-Net architecture , it’s better to use transformers for human motion diffusion. However, standard transformers have a receptive field that contains the whole motion. If we run a network with that, we get an overfitting problem. The output will be the same motion, again and again, just replicating the input, when we want diversity. ” To run a transformer that uses an attention mechanism on motion, you can employ local attention in non- overlapping windows, similar to ViT - Vision Transformers . However, using non- interleaving windows tends to constrain the cross-window interaction, adversely affecting model performance. Smaller windows are necessary to reduce the size of the receptive field. The QnA or query and attend algorithm provides a solution to this problem . “ Existing algorithms can use interleaving animation – for example, a breakdancing dragon – and create as many different animations as we want, ” Sigal tells us. “ All these different animations are faithful to the motion motifs of the original, but no motion is the same as another. The dragons dance together but not in the same order. Only the semantics are the same! ” The new algorithm has significantly improved the speed and memory requirements for generating animations by utilizing the power of diffusion models and a denoising network. Traditionally, diffusion models were believed to require extensive datasets, but this model can run on just one input . The resulting animations are superior to previous work that used GANs, leading to a more complicated, slow, and memory-intensive algorithm. “ Diffusion models for human motion modeling is a new region, ” Sigal, who recently took the relatively unusual path of returning to academia after many years working in the industry, points out. “ I was
Made with FlippingBook
RkJQdWJsaXNoZXIy NTc3NzU=