Computer Vision News - February 2022

15 Tomas Jakab The idea of conditional generation was later also leveraged in our method for self- supervised shape deformation controlled by 3D keypoints [ 3 ] . The autoencoding framework can be also adapted for more complex structural representations like 3D meshes. Figure 2 shows our photo-geometric autoencoder with a bottleneck that disentangles the input image into a rigid pose, 3D canonical shape, articulation, and texture [ 4 ] . A differentiable renderer working as a decoder reconstructs the input image. The model is trained end-to-end with reconstruction losses, without any explicit 3D supervision. For more information see our project website . Figure 1 illustrates our conditional autoencoder for self-supervised object keypoint discovery [ 1 ] . The decoder is tasked to reconstruct the input image but it can only see 2D points that were extracted from the input image using a 2D keypoint bottleneck. A second image containing the same object but a different pose is supplied to provide appearance information. The network is trained using only image reconstruction loss and learns to use the 2D points as object keypoints that encode the pose by dedicating each of the points to semantically consistent locations on the objects. Our follow-up work [ 2 ] allows the use of a prior over the keypoints to learn human-interpretable keypoints. Figure 1 Figure 2

RkJQdWJsaXNoZXIy NTc3NzU=