Computer Vision News - June 2022

23 Kaichun Mo Learning Compositional Visual Representations for 3D shapes. In PartNet [ 1 ] (Figure 1, left), we introduced a large-scale dataset providing fine-grained part annotations over ShapeNet [ 2 ] models and set up several 3D part segmentation benchmarks. We proposed 3D deep learning methods that are able to segment 3D shape inputs into semantic, instance-level, or hierarchical part instances. While the PartNet work is concerned with the tasks of 3D shape part segmentation, StructureNet [ 3 ] (Figure 1, right) focuses on investigating the inverse problem of synthesizing novel 3D shapes by composing 3D parts. We proposed a part-based and structure-aware 3D shape generative model that not only generates high-fidelity 3D part geometry assembled into a 3D shape but also captures the rich relationships and structural constraints among the 3D parts. We refer to the papers for the technical methods. Learning Actionable Visual Representations for 3D shapes. In Where2Act [ 4 ] (Figure 2, left), we proposed a general and self-supervised framework for learning actionable and task-specific visual representation for manipulating 3D articulated objects. Our method leverages scalable and inexpensive simulated interaction data collected from a physical simulator SAPIEN [ 5 ] to automate the robot-object affordance learning for various manipulation tasks, e.g., estimating where to push/pull a drawer/door on the cabinet. O2O-Afford [ 6] (Figure 2, right) extends the system to handle object-object interaction scenarios, such as fitting a bucket inside a cabinet. The critical challenge is to unify task specifications for a diverse set of downstream tasks, including fitting, placement, and stacking, so that the proposed method is generally applicable. Please check the papers for more details. Figure 2

RkJQdWJsaXNoZXIy NTc3NzU=