1IIIT Hyderabad, India
2Fujitsu Research India
* Equal Contribution
Recently, the fundamental problem of unsupervised domain adaptation (UDA) on 3D point clouds has been motivated by a wide variety of applications in robotics, virtual reality, and scene understanding, to name a few. The point cloud data acquisition procedures manifest themselves as significant domain discrepancies and geometric variations among both similar and dissimilar classes. The standard domain adaptation methods developed for images do not directly translate to point cloud data because of their complex geometric nature. To address this challenge, we leverage the idea of multimodality and alignment between distributions. We propose a new UDA architecture for point cloud classification that benefits from multimodal contrastive learning to get better class separation in both domains individually. Further, the use of optimal transport (OT) aims at learning source and target data distributions jointly to reduce the cross-domain shift and provide a better alignment. We conduct a comprehensive empirical study on PointDA-10 and GraspNetPC-10 and show that our method achieves state-of-the-art performance on GraspNetPC-10 (with ≈ 4-12% margin) and best average performance on PointDA-10. Our ablation studies and decision boundary analysis also validate the significance of our contrastive learning module and OT alignment.
Method. Given a batch of source and target point cloud, each point cloud is first randomly transformed twice using a pre-defined set of transformations. Then these transformed and original point clouds are passed through and 3D encoder to get their respective embeddings. Contrastive Loss is applied between the embeddings of the transformed point cloud, to establish coarse class clusters. Further, Contrastive Loss is also applied between embeddings of the 2D renderings and the latent 3D representation, to establish 3D-2D correspondence understanding. In the final stage the original point cloud embeddings are passed to the OT loss for establishing alignment.
We perform an interesting experiment of visualizing the decision boundary. Early (top-row) and final (bottom-row) epochs decision boundaries on target samples for One-vs-Rest (Monitor class) for S → M. (a), (e) Only PCM (without adaptation), (b), (f) Contrastive learning with PCM, (c), (g) Optimal transport and contrastive learning with PCM (Our COT) and (d), (h) Our COT fine-tuned with SPST. We see that, as we include our proposed compoents the representations are enchanced and the boundary becomes more tight.