Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Siddharth Katageri^1* Arkadipta De^2* Chaitanya Devaguptapu^2* VSSV Prasad² Charu Sharma¹ Manohar Kaul²

¹IIIT Hyderabad, India

²Fujitsu Research India

* Equal Contribution

WACV 2024 (Oral presentation)

arXiv Code Video

The above video gives an overview of our method for UDA. Contrastive learning (CL) and optimal transport (OT) are designed to complement each other synergistically. CL establishes coarse class clusters in both domains, while OT aligns classes across domains. The colors of data points denote different classes.

Abstract

Recently, the fundamental problem of unsupervised domain adaptation (UDA) on 3D point clouds has been motivated by a wide variety of applications in robotics, virtual reality, and scene understanding, to name a few. The point cloud data acquisition procedures manifest themselves as significant domain discrepancies and geometric variations among both similar and dissimilar classes. The standard domain adaptation methods developed for images do not directly translate to point cloud data because of their complex geometric nature. To address this challenge, we leverage the idea of multimodality and alignment between distributions. We propose a new UDA architecture for point cloud classification that benefits from multimodal contrastive learning to get better class separation in both domains individually. Further, the use of optimal transport (OT) aims at learning source and target data distributions jointly to reduce the cross-domain shift and provide a better alignment. We conduct a comprehensive empirical study on PointDA-10 and GraspNetPC-10 and show that our method achieves state-of-the-art performance on GraspNetPC-10 (with ≈ 4-12% margin) and best average performance on PointDA-10. Our ablation studies and decision boundary analysis also validate the significance of our contrastive learning module and OT alignment.

Video Presentation

Method

Method. Given a batch of source and target point cloud, each point cloud is first randomly transformed twice using a pre-defined set of transformations. Then these transformed and original point clouds are passed through and 3D encoder to get their respective embeddings. Contrastive Loss is applied between the embeddings of the transformed point cloud, to establish coarse class clusters. Further, Contrastive Loss is also applied between embeddings of the 2D renderings and the latent 3D representation, to establish 3D-2D correspondence understanding. In the final stage the original point cloud embeddings are passed to the OT loss for establishing alignment.

Decision Boundary Visualization

We perform an interesting experiment of visualizing the decision boundary. Early (top-row) and final (bottom-row) epochs decision boundaries on target samples for One-vs-Rest (Monitor class) for S → M. (a), (e) Only PCM (without adaptation), (b), (f) Contrastive learning with PCM, (c), (g) Optimal transport and contrastive learning with PCM (Our COT) and (d), (h) Our COT fine-tuned with SPST. We see that, as we include our proposed compoents the representations are enchanced and the boundary becomes more tight.