Home
Scholarly Works
Optimizing Monocular 3D Object Detection on KITTI:...
Conference

Optimizing Monocular 3D Object Detection on KITTI: Harnessing Power of Right Images

Abstract

Monocular 3D object detection is an important yet challenging problem in computer vision, with applications such as autonomous driving. A key limitation in advancing this field is the scarcity of annotated training data, an issue exacerbated in benchmarks like KITTI, which provide only around 7,000 labeled images. Prior arts have developed techniques to improve monocular 3D detection but often rely on external sources of data like LiDAR to supplement the limited training images. In this work, we propose a pre-training strategy that addresses the limited data issue by leveraging unlabeled right-camera images available within the KITTI dataset itself. We pre-train a model initialized for 3D detection by using “right” views before fine-tuning on just “left” images. Our experiments validate that, through this strategic pre-training with readily available “right” images, significant improvements can be achieved over models trained from scratch on only “left” images. We observe consistent gains in 3D detection performance when leveraging “right” image pre-training, without requiring any external LiDAR data. Our method provides evidence that mining unlabeled or weakly labeled in-domain data can effectively remedy the pervasive challenge of limited training data for monocular 3D object detection. This offers a plug-and-play practical strategy to use available datasets like KITTI better and reduce overfitting.

Authors

Bakhtiarian A; Karimi N; Samavi S

Volume

00

Pagination

pp. 1-6

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

February 22, 2024

DOI

10.1109/aisp61396.2024.10475203

Name of conference

2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP)
View published work (Non-McMaster Users)

Contact the Experts team