Benchmarking Robustness-Efficiency Trade-offs of Camera-LiDAR Fusion Strategies for 3D Object Detection Under Environmental Corruptions
Keywords:
multi-modal sensor fusion, robustness benchmarking, 3D object detection, autonomous driving perceptionAbstract
Multi-modal sensor fusion combining camera and LiDAR data has become the dominant paradigm for 3D object detection in autonomous driving. The selection among early fusion, late fusion, and deep fusion strategies involves complex trade-offs between detection accuracy, computational efficiency, and robustness under adverse conditions. This paper presents a systematic benchmarking study that quantitatively evaluates these trade-offs on the nuScenes dataset under diverse environmental corruptions and calibration perturbations. Six representative fusion algorithms spanning three fusion categories are evaluated across 10 corruption types at three severity levels, with simultaneous measurement of detection accuracy (mAP, NDS), computational resource consumption (latency, GPU memory), and degradation patterns under spatial misalignment and temporal desynchronization. The results reveal that deep fusion approaches achieve 2.8%--5.1% higher NDS than early fusion under clean conditions, while late fusion strategies demonstrate 12.3%--18.7% lower mean Corruption Error under LiDAR-degraded scenarios. Calibration perturbation analysis shows that soft-association mechanisms reduce mAP degradation by 41.2% compared to hard-association approaches at 0.5-meter spatial misalignment. These findings provide evidence-based guidance for engineering teams selecting sensor fusion configurations under real-world deployment constraints.References
1. H. Caesar, V. Bankiti, A. H. Lang, S. Vora, V. E. Liong, Q. Xu, A. Krishnan, Y. Pan, G. Baldan, and O. Beijbom, "nuScenes: A multimodal dataset for autonomous driving," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2020, pp. 11621–11631.
2. P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V. Patnaik, P. Tsui, J. Guo, Y. Zhou, Y. Chai, B. Caine, V. Vasudevan, W. Han, J. Ngiam, H. Zhao, A. Timofeev, S. Ettinger, M. Krivokon, A. Gao, A. Joshi, ... D. Anguelov, "Scalability in perception for autonomous driving: Waymo open dataset," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2020, pp. 2443–2451.
3. D. Feng, C. Haase-Schütz, L. Rosenbaum, H. Hertlein, C. Glaeser, F. Timm, W. Wiesbeck, and K. Dietmayer, "Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges," IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 3, pp. 1341–1360, 2021.
4. Y. Dong, C. Kang, J. Zhang, Z. Zhu, Y. Wang, X. Yang, H. Su, X. Wei, and J. Zhu, "Benchmarking robustness of 3D object detection to common corruptions in autonomous driving," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2023, pp. 1024–1034.
5. L. Kong, Y. Liu, X. Li, R. Chen, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, "Robo3D: Towards robust and reliable 3D perception against corruptions," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 19994–20006.
6. S. Vora, A. H. Lang, B. Helou, and O. Beijbom, "PointPainting: Sequential fusion for 3D object detection," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2020, pp. 4604–4612.
7. T. Yin, X. Zhou, and P. Krähenbühl, "Center-based 3D object detection and tracking," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2021, pp. 11784–11793.
8. X. Bai, Z. Hu, X. Zhu, Q. Huang, Y. Chen, H. Fu, and C. L. Tai, "TransFusion: Robust LiDAR-camera fusion for 3D object detection with transformers," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2022, pp. 1080–1089.
9. Y. Li, A. W. Yu, T. Meng, B. Caine, J. Ngiam, D. Peng, J. Shen, Y. Lu, D. Zhou, Q. V. Le, A. Yuille, and M. Tan, "DeepFusion: Lidar-camera deep fusion for multi-modal 3D object detection," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2022, pp. 17182–17191.
10. Z. Liu, H. Tang, A. Amini, X. Yang, H. Mao, D. Rus, and S. Han, "BEVFusion: Multi-task multi-sensor fusion with unified bird's-eye view representation," in *Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)*, 2023, pp. 2774–2781.
11. T. Liang, H. Xie, K. Yu, Z. Xia, Z. Lin, Y. Wang, T. Tang, B. Wang, and Z. Tang, "BEVFusion: A simple and robust LiDAR-camera fusion framework," in Advances in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 10421–10434.
12. S. Xie, L. Kong, W. Zhang, J. Ren, L. Pan, K. Chen, and Z. Liu, "RoboBEV: Towards robust bird's eye view perception under corruptions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 1, pp. 1–18, 2025.
13. K. Yu, T. Tao, H. Xie, Z. Lin, T. Liang, B. Wang, P. Chen, D. Hao, Y. Wang, and X. Liang, "Benchmarking the robustness of LiDAR-camera fusion for 3D object detection," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)*, 2023, pp. 3187–3197.
14. T. Beemelmanns, Q. Zhang, C. Geller, and L. Eckstein, "MultiCorrupt: A multi-modal robustness dataset and benchmark of LiDAR-camera fusion for 3D object detection," in Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2024, pp. 1–8.
15. J. Yan, Y. Liu, J. Sun, F. Jia, S. Li, T. Wang, and X. Zhang, "Cross modal transformer: Towards fast and robust 3D object detection," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 18268–18278.
16. Y. Kim, J. Shin, S. Kim, I. J. Lee, J. W. Choi, and D. Kum, "CRN: Camera radar net for accurate, robust, efficient 3D perception," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 17615–17626.
17. X. Chen, T. Zhang, Y. Wang, Y. Wang, and H. Zhao, "FUTR3D: A unified sensor fusion framework for 3D detection," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)*, 2023, pp. 172–181.
18. M. Bijelic, T. Gruber, F. Mannan, F. Kraus, W. Ritter, K. Dietmayer, and F. Heide, "Seeing through fog without seeing fog: Deep multimodal sensor fusion in unseen adverse weather," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2020, pp. 11682–11692.
19. J. Wang, Q. Meng, G. Liu, L. Yan, K. Wang, M. M. Cheng, and Q. Hou, "Towards stable 3D object detection," in Proceedings of the European Conference on Computer Vision (ECCV), 2024, pp. 1–17.
20. S. Jin, J. Park, J. Lee, H. Lee, and S. Lee, "Run your 3D object detector on NVIDIA Jetson platforms: A benchmark analysis," Sensors, vol. 23, no. 8, p. 4005, 2023.
21. K. Qian, S. Zhu, X. Zhang, and L. E. Li, "Robust multimodal vehicle detection in foggy weather using complementary lidar and radar signals," in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2021, pp. 444–453.
22. Y. Ma, T. Wang, X. Bai, H. Yang, Y. Hou, Y. Wang, Y. Qiao, R. Yang, D. Meng, and Z. Li, "Vision-centric BEV perception: A survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 10978–10997, 2024.

