3D目标检测方法研究综述

黄哲; 王永才; 李德英

doi:10.11959/j.issn.2096-6652.202312

您当前的位置：

首页 >

文章列表页 >

3D目标检测方法研究综述

综述与展望 | 更新时间：2024-06-05

- 3D目标检测方法研究综述
- A survey of 3D object detection algorithms
- 智能科学与技术学报 2023年5卷第1期页码：7-31
- 作者机构：
  
  1. 中国人民大学信息学院，北京 100872
  2. 警务物联网应用技术公安部重点实验室，北京 100048
- 作者简介：
  
  [ "黄哲（1996- ），女，中国人民大学信息学院博士生，主要研究领域为计算机视觉、3D 目标检测" ]
  [ "王永才（1978- ），男，博士，中国人民大学信息学院副教授、博士生导师，主要研究领域为物联网、智能感知、网络定位、视觉感知、惯导融合定位" ]
  [ "李德英（1965- ），女，中国人民大学信息学院教授、博士生导师，主要研究领域为物联网、智能网络算法与分析" ]
- 基金信息：
  
  国家自然科学基金项目;The National Natural Science Foundation of China(61972404);国家自然科学基金项目;The National Natural Science Foundation of China(12071478);国家自然科学基金资助项目;The National Natural Science Foundation of China(61732006)
- DOI：10.11959/j.issn.2096-6652.202312
  中图分类号： TP183
- 网络出版日期：2023-03，
  
  纸质出版日期：2023-03-15
- 稿件说明：
移动端阅览
黄哲, 王永才, 李德英. 3D目标检测方法研究综述[J]. 智能科学与技术学报, 2023,5(1):7-31.

Zhe HUANG, Yongcai WANG, Deying LI. A survey of 3D object detection algorithms[J]. Chinese journal of intelligent science and technology, 2023, 5(1): 7-31.
黄哲, 王永才, 李德英. 3D目标检测方法研究综述[J]. 智能科学与技术学报, 2023,5(1):7-31. DOI： 10.11959/j.issn.2096-6652.202312.

Zhe HUANG, Yongcai WANG, Deying LI. A survey of 3D object detection algorithms[J]. Chinese journal of intelligent science and technology, 2023, 5(1): 7-31. DOI： 10.11959/j.issn.2096-6652.202312.

摘要

3D 目标检测是自动驾驶、虚拟现实、机器人等应用领域的重要基础问题，其目的是从无序点云中框取出描述目标最准确的3D框，例如紧密包围行人或车辆点云的3D框，并给出目标3D框的位置、尺寸和朝向。如今，基于双目视觉、RGB-D相机、激光雷达构建的纯点云的3D目标检测，融合图像和点云多模态信息的3D目标检测，是两类主要的方法。首先介绍了3D点云的不同表示形式和特征提取方法，然后从传统机器学习类算法、非融合深度学习类算法、基于多模态融合的深度学习类算法3个层面，逐层递进地介绍各类3D目标检测方法，对类别内部和各类之间的方法进行分析和对比，深入分析了各类方法之间的区别和联系，最后论述了3D目标检测仍存在的问题和可能的研究方向，并对3D目标检测研究的主流数据集和主要评价指标进行了总结。

Abstract

3D object detection is a fundamental problem in autonomous driving，virtual reality，robotics，and other applications.Its goal is to extract the most accurate 3D box characterizing interested targets from the disordered point clouds，such as the closest 3D box surrounding the pedestrians or vehicles.The target 3D box's location，size，and orientation are also output.Currently，there are two primary approaches for 3D object detection: (1) pure point cloud based 3D object detection，in which the point clouds are created by binocular vision，RGB-D camera，and lidar; (2) fusion-based 3D object detection based on the fusion of image and point cloud.The various representations of 3D point clouds were introduced.Then representative methods were introduced from three aspects: traditional machine learning techniques; non-fusion deep learning based algorithms; and multimodal fusion-based deep learning algorithms in progressive relation.The algorithms within and across each category were examined and compared，and the differences and connections between the various methods were analyzed thoroughly.Finally，remaining challenges of 3D object detection were discussed and explored.And the primary datasets and metrics used in 3D object detection studies were summarized.

关键词

Keywords

references

GUO Y L , WANG H Y , HU Q Y , et al . Deep learning for 3D point clouds:a survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 12 ): 4338 - 4364 .

ARNOLD E , AL-JARRAH O Y , DIANATI M , et al . A survey on 3D object detection methods for autonomous driving applications [J ] . IEEE Transactions on Intelligent Transportation Systems , 2019 , 20 ( 10 ): 3782 - 3795 .

QIAN R , LAI X , LI X . 3D object detection for autonomous driving:a survey [J ] . arXiv preprint , 2021 ,arXiv:2106.10823.

GEIGER A , LENZ P , URTASUN R . Are we ready for autonomous driving？ The KITTI vision benchmark suite [C ] // Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2012 : 3354 - 3361 .

FRITZ M , SCHIELE B . Decomposition,discovery and detection of visual categories using topic models [C ] // Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2008 : 1 - 8 .

CUI Y D , CHEN R , CHU W B , et al . Deep learning for image and point cloud fusion in autonomous driving:a review [J ] . IEEE Transactions on Intelligent Transportation Systems , 2022 , 23 ( 2 ): 722 - 739 .

QIN Z Y , WANG J L , LU Y . MonoGRNet:a geometric reasoning network for monocular 3D object localization [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 8851 - 8858 .

FU H , GONG M M , WANG C H , et al . Deep ordinal regression network for monocular depth estimation [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 2002 - 2011 .

ALDOMA A , MARTON Z C , TOMBARI F , et al . Tutorial:point cloud library:three-dimensional object recognition and 6 DOF pose estimation [J ] . IEEE Robotics ＆ Automation Magazine , 2012 , 19 ( 3 ): 80 - 91 .

ARORA H , LOEFF N , FORSYTH D A , et al . Unsupervised segmentation of objects using efficient learning [C ] // Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2007 : 1 - 7 .

WANG Z X , JIA K . Frustum ConvNet:sliding Frustums to aggregate local point-wise features for amodal 3D object detection [C ] // Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway:IEEE Press , 2020 : 1742 - 1749 .

YAN Y , MAO Y X , LI B . SECOND:sparsely embedded convolutional detection [J ] . Sensors (Basel,Switzerland) , 2018 , 18 ( 10 ): 3337 .

ZHOU Y , SUN P , ZHANG Y , et al . End-to-end multi-view fusion for 3D object detection in LiDAR point clouds [C ] // Proceedings of Conference on Robot Learning .[S.l.:s.n. ] , 2020 : 923 - 932 .

XU D F , ANGUELOV D , JAIN A . PointFusion:deep sensor fusion for 3D bounding box estimation [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 244 - 253 .

XIE S N , LIU S N , CHEN Z Y , et al . Attentional ShapeContextNet for point cloud recognition [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 4606 - 4615 .

XIE L , XIANG C , YU Z X , et al . PI-RCNN:an efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2020 , 34 ( 7 ): 12460 - 12467 .

FENG D , HAASE-SCHÜTZ C , ROSENBAUM L , et al . Deep multi-modal object detection and semantic segmentation for autonomous driving:datasets,methods,and challenges [J ] . IEEE Transactions on Intelligent Transportation Systems , 2021 , 22 ( 3 ): 1341 - 1360 .

WANG Y , MAO Q , ZHU H , et al . Multi-modal 3D object detection in autonomous driving:a survey [J ] . arXiv preprint , 2021 ,arXiv:2106.12735.

CHARLES R Q , HAO S , MO K C , et al . PointNet:deep learning on point sets for 3D classification and segmentation [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 77 - 85 .

QI C R , YI L , SU H , et al . PointNet++:deep hierarchical feature learning on point sets in a metric space [J ] . arXiv preprint , 2017 ,arXiv:1706.02413.

ZHOU Y , TUZEL O . VoxelNet:end-to-end learning for point cloud based 3D object detection [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 4490 - 4499 .

CHEN X Z , MA H M , WAN J , et al . Multi-view 3D object detection network for autonomous driving [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 6526 - 6534 .

KU J , MOZIFIAN M , LEE J , et al . Joint 3D proposal generation and object detection from view aggregation [C ] // Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems . New York:ACM Press , 2018 : 1 - 8 .

LANG A H , VORA S , CAESAR H , et al . PointPillars:fast encoders for object detection from point clouds [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 12689 - 12697 .

FAN L , XIONG X , WANG F , et al . RangeDet:in defense of range view for LiDAR-based 3D object detection [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2022 : 2898 - 2907 .

SUN P , WANG W Y , CHAI Y N , et al . RSN:range sparse net for efficient,accurate LiDAR 3D object detection [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 5721 - 5730 .

WU B C , WAN A , YUE X Y , et al . SqueezeSeg:convolutional neural nets with recurrent CRF for real-time road-object segmentation from 3D LiDAR point cloud [C ] // Proceedings of 2018 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2018 : 1887 - 1893 .

REN M Y , POKROVSKY A , YANG B , et al . SBNet:sparse blocks network for fast inference [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 8711 - 8720 .

LI P X , ZHAO H C , LIU P F , et al . RTM3D:real-time monocular 3D detection from object keypoints for autonomous driving [C ] // Proceedings of 2020 16th European Conference on Computer Vision . Cham:Springer , 2020 : 644 - 660 .

WANG Y , CHAO W L , GARG D , et al . Pseudo-LiDAR from visual depth estimation:bridging the gap in 3D object detection for autonomous driving [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 8437 - 8445 .

NG P C , HENIKOFF S . SIFT:predicting amino acid changes that affect protein function [J ] . Nucleic Acids Research , 2003 , 31 ( 13 ): 3812 - 3814 .

JOHNSON A E , HEBERT M . Surface matching for object recognition in complex three-dimensional scenes [J ] . Image and Vision Computing , 1998 , 16 ( 9/10 ): 635 - 651 .

CHEN H , BHANU B . 3D free-form object recognition in range images using local surface patches [J ] . Pattern Recognition Letters , 2007 , 28 ( 10 ): 1252 - 1262 .

MIAN A , BENNAMOUN M , OWENS R . On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes [J ] . International Journal of Computer Vision , 2010 , 89 ( 2 ): 348 - 361 .

STEIN F , MEDIONI G . Structural indexing:efficient 3D object recognition [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 1992 , 14 ( 2 ): 125 - 145 .

CHUA C S , JARVIS R . Point signatures:a new representation for 3D object recognition [J ] . International Journal of Computer Vision , 1997 , 25 ( 1 ): 63 - 85 .

FROME A , HUBER D , KOLLURI R , et al . Recognizing objects in range data using regional point descriptors [C ] // Proceedings of European Conference on Computer Vision . Heidelberg:Springer , 2004 : 224 - 237 .

ZHOU D F , FANG J , SONG X B , et al . IoU loss for 2D/3D object detection [C ] // Proceedings of 2019 International Conference on 3D Vision . Piscataway:IEEE Press , 2019 : 85 - 94 .

COLLET A , SRINIVASAY S S , HEBERT M . Structure discovery in multi-modal data:a region-based approach [C ] // Proceedings of 2011 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2011 : 5695 - 5702 .

SHIN J , TRIEBEL R , SIEGWART R . Unsupervised discovery of repetitive objects [C ] // Proceedings of 2010 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2010 : 5041 - 5046 .

HERBST E , HENRY P , REN X F , et al . Toward object discovery and modeling via 3D scene comparison [C ] // Proceedings of 2011 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2011 : 2623 - 2629 .

KARPATHY A , MILLER S , LI F F . Object discovery in 3D scenes via shape analysis [C ] // Proceedings of 2013 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2013 : 2088 - 2095 .

FELZENSZWALB P F , HUTTENLOCHER D P . Efficient graph-based image segmentation [J ] . International Journal of Computer Vision , 2004 , 59 ( 2 ): 167 - 181 .

SONG S R , XIAO J X . Sliding shapes for 3D object detection in depth images [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2014 : 634 - 651 .

MALISIEWICZ T , GUPTA A , EFROS A A . Ensemble of exemplar-SVMs for object detection and beyond [C ] // Proceedings of 2011 International Conference on Computer Vision . Piscataway:IEEE Press , 2012 : 89 - 96 .

SONG S R , XIAO J X . Deep sliding shapes for amodal 3D object detection in RGB-D images [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 808 - 816 .

罗会兰 , 陈鸿坤 . 基于深度学习的目标检测研究综述 [J ] . 电子学报 , 2020 , 48 ( 6 ): 1230 - 1239 .

LUO H L , CHEN H K . Survey of object detection based on deep learning [J ] . Acta Electronica Sinica , 2020 , 48 ( 6 ): 1230 - 1239 .

MINEMURA K , LIAU H , MONRROY A , et al . LMNet:real-time multiclass object detection on CPU using 3D LiDAR [C ] // Proceedings of 2018 3rd Asia-Pacific Conference on Intelligent Robot Systems . Piscataway:IEEE Press , 2018 : 28 - 34 .

YE M S , XU S J , CAO T Y . HVNet:hybrid voxel network for LiDAR based 3D object detection [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 1628 - 1637 .

GUO X Y , SHI S S , WANG X G , et al . LIGA-stereo:learning LiDAR geometry aware representations for stereo-based 3D detector [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2022 : 3133 - 3143 .

YANG J H , SHI S S , WANG Z , et al . ST3D:self-training for unsupervised domain adaptation on 3D object detection [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 10363 - 10373 .

SHI S S , WANG X G , LI H S . PointRCNN:3D object proposal generation and detection from point cloud [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 770 - 779 .

HE C H , ZENG H , HUANG J Q , et al . Structure aware single-stage 3D object detection from point cloud [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 11870 - 11879 .

SHI S S , WANG Z , WANG X G , et al . Part-A∧2 Net:3D part-aware and aggregation neural network for object detection from point cloud [J ] . arXiv preprint , 2019 ,arXiv:1907.03670.

PENG X D , ZHU X G , WANG T , et al . SIDE:center-based stereo 3D detector with structure-aware instance depth estimation [C ] // Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision . Piscataway:IEEE Press , 2022 : 225 - 234 .

ARMENI I , SAX S , ZAMIR A R , et al . Joint 2D-3D-semantic data for indoor scene understanding [J ] . arXiv preprint , 2017 ,arXiv:1702.01105.

PAN X R , XIA Z F , SONG S J , et al . 3D object detection with pointformer [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 7459 - 7468 .

SHENGA H L , CAI S J , LIU Y , et al . Improving 3D object detection with channel-wise transformer [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2022 : 2723 - 2732 .

WANG Y , GUIZILINI V , ZHANG T , et al . DETR3D:3D object detection from multi-view images via 3D-to-2D queries [C ] // Proceedings of Conference on Robot Learning .[S.l.:s.n. ] , 2022 : 180 - 191 .

MAO J G , XUE Y J , NIU M Z , et al . Voxel transformer for 3D object detection [C ] // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Piscataway:IEEE Press , 2021 : 3164 - 3173 .

DUAN Y , ZHU C , LAN Y , et al . DisARM:displacement aware relation module for 3D detection [J ] . arXiv preprint , 2022 ,arXiv:2203.01152.

JUNG H , OTO Y , MOZOS O M , et al . Multi-modal panoramic 3D outdoor datasets for place categorization [C ] // Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems . New York:ACM Press , 2016 : 4545 - 4550 .

ZHANG R , QIU H , WANG T , et al . MonoDETR:depth-guided transformer for monocular 3D object detection [J ] . arXiv preprint , 2022 ,arXiv:2203.13310.

DAI A , CHANG A X , SAVVA M , et al . ScanNet:richly-annotated 3D reconstructions of indoor scenes [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 2432 - 2443 .

LI Z C , WANG F , WANG N Y . LiDAR R-CNN:an efficient and universal 3D object detector [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 7542 - 7551 .

SONG S R , LICHTENBERG S P , XIAO J X . SUN RGB-D:a RGB-D scene understanding benchmark suite [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 567 - 576 .

XIAO J X , OWENS A , TORRALBA A . SUN3D:a database of big spaces reconstructed using SfM and object labels [C ] // Proceedings of 2013 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2014 : 1625 - 1632 .

CHOI Y , KIM N , HWANG S , et al . KAIST multi-spectral day/night data set for autonomous and assisted driving [J ] . IEEE Transactions on Intelligent Transportation Systems , 2018 , 19 ( 3 ): 934 - 948 .

BELTRÁN J , GUINDEL C , MORENO F M , et al . BirdNet:a 3D object detection framework from LiDAR information [C ] // Proceedings of 2018 21st International Conference on Intelligent Transportation Systems . New York:ACM Press , 2018 : 3517 - 3523 .

LU H H , CHEN X S , ZHANG G Y , et al . Scanet:spatial-channel attention network for 3D object detection [C ] // Proceedings of 2019 IEEE International Conference on Acoustics,Speech and Signal Processing . Piscataway:IEEE Press , 2019 : 1992 - 1996 .

HUANG K C , WU T H , SU H T , et al . MonoDTR:monocular 3D object detection with depth-aware transformer [J ] . arXiv preprint , 2022 ,arXiv:2203.1 0981.

ZHAO X , LIU Z , HU R L , et al . 3D object detection using scale invariant and feature reweighting networks [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 9267 - 9274 .

VORA S , LANG A H , HELOU B , et al . Point Painting:sequential fusion for 3D object detection [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 4603 - 4611 .

QI C R , LIU W , WU C X , et al . Frustum PointNets for 3D object detection from RGB-D data [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 918 - 927 .

JIANG M , WU Y , ZHAO T , et al . PointSIFT:a SIFT-like network module for 3D point cloud semantic segmentation [J ] . arXiv preprint , 2018 ,arXiv:1807.00652.

LI B , ZHANG T , XIA T . Vehicle detection from 3D lidar using fully convolutional network [J ] . arXiv preprint , 2016 ,arXiv:1608.07916.

LIANG M , YANG B , WANG S L , et al . Deep continuous fusion for multi-sensor 3D object detection [C ] // Proceedings of 2018 15th European Conference on Computer Vision . New York:ACM Press , 2018 : 663 - 678 .

尹宏鹏 , 陈波 , 柴毅 , 等 . 基于视觉的目标检测与跟踪综述 [J ] . 自动化学报 , 2016 , 42 ( 10 ): 1466 - 1489 .

YIN H P , CHEN B , CHAI Y , et al . Vision-based object detection and tracking:a review [J ] . Acta Automatica Sinica , 2016 , 42 ( 10 ): 1466 - 1489 .

王永森 , 刘宏哲 . 3D 目标检测技术的研究进展 [C ] // 中国计算机用户协会网络应用分会2019年第二十三届网络新技术与应用年会论文集 . 重庆:《计算机科学》编辑部 , 2019 : 177 - 182 .

WANG Y S , LIU H Z . Study progress of advances in 3D object detection technology [C ] // Proceedings of 2019 23rd Annual Conference on New Network Technologies and Applications of China Computer Users Association . Chongqing:Editorial Board of Computer Science , 2019 : 177 - 182 .

GIRSHICK R , . Fast R-CNN [C ] // Proceedings of 2015 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2016 : 1440 - 1448 .

REN S Q , HE K M , GIRSHICK R , et al . Faster R-CNN:towards real-time object detection with region proposal networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2017 , 39 ( 6 ): 1137 - 1149 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD:single shot MultiBox detector [C ] // Proceedings of 14th European Conference on Computer Vision . Cham:Springer International Publishing , 2016 : 21 - 37 .

REDMON J , DIVVALA S , GIRSHICK R , et al . You only look once:unified,real-time object detection [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 779 - 788 .

HE K M , ZHANG X Y , REN S Q , et al . Deep residual learning for image recognition [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 770 - 778 .

CHEN Y J , TAI L , SUN K , et al . MonoPair:monocular 3D object detection using pairwise spatial relationships [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 12090 - 12099 .

MOUSAVIAN A , ANGUELOV D , FLYNN J , et al . 3D bounding box estimation using deep learning and geometry [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 5632 - 5640 .

WENG X S , KITANI K . Monocular 3D object detection with pseudo-LiDAR point cloud [C ] // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop . Piscataway:IEEE Press , 2020 : 857 - 866 .

LIAN Q , LI P , CHEN X . MonoJSG:joint semantic and geometric cost volume for monocular 3D object detection [J ] . arXiv preprint , 2022 ,arXiv:2203.08563.

CHEN X Z , KUNDU K , ZHU Y K , et al . 3D object proposals using stereo imagery for accurate object class detection [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 5 ): 1259 - 1272 .

LI P L , CHEN X Z , SHEN S J . Stereo R-CNN based 3D object detection for autonomous driving [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 7636 - 7644 .

HE K M , GKIOXARI G , DOLLÁR P , et al . Mask R-CNN [C ] // Proceedings of 2017 IEEE International Conference on Computer Vision . Piscataway:IEEE Press , 2017 : 2980 - 2988 .

YANG B , LUO W J , URTASUN R . PIXOR:real-time 3D object detection from point clouds [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 7652 - 7660 .

YANG B , LIANG M , URTASUN R . HDNET:exploiting HD maps for 3D object detection [C ] // Proceedings of Conference on Robot Learning .[S.l.:s.n. ] , 2018 : 146 - 155 .

MEYER G P , LADDHA A , KEE E , et al . LaserNet:an efficient probabilistic 3D object detector for autonomous driving [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 12669 - 12678 .

LIU W , ANGUELOV D , ERHAN D , et al . SSD:single shot multibox detector [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2016 : 21 - 37 .

SHI S S , GUO C X , JIANG L , et al . PV-RCNN:point-voxel feature set abstraction for 3D object detection [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 10526 - 10535 .

WANG H , CONG Y Z , LITANY O , et al . 3DIoUMatch:leveraging IoU prediction for semi-supervised 3D object detection [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 14610 - 14619 .

CARION N , MASSA F , SYNNAEVE G , et al . End-to-end object detection with transformers [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2020 : 213 - 229 .

LIU X , XUE N , WU T . Learning auxiliary monocular contexts helps monocular 3D object detection [J ] . arXiv preprint , 2021 ,arXiv:2112.04628.

HE T , SOATTO S . Mono3D++:monocular 3D vehicle detection with two-scale 3D hypotheses and task priors [J ] . Proceedings of the AAAI Conference on Artificial Intelligence , 2019 , 33 ( 1 ): 8409 - 8416 .

LUO S J , DAI H , SHAO L , et al . M3DSSD:monocular 3D single stage object detector [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 6141 - 6150 .

CHEN H S , HUANG Y Y , TIAN W , et al . MonoRUn:monocular 3D object detection by reconstruction and uncertainty propagation [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 10374 - 10383 .

LI C Y , KU J , WASLANDER S L . Confidence guided stereo 3D object detection with split depth estimation [C ] // Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway:IEEE Press , 2021 : 5776 - 5783 .

NOH J , LEE S , HAM B . HVPR:hybrid voxel-point representation for single-stage 3D object detection [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 14600 - 14609 .

DING Z P , HAN X , NIETHAMMER M . VoteNet:a deep learning label fusion method for multi-atlas segmentation [C ] // Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention . Cham:Springer , 2019 : 202 - 210 .

YANG Z T , SUN Y N , LIU S , et al . 3DSSD:point-based 3D single stage object detector [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 11037 - 11045 .

LIU Z C , WU Z Z , TÓTH R , . SMOKE:single-stage monocular 3D object detection via keypoint estimation [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway:IEEE Press , 2020 : 4289 - 4298 .

HERBST E , REN X F , FOX D . RGB-D object discovery via multi-scene analysis [C ] // Proceedings of 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems . Piscataway:IEEE Press , 2011 : 4850 - 4856 .

LIANG M , YANG B , CHEN Y , et al . Multi-task multi-sensor fusion for 3D object detection [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 7337 - 7345 .

LI Y , YU A W , MENG T , et al . DeepFusion:lidar-camera deep fusion for multi-modal 3D object detection [J ] . arXiv preprint , 2022 ,arXiv:2203.08195.

GUPTA S , GIRSHICK R , ARBELÁEZ P , et al . Learning rich features from RGB-D images for object detection and segmentation [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2014 : 345 - 360 .

YOO J H , KIM Y , KIM J , et al . 3D-CVF:generating joint camera and LiDAR features using cross-view spatial feature fusion for 3D object detection [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2020 : 720 - 736 .

SHIN K , KWON Y P , TOMIZUKA M . RoarNet:a robust 3D object detection based on RegiOn approximation refinement [C ] // Proceedings of 2019 IEEE Intelligent Vehicles Symposium . Piscataway:IEEE Press , 2019 : 2510 - 2515 .

QI C R , CHEN X L , LITANY O , et al . ImVoteNet:boosting 3D object detection in point clouds with image votes [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 4403 - 4412 .

SINDAGI V A , ZHOU Y , TUZEL O . MVX-Net:multimodal VoxelNet for 3D object detection [C ] // Proceedings of 2019 International Conference on Robotics and Automation . Piscataway:IEEE Press , 2019 : 7276 - 7282 .

ZHANG W W , WANG Z , LOY C C . Multi-modality cut and paste for 3D object detection [J ] . arXiv preprint , 2020 ,arXiv:2012.12741.

WANG C W , MA C , ZHU M , et al . PointAugmenting:cross-modal augmentation for 3D object detection [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 11789 - 11798 .

WANG Y , CHEN X , CAO L , et al . Multimodal token fusion for vision transformers [J ] . arXiv preprint , 2022 ,arXiv:2204.08721.

BAI X , HU Z , ZHU X , et al . TransFusion:robust LiDAR-camera fusion for 3D object detection with transformers [J ] . arXiv preprint , 2022 ,arXiv:2203.11496.

PRAKASH A , CHITTA K , GEIGER A . Multi-modal fusion transformer for end-to-end autonomous driving [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 7073 - 7083 .

LIANG T , XIE H , YU K , et al . BEVFusion:a simple and robust LiDAR-camera fusion framework [J ] . arXiv preprint , 2022 ,arXiv:2205.13790.

MANHARDT F , KEHL W , GAIDON A . ROI-10D:monocular lifting of 2D detection to 6D pose and metric shape [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 2064 - 2073 .

GUPTA S , HOFFMAN J , MALIK J . Cross modal distillation for supervision transfer [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 2827 - 2836 .

HUANG T T , LIU Z , CHEN X W , et al . EPNet:enhancing point features with image semantics for 3D object detection [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2020 : 35 - 52 .

PANG S , MORRIS D , RADHA H . CLOCs:camera-LiDAR object candidates fusion for 3D object detection [C ] // Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems . New York:ACM Press , 2020 : 10386 - 10393 .

YIN T W , ZHOU X Y , KRÄHENBÜHL P , . Center-based 3D object detection and tracking [C ] // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2021 : 11779 - 11788 .

SIVIC J , RUSSELL B C , EFROS A A , et al . Discovering objects and their location in images [C ] // Proceedings of 10th IEEE International Conference on Computer Vision Volume 1 . Piscataway:IEEE Press , 2005 : 370 - 377 .

SHI X P , CHEN Z X , KIM T K . Distance-normalized unified representation for monocular 3D object detection [C ] // Proceedings of European Conference on Computer Vision . Cham:Springer , 2020 : 91 - 107 .

CHEN Y P , WANG J K , LI J , et al . LiDAR-video driving dataset:learning driving policies effectively [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2018 : 5870 - 5878 .

PHAM Q H , SEVESTRE P , PAHWA R S , et al . A*3D dataset:towards autonomous driving in challenging environments [C ] // Proceedings of 2020 IEEE International Conference on Robotics and Automation . Piscataway:IEEE Press , 2020 : 2267 - 2273 .

CHANG M F , LAMBERT J , SANGKLOY P , et al . Argoverse:3D tracking and forecasting with rich maps [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 8740 - 8749 .

SCALE H . PandaSet:public large-scale dataset for autonomous driving [R ] . 2019 .

HUANG X Y , CHENG X J , GENG Q C , et al . The ApolloScape dataset for autonomous driving [C ] // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Piscataway:IEEE Press , 2018 : 1067 - 10676 .

CAESAR H , BANKITI V , LANG A H , et al . nuScenes:a multimodal dataset for autonomous driving [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 11618 - 11628 .

SUN P , KRETZSCHMAR H , DOTIWALLA X , et al . Scalability in perception for autonomous driving:waymo open dataset [C ] // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 2443 - 2451 .

PATIL A , MALLA S , GANG H M , et al . The H3D dataset for full-surround 3D multi-object detection and tracking in crowded urban scenes [C ] // Proceedings of 2019 International Conference on Robotics and Automation . Piscataway:IEEE Press , 2019 : 9552 - 9557 .

GEYER J A2D2:AEV autonomous driving dataset [Z ] . 2019 .

RAHMAN M M , TAN Y H , XUE J , et al . Notice of violation of IEEE publication principles:recent advances in 3D object detection in the era of deep neural networks:a survey [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 2947 - 2962 .

DENG J , DONG W , SOCHER R , et al . ImageNet:a large-scale hierarchical image database [C ] // Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2009 : 248 - 255 .

LIU Y C , FAN B , XIANG S M , et al . Relation-shape convolutional neural network for point cloud analysis [C ] // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2020 : 8887 - 8896 .

CASEY A D , SON S F , BILIONIS I , et al . Prediction of energetic material properties from electronic structure using 3D convolutional neural networks [J ] . Journal of Chemical Information and Modeling , 2020 , 60 ( 10 ): 4457 - 4473 .

CASAS S , LUO W , URTASUN R . IntentNet:learning to predict intention from raw sensor data [J ] . arXiv preprint , 2021 ,arXiv:2101.07907.

SUN Y X , ZUO W X , LIU M . RTFNet:RGB-thermal fusion network for semantic segmentation of urban scenes [J ] . IEEE Robotics and Automation Letters , 2019 , 4 ( 3 ): 2576 - 2583 .

CHEN X , SHI S , ZHU B , et al . MPPNet:multi-frame feature intertwining with proxy points for 3D temporal object detection [J ] . arXiv preprint , 2022 ,arXiv:2205.05979.

XU J Y , MIAO Z W , ZHANG D , et al . INT:towards infiniteframes 3D detection with an efficient framework [J ] . arXiv preprint , 2022 ,arXiv:2209.15215.

浏览量

1439

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

Cerberus：基于深度学习的跨网站社交机器人检测系统

基于视觉的列车轨道缺陷检测综述

基于自适应平滑度策略的三维模型分类神经架构搜索

具身智能驾驶：概念、方法、现状与展望

动态环境下基于语义信息与几何约束的视觉SLAM系统