Point Cloud Registration using Graph Neural Networks

Research Project · Technical University of Munich · 2022

A six-month survey of how graph neural networks are reshaping point cloud registration for autonomous vehicle localization, mapping the landscape from classical ICP to modern deep learning approaches across 50+ papers and 7 benchmark datasets.

The Question

Self-driving cars need to know exactly where they are. GPS alone is not precise enough. The standard approach: take a 3D LiDAR scan of the surroundings and align it against a pre-built HD map. This alignment is point cloud registration, a fundamental problem in computer vision: finding the rigid transformation (rotation and translation) that maps one 3D point cloud onto another.

Classical methods like Iterative Closest Point (ICP) work, but have a fundamental weakness: they get stuck in local minima and require a good initial alignment. When GPS provides that starting guess, and the guess is off, ICP fails. The question becomes whether graph neural networks offer a better path. By modeling point clouds as graphs and learning both vertex features and edge relationships, GNNs can capture geometric structure that purely point-based or iterative methods miss.

The Approach

Literature Review and Taxonomy The Foundation

Reviewed 50+ papers to build a comprehensive taxonomy of point cloud registration methods. The field splits into two broad categories: same-source registration (both point clouds from the same sensor type) and cross-source registration (different sensors or modalities). Same-source methods further divide into optimization-based approaches (ICP, graph matching, Gaussian mixture models, semi-definite programming), feature-learning methods (volumetric and point cloud descriptors), and end-to-end learning (regression and neural-network-guided optimization).

Key methods analyzed in depth: ICP and its variants, Coherent Point Drift (CPD), graph matching approaches (RGM), Gaussian mixture models (DeepGMR), Deep Global Registration (DGR), and correspondence-less methods (DeepCLR). Each represents a distinct philosophy for solving the alignment problem, from iterative point matching to probabilistic distribution alignment to learned geometric descriptors.

Vehicle Localization and Benchmarks The Application

The second phase narrowed focus to how point cloud registration applies to vehicle localization in autonomous driving. The driving pipeline has distinct stages: sensing, perception, prediction, planning, and control, with map creation and localization running as parallel components. For localization specifically, the available sensor modalities include cameras, IMUs (prone to drift over time), GNSS (requires clear sky, limited in urban canyons), and LiDAR (most precise for 3D spatial data). Practical systems combine multiple sources: visual-inertial, visual-laser, and radar-inertial fusion.

Analyzed seven major autonomous driving benchmark datasets across their sensor coverage, annotation quality, and adoption. Cityscapes leads in total citations (7,840+), but KITTI remains the standard benchmark for point cloud registration and odometry tasks. Across the datasets surveyed, 79% include point cloud maps and 71% include odometry information, confirming that LiDAR-based 3D data is central to the field. The top-performing methods on the KITTI odometry leaderboard (MULLS, TVL-SLAM+, CT-ICP) all operate on Velodyne LiDAR data.

ROS Implementation Hands-on Robotics

The final phase involved working with the Robot Operating System (ROS) in the context of motion planning. Learned ROS from scratch to gain practical experience with the middleware that connects perception, localization, and planning in real robotic systems. This grounded the theoretical survey in the engineering realities of deploying point cloud registration on actual autonomous platforms.

Results

Localization methods comparison

Method	Approach	Key Limitation
ICP	Iterative closest point matching	Local minima, needs good initialization
CPD	Probabilistic point drift	High computational cost
Graph Matching	Vertex and edge correspondences	Optimization complexity
DeepGMR	Gaussian mixture alignment	Fixed number of Gaussians
DGR	Learned geometric descriptors	Training data dependency
DeepCLR	Correspondence-less end-to-end	Architecture complexity

Methods span the spectrum from classical optimization to fully learned approaches. ICP (highlighted) remains widely used despite its initialization sensitivity.

Benchmark dataset citations

Cityscapes

7,840

KITTI

~5,500

nuScenes

~2,200

Oxford RobotCar

~1,200

Vistas

~800

Berkeley DD

~500

Argoverse

~400

Citation counts at time of survey. Cityscapes (highlighted) leads overall, but KITTI is the standard benchmark for point cloud registration and odometry.

Dataset feature coverage

Feature	Datasets With	Datasets Without
Point Cloud Maps	79%	21%
Odometry	71%	29%

Across seven major autonomous driving datasets. The high prevalence of point cloud data confirms LiDAR as the primary modality for 3D localization research.

Autonomous driving localization pipeline

Key Findings

Classical ICP methods remain widely used but are fundamentally limited by sensitivity to initial alignment, precisely the scenario where GPS-based initialization is imprecise and urban environments create signal occlusion.
Graph neural networks capture both vertex features and edge relationships in point clouds, offering structural understanding that purely point-based methods miss. This graph-level reasoning is well-suited to the inherently sparse, irregular geometry of LiDAR scans.
Feature-learning approaches (DGR, DeepCLR) represent the most active research direction, extracting local geometric descriptors as correspondences before estimating rigid transformations. These methods decouple feature extraction from pose estimation, improving modularity and generalization.
GMM-based methods like DeepGMR provide noise and outlier robustness by aligning probability distributions rather than individual points, an important property for real-world LiDAR data with occlusions and varying point densities.
KITTI remains the dominant benchmark for vehicle localization, but Cityscapes leads in total citations (7,840+). Across the seven major datasets surveyed, 79% include point cloud maps, confirming LiDAR as the central modality for 3D localization research.
The best-performing methods on the KITTI odometry leaderboard (MULLS, TVL-SLAM+, CT-ICP) all use Velodyne LiDAR data, reinforcing that precise 3D localization in autonomous driving is currently a LiDAR-first problem.