← bsozudogru.com

Private 5G Networks for Real-Time Object Detection in Autonomous Systems

Master's Thesis · Technical University of Munich · July 2024

Benchmarking private 5G against Wi-Fi for real-time object detection on an autonomous forklift. 63,780 data points later, the faster network wasn't the more reliable one.


The Question

Autonomous systems in industrial settings need real-time video and object detection. They currently rely on Wi-Fi, but private 5G promises dedicated bandwidth, lower interference, and industrial-grade reliability. The question: does 5G actually deliver for real-time computer vision workloads?

This thesis benchmarks a private 5G network (Ericsson infrastructure at Telefonica Germany) against enterprise Wi-Fi for live video streaming with object detection, tested on an autonomous forklift in TUM's industrial testing hall. The results were presented to Telefonica's supervisory board.


The System

Streaming Pipeline

A camera on the forklift captures live video, streams it via RTMP to a media server (Oven Media Engine), which converts it to WebRTC for sub-second delivery. The browser-based interface receives the stream and runs object detection locally, eliminating a server round-trip. Five detection models were tested: COCO-SSD, YOLOv5s, YOLOv8s, YOLOv8x, and YOLOv9c.

S21 Camera RTMP 5G / Wi-Fi Media Server (OME) WebRTC Browser UI Object Detection runs in browser, no round-trip
Two Phases Benchmark + Live Demo

Phase one: a controlled benchmark. 232 videos streamed through both networks, processed by all five models, producing 63,780 data points over 63 hours and 57 minutes. Every frame measured for latency, jitter, throughput, packet loss, and detection accuracy.

Phase two: a real-world demonstration. An autonomous forklift with a 360-degree camera at TUM's testing hall, remotely operated from the O2 Tower (Telefonica HQ) using a VR headset and Xbox controller, with live object detection over private 5G.


The Benchmark

Scale: 232 videos from the YouTube-BoundingBoxes dataset, combined into 6 hours and 23 minutes of 1080p footage. Each video streamed through both 5G and Wi-Fi, analyzed by five detection models. 31,920 Wi-Fi data points and 31,860 5G data points across 21 object classes.

Infrastructure: Ericsson private 5G (Radio Dot 4479, IRU 8846, Router 6675) at two locations: the O2 Tower and TUM's testing hall. Samsung Galaxy S21 as streaming source. Metrics captured per frame: jitter, latency, throughput, packet loss, IoU, confidence score, and inference time.

Models: COCO-SSD, YOLOv5s, YOLOv8s, YOLOv8x, and YOLOv9c, ranging from lightweight to state-of-the-art. All running in-browser via TensorFlow.js and ONNX Runtime Web.


Results

Both networks achieved statistically identical detection accuracy: mean IoU of 0.646 (Wi-Fi) vs 0.645 (5G), with no significant difference (p > 0.05). Zero packet loss on both networks across the entire 63-hour test. The difference wasn't in accuracy. It was in consistency.

Metric Wi-Fi 5G
Mean Latency 3.6 ms 13.5 ms
Max Latency 160.5 ms 29.5 ms
Mean Jitter 9.0 ms 15.7 ms
Max Jitter 53.0 ms 27.0 ms
Peak Throughput 12.9 Mbps 23.5 Mbps
Packet Loss 0.0% 0.0%
Mean IoU 0.646 0.645
Mean Inference Time 31.7 ms 31.7 ms

Network comparison across 63,780 data points. Highlighted rows show where 5G outperforms.

Latency consistency

Wi-Fi
avg 3.6 ms, max 160.5 ms
5G
avg 13.5 ms, max 29.5 ms

Shaded region shows the full latency range. Vertical line marks the average. Wi-Fi averages lower but spikes to 160 ms. 5G stays within a narrow 22 ms window.

Detection model performance (both networks combined)

YOLOv5s
F1: 0.858
YOLOv8s
F1: 0.857
COCO-SSD
F1: 0.856
YOLOv9c
F1: 0.852
YOLOv8x
F1: 0.849

Sorted by F1 score. YOLOv5s (highlighted), the simplest YOLO variant, outperformed the larger YOLOv8x and newer YOLOv9c.


Key Findings

  1. Private 5G and Wi-Fi achieve statistically identical detection accuracy, but 5G is far more consistent: its latency stays within a 22 ms range while Wi-Fi spikes to over 160 ms.
  2. For autonomous systems, predictability matters more than averages. A forklift can handle a steady 13 ms delay; it cannot handle unpredictable 160 ms spikes.
  3. 5G delivers nearly twice the peak throughput (23.5 vs 12.9 Mbps), making it better suited for data-intensive workloads with unpredictable bandwidth demands.
  4. Both networks achieved zero packet loss across 63+ hours of continuous streaming, confirming that modern wireless infrastructure is reliable at the transport layer.
  5. YOLOv5s, the simplest YOLO variant tested, outperformed all others including the 5x larger YOLOv8x. In real-time streaming, simpler models win because they maintain higher frame rates and avoid inference bottlenecks.
  6. Running object detection in the browser eliminates a server round-trip, reducing end-to-end latency. This architectural choice matters more than the network choice for overall system responsiveness.

Throughput has a significant positive effect on detection accuracy (p = 0.0025), and latency has a significant negative effect on confidence scores (p = 0.0005). But neither network's latency or jitter alone significantly affects IoU. The detection models are more resilient to network conditions than expected.