← bsozudogru.com

Convolutional Neural Networks and Extreme Learning Machines on Mobile Devices

Bachelor's Thesis · Technical University of Munich · May 2021

Comparing three machine learning approaches for image classification on a Raspberry Pi. The simplest method sometimes wins.


The Question

Neural networks typically train on powerful GPUs. But what about devices at the edge? IoT sensors, embedded systems, single-board computers. Training on-device eliminates cloud dependency, reduces latency, and keeps data private.

This thesis asks whether three fundamentally different ML architectures can train on a Raspberry Pi 4 (a $35 ARM-based computer with 2GB of RAM and no GPU) and compares their accuracy-vs-speed tradeoffs on two image classification benchmarks.


Three Approaches

CNN The Standard

Two convolutional layers with 5x5 filters extract spatial features, followed by max pooling, a fully connected layer with 128 neurons, and softmax output. The entire network is trained end-to-end through backpropagation: iterative gradient descent that adjusts all 105,634 parameters over multiple passes through the data.

28x28 Input Conv 1 Conv 2 Max Pool Flatten Dense 128 Softmax 10
ELM The Underdog

Extreme Learning Machines take a counterintuitive approach: the hidden layer weights are assigned randomly and never updated. Only the output weights are learned, not through iteration, but by solving a single linear equation (Moore-Penrose pseudoinverse). No backpropagation, no gradient descent, no training epochs. The entire "training" step is one matrix operation. Tested with both single-hidden-layer (up to 8,192 neurons) and two-hidden-layer variants.

784 Input Hidden (random weights) never updated Output (analytical solve) one matrix operation 10
Hybrid Best of Both?

Uses CNN convolutional layers for feature extraction (they excel at finding spatial patterns in images), then replaces the fully connected classification layers with an ELM. This combines CNN's learned features with ELM's fast analytical training, eliminating backpropagation from the classification stage.

Input Conv 1 Conv 2 Pool Flatten CNN ELM Analytical classify 10

The Setup

Platforms: A desktop computer (Intel i7, Ubuntu) as baseline. A Raspberry Pi 4 Model B with 2GB RAM, ARM Cortex-A72 at 1.5 GHz, no GPU. And the same Pi overclocked to 2.0 GHz with a heatsink and fan. To fit larger models in 2GB of RAM, a 4GB ZRAM swap partition was configured using LZ4 compression.

Datasets: MNIST (handwritten digits, 70K images, the classic benchmark) and Fashion MNIST (clothing items from Zalando, same dimensions, significantly harder). MNIST is nearly solved by modern methods; Fashion MNIST tests whether results hold on a more challenging task.

Tools: Python, TensorFlow 2.2, Keras, NumPy, SciPy, Scikit-learn. All training on CPU to ensure a fair comparison across platforms.


Results

On Fashion MNIST, the harder dataset, a single-layer ELM with 4,096 hidden neurons achieved 86.2% accuracy in 63 seconds. The CNN, after 4 training epochs, reached 85.7% in 77 seconds. The method with no backpropagation was both faster and more accurate.

Method Accuracy Training Time
CNN (2 epochs, 128 neurons) 85.08% 44s
CNN (4 epochs, 128 neurons) 85.72% 77s
ELM single-layer (4,096) 86.17% 63s
ELM single-layer (8,192) 87.23% 252s
Hybrid CNN + single-layer ELM 86.11% 95s
Hybrid CNN + two-layer ELM 84.63% 222s

Fashion MNIST, desktop platform, mean of 5 consecutive experiments.

Training time comparison, Fashion MNIST

CNN (2 ep)
44s · 85.1%
ELM (4,096)
63s · 86.2%
CNN (4 ep)
77s · 85.7%
Hybrid
95s · 86.1%
ELM (8,192)
252s · 87.2%

Sorted by training time. ELM 4,096 (highlighted) achieves the best accuracy-to-speed ratio.

On MNIST, the CNN held its accuracy advantage (97.9% vs 96.5% for ELM with 8,192 neurons), though ELM still trained faster for comparable configurations.

On the Raspberry Pi, training times increased roughly 3x across all methods, while accuracy remained within 1% of desktop results. Overclocking the CPU from 1.5 to 2.0 GHz reduced training times by approximately 17% with no effect on accuracy.


Key Findings

  1. Single-layer ELMs are a viable, faster alternative to CNNs for image classification, particularly on more complex datasets like Fashion MNIST.
  2. The absence of backpropagation makes ELMs especially attractive for resource-constrained devices where training time and power consumption matter.
  3. CNN, ELM, and hybrid all successfully train and classify on a Raspberry Pi. Edge ML is feasible without a GPU.
  4. Overclocking provides meaningful speedup (~17%) with no accuracy cost, and ZRAM enables larger models within tight memory constraints.
  5. Two-layer ELMs and the two-layer hybrid showed diminishing returns: higher complexity for worse accuracy-per-second performance.

CNN models benefit from highly optimized frameworks built by the TensorFlow team. ELMs, with comparatively little framework support, still competed on accuracy while training faster. With similar optimization investment, ELMs could yield even stronger results on edge devices.