Convolutional Neural Networks and Extreme Learning Machines on Mobile Devices
Comparing three machine learning approaches for image classification on a Raspberry Pi. The simplest method sometimes wins.
The Question
Neural networks typically train on powerful GPUs. But what about devices at the edge? IoT sensors, embedded systems, single-board computers. Training on-device eliminates cloud dependency, reduces latency, and keeps data private.
This thesis asks whether three fundamentally different ML architectures can train on a Raspberry Pi 4 (a $35 ARM-based computer with 2GB of RAM and no GPU) and compares their accuracy-vs-speed tradeoffs on two image classification benchmarks.
Three Approaches
Two convolutional layers with 5x5 filters extract spatial features, followed by max pooling, a fully connected layer with 128 neurons, and softmax output. The entire network is trained end-to-end through backpropagation: iterative gradient descent that adjusts all 105,634 parameters over multiple passes through the data.
Extreme Learning Machines take a counterintuitive approach: the hidden layer weights are assigned randomly and never updated. Only the output weights are learned, not through iteration, but by solving a single linear equation (Moore-Penrose pseudoinverse). No backpropagation, no gradient descent, no training epochs. The entire "training" step is one matrix operation. Tested with both single-hidden-layer (up to 8,192 neurons) and two-hidden-layer variants.
Uses CNN convolutional layers for feature extraction (they excel at finding spatial patterns in images), then replaces the fully connected classification layers with an ELM. This combines CNN's learned features with ELM's fast analytical training, eliminating backpropagation from the classification stage.
The Setup
Platforms: A desktop computer (Intel i7, Ubuntu) as baseline. A Raspberry Pi 4 Model B with 2GB RAM, ARM Cortex-A72 at 1.5 GHz, no GPU. And the same Pi overclocked to 2.0 GHz with a heatsink and fan. To fit larger models in 2GB of RAM, a 4GB ZRAM swap partition was configured using LZ4 compression.
Datasets: MNIST (handwritten digits, 70K images, the classic benchmark) and Fashion MNIST (clothing items from Zalando, same dimensions, significantly harder). MNIST is nearly solved by modern methods; Fashion MNIST tests whether results hold on a more challenging task.
Tools: Python, TensorFlow 2.2, Keras, NumPy, SciPy, Scikit-learn. All training on CPU to ensure a fair comparison across platforms.
Results
On Fashion MNIST, the harder dataset, a single-layer ELM with 4,096 hidden neurons achieved 86.2% accuracy in 63 seconds. The CNN, after 4 training epochs, reached 85.7% in 77 seconds. The method with no backpropagation was both faster and more accurate.
| Method | Accuracy | Training Time |
|---|---|---|
| CNN (2 epochs, 128 neurons) | 85.08% | 44s |
| CNN (4 epochs, 128 neurons) | 85.72% | 77s |
| ELM single-layer (4,096) | 86.17% | 63s |
| ELM single-layer (8,192) | 87.23% | 252s |
| Hybrid CNN + single-layer ELM | 86.11% | 95s |
| Hybrid CNN + two-layer ELM | 84.63% | 222s |
Fashion MNIST, desktop platform, mean of 5 consecutive experiments.
Training time comparison, Fashion MNIST
On MNIST, the CNN held its accuracy advantage (97.9% vs 96.5% for ELM with 8,192 neurons), though ELM still trained faster for comparable configurations.
On the Raspberry Pi, training times increased roughly 3x across all methods, while accuracy remained within 1% of desktop results. Overclocking the CPU from 1.5 to 2.0 GHz reduced training times by approximately 17% with no effect on accuracy.
Key Findings
- Single-layer ELMs are a viable, faster alternative to CNNs for image classification, particularly on more complex datasets like Fashion MNIST.
- The absence of backpropagation makes ELMs especially attractive for resource-constrained devices where training time and power consumption matter.
- CNN, ELM, and hybrid all successfully train and classify on a Raspberry Pi. Edge ML is feasible without a GPU.
- Overclocking provides meaningful speedup (~17%) with no accuracy cost, and ZRAM enables larger models within tight memory constraints.
- Two-layer ELMs and the two-layer hybrid showed diminishing returns: higher complexity for worse accuracy-per-second performance.
CNN models benefit from highly optimized frameworks built by the TensorFlow team. ELMs, with comparatively little framework support, still competed on accuracy while training faster. With similar optimization investment, ELMs could yield even stronger results on edge devices.