Vehicle Classification under Extreme Imbalance: A Comparative Study of Ensemble Learning and CNNs
This work addresses performance issues in intelligent transportation systems due to class imbalance, but it is incremental as it primarily benchmarks existing methods on a new dataset.
The study tackled vehicle classification under severe class imbalance by comparing ensemble learning methods and CNNs on a curated 16-class dataset, finding that a CNN achieved up to 81.25% accuracy on unseen data, though rare classes like Barge remained problematic.
Accurate vehicle type recognition underpins intelligent transportation and logistics, but severe class imbalance in public datasets suppresses performance on rare categories. We curate a 16-class corpus (~47k images) by merging Kaggle, ImageNet, and web-crawled data, and create six balanced variants via SMOTE oversampling and targeted undersampling. Lightweight ensembles, such as Random Forest, AdaBoost, and a soft-voting combiner built on MobileNet-V2 features are benchmarked against a configurable ResNet-style CNN trained with strong augmentation and label smoothing. The best ensemble (SMOTE-combined) attains 74.8% test accuracy, while the CNN achieves 79.19% on the full test set and 81.25% on an unseen inference batch, confirming the advantage of deep models. Nonetheless, the most under-represented class (Barge) remains a failure mode, highlighting the limits of rebalancing alone. Results suggest prioritizing additional minority-class collection and cost-sensitive objectives (e.g., focal loss) and exploring hybrid ensemble or CNN pipelines to combine interpretability with representational power.