Thiwanka Pathirana

8.1LGMay 20

Memory-Efficient Partitioned DNN Inference on Resource-Constrained Android Crowds

Lakshani Manamperi, Disumi Pathirana, Thiwanka Pathirana et al.

Deploying large deep neural networks on memory-constrained mobile devices is a central challenge in edge ML. While compression, pruning, and quantization reduce per-parameter cost, transformer-based models remain too large for the 3.3-7.4 GB RAM envelope of commodity Android handsets. We present the DNN pipeline scheduling subsystem of CROWDio, which achieves practical ONNX inference across resource-constrained Android workers without model modification, by distributing memory pressure across devices via five mechanisms: JIT deferred partition loading, a single-partition-resident constraint, a 4-tier affinity scheduler, a zlib-compressed tensor transport, and a streaming 1:1 dependency model. Evaluated on DistilBERT (Sanh et al., 2019) (approximately 67 M parameters, SST-2) across five Android handsets over ten runs, our system holds peak per-device RSS to 43+-2 MB and limits battery draw to 50+-3 mAh per run, while streaming concurrency cuts batch latency 34% below barrier synchronisation.

0.7DCApr 21

CROWDio: A Practical Mobile Crowd Computing Framework with Developer-Oriented Design, Adaptive Scheduling, and Fault Resilience

Lakshani Manamperi, Disumi Pathirana, Thiwanka Pathirana et al.

Mobile Crowd Computing (MCdC) leverages the idle computational capacity of consumer smartphones to enable distributed task processing at scale; however, widespread real-world adoption remains constrained by the absence of developer-oriented frameworks capable of transparently managing device heterogeneity, fault tolerance, and connectivity volatility. This paper introduces CROWDio, a centralized MCdC platform comprising three tightly integrated subsystems: (i) a declarative SDK that abstracts distributed execution to a single function annotation, eliminating the need for explicit parallelism management; (ii) a tiered checkpointing mechanism that enables fault-tolerant task resumption under the memory and execution constraints inherent to mobile runtimes; and (iii) a pluggable multi-criteria scheduling framework driven by continuous live device telemetry, supporting interchangeable decision strategies without modification to the dispatch core. Empirical evaluation across six heterogeneous Android devices spanning CPU-bound, AI/NLP inference, and data-parallel workloads demonstrates that capability-aware adaptive scheduling reduces total execution time by up to 56.9% relative to naive round-robin dispatch, while the checkpointing subsystem incurs a bounded overhead of only 2-3 s per task regardless of checkpoint frequency. A system-wide Jain's Fairness Index of 0.889 confirms equitable and stable workload distribution across heterogeneous worker devices.

Thiwanka Pathirana

2 Papers