LGApr 8

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

Ning Yang, Chuangxin Cheng, Haijun Zhang

arXiv:2604.0714846.9

AI Analysis

This work addresses the problem of dynamic task offloading for resource-constrained mobile devices in MEC, offering a novel solution that improves upon existing methods, though it is incremental in combining LLMs with simulation mechanisms.

The paper tackles the challenge of designing effective task offloading policies in Mobile Edge Computing (MEC) by proposing COMLLM, a generative framework that integrates Group Relative Policy Optimization with a Look-Ahead Collaborative Simulation mechanism to enable foresighted decision-making, achieving near-optimal latency and improved load-balancing fairness while demonstrating zero-shot topological scalability to larger networks without retraining.

Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.

View on arXiv PDF

Similar