Steve Liu

DC
h-index17
8papers
65citations
Novelty48%
AI Score41

8 Papers

AIJun 23, 2023
CeBed: A Benchmark for Deep Data-Driven OFDM Channel Estimation

Amal Feriani, Di Wu, Steve Liu et al.

Deep learning has been extensively used in wireless communication problems, including channel estimation. Although several data-driven approaches exist, a fair and realistic comparison between them is difficult due to inconsistencies in the experimental conditions and the lack of a standardized experimental design. In addition, the performance of data-driven approaches is often compared based on empirical analysis. The lack of reproducibility and availability of standardized evaluation tools (e.g., datasets, codebases) hinder the development and progress of data-driven methods for channel estimation and wireless communication in general. In this work, we introduce an initiative to build benchmarks that unify several data-driven OFDM channel estimation approaches. Specifically, we present CeBed (a testbed for channel estimation) including different datasets covering various systems models and propagation conditions along with the implementation of ten deep and traditional baselines. This benchmark considers different practical aspects such as the robustness of the data-driven models, the number and the arrangement of pilots, and the number of receive antennas. This work offers a comprehensive and unified framework to help researchers evaluate and design data-driven channel estimation algorithms.

NIMar 22, 2023
Communication Load Balancing via Efficient Inverse Reinforcement Learning

Abhisek Konar, Di Wu, Yi Tian Xu et al.

Communication load balancing aims to balance the load between different available resources, and thus improve the quality of service for network systems. After formulating the load balancing (LB) as a Markov decision process problem, reinforcement learning (RL) has recently proven effective in addressing the LB problem. To leverage the benefits of classical RL for load balancing, however, we need an explicit reward definition. Engineering this reward function is challenging, because it involves the need for expert knowledge and there lacks a general consensus on the form of an optimal reward function. In this work, we tackle the communication load balancing problem from an inverse reinforcement learning (IRL) approach. To the best of our knowledge, this is the first time IRL has been successfully applied in the field of communication load balancing. Specifically, first, we infer a reward function from a set of demonstrations, and then learn a reinforcement learning load balancing policy with the inferred reward function. Compared to classical RL-based solution, the proposed solution can be more general and more suitable for real-world scenarios. Experimental evaluations implemented on different simulated traffic scenarios have shown our method to be effective and better than other baselines by a considerable margin.

AINov 1, 2023
SAGE: Smart home Agent with Grounded Execution

Dmitriy Rivkin, Francois Hogan, Amal Feriani et al.

The common sense reasoning abilities and vast general knowledge of Large Language Models (LLMs) make them a natural fit for interpreting user requests in a Smart Home assistant context. LLMs, however, lack specific knowledge about the user and their home limit their potential impact. SAGE (Smart Home Agent with Grounded Execution), overcomes these and other limitations by using a scheme in which a user request triggers an LLM-controlled sequence of discrete actions. These actions can be used to retrieve information, interact with the user, or manipulate device states. SAGE controls this process through a dynamically constructed tree of LLM prompts, which help it decide which action to take next, whether an action was successful, and when to terminate the process. The SAGE action set augments an LLM's capabilities to support some of the most critical requirements for a Smart Home assistant. These include: flexible and scalable user preference management ("is my team playing tonight?"), access to any smart device's full functionality without device-specific code via API reading "turn down the screen brightness on my dryer", persistent device state monitoring ("remind me to throw out the milk when I open the fridge"), natural device references using only a photo of the room ("turn on the light on the dresser"), and more. We introduce a benchmark of 50 new and challenging smart home tasks where SAGE achieves a 75% success rate, significantly outperforming existing LLM-enabled baselines (30% success rate).

DCApr 7, 2025Code
Prima.cpp: Fast 30-70B LLM Inference on Heterogeneous and Low-Resource Home Clusters

Zonghang Li, Tao Li, Wenjiao Feng et al.

On-device inference offers privacy, offline use, and instant response, but consumer hardware restricts large language models (LLMs) to low throughput and capability. To overcome this challenge, we present prima.cpp, a distributed on-device inference system that runs 30-70B LLMs on consumer home clusters with mixed CPUs/GPUs, insufficient RAM/VRAM, slow disks, Wi-Fi links, and heterogeneous OSs. We introduce pipelined-ring parallelism (PRP) to overlap disk I/O with compute and communication, and address the prefetch-release conflict in mmap-based offloading. We further propose Halda, a heterogeneity-aware scheduler that co-optimizes per-device CPU/GPU workloads and device selection under RAM/VRAM constraints. On four consumer home devices, a 70B model reaches 674 ms/token TPOT with <6% memory pressure, and a 32B model with speculative decoding achieves 26 tokens/s. Compared with llama.cpp, exo, and dllama, our proposed prima.cpp achieves 5-17x lower TPOT, supports fine-grained model sizes from 8B to 70B, ensures broader cross-OS and quantization compatibility, and remains OOM-free, while also being Wi-Fi tolerant, privacy-preserving, and hardware-independent. The code is available at https://gitee.com/zonghang-li/prima.cpp.

DCApr 29
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training

Akhmed Sakip, Erland Hilman Fuadi, Omar Sayedelahl et al.

Training large language models requires jointly configuring two interdependent aspects of the system: the global batch size, which governs statistical efficiency, and the 3D parallelism strategy, which governs hardware throughput. Existing approaches make these decisions independently: optimization work adapts the batch size to track the evolving critical batch size while keeping parallelism fixed, and systems work selects the fastest parallelism for a given fixed batch size without anticipating that the optimal batch size could change. We show that these decisions are tightly coupled: the throughput-optimal parallelism strategy may shift as the global batch size changes, so any method that fixes one while adapting the other operates with a suboptimal configuration for part of the training run. We present COPUS, a system that adaptively tunes the global batch size, parallelism strategy, and micro-batch size as training evolves. COPUS is guided by Goodput, the product of throughput and statistical efficiency, which models both hardware and statistical effects jointly and directly measures useful convergence per unit of wall-clock time. The system combines online gradient noise scale estimation under 3D parallelism with throughput-aware evaluation of candidate configurations, and supports efficient reconfiguration of both batch size and parallelism during training. We evaluate COPUS on LLM pre-training workloads across 1-4 nodes of 8xH100 and 8xMI210 GPUs and model sizes from 3B to 32B parameters, demonstrating average time-to-convergence speedups of 3.9-8.0% over the fastest baseline across four configurations, with peak gains up to 11.1%, including system overheads.

SPDec 26, 2023
Device-Free Human State Estimation using UWB Multi-Static Radios

Saria Al Laham, Bobak H. Baghi, Pierre-Yves Lajoie et al.

We present a human state estimation framework that allows us to estimate the location, and even the activities, of people in an indoor environment without the requirement that they carry a specific devices with them. To achieve this "device free" localization we use a small number of low-cost Ultra-Wide Band (UWB) sensors distributed across the environment of interest. To achieve high quality estimation from the UWB signals merely reflected of people in the environment, we exploit a deep network that can learn to make inferences. The hardware setup consists of commercial off-the-shelf (COTS) single antenna UWB modules for sensing, paired with Raspberry PI units for computational processing and data transfer. We make use of the channel impulse response (CIR) measurements from the UWB sensors to estimate the human state - comprised of location and activity - in a given area. Additionally, we can also estimate the number of humans that occupy this region of interest. In our approach, first, we pre-process the CIR data which involves meticulous aggregation of measurements and extraction of key statistics. Afterwards, we leverage a convolutional deep neural network to map the CIRs into precise location estimates with sub-30 cm accuracy. Similarly, we achieve accurate human activity recognition and occupancy counting results. We show that we can quickly fine-tune our model for new out-of-distribution users, a process that requires only a few minutes of data and a few epochs of training. Our results show that UWB is a promising solution for adaptable smart-home localization and activity recognition problems.

DCMay 19, 2025
Learning In Chaos: Efficient Autoscaling and Self-Healing for Multi-Party Distributed Training

Wenjiao Feng, Rongxing Xiao, Zonghang Li et al.

Node and link churn in multi-party, cross-region clusters over wide-area networks (WANs) often disrupts distributed training. However, checkpoint-based recovery and cloud-centric autoscaling react slowly and assume centralized control, which is misaligned with the self-governed setup where institutions can freely join and leave. This paper proposes Chaos, a multi-party distributed training system with self-healing and autoscaling, enabling robust and elastic training under churn. It speeds up autoscaling via multi-neighbor state replication and model sharding. We formalize the sharding and assignment as a MINLP that captures WAN heterogeneity, and reduce it to a tractable MILP by analyzing its monotonicity on a divisibility chain. By establishing an equivalence, we derive a greedy algorithm that follows optimality rules and yields the optimal solution in polynomial time. Chaos uses a cluster monitor to track resource and topology changes, and handles scaling events through peer negotiation protocols, enabling fully self-governed autoscaling among institutions. Experiments show that Chaos has substantially lower scale-out delay than Pollux, Elan, and Autoscaling, and handles scale-in, connect-link, and disconnect-link events within 20ms. It also delivers the lowest idle time, showing superior resource use and scalability as the cluster grows.

CLJan 16, 2024
Hallucination Detection and Hallucination Mitigation: An Investigation

Junliang Luo, Tianyu Li, Di Wu et al.

Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a comprehensive review of the current literature on both hallucination detection and hallucination mitigation. We hope that this report can serve as a good reference for both engineers and researchers who are interested in LLMs and applying them to real world tasks.