DCJun 3, 2022
Multi-user Co-inference with Batch Processing Capable Edge ServerWenqi Shi, Sheng Zhou, Zhisheng Niu et al.
Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload inference tasks to an edge server with GPU. The inference task is partitioned into sub-tasks for a finer granularity of offloading and scheduling, and the user energy consumption minimization problem under inference latency constraints is investigated. To deal with the coupled offloading and scheduling introduced by concurrent batch processing, we first consider an offline problem with a constant edge inference latency and the same latency constraint. It is proven that optimizing the offloading policy of each user independently and aggregating all the same sub-tasks in one batch is optimal, and thus the independent partitioning and same sub-task aggregating (IP-SSA) algorithm is inspired. Further, the optimal grouping (OG) algorithm is proposed to optimally group tasks when the latency constraints are different. Finally, when future task arrivals cannot be precisely predicted, a deep deterministic policy gradient (DDPG) agent is trained to call OG. Experiments show that IP-SSA reduces up to 94.9\% user energy consumption in the offline setting, while DDPG-OG outperforms DDPG-IP-SSA by up to 8.92\% in the online setting.
CVFeb 26
MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive AbstractionYizhi Li, Xiaohan Chen, Miao Jiang et al.
With the explosive growth of digital entertainment, automated video summarization has become indispensable for applications such as content indexing, personalized recommendation, and efficient media archiving. Automatic synopsis generation for long-form videos, such as movies and TV series, presents a significant challenge for existing Vision-Language Models (VLMs). While proficient at single-image captioning, these general-purpose models often exhibit critical failures in long-duration contexts, primarily a lack of ID-consistent character identification and a fractured narrative coherence. To overcome these limitations, we propose MovieTeller, a novel framework for generating movie synopses via tool-augmented progressive abstraction. Our core contribution is a training-free, tool-augmented, fact-grounded generation process. Instead of requiring costly model fine-tuning, our framework directly leverages off-the-shelf models in a plug-and-play manner. We first invoke a specialized face recognition model as an external "tool" to establish Factual Groundings--precise character identities and their corresponding bounding boxes. These groundings are then injected into the prompt to steer the VLM's reasoning, ensuring the generated scene descriptions are anchored to verifiable facts. Furthermore, our progressive abstraction pipeline decomposes the summarization of a full-length movie into a multi-stage process, effectively mitigating the context length limitations of current VLMs. Experiments demonstrate that our approach yields significant improvements in factual accuracy, character consistency, and overall narrative coherence compared to end-to-end baselines.
AIAug 18, 2025
GridCodex: A RAG-Driven AI Framework for Power Grid Code Reasoning and ComplianceJinquan Shi, Yingying Cheng, Fan Zhang et al.
The global shift towards renewable energy presents unprecedented challenges for the electricity industry, making regulatory reasoning and compliance increasingly vital. Grid codes, the regulations governing grid operations, are complex and often lack automated interpretation solutions, which hinders industry expansion and undermines profitability for electricity companies. We introduce GridCodex, an end to end framework for grid code reasoning and compliance that leverages large language models and retrieval-augmented generation (RAG). Our framework advances conventional RAG workflows through multi stage query refinement and enhanced retrieval with RAPTOR. We validate the effectiveness of GridCodex with comprehensive benchmarks, including automated answer assessment across multiple dimensions and regulatory agencies. Experimental results showcase a 26.4% improvement in answer quality and more than a 10 fold increase in recall rate. An ablation study further examines the impact of base model selection.
CVSep 7, 2021
Hierarchical Graph Convolutional Skeleton Transformer for Action RecognitionRuwen Bai, Min Li, Bo Meng et al.
Graph convolutional networks (GCNs) have emerged as dominant methods for skeleton-based action recognition. However, they still suffer from two problems, namely, neighborhood constraints and entangled spatiotemporal feature representations. Most studies have focused on improving the design of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a disentangled spatiotemporal transformer (DSTT) block to overcome the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition;(ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary advantages of GCN (i.e., local topology, temporal dynamics and hierarchy) and Transformer (i.e., global context and dynamic attention). HGCT is lightweight and computationally efficient. Quantitative analysis demonstrates the superiority and good interpretability of HGCT.
ITJul 14, 2020
Joint Device Scheduling and Resource Allocation for Latency Constrained Wireless Federated LearningWenqi Shi, Sheng Zhou, Zhisheng Niu et al.
In federated learning (FL), devices contribute to the global training by uploading their local model updates via wireless channels. Due to limited computation and communication resources, device scheduling is crucial to the convergence rate of FL. In this paper, we propose a joint device scheduling and resource allocation policy to maximize the model accuracy within a given total training time budget for latency constrained wireless FL. A lower bound on the reciprocal of the training performance loss, in terms of the number of training rounds and the number of scheduled devices per round, is derived. Based on the bound, the accuracy maximization problem is solved by decoupling it into two sub-problems. First, given the scheduled devices, the optimal bandwidth allocation suggests allocating more bandwidth to the devices with worse channel conditions or weaker computation capabilities. Then, a greedy device scheduling algorithm is introduced, which in each step selects the device consuming the least updating time obtained by the optimal bandwidth allocation, until the lower bound begins to increase, meaning that scheduling more devices will degrade the model accuracy. Experiments show that the proposed policy outperforms state-of-the-art scheduling policies under extensive settings of data distributions and cell radius.
LGJun 1, 2019
Achieving Fairness in Determining Medicaid Eligibility through Fairgroup ConstructionBoli Fang, Miao Jiang, Jerry Shen
Effective complements to human judgment, artificial intelligence techniques have started to aid human decisions in complicated social problems across the world. In the context of United States for instance, automated ML/DL classification models offer complements to human decisions in determining Medicaid eligibility. However, given the limitations in ML/DL model design, these algorithms may fail to leverage various factors for decision making, resulting in improper decisions that allocate resources to individuals who may not be in the most need. In view of such an issue, we propose in this paper the method of \textit{fairgroup construction}, based on the legal doctrine of \textit{disparate impact}, to improve the fairness of regressive classifiers. Experiments on American Community Survey dataset demonstrate that our method could be easily adapted to a variety of regressive classification models to boost their fairness in deciding Medicaid Eligibility, while maintaining high levels of classification accuracy.