MAJun 20, 2022Code
From Multi-agent to Multi-robot: A Scalable Training and Evaluation Platform for Multi-robot Reinforcement LearningZhiuxan Liang, Jiannong Cao, Shan Jiang et al.
Multi-agent reinforcement learning (MARL) has been gaining extensive attention from academia and industries in the past few decades. One of the fundamental problems in MARL is how to evaluate different approaches comprehensively. Most existing MARL methods are evaluated in either video games or simplistic simulated scenarios. It remains unknown how these methods perform in real-world scenarios, especially multi-robot systems. This paper introduces a scalable emulation platform for multi-robot reinforcement learning (MRRL) called SMART to meet this need. Precisely, SMART consists of two components: 1) a simulation environment that provides a variety of complex interaction scenarios for training and 2) a real-world multi-robot system for realistic performance evaluation. Besides, SMART offers agent-environment APIs that are plug-and-play for algorithm implementation. To illustrate the practicality of our platform, we conduct a case study on the cooperative driving lane change scenario. Building off the case study, we summarize several unique challenges of MRRL, which are rarely considered previously. Finally, we open-source the simulation environments, associated benchmark tasks, and state-of-the-art baselines to encourage and empower MRRL research.
LGJul 22, 2022
AdaptCL: Adaptive Continual Learning for Tackling Heterogeneity in Sequential DatasetsYuqing Zhao, Divya Saxena, Jiannong Cao
Managing heterogeneous datasets that vary in complexity, size, and similarity in continual learning presents a significant challenge. Task-agnostic continual learning is necessary to address this challenge, as datasets with varying similarity pose difficulties in distinguishing task boundaries. Conventional task-agnostic continual learning practices typically rely on rehearsal or regularization techniques. However, rehearsal methods may struggle with varying dataset sizes and regulating the importance of old and new data due to rigid buffer sizes. Meanwhile, regularization methods apply generic constraints to promote generalization but can hinder performance when dealing with dissimilar datasets lacking shared features, necessitating a more adaptive approach. In this paper, we propose AdaptCL, a novel adaptive continual learning method to tackle heterogeneity in sequential datasets. AdaptCL employs fine-grained data-driven pruning to adapt to variations in data complexity and dataset size. It also utilizes task-agnostic parameter isolation to mitigate the impact of varying degrees of catastrophic forgetting caused by differences in data similarity. Through a two-pronged case study approach, we evaluate AdaptCL on both datasets of MNIST Variants and DomainNet, as well as datasets from different domains. The latter include both large-scale, diverse binary-class datasets and few-shot, multi-class datasets. Across all these scenarios, AdaptCL consistently exhibits robust performance, demonstrating its flexibility and general applicability in handling heterogeneous datasets.
MAJun 25, 2022
Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent CooperationZhixuan Liang, Jiannong Cao, Shan Jiang et al.
Many real-world applications can be formulated as multi-agent cooperation problems, such as network packet routing and coordination of autonomous vehicles. The emergence of deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments. However, traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search. Besides, the dynamicity of agents' policies makes the training non-stationary. To tackle the issues, we propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search. In particular, the cooperation of multiple agents can be learned in high-level discrete action space efficiently. At the same time, the low-level individual control can be reduced to single-agent reinforcement learning. In addition to hierarchical reinforcement learning, we propose an opponent modeling network to model other agents' policies during the learning process. In contrast to end-to-end DRL approaches, our approach reduces the learning complexity by decomposing the overall task into sub-tasks in a hierarchical way. To evaluate the efficiency of our approach, we conduct a real-world case study in the cooperative lane change scenario. Both simulation and real-world experiments show the superiority of our approach in the collision rate and convergence speed.
LGOct 23, 2023Code
MGAS: Multi-Granularity Architecture Search for Trade-Off Between Model Effectiveness and EfficiencyXiaoyun Liu, Divya Saxena, Jiannong Cao et al.
Neural architecture search (NAS) has gained significant traction in automating the design of neural networks. To reduce search time, differentiable architecture search (DAS) reframes the traditional paradigm of discrete candidate sampling and evaluation into a differentiable optimization over a super-net, followed by discretization. However, most existing DAS methods primarily focus on optimizing the coarse-grained operation-level topology, while neglecting finer-grained structures such as filter-level and weight-level patterns. This limits their ability to balance model performance with model size. Additionally, many methods compromise search quality to save memory during the search process. To tackle these issues, we propose Multi-Granularity Differentiable Architecture Search (MG-DARTS), a unified framework which aims to discover both effective and efficient architectures from scratch by comprehensively yet memory-efficiently exploring a multi-granularity search space. Specifically, we improve the existing DAS methods in two aspects. First, we adaptively adjust the retention ratios of searchable units across different granularity levels through adaptive pruning, which is achieved by learning granularity-specific discretization functions along with the evolving architecture. Second, we decompose the super-net optimization and discretization into multiple stages, each operating on a sub-net, and introduce progressive re-evaluation to enable re-pruning and regrowth of previous units, thereby mitigating potential bias. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate that MG-DARTS outperforms other state-of-the-art methods in achieving a better trade-off between model accuracy and parameter efficiency. Codes are available at https://github.com/lxy12357/MG_DARTS.
LGSep 20, 2024
Inductive Spatial Temporal Prediction Under Data Drift with Informative Graph Neural NetworkJialun Zheng, Divya Saxena, Jiannong Cao et al.
Inductive spatial temporal prediction can generalize historical data to predict unseen data, crucial for highly dynamic scenarios (e.g., traffic systems, stock markets). However, external events (e.g., urban structural growth, market crash) and emerging new entities (e.g., locations, stocks) can undermine prediction accuracy by inducing data drift over time. Most existing studies extract invariant patterns to counter data drift but ignore pattern diversity, exhibiting poor generalization to unseen entities. To address this issue, we design an Informative Graph Neural Network (INF-GNN) to distill diversified invariant patterns and improve prediction accuracy under data drift. Firstly, we build an informative subgraph with a uniquely designed metric, Relation Importance (RI), that can effectively select stable entities and distinct spatial relationships. This subgraph further generalizes new entities' data via neighbors merging. Secondly, we propose an informative temporal memory buffer to help the model emphasize valuable timestamps extracted using influence functions within time intervals. This memory buffer allows INF-GNN to discern influential temporal patterns. Finally, RI loss optimization is designed for pattern consolidation. Extensive experiments on real-world dataset under substantial data drift demonstrate that INF-GNN significantly outperforms existing alternatives.
LGMar 12
GPrune-LLM: Generalization-Aware Structured Pruning for Large Language ModelsXiaoyun Liu, Divya Saxena, Jiannong Cao et al.
Structured pruning is widely used to compress large language models (LLMs), yet its effectiveness depends heavily on neuron importance estimation. Most existing methods estimate neuron importance from activation statistics on a single calibration dataset, which introduces calibration bias and degrades downstream cross-task generalization. We observe that neurons exhibit heterogeneous distribution sensitivity, with distribution-robust neurons maintaining consistent rankings across datasets and distribution-sensitive neurons showing high cross-dataset ranking variance. Based on this, we identify two structural limitations in existing methods. First, ranking all neurons within a shared space causes distribution-sensitive neurons that strongly activate on calibration inputs to dominate, crowding out distribution-robust neurons critical for out-of-distribution tasks. Second, applying activation-based importance metrics uniformly can be unreliable. Distribution-sensitive neurons that infrequently activate on calibration data receive insufficient activation signal for accurate local ranking. To address these limitations, we propose GPrune-LLM, a generalization-aware structured pruning framework that explicitly accounts for neuron differences in cross-distribution behavior. We first partition neurons into behavior-consistent modules to localize ranking competition, then evaluate activation-based metric reliability per module according to distribution sensitivity and score magnitude. For modules where activation-based scoring is unreliable, we switch to an activation-independent metric. Finally, we adaptively learn module-wise sparsity. Extensive experiments across multiple downstream tasks demonstrate GPrune-LLM's consistent improvements in post-compression generalization, particularly at high sparsity, and reduced dependence on importance metric choice.
LGAug 20, 2024
Overcoming Growth-Induced Forgetting in Task-Agnostic Continual LearningYuqing Zhao, Jiannong Cao, Divya Saxena et al.
In continual learning (CL), model growth enhances adaptability to new data. However, when model growth is applied improperly, especially in task-agnostic CL, where the entire grown model is used for inference, it can lead to severe degradation of learned knowledge, a problem we term growth-induced forgetting. Most existing methods that adopt model growth to improve adaptability often overlook the forgetting issue, resulting in compromised knowledge retention, making them unsuitable for task-agnostic settings. To promote both adaptability and knowledge retention with model growth, we identify the key: gradient and parameter sparsity. Introducing SparseGrow, which increases gradient sparsity through layer expansion and gradient gating to enable focused updates on parameters while preserving critical parameters, thus inhibiting forgetting. Moreover, it promotes parameter sparsity with sparse initialization and training, aiming at better control of model plasticity, improving adaptability over new data. Extensive experiments across diverse datasets, task-agnostic settings, and a large number of tasks demonstrate the necessity of controlled layer expansion and validate the effectiveness of SparseGrow in achieving high adaptability while minimizing forgetting in continual learning. By enabling model growth with sparsified gradients and parameters, SparseGrow paves the way for building scalable lifelong learning systems capable of continual adaptation with better knowledge retention.
LGApr 14, 2024
FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated LearningChanglin Song, Divya Saxena, Jiannong Cao et al.
Federated Learning (FL) is a novel approach that allows for collaborative machine learning while preserving data privacy by leveraging models trained on decentralized devices. However, FL faces challenges due to non-uniformly distributed (non-iid) data across clients, which impacts model performance and its generalization capabilities. To tackle the non-iid issue, recent efforts have utilized the global model as a teaching mechanism for local models. However, our pilot study shows that their effectiveness is constrained by imbalanced data distribution, which induces biases in local models and leads to a 'local forgetting' phenomenon, where the ability of models to generalize degrades over time, particularly for underrepresented classes. This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models, focusing on the issue of imbalanced class distribution. Specifically, FedDistill employs group distillation, segmenting classes based on their frequency in local datasets to facilitate a focused distillation process to classes with fewer samples. Additionally, FedDistill dissects the global model into a feature extractor and a classifier. This separation empowers local models with more generalized data representation capabilities and ensures more accurate classification across all classes. FedDistill mitigates the adverse effects of data imbalance, ensuring that local models do not forget underrepresented classes but instead become more adept at recognizing and classifying them accurately. Our comprehensive experiments demonstrate FedDistill's effectiveness, surpassing existing baselines in accuracy and convergence speed across several benchmark datasets.
CVNov 28, 2025
Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View SynthesisYiwei Li, Jiannong Cao, Penghui Ruan et al.
Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Gaussian Splatting methods significantly degrade when only sparse input views are available, limiting their applicability in practice. The issue arises from the incoherent learning of 4D geometry as input views decrease. This paper presents GC-4DGS, a novel framework that infuses geometric consistency into 4D Gaussian Splatting (4DGS), offering real-time and high-quality dynamic scene rendering from sparse input views. While learning-based Multi-View Stereo (MVS) and monocular depth estimators (MDEs) provide geometry priors, directly integrating these with 4DGS yields suboptimal results due to the ill-posed nature of sparse-input 4D geometric optimization. To address these problems, we introduce a dynamic consistency checking strategy to reduce estimation uncertainties of MVS across spacetime. Furthermore, we propose a global-local depth regularization approach to distill spatiotemporal-consistent geometric information from monocular depths, thereby enhancing the coherent geometry and appearance learning within the 4D volume. Extensive experiments on the popular N3DV and Technicolor datasets validate the effectiveness of GC-4DGS in rendering quality without sacrificing efficiency. Notably, our method outperforms RF-DeRF, the latest dynamic radiance field tailored for sparse-input dynamic view synthesis, and the original 4DGS by 2.62dB and 1.58dB in PSNR, respectively, with seamless deployability on resource-constrained IoT edge devices.
CVOct 31, 2024
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and ConditioningPenghui Ruan, Pichao Wang, Divya Saxena et al.
Despite advancements in Text-to-Video (T2V) generation, producing videos with realistic motion remains challenging. Current models often yield static or minimally dynamic outputs, failing to capture complex motions described by text. This issue stems from the internal biases in text encoding, which overlooks motions, and inadequate conditioning mechanisms in T2V generation models. To address this, we propose a novel framework called DEcomposed MOtion (DEMO), which enhances motion synthesis in T2V generation by decomposing both text encoding and conditioning into content and motion components. Our method includes a content encoder for static elements and a motion encoder for temporal dynamics, alongside separate content and motion conditioning mechanisms. Crucially, we introduce text-motion and video-motion supervision to improve the model's understanding and generation of motion. Evaluations on benchmarks such as MSR-VTT, UCF-101, WebVid-10M, EvalCrafter, and VBench demonstrate DEMO's superior ability to produce videos with enhanced motion dynamics while maintaining high visual quality. Our approach significantly advances T2V generation by integrating comprehensive motion understanding directly from textual descriptions. Project page: https://PR-Ryan.github.io/DEMO-project/
LGOct 14, 2021
Time Series Clustering for Human Behavior Pattern MiningRohan Kabra, Divya Saxena, Dhaval Patel et al.
Human behavior modeling deals with learning and understanding behavior patterns inherent in humans' daily routines. Existing pattern mining techniques either assume human dynamics is strictly periodic, or require the number of modes as input, or do not consider uncertainty in the sensor data. To handle these issues, in this paper, we propose a novel clustering approach for modeling human behavior (named, MTpattern) from time-series data. For mining frequent human behavior patterns effectively, we utilize a three-stage pipeline: (1) represent time series data into a sequence of regularly sampled equal-sized unit time intervals for better analysis, (2) a new distance measure scheme is proposed to cluster similar sequences which can handle temporal variation and uncertainty in the data, and (3) exploit an exemplar-based clustering mechanism and fine-tune its parameters to output minimum number of clusters with given permissible distance constraints and without knowing the number of modes present in the data. Then, the average of all sequences in a cluster is considered as a human behavior pattern. Empirical studies on two real-world datasets and a simulated dataset demonstrate the effectiveness of MTpattern with respect to internal and external measures of clustering.
LGApr 30, 2020
Generative Adversarial Networks (GANs Survey): Challenges, Solutions, and Future DirectionsDivya Saxena, Jiannong Cao
Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.
LGJul 19, 2019
D-GAN: Deep Generative Adversarial Nets for Spatio-Temporal PredictionDivya Saxena, Jiannong Cao
Spatio-temporal (ST) data for urban applications, such as taxi demand, traffic flow, regional rainfall is inherently stochastic and unpredictable. Recently, deep learning based ST prediction models are proposed to learn the ST characteristics of data. However, it is still very challenging (1) to adequately learn the complex and non-linear ST relationships; (2) to model the high variations in the ST data volumes as it is inherently dynamic, changing over time (i.e., irregular) and highly influenced by many external factors, such as adverse weather, accidents, traffic control, PoI, etc.; and (3) as there can be many complicated external factors that can affect the accuracy and it is impossible to list them explicitly. To handle the aforementioned issues, in this paper, we propose a novel deep generative adversarial network based model (named, D-GAN) for more accurate ST prediction by implicitly learning ST feature representations in an unsupervised manner. D-GAN adopts a GAN-based structure and jointly learns generation and variational inference of data. More specifically, D-GAN consists of two major parts: (1) a deep ST feature learning network to model the ST correlations and semantic variations, and underlying factors of variations and irregularity in the data through the implicit distribution modelling; (2) a fusion module to incorporate external factors for reaching a better inference. To the best our knowledge, no prior work studies ST prediction problem via deep implicit generative model and in an unsupervised manner. Extensive experiments performed on two real-world datasets show that D-GAN achieves more accurate results than traditional as well as deep learning based ST prediction methods.