66.7LGMar 26Code
Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement LearningJiajun Hu, Nuria Armengol Urpi, Jin Cheng et al.
Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $\textit{online}$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a regularization critic. FB-MEBE promotes exploration by maximizing the entropy of the achieved behavior distribution. Additionally, a regularization critic shapes the recovered policies toward more natural and physically plausible behaviors. We empirically demonstrate that FB-MEBE achieves and improved performance compared to other exploration strategies in a range of simulated downstream tasks, and that it renders natural policies that can be seamlessly deployed to hardware without further finetuning. Videos and code available on our website.
CVJul 21, 2024
Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain GeneralizationJiajun Hu, Jian Zhang, Lei Qi et al.
Domain generalization (DG) aims to avoid the performance degradation of the model when the distribution shift between the limited training data and unseen test data occurs. Recently, foundation models with enormous parameters have been pre-trained with huge datasets, demonstrating strong generalization ability and showing promising direction for solving the DG problem. However, fully Fine-Tuning (FT) the foundation models results in unsatisfactory out-of-distribution accuracy due to the destroyed pre-trained generalized features. Recently, Parameter-Efficient Fine-Tuning (PEFT) alleviates the above problem by fine-tuning a small portion of the model parameters while keeping the rest frozen, which achieves better generalization performance compared to FT. Nevertheless, PEFT still suffers from the issue of overfitting to the training domains. To address the above issue, we propose Parameter-Efficient Group with Orthogonal regularization (PEGO) for vision transformers, which effectively preserves the generalization ability of the pre-trained network and learns more diverse knowledge compared with conventional PEFT. Specifically, we inject a group of trainable Low-Rank Adaptation (LoRA) modules into the pre-trained model and propose an orthogonal regularization loss to enhance the generalization ability of the model. Our framework achieves SOTA performance on five DG benchmarks, while only requiring training a small number of parameters without adding additional testing cost.
LGNov 4, 2023
Successive Model-Agnostic Meta-Learning for Few-Shot Fault Time Series PrognosisHai Su, Jiajun Hu, Songsen Yu
Meta learning is a promising technique for solving few-shot fault prediction problems, which have attracted the attention of many researchers in recent years. Existing meta-learning methods for time series prediction, which predominantly rely on random and similarity matching-based task partitioning, face three major limitations: (1) feature exploitation inefficiency; (2) suboptimal task data allocation; and (3) limited robustness with small samples. To overcome these limitations, we introduce a novel 'pseudo meta-task' partitioning scheme that treats a continuous time period of a time series as a meta-task, composed of multiple successive short time periods. Employing continuous time series as pseudo meta-tasks allows our method to extract more comprehensive features and relationships from the data, resulting in more accurate predictions. Moreover, we introduce a differential algorithm to enhance the robustness of our method across different datasets. Through extensive experiments on several fault and time series prediction datasets, we demonstrate that our approach substantially enhances prediction performance and generalization capability under both few-shot and general conditions.
FLU-DYNJun 30, 2024
Generative prediction of flow fields around an obstacle using the diffusion modelJiajun Hu, Zhen Lu, Yue Yang
We propose a geometry-to-flow diffusion model that utilizes obstacle shape as input to predict a flow field around an obstacle. The model is based on a learnable Markov transition kernel to recover the data distribution from the Gaussian distribution. The Markov process is conditioned on the obstacle geometry, estimating the noise to be removed at each step, implemented via a U-Net. A cross-attention mechanism incorporates the geometry as a prompt. We train the geometry-to-flow diffusion model using a dataset of flows around simple obstacles, including circles, ellipses, rectangles, and triangles. For comparison, two CNN-based models and a VAE model are trained on the same dataset. Tests are carried out on flows around obstacles with simple and complex geometries, representing interpolation and generalization on the geometry condition, respectively. To evaluate performance under demanding conditions, the test set incorporates scenarios including crosses and the characters `PKU.' Generated flow fields show that the geometry-to-flow diffusion model is superior to the CNN-based models and the VAE model in predicting instantaneous flow fields and handling complex geometries. Quantitative analysis of the accuracy and divergence demonstrates the model's robustness.