CLMay 25, 2022
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement LearningMingkai Deng, Jianyu Wang, Cheng-Ping Hsieh et al.
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicability when gradients are not accessible. Discrete prompt, on the other hand, is difficult to optimize, and is often created by "enumeration (e.g., paraphrasing)-then-selection" heuristics that do not explore the prompt space systematically. This paper proposes RLPrompt, an efficient discrete prompt optimization approach with reinforcement learning (RL). RLPrompt formulates a parameter-efficient policy network that generates the desired discrete prompt after training with reward. To overcome the complexity and stochasticity of reward signals by the large LM environment, we incorporate effective reward stabilization that substantially enhances the training efficiency. RLPrompt is flexibly applicable to different types of LMs, such as masked (e.g., BERT) and left-to-right models (e.g., GPTs), for both classification and generation tasks. Experiments on few-shot classification and unsupervised text style transfer show superior performance over a wide range of existing finetuning or prompting methods. Interestingly, the resulting optimized prompts are often ungrammatical gibberish text; and surprisingly, those gibberish prompts are transferrable between different LMs to retain significant performance, indicating LM prompting may not follow human language patterns.
CVSep 13, 2024
CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian SplattingRunze Chen, Mingyu Xiao, Haiyong Luo et al.
We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses, limited viewpoints, and inconsistent lighting. CSS addresses these challenges through robust geometric priors and advanced illumination modeling, enabling high-quality novel view synthesis under complex, real-world conditions. Our method demonstrates clear improvements over existing approaches, paving the way for more accurate and flexible applications in AR, VR, and large-scale 3D reconstruction.
SYApr 7
Exergy Battery Modeling and P2P Trading Based Optimal Operation of Virtual Energy StationMeng Song, Xinyi Jing, Jianyong Ding et al.
Virtual energy stations (VESs) work as retailers to provide electricity and natural gas sale services for integrated energy systems (IESs), and guide IESs energy consumption behaviors to tackle the varying market prices via integrated demand response (IDR). However, IES customers are risk averse and show low enthusiasm in responding to the IDR incentive signals. To address this problem, exergy is utilized to unify different energies and allowed to be virtually stored and withdrawn for arbitrage by IESs. The whole incentive mechanism operating process is innovatively characterized by a virtual exergy battery. Peer to peer (P2P) exergy trading based on shared exergy storage is also developed to reduce the energy cost of IESs without any extra transmission fee. In this way, IES can reduce the economic loss risk caused by the market price fluctuation via the different time (time dimension), multiple energy conversion (energy dimension), and P2P exergy trading (space dimension) arbitrage. Moreover, the optimal scheduling of VES and IESs is modeled by a bilevel optimization model. The consensus based alternating direction method of multipliers (CADMM) algorithm is utilized to solve this problem in a distributed way. Simulation results validate the effectiveness of the proposed incentive mechanism and show that the shared exergy storage can enhance the benefits of different type IESs by 18.96%, 3.49%, and 3.15 %, respectively.
GNMar 31
GenoBERT: A Language Model for Accurate Genotype ImputationLei Huang, Chuan Qiu, Kuan-Jui Su et al.
Genotype imputation enables dense variant coverage for genome-wide association and risk-prediction studies, yet conventional reference-panel methods remain limited by ancestry bias and reduced rare-variant accuracy. We present Genotype Bidirectional Encoder Representations from Transformers (GenoBERT), a transformer-based, reference-free framework that tokenizes phased genotypes and uses a self-attention mechanism to capture both short- and long-range linkage disequilibrium (LD) dependencies. Benchmarking on two independent datasets including the Louisiana Osteoporosis Study (LOS) and the 1000 Genomes Project (1KGP) across ancestry groups and multiple genotype missingness levels (5-50%) shows that GenoBERT achieves the highest overall accuracy compared to four baseline methods (Beagle5.4, SCDA, BiU-Net, and STICI). At practical sparsity levels (up to 25% missing), GenoBERT attains high overall imputation accuracy ($r^2 approx 0.98$) across datasets, and maintains robust performance ($r^2 > 0.90$) even at 50% missingness. Experimental results across different ancestries confirm consistent gains across datasets, with resilience to small sample sizes and weak LD. A 128-SNP (single-nucleotide polymorphism) context window (approximately 100 Kb) is validated through LD-decay analyses as sufficient to capture local correlation structures. By eliminating reference-panel dependence while preserving high accuracy, GenoBERT provides a scalable and robust solution for genotype imputation and a foundation for downstream genomic modeling.
LGMar 19, 2025
Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes BetterMeng Song
Supervised learning (SL) and reinforcement learning (RL) are both widely used to train general-purpose agents for complex tasks, yet their generalization capabilities and underlying mechanisms are not yet fully understood. In this paper, we provide a direct comparison between SL and RL in terms of zero-shot generalization. Using the Habitat visual navigation task as a testbed, we evaluate Proximal Policy Optimization (PPO) and Behavior Cloning (BC) agents across two levels of generalization: state-goal pair generalization within seen environments and generalization to unseen environments. Our experiments show that PPO consistently outperforms BC across both zero-shot settings and performance metrics-success rate and SPL. Interestingly, even though additional optimal training data enables BC to match PPO's zero-shot performance in SPL, it still falls significantly behind in success rate. We attribute this to a fundamental difference in how models trained by these algorithms generalize: BC-trained models generalize by imitating successful trajectories, whereas TD-based RL-trained models generalize through combinatorial experience stitching-leveraging fragments of past trajectories (mostly failed ones) to construct solutions for new tasks. This allows RL to efficiently find solutions in vast state space and discover novel strategies beyond the scope of human knowledge. Besides providing empirical evidence and understanding, we also propose practical guidelines for improving the generalization capabilities of RL and SL through algorithm design.
LGMay 9, 2024
A Minimalist Prompt for Zero-Shot Policy LearningMeng Song, Xuezhi Wang, Tanay Biradar et al.
Transformer-based methods have exhibited significant generalization ability when prompted with target-domain demonstrations or example solutions during inference. Although demonstrations, as a way of task specification, can capture rich information that may be hard to specify by language, it remains unclear what information is extracted from the demonstrations to help generalization. Moreover, assuming access to demonstrations of an unseen task is impractical or unreasonable in many real-world scenarios, especially in robotics applications. These questions motivate us to explore what the minimally sufficient prompt could be to elicit the same level of generalization ability as the demonstrations. We study this problem in the contextural RL setting which allows for quantitative measurement of generalization and is commonly adopted by meta-RL and multi-task RL benchmarks. In this setting, the training and test Markov Decision Processes (MDPs) only differ in certain properties, which we refer to as task parameters. We show that conditioning a decision transformer on these task parameters alone can enable zero-shot generalization on par with or better than its demonstration-conditioned counterpart. This suggests that task parameters are essential for the generalization and DT models are trying to recover it from the demonstration prompt. To extract the remaining generalizable information from the supervision, we introduce an additional learnable prompt which is demonstrated to further boost zero-shot generalization across a range of robotic control, manipulation, and navigation benchmark tasks.
LGMar 16, 2024
Probabilistic World Modeling with Asymmetric Distance MeasureMeng Song
Representation learning is a fundamental task in machine learning, aiming at uncovering structures from data to facilitate subsequent tasks. However, what is a good representation for planning and reasoning in a stochastic world remains an open problem. In this work, we posit that learning a distance function is essential to allow planning and reasoning in the representation space. We show that a geometric abstraction of the probabilistic world dynamics can be embedded into the representation space through asymmetric contrastive learning. Unlike previous approaches that focus on learning mutual similarity or compatibility measures, we instead learn an asymmetric similarity function that reflects the state reachability and allows multi-way probabilistic inference. Moreover, by conditioning on a common reference state (e.g. the observer's current state), the learned representation space allows us to discover the geometrically salient states that only a handful of paths can lead through. These states can naturally serve as subgoals to break down long-horizon planning tasks. We evaluate our method in gridworld environments with various layouts and demonstrate its effectiveness in discovering the subgoals.
CVJul 25, 2020
OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene DatasetsZhengqin Li, Ting-Wei Yu, Shen Sang et al.
We propose a novel framework for creating large-scale photorealistic datasets of indoor scenes, with ground truth geometry, material, lighting and semantics. Our goal is to make the dataset creation process widely accessible, transforming scans into photorealistic datasets with high-quality ground truth for appearance, layout, semantic labels, high quality spatially-varying BRDF and complex lighting, including direct, indirect and visibility components. This enables important applications in inverse rendering, scene understanding and robotics. We show that deep networks trained on the proposed dataset achieve competitive performance for shape, material and lighting estimation on real images, enabling photorealistic augmented reality applications, such as object insertion and material editing. We also show our semantic labels may be used for segmentation and multi-task learning. Finally, we demonstrate that our framework may also be integrated with physics engines, to create virtual robotics environments with unique ground truth such as friction coefficients and correspondence to real scenes. The dataset and all the tools to create such datasets will be made publicly available.
ROOct 31, 2019
S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered ScenesYuzhe Qin, Rui Chen, Hao Zhu et al.
Grasping is among the most fundamental and long-lasting problems in robotics study. This paper studies the problem of 6-DoF(degree of freedom) grasping by a parallel gripper in a cluttered scene captured using a commodity depth sensor from a single viewpoint. We address the problem in a learning-based framework. At the high level, we rely on a single-shot grasp proposal network, trained with synthetic data and tested in real-world scenarios. Our single-shot neural network architecture can predict amodal grasp proposal efficiently and effectively. Our training data synthesis pipeline can generate scenes of complex object configuration and leverage an innovative gripper contact model to create dense and high-quality grasp annotations. Experiments in synthetic and real environments have demonstrated that the proposed approach can outperform state-of-the-arts by a large margin.