LGJan 29
Latent Adversarial Regularization for Offline Preference OptimizationEnyi Jiang, Yibo Jacky Zhang, Yinglun Xu et al.
Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language models is particularly challenging because token-space similarity does not imply semantic or behavioral similarity. To address this challenge, we leverage latent-space regularization for language model preference optimization. We introduce GANPO, which achieves latent-space regularization by penalizing divergence between the internal representations of a policy model and a reference model. Given that latent representations are not associated with explicit probability densities, we adopt an adversarial approach inspired by GANs to minimize latent-space divergence. We integrate GANPO as a regularizer into existing offline preference optimization objectives. Experiments across multiple model architectures and tasks show consistent improvements from latent-space regularization. Further, by comparing GANPO-induced inferential biases with those from token-level regularization, we find that GANPO provides more robust structural feedback under distributional shift and noise while maintaining comparable downstream performance with minor computational overhead.
HCMar 27
Uncovering Patterns of Brain Activity from EEG Data Consistently Associated with Cybersickness Using Neural Network Interpretability MapsJacqueline Yau, Katherine J. Mimnaugh, Evan G. Center et al.
Cybersickness poses a serious challenge for users of virtual reality (VR) technology. Consequently, there has been significant effort to track its occurrence during VR use with passive measures like brain activity recorded through electroencephalogram (EEG). To classify cybersickness accurately, including in real time, machine learning algorithms which can extract meaningful signals from the rest of the brain data will be required. However, EEG datasets are typically very small and very high in variability between participants, which makes building effective models extremely challenging. To address these concerns, we first introduce a framework for neural networks which has subject-adaptive training with calibration and interpretation for classification given limited and imbalanced EEG data. Which features the models determine are most useful can be visualized by plotting interpretability maps from integrated gradients and class activation. The framework is demonstrated here with convolutional neural networks and transformer models. Using a set of brain data recorded with EEG while participants viewed a stimulus in VR designed to elicit cybersickness, we show which spatio-temporal EEG features (from electrodes and time steps) were most important for discomfort classification. Across 12 runs of our framework with three different neural networks over multiple random seeds, the models consistently pointed to the same scalp locations as having patterns of brain data that were the most helpful in determining whether or not a sample of EEG data belonged to someone who was experiencing cybersickness. These results help clarify a hidden pattern in other related research and can be used as tagged features for better real-time cybersickness classification with EEG in the future. We provide our code at [anonymized] to enable feature interpretation across different neural network architectures.
ROOct 8, 2016
Proceedings of the 1st International Workshop on Robot Learning and Planning (RLP 2016)Nancy Amato, Charles Anderson, Gregory Chirikjian et al.
Proceedings of the 1st International Workshop on Robot Learning and Planning (RLP 2016)