LGJul 26, 2022
Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy LearningZeren Huang, Wenhao Chen, Weinan Zhang et al.
Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.
91.7ROMay 26
HyperSim: A Holistic Sim-To-Real Framework For Robust Robotic ManipulationJunyi Dong, Haotian Luo, Ziwei Xu et al.
Scaling data volume and diversity is critical for generalizing embodied intelligence. While synthetic data generation offers a scalable alternative to expensive physical data acquisition, transferring robotic manipulation policies from simulation to the real world (sim-to-real) remains a formidable challenge due to the domain gap. This paper presents HyperSim, a holistic framework spanning from synthetic data generation to policy training and seamless real-world deployment. To systematically bridge the sim-to-real gap, HyperSim is realized through three core pillars: high-fidelity environment synthesis, adversarial trajectory generation, and sim-and-real co-training. Collectively, these modules address domain discrepancies by enhancing visual fidelity, expanding data coverage, and enforcing domain-invariant representations. We rigorously validate HyperSim through a large-scale empirical study involving 400 real-world task executions across two representative manipulation models. Assessed across three fine-grained metrics, our complete pipeline achieves remarkable sim-to-real success rates of 80% and 95% with ACT and π_{0}, respectively. Furthermore, policies trained on our adversarial trajectories exhibit significantly enhanced robustness against dynamic uncertainties, achieving a 35% higher completion rate under physical perturbations.
72.3AIMay 12Code
Toward Stable Value Alignment: Introducing Independent Modules for Consistent Value GuidanceWenhao Chen, Sirui Sun, Shengyuan Bai et al.
Aligning large language models (LLMs) with human values typically relies on post-training or inference-time steering that directly manipulates the backbone's parameters or representation space. However, a critical gap exists: the model's residual stream is highly dynamic, in which values exist as fragile, low-dimensional properties, inherently incompatible with the stability required for consistent value expression. In this paper, we propose the Stable Value Guidance Transformer (SVGT), which addresses this gap through an independent value module incorporating two key designs: (1) independent value modeling, maintaining normative representations in a dedicated value space isolated from the backbone, and (2) explicit behavioral guidance, transducing these stable signals into learnable latent Bridge Tokens. These tokens serve as dynamic value anchors to explicitly steer the generative trajectory, ensuring robust adherence across diverse contexts without disrupting the backbone's internal representations. Experiments across multiple backbones and safety benchmarks show that SVGT generally reduces harmful scores by over 70% while maintaining generation fluency, demonstrating the efficacy of architecturally grounded value modeling. Our code is available at https://github.com/Clervils/SVGT.git.
86.5ROMay 11Code
HiDrive: A Closed-Loop Benchmark for High-Level Autonomous DrivingZhongyu Xia, Guanyu Zhu, Guo Tang et al.
End-to-end autonomous driving has witnessed rapid progress, yet existing benchmarks are increasingly saturated, with state-of-the-art models achieving near-perfect scores on widely used open-loop and closed-loop benchmarks. This saturation does not mean that the problem has been solved; instead, it reveals that current benchmarks remain limited in scenario diversity, object variety, and the breadth of driving capabilities they evaluate. In particular, they lack sufficient long-tail scenarios involving rare but safety-critical objects and fail to assess advanced decision-making such as legal compliance, ethical reasoning, and emergency response. To address these gaps, we propose HiDrive, a new closed-loop benchmark for end-to-end autonomous driving that emphasizes long-tail scenarios and a richer evaluation of driving capabilities. HiDrive introduces a diverse set of rare objects and uncommon traffic situations, and expands evaluation from basic driving skills to more advanced capabilities, including rule compliance, moral reasoning, and context-dependent emergency maneuvers. Correspondingly, we extend previous collision-avoidance-centered metrics into a comprehensive evaluation system that encompasses collision and braking, traffic-rule compliance, and moral-reasoning indicators. Built on a more advanced physics engine, HiDrive provides physically realistic lighting and high-fidelity visual rendering, offering a more challenging and realistic testbed for assessing whether autonomous driving systems can handle the complexity of real-world deployment. The HiDrive software, source code, digital assets, and documentation are available at https://github.com/VDIGPKU/HiDrive.
LGDec 10, 2024Code
A New Federated Learning Framework Against Gradient Inversion AttacksPengxin Guo, Shuang Zeng, Wenhao Chen et al.
Federated Learning (FL) aims to protect data privacy by enabling clients to collectively train machine learning models without sharing their raw data. However, recent studies demonstrate that information exchanged during FL is subject to Gradient Inversion Attacks (GIA) and, consequently, a variety of privacy-preserving methods have been integrated into FL to thwart such attacks, such as Secure Multi-party Computing (SMC), Homomorphic Encryption (HE), and Differential Privacy (DP). Despite their ability to protect data privacy, these approaches inherently involve substantial privacy-utility trade-offs. By revisiting the key to privacy exposure in FL under GIA, which lies in the frequent sharing of model gradients that contain private data, we take a new perspective by designing a novel privacy preserve FL framework that effectively ``breaks the direct connection'' between the shared parameters and the local private data to defend against GIA. Specifically, we propose a Hypernetwork Federated Learning (HyperFL) framework that utilizes hypernetworks to generate the parameters of the local model and only the hypernetwork parameters are uploaded to the server for aggregation. Theoretical analyses demonstrate the convergence rate of the proposed HyperFL, while extensive experimental results show the privacy-preserving capability and comparable performance of HyperFL. Code is available at https://github.com/Pengxin-Guo/HyperFL.
RODec 23, 2025
KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving SystemZhongyu Xia, Wenhao Chen, Yongtao Wang et al.
Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, making it difficult to capture the complex logic underlying decision-making through imitation or limited reinforcement rewards. To address this, we propose KnowVal, a new autonomous driving system that enables visual-language reasoning through the synergistic integration of open-world perception and knowledge retrieval. Specifically, we construct a comprehensive driving knowledge graph that encodes traffic laws, defensive driving principles, and ethical norms, complemented by an efficient LLM-based retrieval mechanism tailored for driving scenarios. Furthermore, we develop a human-preference dataset and train a Value Model to guide interpretable, value-aligned trajectory assessment. Experimental results show that our method substantially improves planning performance while remaining compatible with existing architectures. Notably, KnowVal achieves the lowest collision rate on nuScenes and state-of-the-art results on Bench2Drive.
76.2SPMar 11
Spyglass: Directional Spectrum Sensing with Single-shot AoA Estimation and Virtual ArraysRaghav Subbaraman, Akshit Agarwal, Wenhao Chen et al.
In this paper, we introduce Spyglass, a spectrum sensor designed to address the challenges of effective spectrum usage in dense wireless environments. Spyglass is capable of observing a frequency band and accurately estimating the Angle of Arrival (AoA) of any signal during a single transmission. This includes additional signal context such as center frequency, bandwidth, and I/Q samples. We overcome challenges such as the clutter of fleeting transmissions in common bands, the high cost of array processing for AoA estimation, and the difficulty of detecting and estimating channels for unknown signals. Our first contribution is the development of Searchlite, a protocol-agnostic signal detection and separation algorithm. We use a switched array to reduce cost and processing complexity, and we develop SSFP, a signal processing technique using Fourier transforms that is synchronized to switching boundaries. Spyglass performs multi-channel blind AoA estimation synchronized with the array. Implemented using commercially available hardware, Spyglass demonstrates a median AoA accuracy of 1.4$^\circ$ and the ability to separate simultaneous signals from multiple devices in an unconstrained RF environment, providing valuable tools for large-scale RF data collection and analysis.
CVFeb 1
Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion ReasoningMeng Luo, Bobo Li, Shanqing Xu et al.
Despite rapid progress in multimodal large language models (MLLMs), their capability for deep emotional understanding remains limited. We argue that genuine affective intelligence requires explicit modeling of Theory of Mind (ToM), the cognitive substrate from which emotions arise. To this end, we introduce HitEmotion, a ToM-grounded hierarchical benchmark that diagnoses capability breakpoints across increasing levels of cognitive depth. Second, we propose a ToM-guided reasoning chain that tracks mental states and calibrates cross-modal evidence to achieve faithful emotional reasoning. We further introduce TMPO, a reinforcement learning method that uses intermediate mental states as process-level supervision to guide and strengthen model reasoning. Extensive experiments show that HitEmotion exposes deep emotional reasoning deficits in state-of-the-art models, especially on cognitively demanding tasks. In evaluation, the ToM-guided reasoning chain and TMPO improve end-task accuracy and yield more faithful, more coherent rationales. In conclusion, our work provides the research community with a practical toolkit for evaluating and enhancing the cognition-based emotional understanding capabilities of MLLMs. Our dataset and code are available at: https://HitEmotion.github.io/.
IVApr 19, 2019
StegoAppDB: a Steganography Apps Forensics Image DatabaseJennifer Newman, Li Lin, Wenhao Chen et al.
In this paper, we present a new reference dataset simulating digital evidence for image steganography. Steganography detection is a digital image forensic topic that is relatively unknown in practical forensics, although stego app use in the wild is on the rise. This paper introduces the first database consisting of mobile phone photographs and stego images produced from mobile stego apps, including a rich set of side information, offering simulated digital evidence. StegoAppDB, a steganography apps forensics image database, contains over 810,000 innocent and stego images using a minimum of 10 different phone models from 24 distinct devices, with detailed provenanced data comprising a wide range of ISO and exposure settings, EXIF data, message information, embedding rates, etc. We develop a camera app, Cameraw, specifically for data acquisition, with multiple images per scene, saving simultaneously in both DNG and high-quality JPEG formats. Stego images are created from these original images using selected mobile stego apps through a careful process of reverse engineering. StegoAppDB contains cover-stego image pairs including for apps that resize the stego dimensions. We retainthe original devices and continue to enlarge the database, and encourage the image forensics community to use StegoAppDB. While designed for steganography, we discuss uses of this publicly available database to other digital image forensic topics.
CRAug 1, 2018
Tackling Android Stego Apps in the WildWenhao Chen, Li Lin, Min Wu et al.
Digital image forensics is a young but maturing field, encompassing key areas such as camera identification, detection of forged images, and steganalysis. However, large gaps exist between academic results and applications used by practicing forensic analysts. To move academic discoveries closer to real-world implementations, it is important to use data that represent "in the wild" scenarios. For detection of stego images created from steganography apps, images generated from those apps are ideal to use. In this paper, we present our work to perform steg detection on images from mobile apps using two different approaches: "signature" detection, and machine learning methods. A principal challenge of the ML task is to create a great many of stego images from different apps with certain embedding rates. One of our main contributions is a procedure for generating a large image database by using Android emulators and reverse engineering techniques. We develop algorithms and tools for signature detection on stego apps, and provide solutions to issues encountered when creating ML classifiers.