Bohan Yang

CV
h-index11
9papers
44citations
Novelty49%
AI Score53

9 Papers

50.7AIMay 31
TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

Bohan Yang, Yijun Gong, Zhi Zhang et al.

When a language model hallucinates, the final answer is wrong, but the mistake is not necessarily invisible inside the model. Different internal pathways may remain uncertain, disagree in how quickly they sharpen, or commit to competing continuations before the output is produced. We introduce TriLens, a white-box detector that turns this intuition into a compact representation: at every layer, it reads the multi-head self-attention output, the feed-forward output, and the residual stream through the model's own logit lens, then records only the entropy of each readout. The resulting 3L-dimensional trajectory describes how certainty forms across depth and across modules, without storing high-dimensional hidden states or sampling multiple generations. This simple signal yields a strong detector across instruction-tuned LLMs and QA benchmarks, and our analyses show that the three module-wise entropy trajectories provide complementary evidence. TriLens suggests that hallucination detection can benefit from tracking how internal computation settles, not only what the final layer predicts.

72.6ROMar 13
A Human-in-the-Loop Confidence-Aware Failure Recovery Framework for Modular Robot Policies

Rohan Banerjee, Krishna Palempalli, Bohan Yang et al.

Robots operating in unstructured human environments inevitably encounter failures, especially in robot caregiving scenarios. While humans can often help robots recover, excessive or poorly targeted queries impose unnecessary cognitive and physical workload on the human partner. We present a human-in-the-loop failure-recovery framework for modular robotic policies, where a policy is composed of distinct modules such as perception, planning, and control, any of which may fail and often require different forms of human feedback. Our framework integrates calibrated estimates of module-level uncertainty with models of human intervention cost to decide which module to query and when to query the human. It separates these two decisions: a module selector identifies the module most likely responsible for failure, and a querying algorithm determines whether to solicit human input or act autonomously. We evaluate several module-selection strategies and querying algorithms in controlled synthetic experiments, revealing trade-offs between recovery efficiency, robustness to system and user variables, and user workload. Finally, we deploy the framework on a robot-assisted bite acquisition system and demonstrate, in studies involving individuals with both emulated and real mobility limitations, that it improves recovery success while reducing the workload imposed on users. Our results highlight how explicitly reasoning about both robot uncertainty and human effort can enable more efficient and user-centered failure recovery in collaborative robots. Supplementary materials and videos can be found at: http://emprise.cs.cornell.edu/modularhil

62.8CVMay 6Code
QuadBox: Accelerating 3D Gaussian Splatting with Geometry-Aware Boxes

Xinze Li, Bohan Yang, Pengxu Chen et al.

3D Gaussian Splatting (3DGS) has emerged as an advanced technique for real-time novel view synthesis by representing scene geometry and appearance using differentiable Gaussian primitives. However, efficiently computing precise Gaussian-tile intersections remains a critical task in the rasterization pipeline. To this end, we propose QuadBox, a method that leverages four axis-aligned bounding boxes to tightly encapsulate projected Gaussians in a discrete manner. First, we derive a geometry-aware stretching factor that enables the construction of a tile-aligned QuadBox, which covers the elliptical projection and largely excludes irrelevant tiles. Second, we introduce QPass, a single-pass tile traversal algorithm that exhaustively exploits the discrete nature of QuadBox, ensuring that the tile intersection check is performed with simple interval tests. Experiments on public datasets show that our method accelerates the rendering speed of 3DGS by 1.85$\times$. Code is available at \href{https://github.com/Powertony102/QuadBox}{https://github.com/Powertony102/QuadBox}.

50.5ROMar 16
A Unified Calibration Framework for Coordinate and Kinematic Parameters in Dual-Arm Robots

Tianyu Huang, Bohan Yang, Bin Li et al.

Precise collaboration in vision-based dual-arm robot systems requires accurate system calibration. Recent dual-robot calibration methods have achieved strong performance by simultaneously solving multiple coordinate transformations. However, these methods either treat kinematic errors as implicit noise or handle them through separated error modeling, resulting in non-negligible accumulated errors. In this paper, we present a novel framework for unified calibration of the coordinate transformations and kinematic parameters in both robot arms. Our key idea is to unify all the tightly coupled parameters within a single Lie-algebraic formulation. To this end, we construct a consolidated error model grounded in the product-of-exponentials formula, which naturally integrates the coordinate and kinematic parameters in twist forms. Our model introduces no artificial error separation and thus greatly mitigates the error propagation. In addition, we derive a closed-form analytical Jacobian from this model using Lie derivatives. By exploring the Jacobian rank property, we analyze the identifiability of all calibration parameters and show that our joint optimization is well-posed under mild conditions. This enables off-the-shelf iterative solvers to stably optimize these parameters on the manifold space. Besides, to ensure robust convergence of our joint optimization, we develop a certifiably correct algorithm for initializing the unknown coordinates. Relying on semidefinite relaxation, our algorithm can yield a reliable estimate whose near-global optimality can be verified a posteriori. Extensive experiments validate the superior accuracy of our approach over previous baselines under identical visual measurements. Meanwhile, our certifiable initialization consistently outperforms several coordinate-only baselines, proving its reliability as a starting point for joint optimization.

97.8LGMar 24
Off-Policy Value-Based Reinforcement Learning for Large Language Models

Peng-Yuan Wang, Ziniu Li, Tian Xu et al.

Improving data utilization efficiency is critical for scaling reinforcement learning (RL) for long-horizon tasks where generating trajectories is expensive. However, the dominant RL methods for LLMs are largely on-policy: they update each batch of data only once, discard it, and then collect fresh samples, resulting in poor sample efficiency. In this work, we explore an alternative value-based RL framework for LLMs that naturally enables off-policy learning. We propose ReVal, a Bellman-update-based method that combines stepwise signals capturing internal consistency with trajectory-level signals derived from outcome verification. ReVal naturally supports replay-buffer-based training, allowing efficient reuse of past trajectories. Experiments on standard mathematical reasoning benchmarks show that ReVal not only converges faster but also outperforms GRPO in final performance. On DeepSeek-R1-Distill-1.5B, ReVal improves training efficiency and achieves improvement of 2.7% in AIME24 and 4.5% in out-of-domain benchmark GPQA over GRPO. These results suggest that value-based RL is a practical alternative to policy-based methods for LLM training.

LGAug 10, 2022
Machine Learning-based EEG Applications and Markets

Weiqing Gu, Bohan Yang, Ryan Chang

This paper addresses both the various EEG applications and the current EEG market ecosystem propelled by machine learning. Increasingly available open medical and health datasets using EEG encourage data-driven research with a promise of improving neurology for patient care through knowledge discovery and machine learning data science algorithm development. This effort leads to various kinds of EEG developments and currently forms a new EEG market. This paper attempts to do a comprehensive survey on the EEG market and covers the six significant applications of EEG, including diagnosis/screening, drug development, neuromarketing, daily health, metaverse, and age/disability assistance. The highlight of this survey is on the compare and contrast between the research field and the business market. Our survey points out the current limitations of EEG and indicates the future direction of research and business opportunity for every EEG application listed above. Based on our survey, more research on machine learning-based EEG applications will lead to a more robust EEG-related market. More companies will use the research technology and apply it to real-life settings. As the EEG-related market grows, the EEG-related devices will collect more EEG data, and there will be more EEG data available for researchers to use in their study, coming back as a virtuous cycle. Our market analysis indicates that research related to the use of EEG data and machine learning in the six applications listed above points toward a clear trend in the growth and development of the EEG ecosystem and machine learning world.

CVOct 21, 2024
Leveraging CORAL-Correlation Consistency Network for Semi-Supervised Left Atrium MRI Segmentation

Xinze Li, Runlin Huang, Zhenghao Wu et al.

Semi-supervised learning (SSL) has been widely used to learn from both a few labeled images and many unlabeled images to overcome the scarcity of labeled samples in medical image segmentation. Most current SSL-based segmentation methods use pixel values directly to identify similar features in labeled and unlabeled data. They usually fail to accurately capture the intricate attachment structures in the left atrium, such as the areas of inconsistent density or exhibit outward curvatures, adding to the complexity of the task. In this paper, we delve into this issue and introduce an effective solution, CORAL(Correlation-Aligned)-Correlation Consistency Network (CORN), to capture the global structure shape and local details of Left Atrium. Diverging from previous methods focused on each local pixel value, the CORAL-Correlation Consistency Module (CCM) in the CORN leverages second-order statistical information to capture global structural features by minimizing the distribution discrepancy between labeled and unlabeled samples in feature space. Yet, direct construction of features from unlabeled data frequently results in ``Sample Selection Bias'', leading to flawed supervision. We thus further propose the Dynamic Feature Pool (DFP) for the CCM, which utilizes a confidence-based filtering strategy to remove incorrectly selected features and regularize both teacher and student models by constraining the similarity matrix to be consistent. Extensive experiments on the Left Atrium dataset have shown that the proposed CORN outperforms previous state-of-the-art semi-supervised learning methods.

MED-PHJun 18, 2025
Unsupervised deep learning model for fast energy layer pre-selection of delivery-efficient proton arc therapy plan optimization of nasopharyngeal carcinoma

Bohan Yang, Gang Liu, Yang Zhong et al.

Proton arc therapy (PAT) is an emerging and promising modality in radiotherapy, offering improved dose distribution and treatment robustness over intensity-modulated proton therapy. Yet, identifying the optimal energy layer (EL) sequence remains challenging due to the intensive computational demand and prolonged treatment delivery time. This study proposes an unsupervised deep learning model for fast EL pre-selection that minimizes EL switch (ELS) time while maintaining high plan quality. We introduce a novel data representation method, spot-count representation, which encodes the number of proton spots intersecting the target and organs at risk (OAR) in a matrix structured by sorted gantry angles and energy layers. This representation serves as the input of an U-Net style architecture, SPArc_dl, which is trained using a tri-objective function: maximizing spot-counts on target, minimizing spot-counts on OAR, and reducing ELS time. The model is evaluated on 35 nasopharyngeal cancer cases, and its performance is compared to SPArc_particle_swarm (SPArc_ps). SPArc_dl produces EL pre-selection that significantly improves both plan quality and delivery efficiency. Compared to SPArc_ps, it enhances the conformity index by 0.1 (p<0.01), reduces the homogeneity index by 0.71 (p<0.01), lowers the brainstem mean dose by 0.25 (p<0.01), and shortens the ELS time by 37.2% (p < 0.01). The results unintentionally reveal employing unchanged ELS is more time-wise efficient than descended ELS. SPArc_dl's inference time is within 1 second. However, SPArc_dl plan demonstrates limitation in robustness. The proposed spot-count representation lays a foundation for incorporating unsupervised deep learning approaches into EL pre-selection task. SPArc_dl is a fast tool for generating high-quality PAT plans by strategically pre-selecting EL to reduce delivery time while maintaining excellent dosimetric performance.

CVOct 8, 2021
Stereo Dense Scene Reconstruction and Accurate Localization for Learning-Based Navigation of Laparoscope in Minimally Invasive Surgery

Ruofeng Wei, Bin Li, Hangjie Mo et al.

Objective: The computation of anatomical information and laparoscope position is a fundamental block of surgical navigation in Minimally Invasive Surgery (MIS). Recovering a dense 3D structure of surgical scene using visual cues remains a challenge, and the online laparoscopic tracking primarily relies on external sensors, which increases system complexity. Methods: Here, we propose a learning-driven framework, in which an image-guided laparoscopic localization with 3D reconstructions of complex anatomical structures is obtained. To reconstruct the 3D structure of the whole surgical environment, we first fine-tune a learning-based stereoscopic depth perception method, which is robust to the texture-less and variant soft tissues, for depth estimation. Then, we develop a dense visual reconstruction algorithm to represent the scene by surfels, estimate the laparoscope poses and fuse the depth maps into a unified reference coordinate for tissue reconstruction. To estimate poses of new laparoscope views, we achieve a coarse-to-fine localization method, which incorporates our reconstructed 3D model. Results: We evaluate the reconstruction method and the localization module on three datasets, namely, the stereo correspondence and reconstruction of endoscopic data (SCARED), the ex-vivo phantom and tissue data collected with Universal Robot (UR) and Karl Storz Laparoscope, and the in-vivo DaVinci robotic surgery dataset, where the reconstructed 3D structures have rich details of surface texture with an accuracy error under 1.71 mm and the localization module can accurately track the laparoscope with only images as input. Conclusions: Experimental results demonstrate the superior performance of the proposed method in 3D anatomy reconstruction and laparoscopic localization. Significance: The proposed framework can be potentially extended to the current surgical navigation system.