Yiheng Li

h-index23

7papers

106citations

Novelty44%

AI Score39

Ranked #78,472 of 194,257 authors (top 40%)#26,556 in CV (top 45%)

7 Papers

6.2CVNov 13, 2025

Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction

Mingda Jia, Weiliang Meng, Zenghuang Fu et al.

Dense video captioning jointly localizes and captions salient events in untrimmed videos. Recent methods primarily focus on leveraging additional prior knowledge and advanced multi-task architectures to achieve competitive performance. However, these pipelines rely on implicit modeling that uses frame-level or fragmented video features, failing to capture the temporal coherence across event sequences and comprehensive semantics within visual contexts. To address this, we propose an explicit temporal-semantic modeling framework called Context-Aware Cross-Modal Interaction (CACMI), which leverages both latent temporal characteristics within videos and linguistic semantics from text corpus. Specifically, our model consists of two core components: Cross-modal Frame Aggregation aggregates relevant frames to extract temporally coherent, event-aligned textual features through cross-modal retrieval; and Context-aware Feature Enhancement utilizes query-guided attention to integrate visual dynamics with pseudo-event semantics. Extensive experiments on the ActivityNet Captions and YouCook2 datasets demonstrate that CACMI achieves the state-of-the-art performance on dense video captioning task.

19.0CVJul 12, 2025Code

MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models

Qiyan Zhao, Xiaofeng Zhang, Yiheng Li et al.

Hallucinations pose a significant challenge in Large Vision Language Models (LVLMs), with misalignment between multimodal features identified as a key contributing factor. This paper reveals the negative impact of the long-term decay in Rotary Position Encoding (RoPE), used for positional modeling in LVLMs, on multimodal alignment. Concretely, under long-term decay, instruction tokens exhibit uneven perception of image tokens located at different positions within the two-dimensional space: prioritizing image tokens from the bottom-right region since in the one-dimensional sequence, these tokens are positionally closer to the instruction tokens. This biased perception leads to insufficient image-instruction interaction and suboptimal multimodal alignment. We refer to this phenomenon as image alignment bias. To enhance instruction's perception of image tokens at different spatial locations, we propose MCA-LLaVA, based on Manhattan distance, which extends the long-term decay to a two-dimensional, multi-directional spatial decay. MCA-LLaVA integrates the one-dimensional sequence order and two-dimensional spatial position of image tokens for positional modeling, mitigating hallucinations by alleviating image alignment bias. Experimental results of MCA-LLaVA across various hallucination and general benchmarks demonstrate its effectiveness and generality. The code can be accessed in https://github.com/ErikZ719/MCA-LLaVA.

5.1IVMay 28, 2025

Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans

Yiheng Li, Francisco Carrillo-Perez, Mohammed Alawad et al.

Lung cancer is the leading cause of cancer mortality worldwide, and non-invasive methods for detecting key mutations and staging are essential for improving patient outcomes. Here, we compare the performance of two machine learning models - FMCIB+XGBoost, a supervised model with domain-specific pretraining, and Dinov2+ABMIL, a self-supervised model with attention-based multiple-instance learning - on 3D lung nodule data from the Stanford Radiogenomics and Lung-CT-PT-Dx cohorts. In the task of KRAS and EGFR mutation detection, FMCIB+XGBoost consistently outperformed Dinov2+ABMIL, achieving accuracies of 0.846 and 0.883 for KRAS and EGFR mutations, respectively. In cancer staging, Dinov2+ABMIL demonstrated competitive generalization, achieving an accuracy of 0.797 for T-stage prediction in the Lung-CT-PT-Dx cohort, suggesting SSL's adaptability across diverse datasets. Our results emphasize the clinical utility of supervised models in mutation detection and highlight the potential of SSL to improve staging generalization, while identifying areas for enhancement in mutation sensitivity.

5.1APAug 7, 2021

Kinematics clustering enables head impact subtyping for better traumatic brain injury prediction

Xianghao Zhan, Yiheng Li, Yuzhe Liu et al.

Traumatic brain injury can be caused by various types of head impacts. However, due to different kinematic characteristics, many brain injury risk estimation models are not generalizable across the variety of impacts that humans may sustain. The current definitions of head impact subtypes are based on impact sources (e.g., football, traffic accident), which may not reflect the intrinsic kinematic similarities of impacts across the impact sources. To investigate the potential new definitions of impact subtypes based on kinematics, 3,161 head impacts from various sources including simulation, college football, mixed martial arts, and car racing were collected. We applied the K-means clustering to cluster the impacts on 16 standardized temporal features from head rotation kinematics. Then, we developed subtype-specific ridge regression models for cumulative strain damage (using the threshold of 15%), which significantly improved the estimation accuracy compared with the baseline method which mixed impacts from different sources and developed one model (R^2 from 0.7 to 0.9). To investigate the effect of kinematic features, we presented the top three critical features (maximum resultant angular acceleration, maximum angular acceleration along the z-axis, maximum linear acceleration along the y-axis) based on regression accuracy and used logistic regression to find the critical points for each feature that partitioned the subtypes. This study enables researchers to define head impact subtypes in a data-driven manner, which leads to more generalizable brain injury risk estimation.

5.1QMApr 19, 2021

Machine-learning-based head impact subtyping based on the spectral densities of the measurable head kinematics

Xianghao Zhan, Yiheng Li, Yuzhe Liu et al.

Objective: Traumatic brain injury can be caused by head impacts, but many brain injury risk estimation models are not equally accurate across the variety of impacts that patients may undergo and the characteristics of different types of impacts are not well studied. We investigated the spectral characteristics of different head impact types with kinematics classification. Methods: Data was analyzed from 3,262 head impacts from lab reconstruction, American football, mixed martial arts, and publicly available car crash data. A random forest classifier with spectral densities of linear acceleration and angular velocity was built to classify head impact types (e.g., football, car crash, mixed martial arts). To test the classifier robustness, another 271 lab-reconstructed impacts were obtained from 5 other instrumented mouthguards. Finally, with the classifier, type-specific, nearest-neighbor regression models were built for brain strain. Results: The classifier reached a median accuracy of 96% over 1,000 random partitions of training and test sets. The most important features in the classification included both low-frequency and high-frequency features, both linear acceleration features and angular velocity features. Different head impact types had different distributions of spectral densities in low-frequency and high-frequency ranges (e.g., the spectral densities of MMA impacts were higher in high-frequency range than in the low-frequency range). The type-specific regression showed a generally higher R^2-value than baseline models without classification. Conclusion: The machine-learning-based classifier enables a better understanding of the impact kinematics spectral density in different sports, and it can be applied to evaluate the quality of impact-simulation systems and on-field data augmentation.

8.0BIO-PHFeb 9, 2021

Predictive Factors of Kinematics in Traumatic Brain Injury from Head Impacts Based on Statistical Interpretation

Xianghao Zhan, Yiheng Li, Yuzhe Liu et al.

Brain tissue deformation resulting from head impacts is primarily caused by rotation and can lead to traumatic brain injury. To quantify brain injury risk based on measurements of kinematics on the head, finite element (FE) models and various brain injury criteria based on different factors of these kinematics have been developed, but the contribution of different kinematic factors has not been comprehensively analyzed across different types of head impacts in a data-driven manner. To better design brain injury criteria, the predictive power of rotational kinematics factors, which are different in 1) the derivative order (angular velocity, angular acceleration, angular jerk), 2) the direction and 3) the power (e.g., square-rooted, squared, cubic) of the angular velocity, were analyzed based on different datasets including laboratory impacts, American football, mixed martial arts (MMA), NHTSA automobile crashworthiness tests and NASCAR crash events. Ordinary least squares regressions were built from kinematics factors to the 95\% maximum principal strain (MPS95), and we compared zero-order correlation coefficients, structure coefficients, commonality analysis, and dominance analysis. The angular acceleration, the magnitude, and the first power factors showed the highest predictive power for the majority of impacts including laboratory impacts, American football impacts, with few exceptions (angular velocity for MMA and NASCAR impacts). The predictive power of rotational kinematics in three directions (x: posterior-to-anterior, y: left-to-right, z: superior-to-inferior) of kinematics varied with different sports and types of head impacts.

8.0TODec 18, 2020

Relationship between brain injury criteria and brain strain across different types of head impacts can be different

Xianghao Zhan, Yiheng Li, Yuzhe Liu et al.

Multiple brain injury criteria (BIC) are developed to quickly quantify brain injury risks after head impacts. These BIC originated from different types of head impacts (e.g., sports and car crashes) are widely used in risk evaluation. However, the accuracy of using the BIC on brain injury risk estimation across different types of head impacts has not been evaluated. Physiologically, brain strain is often considered the key parameter of brain injury. To evaluate the BIC's risk estimation accuracy across five datasets comprising different head impact types, linear regression was used to model 95% maximum principal strain, 95% maximum principal strain at the corpus callosum, and cumulative strain damage (15%) on each of 18 BIC respectively. The results show a significant difference in the relationship between BIC and brain strain across datasets, indicating the same BIC value may suggest different brain strain in different head impact types. The accuracy of brain strain regression is generally decreasing if the BIC regression models are fit on a dataset with a different type of head impact rather than on the dataset with the same type. Given this finding, this study raises concerns for applying BIC to estimate the brain injury risks for head impacts different from the head impacts on which the BIC was developed.