Kangning Li

CV
h-index10
5papers
10citations
Novelty47%
AI Score30

5 Papers

CVJan 11, 2023
Object Detection in 3D Point Clouds via Local Correlation-Aware Point Embedding

Chengzhi Wu, Julius Pfrommer, Jürgen Beyerer et al.

We present an improved approach for 3D object detection in point cloud data based on the Frustum PointNet (F-PointNet). Compared to the original F-PointNet, our newly proposed method considers the point neighborhood when computing point features. The newly introduced local neighborhood embedding operation mimics the convolutional operations in 2D neural networks. Thus features of each point are not only computed with the features of its own or of the whole point cloud but also computed especially with respect to the features of its neighbors. Experiments show that our proposed method achieves better performance than the F-Pointnet baseline on 3D object detection tasks.

CVMar 22, 2024
An Integrated Neighborhood and Scale Information Network for Open-Pit Mine Change Detection in High-Resolution Remote Sensing Images

Zilin Xie, Kangning Li, Jinbao Jiang et al.

Open-pit mine change detection (CD) in high-resolution (HR) remote sensing images plays a crucial role in mineral development and environmental protection. Significant progress has been made in this field in recent years, largely due to the advancement of deep learning techniques. However, existing deep-learning-based CD methods encounter challenges in effectively integrating neighborhood and scale information, resulting in suboptimal performance. Therefore, by exploring the influence patterns of neighborhood and scale information, this paper proposes an Integrated Neighborhood and Scale Information Network (INSINet) for open-pit mine CD in HR remote sensing images. Specifically, INSINet introduces 8-neighborhood-image information to acquire a larger receptive field, improving the recognition of center image boundary regions. Drawing on techniques of skip connection, deep supervision, and attention mechanism, the multi-path deep supervised attention (MDSA) module is designed to enhance multi-scale information fusion and change feature extraction. Experimental analysis reveals that incorporating neighborhood and scale information enhances the F1 score of INSINet by 6.40%, with improvements of 3.08% and 3.32% respectively. INSINet outperforms existing methods with an Overall Accuracy of 97.69%, Intersection over Union of 71.26%, and F1 score of 83.22%. INSINet shows significance for open-pit mine CD in HR remote sensing images.

CLJun 24, 2025
Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents

Jiaxi Zhuang, Kangning Li, Jue Hou et al.

Extracting molecular structure-activity relationships (SARs) from scientific literature and patents is essential for drug discovery and materials research. However, this task remains challenging due to heterogeneous document formats and limitations of existing methods. Specifically, rule-based approaches relying on rigid templates fail to generalize across diverse document layouts, while general-purpose multimodal large language models (MLLMs) lack sufficient accuracy and reliability for specialized tasks, such as layout detection and optical chemical structure recognition (OCSR). To address these challenges, we introduce DocSAR-200, a rigorously annotated benchmark of 200 scientific documents designed specifically for evaluating SAR extraction methods. Additionally, we propose Doc2SAR, a novel synergistic framework that integrates domain-specific tools with MLLMs enhanced via supervised fine-tuning (SFT). Extensive experiments demonstrate that Doc2SAR achieves state-of-the-art performance across various document types, significantly outperforming leading end-to-end baselines. Specifically, Doc2SAR attains an overall Table Recall of 80.78% on DocSAR-200, exceeding end2end GPT-4o by 51.48%. Furthermore, Doc2SAR demonstrates practical usability through efficient inference and is accompanied by a web app.

HCJan 15, 2025
Alleviating Seasickness through Brain-Computer Interface-based Attention Shift

Xiaoyu Bao, Kailin Xu, Jiawei Zhu et al.

Seasickness poses a widespread problem that adversely impacts both passenger comfort and the operational efficiency of maritime crews. Although attention shift has been proposed as a potential method to alleviate symptoms of motion sickness, its efficacy remains to be rigorously validated, especially in maritime environments. In this study, we develop an AI-driven brain-computer interface (BCI) to realize sustained and practical attention shift by incorporating tasks such as breath counting. Forty-three participants completed a real-world nautical experiment consisting of a real-feedback session, a resting session, and a pseudo-feedback session. Notably, 81.39\% of the participants reported that the BCI intervention was effective. EEG analysis revealed that the proposed system can effectively regulate motion sickness EEG signatures, such as an decrease in total band power, along with an increase in theta relative power and a decrease in beta relative power. Furthermore, an indicator of attentional focus, the theta/beta ratio, exhibited a significant reduction during the real-feedback session, providing further evidence to support the effectiveness of the BCI in shifting attention. Collectively, this study presents a novel nonpharmacological, portable, and effective approach for seasickness intervention, which has the potential to open up a brand-new application domain for BCIs.

CVDec 19, 2024
Movie2Story: A framework for understanding videos and telling stories in the form of novel text

Kangning Li, Zheyang Jia, Anyu Ying

In recent years, large-scale models have achieved significant advancements, accompanied by the emergence of numerous high-quality benchmarks for evaluating various aspects of their comprehension abilities. However, most existing benchmarks primarily focus on spatial understanding in static image tasks. While some benchmarks extend evaluations to temporal tasks, they fall short in assessing text generation under complex contexts involving long videos and rich auxiliary information. To address this limitation, we propose a novel benchmark: the Multi-modal Story Generation Benchmark (MSBench), designed to evaluate text generation capabilities in scenarios enriched with auxiliary information. Our work introduces an innovative automatic dataset generation method to ensure the availability of accurate auxiliary information. On one hand, we leverage existing datasets and apply automated processes to generate new evaluation datasets, significantly reducing manual efforts. On the other hand, we refine auxiliary data through systematic filtering and utilize state-of-the-art models to ensure the fairness and accuracy of the ground-truth datasets. Our experiments reveal that current Multi-modal Large Language Models (MLLMs) perform suboptimally under the proposed evaluation metrics, highlighting significant gaps in their capabilities. To address these challenges, we propose a novel model architecture and methodology to better handle the overall process, demonstrating improvements on our benchmark.