Juan Li

CL
h-index25
20papers
1,306citations
Novelty42%
AI Score56

20 Papers

CLOct 19, 2023Code
Reliable Academic Conference Question Answering: A Study Based on Large Language Model

Zhiwei Huang, Juan Li, Long Jin et al.

As the development of academic conferences fosters global scholarly communication, researchers consistently need to obtain accurate and up-to-date information about academic conferences. Since the information is scattered, using an intelligent question-answering system to efficiently handle researchers' queries and ensure awareness of the latest advancements is necessary. Recently, Large Language Models (LLMs) have demonstrated impressive capabilities in question answering, and have been enhanced by retrieving external knowledge to deal with outdated knowledge. However, these methods fail to work due to the lack of the latest conference knowledge. To address this challenge, we develop the ConferenceQA dataset, consisting of seven diverse academic conferences. Specifically, for each conference, we first organize academic conference data in a tree-structured format through a semi-automated method. Then we annotate question-answer pairs and classify the pairs into four different types to better distinguish their difficulty. With the constructed dataset, we further propose a novel method STAR (STructure-Aware Retrieval) to improve the question-answering abilities of LLMs, leveraging inherent structural information during the retrieval process. Experimental results on the ConferenceQA dataset show the effectiveness of our retrieval method. The dataset and code are available at https://github.com/zjukg/ConferenceQA.

FLU-DYNMay 10, 2022
Flow Completion Network: Inferring the Fluid Dynamics from Incomplete Flow Information using Graph Neural Networks

Xiaodong He, Yinan Wang, Juan Li

This paper introduces a novel neural network - flow completion network (FCN) - to infer the fluid dynamics, includ-ing the flow field and the force acting on the body, from the incomplete data based on Graph Convolution AttentionNetwork. The FCN is composed of several graph convolution layers and spatial attention layers. It is designed to inferthe velocity field and the vortex force contribution of the flow field when combined with the vortex force map (VFM)method. Compared with other neural networks adopted in fluid dynamics, the FCN is capable of dealing with bothstructured data and unstructured data. The performance of the proposed FCN is assessed by the computational fluiddynamics (CFD) data on the flow field around a circular cylinder. The force coefficients predicted by our model arevalidated against those obtained directly from CFD. Moreover, it is shown that our model effectively utilizes the exist-ing flow field information and the gradient information simultaneously, giving a better performance than the traditionalconvolution neural network (CNN)-based and deep neural network (DNN)-based models. Specifically, among all thecases of different Reynolds numbers and different proportions of the training dataset, the results show that the proposedFCN achieves a maximum norm mean square error of 5.86% in the test dataset, which is much lower than those of thetraditional CNN-based and DNN-based models (42.32% and 15.63% respectively).

CVNov 6, 2025Code
BoRe-Depth: Self-supervised Monocular Depth Estimation with Boundary Refinement for Embedded Systems

Chang Liu, Juan Li, Sheng Zhang et al.

Depth estimation is one of the key technologies for realizing 3D perception in unmanned systems. Monocular depth estimation has been widely researched because of its low-cost advantage, but the existing methods face the challenges of poor depth estimation performance and blurred object boundaries on embedded systems. In this paper, we propose a novel monocular depth estimation model, BoRe-Depth, which contains only 8.7M parameters. It can accurately estimate depth maps on embedded systems and significantly improves boundary quality. Firstly, we design an Enhanced Feature Adaptive Fusion Module (EFAF) which adaptively fuses depth features to enhance boundary detail representation. Secondly, we integrate semantic knowledge into the encoder to improve the object recognition and boundary perception capabilities. Finally, BoRe-Depth is deployed on NVIDIA Jetson Orin, and runs efficiently at 50.7 FPS. We demonstrate that the proposed model significantly outperforms previous lightweight models on multiple challenging datasets, and we provide detailed ablation studies for the proposed methods. The code is available at https://github.com/liangxiansheng093/BoRe-Depth.

CLNov 11, 2025
Self-Correction Distillation for Structured Data Question Answering

Yushan Zhu, Wen Zhang, Long Jin et al.

Structured data question answering (QA), including table QA, Knowledge Graph (KG) QA, and temporal KG QA, is a pivotal research area. Advances in large language models (LLMs) have driven significant progress in unified structural QA frameworks like TrustUQA. However, these frameworks face challenges when applied to small-scale LLMs since small-scale LLMs are prone to errors in generating structured queries. To improve the structured data QA ability of small-scale LLMs, we propose a self-correction distillation (SCD) method. In SCD, an error prompt mechanism (EPM) is designed to detect errors and provide customized error messages during inference, and a two-stage distillation strategy is designed to transfer large-scale LLMs' query-generation and error-correction capabilities to small-scale LLM. Experiments across 5 benchmarks with 3 structured data types demonstrate that our SCD achieves the best performance and superior generalization on small-scale LLM (8B) compared to other distillation methods, and closely approaches the performance of GPT4 on some datasets. Furthermore, large-scale LLMs equipped with EPM surpass the state-of-the-art results on most datasets.

CVAug 27, 2024
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding

Chuanghao Ding, Xuejing Liu, Wei Tang et al.

This paper introduces SynthDoc, a novel synthetic document generation pipeline designed to enhance Visual Document Understanding (VDU) by generating high-quality, diverse datasets that include text, images, tables, and charts. Addressing the challenges of data acquisition and the limitations of existing datasets, SynthDoc leverages publicly available corpora and advanced rendering tools to create a comprehensive and versatile dataset. Our experiments, conducted using the Donut model, demonstrate that models trained with SynthDoc's data achieve superior performance in pre-training read tasks and maintain robustness in downstream tasks, despite language inconsistencies. The release of a benchmark dataset comprising 5,000 image-text pairs not only showcases the pipeline's capabilities but also provides a valuable resource for the VDU community to advance research and development in document image recognition. This work significantly contributes to the field by offering a scalable solution to data scarcity and by validating the efficacy of end-to-end models in parsing complex, real-world documents.

CLSep 4, 2025Code
RTQA : Recursive Thinking for Complex Temporal Knowledge Graph Question Answering with Large Language Models

Zhaoyan Gong, Juan Li, Zhiqiang Liu et al.

Current temporal knowledge graph question answering (TKGQA) methods primarily focus on implicit temporal constraints, lacking the capability of handling more complex temporal queries, and struggle with limited reasoning abilities and error propagation in decomposition frameworks. We propose RTQA, a novel framework to address these challenges by enhancing reasoning over TKGs without requiring training. Following recursive thinking, RTQA recursively decomposes questions into sub-problems, solves them bottom-up using LLMs and TKG knowledge, and employs multi-path answer aggregation to improve fault tolerance. RTQA consists of three core components: the Temporal Question Decomposer, the Recursive Solver, and the Answer Aggregator. Experiments on MultiTQ and TimelineKGQA benchmarks demonstrate significant Hits@1 improvements in "Multiple" and "Complex" categories, outperforming state-of-the-art methods. Our code and data are available at https://github.com/zjukg/RTQA.

SPAug 28, 2023
Assessing cognitive function among older adults using machine learning and wearable device data: a feasibility study

Collin Sakal, Tingyou Li, Juan Li et al.

Timely implementation of interventions to slow cognitive decline among older adults requires accurate monitoring to detect changes in cognitive function. Data gathered using wearable devices that can continuously monitor factors known to be associated with cognition could be used to train machine learning models and develop wearable-based cognitive monitoring systems. Using data from over 2,400 older adults in the National Health and Nutrition Examination Survey (NHANES) we developed prediction models to differentiate older adults with normal cognition from those with poor cognition based on outcomes from three cognitive tests measuring different domains of cognitive function. During repeated cross-validation, CatBoost, XGBoost, and Random Forest models performed best when predicting cognition based on processing speed, working memory, and attention (median AUCs >0.82) compared to immediate and delayed recall (median AUCs >0.72) and categorical verbal fluency (median AUC >0.68). Activity and sleep parameters were also more strongly associated with processing speed, working memory, and attention compared to other cognitive subdomains. Our work provides proof of concept that wearable-based cognitive monitoring systems may be a viable alternative to traditional methods for monitoring processing speeds, working memory, and attention. We further identified novel metrics that could be targets in future causal studies seeking to better understand how sleep and activity parameters influence cognitive function among older adults.

HCMar 19
CyberJustice Tutor: An Agentic AI Framework for Cybersecurity Learning via Think-Plan-Act Reasoning and Pedagogical Scaffolding

Baiqiang Wang, Yan Bai, Juan Li

The integration of Large Language Models (LLMs) into cybersecurity education for criminal justice professionals is currently hindered by the "statelessness" of reactive chatbots and the risk of hallucinations in high-stakes legal contexts. To address these limitations, we propose the CyberJustice Tutor, an educational dialogue system powered by an Agentic AI framework. Unlike reactive chatbots, our system employs a "Think-Plan-Act" cognitive cycle, enabling autonomous goal decomposition, longitudinal planning, and dynamic context maintenance. We integrate a Pedagogical Scaffolding Layer grounded in Vygotsky's Zone of Proximal Development (ZPD), which dynamically adapts instructional support based on the learner's real-time progress. Furthermore, an Adaptive Retrieval Augmented Generation (RAG) core anchors the agent's reasoning in verified curriculum materials to ensure legal and technical accuracy. A comprehensive user study with 123 participants, including students, educators, and active law enforcement officers, validated the system's efficacy. Quantitative results demonstrate high user acceptance for Response Speed (4.7/5), Ease of Use (4.4/5), and Accuracy (4.3/5). Qualitative feedback indicates that the agentic architecture is perceived as highly effective in guiding learners through personalized paths, demonstrating the feasibility and usability of agentic AI for specialized professional education.

CLJun 3, 2025Code
A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation

Zihui Ma, Lingyao Li, Juan Li et al.

Rapid, fine-grained disaster damage assessment is essential for effective emergency response, yet remains challenging due to limited ground sensors and delays in official reporting. Social media provides a rich, real-time source of human-centric observations, but its multimodal and unstructured nature presents challenges for traditional analytical methods. In this study, we propose a structured Multimodal, Multilingual, and Multidimensional (3M) pipeline that leverages multimodal large language models (MLLMs) to assess disaster impacts. We evaluate three foundation models across two major earthquake events using both macro- and micro-level analyses. Results show that MLLMs effectively integrate image-text signals and demonstrate a strong correlation with ground-truth seismic data. However, performance varies with language, epicentral distance, and input modality. This work highlights the potential of MLLMs for disaster assessment and provides a foundation for future research in applying MLLMs to real-time crisis contexts. The code and data are released at: https://github.com/missa7481/EMNLP25_earthquake

AIJun 29, 2021Code
Benchmarking Knowledge-driven Zero-shot Learning

Yuxia Geng, Jiaoyan Chen, Xiang Zhuang et al.

External knowledge (a.k.a. side information) plays a critical role in zero-shot learning (ZSL) which aims to predict with unseen classes that have never appeared in training data. Several kinds of external knowledge, such as text and attribute, have been widely investigated, but they alone are limited with incomplete semantics. Some very recent studies thus propose to use Knowledge Graph (KG) due to its high expressivity and compatibility for representing kinds of knowledge. However, the ZSL community is still in short of standard benchmarks for studying and comparing different external knowledge settings and different KG-based ZSL methods. In this paper, we proposed six resources covering three tasks, i.e., zero-shot image classification (ZS-IMGC), zero-shot relation extraction (ZS-RE), and zero-shot KG completion (ZS-KGC). Each resource has a normal ZSL benchmark and a KG containing semantics ranging from text to attribute, from relational knowledge to logical expressions. We have clearly presented these resources including their construction, statistics, data formats and usage cases w.r.t. different ZSL methods. More importantly, we have conducted a comprehensive benchmarking study, with two general and state-of-the-art methods, two setting-specific methods and one interpretable method. We discussed and compared different ZSL paradigms w.r.t. different external knowledge settings, and found that our resources have great potential for developing more advanced ZSL methods and more solutions for applying KGs for augmenting machine learning. All the resources are available at https://github.com/China-UK-ZSL/Resources_for_KZSL.

CLNov 24, 2023
CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

Zhijie Qu, Juan Li, Zerui Ma et al.

Medical dialogue generation relies on natural language generation techniques to enable online medical consultations. Recently, the widespread adoption of large-scale models in the field of natural language processing has facilitated rapid advancements in this technology. Existing medical dialogue models are mostly based on BERT and pre-trained on English corpora, but there is a lack of high-performing models on the task of Chinese medical dialogue generation. To solve the above problem, this paper proposes CMed-GPT, which is the GPT pre-training language model based on Chinese medical domain text. The model is available in two versions, namely, base and large, with corresponding perplexity values of 8.64 and 8.01. Additionally, we incorporate lexical and entity embeddings into the dialogue text in a uniform manner to meet the requirements of downstream dialogue generation tasks. By applying both fine-tuning and p-tuning to CMed-GPT, we lowered the PPL from 8.44 to 7.35. This study not only confirms the exceptional performance of the CMed-GPT model in generating Chinese biomedical text but also highlights the advantages of p-tuning over traditional fine-tuning with prefix prompts. Furthermore, we validate the significance of incorporating external information in medical dialogue generation, which enhances the quality of dialogue generation.

CLOct 30, 2025
Evontree: Ontology Rule-Guided Self-Evolution of Large Language Models

Mingchen Tu, Zhiqiang Liu, Juan Li et al.

Large language models (LLMs) have demonstrated exceptional capabilities across multiple domains by leveraging massive pre-training and curated fine-tuning data. However, in data-sensitive fields such as healthcare, the lack of high-quality, domain-specific training corpus hinders LLMs' adaptation for specialized applications. Meanwhile, domain experts have distilled domain wisdom into ontology rules, which formalize relationships among concepts and ensure the integrity of knowledge management repositories. Viewing LLMs as implicit repositories of human knowledge, we propose Evontree, a novel framework that leverages a small set of high-quality ontology rules to systematically extract, validate, and enhance domain knowledge within LLMs, without requiring extensive external datasets. Specifically, Evontree extracts domain ontology from raw models, detects inconsistencies using two core ontology rules, and reinforces the refined knowledge via self-distilled fine-tuning. Extensive experiments on medical QA benchmarks with Llama3-8B-Instruct and Med42-v2 demonstrate consistent outperformance over both unmodified models and leading supervised baselines, achieving up to a 3.7% improvement in accuracy. These results confirm the effectiveness, efficiency, and robustness of our approach for low-resource domain adaptation of LLMs.

CVApr 23
MiMIC: Mitigating Visual Modality Collapse in Universal Multimodal Retrieval While Avoiding Semantic Misalignment

Juan Li, Chuanghao Ding, Xujie Zhang et al.

Universal Multimodal Retrieval (UMR) aims to map different modalities (e.g., visual and textual) into a shared embedding space for multi-modal retrieval. Existing UMR methods can be broadly divided into two categories: early-fusion approaches, such as Marvel, which projects visual features into the language model (LM) space for integrating with text modality, and late-fusion approaches, such as UniVL-DR, which encode visual and textual inputs using separate encoders and obtain fused embeddings through addition. Our pilot study reveals that Marvel exhibits visual modality collapse, which is characterized by the model's tendency to disregard visual features while depending excessively on textual cues. In contrast, although UniVL-DR is less affected by this issue, it is more susceptible to semantic misalignment, where semantically related content is positioned far apart in the embedding space. To address these challenges, we propose MiMIC, which introduces two key innovations: (1) a fusion-in-decoder architecture for effective multimodal integration, and (2) robust training through single modality mixin and random caption dropout. Experiments on the WebQA+ and EVQA+ datasets, where image in documents or queries might lack captions, indicate that MiMIC consistently outperforms both early- and late-fusion baselines.

CLNov 18, 2025
A Specialized Large Language Model for Clinical Reasoning and Diagnosis in Rare Diseases

Tao Yang, Dandan Huang, Yunting Lin et al.

Rare diseases affect hundreds of millions worldwide, yet diagnosis often spans years. Convectional pipelines decouple noisy evidence extraction from downstream inferential diagnosis, and general/medical large language models (LLMs) face scarce real world electronic health records (EHRs), stale domain knowledge, and hallucinations. We assemble a large, domain specialized clinical corpus and a clinician validated reasoning set, and develop RareSeek R1 via staged instruction tuning, chain of thought learning, and graph grounded retrieval. Across multicenter EHR narratives and public benchmarks, RareSeek R1 attains state of the art accuracy, robust generalization, and stability under noisy or overlapping phenotypes. Augmented retrieval yields the largest gains when narratives pair with prioritized variants by resolving ambiguity and aligning candidates to mechanisms. Human studies show performance on par with experienced physicians and consistent gains in assistive use. Notably, transparent reasoning highlights decisive non phenotypic evidence (median 23.1%, such as imaging, interventions, functional tests) underpinning many correct diagnoses. This work advances a narrative first, knowledge integrated reasoning paradigm that shortens the diagnostic odyssey and enables auditable, clinically translatable decision support.

AIFeb 15, 2022
Knowledge Graph Reasoning with Logics and Embeddings: Survey and Perspective

Wen Zhang, Jiaoyan Chen, Juan Li et al.

Knowledge graph (KG) reasoning is becoming increasingly popular in both academia and industry. Conventional KG reasoning based on symbolic logic is deterministic, with reasoning results being explainable, while modern embedding-based reasoning can deal with uncertainty and predict plausible knowledge, often with high efficiency via vector computation. A promising direction is to integrate both logic-based and embedding-based methods, with the vision to have advantages of both. It has attracted wide research attention with more and more works published in recent years. In this paper, we comprehensively survey these works, focusing on how logics and embeddings are integrated. We first briefly introduce preliminaries, then systematically categorize and discuss works of logic and embedding-aware KG reasoning from different perspectives, and finally conclude and discuss the challenges and further directions.

GTNov 20, 2021
Incentive Mechanisms for Federated Learning: From Economic and Game Theoretic Perspective

Xuezhen Tu, Kun Zhu, Nguyen Cong Luong et al.

Federated learning (FL) becomes popular and has shown great potentials in training large-scale machine learning (ML) models without exposing the owners' raw data. In FL, the data owners can train ML models based on their local data and only send the model updates rather than raw data to the model owner for aggregation. To improve learning performance in terms of model accuracy and training completion time, it is essential to recruit sufficient participants. Meanwhile, the data owners are rational and may be unwilling to participate in the collaborative learning process due to the resource consumption. To address the issues, there have been various works recently proposed to motivate the data owners to contribute their resources. In this paper, we provide a comprehensive review for the economic and game theoretic approaches proposed in the literature to design various schemes for stimulating data owners to participate in FL training process. In particular, we first present the fundamentals and background of FL, economic theories commonly used in incentive mechanism design. Then, we review applications of game theory and economic approaches applied for incentive mechanisms design of FL. Finally, we highlight some open issues and future research directions concerning incentive mechanism design of FL.

NEMay 11, 2021
A Hybrid Decomposition-based Multi-objective Evolutionary Algorithm for the Multi-Point Dynamic Aggregation Problem

Guanqiang Gao, Bin Xin, Yi Mei et al.

An emerging optimisation problem from the real-world applications, named the multi-point dynamic aggregation (MPDA) problem, has become one of the active research topics of the multi-robot system. This paper focuses on a multi-objective MPDA problem which is to design an execution plan of the robots to minimise the number of robots and the maximal completion time of all the tasks. The strongly-coupled relationships among robots and tasks, the redundancy of the MPDA encoding, and the variable-size decision space of the MO-MPDA problem posed extra challenges for addressing the problem effectively. To address the above issues, we develop a hybrid decomposition-based multi-objective evolutionary algorithm (HDMOEA) using $ \varepsilon $-constraint method. It selects the maximal completion time of all tasks as the main objective, and converted the other objective into constraints. HDMOEA decomposes a MO-MPDA problem into a series of scalar constrained optimization subproblems by assigning each subproblem with an upper bound robot number. All the subproblems are optimized simultaneously with the transferring knowledge from other subproblems. Besides, we develop a hybrid population initialisation mechanism to enhance the quality of initial solutions, and a reproduction mechanism to transmit effective information and tackle the encoding redundancy. Experimental results show that the proposed HDMOEA method significantly outperforms the state-of-the-art methods in terms of several most-used metrics.

CVApr 21, 2021
Machine vision detection to daily facial fatigue with a nonlocal 3D attention network

Zeyu Chen, Xinhang Zhang, Juan Li et al.

Fatigue detection is valued for people to keep mental health and prevent safety accidents. However, detecting facial fatigue, especially mild fatigue in the real world via machine vision is still a challenging issue due to lack of non-lab dataset and well-defined algorithms. In order to improve the detection capability on facial fatigue that can be used widely in daily life, this paper provided an audiovisual dataset named DLFD (daily-life fatigue dataset) which reflected people's facial fatigue state in the wild. A framework using 3D-ResNet along with non-local attention mechanism was training for extraction of local and long-range features in spatial and temporal dimensions. Then, a compacted loss function combining mean squared error and cross-entropy was designed to predict both continuous and categorical fatigue degrees. Our proposed framework has reached an average accuracy of 90.8% on validation set and 72.5% on test set for binary classification, standing a good position compared to other state-of-the-art methods. The analysis of feature map visualization revealed that our framework captured facial dynamics and attempted to build a connection with fatigue state. Our experimental results in multiple metrics proved that our framework captured some typical, micro and dynamic facial features along spatiotemporal dimensions, contributing to the mild fatigue detection in the wild.

CLOct 30, 2020
Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification

Juan Li, Ruoxu Wang, Ningyu Zhang et al.

Relation classification aims to extract semantic relations between entity pairs from the sentences. However, most existing methods can only identify seen relation classes that occurred during training. To recognize unseen relations at test time, we explore the problem of zero-shot relation classification. Previous work regards the problem as reading comprehension or textual entailment, which have to rely on artificial descriptive information to improve the understandability of relation types. Thus, rich semantic knowledge of the relation labels is ignored. In this paper, we propose a novel logic-guided semantic representation learning model for zero-shot relation classification. Our approach builds connections between seen and unseen relations via implicit and explicit semantic representations with knowledge graph embeddings and logic rules. Extensive experimental results demonstrate that our method can generalize to unseen relation types and achieve promising improvements.

MMSep 29, 2019
MG-VAE: Deep Chinese Folk Songs Generation with Specific Regional Style

Jing Luo, Xinyu Yang, Shulei Ji et al.

Regional style in Chinese folk songs is a rich treasure that can be used for ethnic music creation and folk culture research. In this paper, we propose MG-VAE, a music generative model based on VAE (Variational Auto-Encoder) that is capable of capturing specific music style and generating novel tunes for Chinese folk songs (Min Ge) in a manipulatable way. Specifically, we disentangle the latent space of VAE into four parts in an adversarial training way to control the information of pitch and rhythm sequence, as well as of music style and content. In detail, two classifiers are used to separate style and content latent space, and temporal supervision is utilized to disentangle the pitch and rhythm sequence. The experimental results show that the disentanglement is successful and our model is able to create novel folk songs with controllable regional styles. To our best knowledge, this is the first study on applying deep generative model and adversarial training for Chinese music generation.