Darian Li

CL
h-index10
4papers
31citations
Novelty38%
AI Score38

4 Papers

CLFeb 16, 2025Code
Exposing Numeracy Gaps: A Benchmark to Evaluate Fundamental Numerical Abilities in Large Language Models

Haoyang Li, Xuejia Chen, Zhanchao XU et al.

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language processing tasks, such as text generation and semantic understanding. However, their performance on numerical reasoning tasks, such as basic arithmetic, numerical retrieval, and magnitude comparison, remains surprisingly poor. This gap arises from their reliance on surface-level statistical patterns rather than understanding numbers as continuous magnitudes. Existing benchmarks primarily focus on either linguistic competence or structured mathematical problem-solving, neglecting fundamental numerical reasoning required in real-world scenarios. To bridge this gap, we propose NumericBench, a comprehensive benchmark to evaluate six fundamental numerical capabilities: number recognition, arithmetic operations, contextual retrieval, comparison, summary, and logical reasoning. NumericBench includes datasets ranging from synthetic number lists to the crawled real-world data, addressing challenges like long contexts, noise, and multi-step reasoning. Extensive experiments on state-of-the-art LLMs, including GPT-4 and DeepSeek, reveal persistent weaknesses in numerical reasoning, highlighting the urgent need to improve numerically-aware language modeling. The benchmark is released in: https://github.com/TreeAI-Lab/NumericBench.

CLApr 3, 2025
A Survey of Large Language Models in Mental Health Disorder Detection on Social Media

Zhuohan Ge, Nicole Hu, Darian Li et al.

The detection and intervention of mental health issues represent a critical global research focus, and social media data has been recognized as an important resource for mental health research. However, how to utilize Large Language Models (LLMs) for mental health problem detection on social media poses significant challenges. Hence, this paper aims to explore the potential of LLM applications in social media data analysis, focusing not only on the most common psychological disorders such as depression and anxiety but also incorporating psychotic disorders and externalizing disorders, summarizing the application methods of LLM from different dimensions, such as text data analysis and detection of mental disorders, and revealing the major challenges and shortcomings of current research. In addition, the paper provides an overview of popular datasets, and evaluation metrics. The survey in this paper provides a comprehensive frame of reference for researchers in the field of mental health, while demonstrating the great potential of LLMs in mental health detection to facilitate the further application of LLMs in future mental health interventions.

CLJul 18, 2025
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues

Haoyang Li, Zhanchao Xu, Yiming Li et al.

Multi-turn dialogues are essential in many real-world applications of large language models, such as chatbots and virtual assistants. As conversation histories become longer, existing large language models face increasing computational and memory challenges, which hinder their ability to provide efficient and responsive interactions. Most current acceleration methods either compress the context or optimize key value caching, but they often rely on fixed or position-based heuristics that do not adapt well to the dynamic and unpredictable patterns found in actual multi-turn conversations. As a result, these models cannot accurately identify and prioritize the most relevant context, leading to degraded response quality. In this paper, we present LoopServe, an adaptive dual-phase inference acceleration framework for large language models in multi-turn dialogues. LoopServe introduces two main innovations. First, it performs online sparsification during the prefilling phase by dynamically selecting the most important parts of the attention matrix for each new input. Second, it uses progressive key value compression during decoding by adaptively maintaining a relevant and efficient cache based on the most recently generated output tokens. We also propose a new benchmark with eleven multi-turn datasets that reflect realistic query positions and conversational dependencies. Extensive experiments demonstrate that LoopServe consistently achieves superior effectiveness compared to existing baselines and significantly accelerates LLM inference across a wide range of long-context dialogue tasks.

AIJul 18, 2025
When Speed meets Accuracy: an Efficient and Effective Graph Model for Temporal Link Prediction

Haoyang Li, Yuming Xu, Yiming Li et al.

Temporal link prediction in dynamic graphs is a critical task with applications in diverse domains such as social networks, recommendation systems, and e-commerce platforms. While existing Temporal Graph Neural Networks (T-GNNs) have achieved notable success by leveraging complex architectures to model temporal and structural dependencies, they often suffer from scalability and efficiency challenges due to high computational overhead. In this paper, we propose EAGLE, a lightweight framework that integrates short-term temporal recency and long-term global structural patterns. EAGLE consists of a time-aware module that aggregates information from a node's most recent neighbors to reflect its immediate preferences, and a structure-aware module that leverages temporal personalized PageRank to capture the influence of globally important nodes. To balance these attributes, EAGLE employs an adaptive weighting mechanism to dynamically adjust their contributions based on data characteristics. Also, EAGLE eliminates the need for complex multi-hop message passing or memory-intensive mechanisms, enabling significant improvements in efficiency. Extensive experiments on seven real-world temporal graphs demonstrate that EAGLE consistently achieves superior performance against state-of-the-art T-GNNs in both effectiveness and efficiency, delivering more than a 50x speedup over effective transformer-based T-GNNs.