Haotian Li

HC
h-index28
39papers
705citations
Novelty50%
AI Score57

39 Papers

ROJan 12, 2023Code
ImMesh: An Immediate LiDAR Localization and Meshing Framework

Jiarong Lin, Chongjiang Yuan, Yixi Cai et al.

In this paper, we propose a novel LiDAR(-inertial) odometry and mapping framework to achieve the goal of simultaneous localization and meshing in real-time. This proposed framework termed ImMesh comprises four tightly-coupled modules: receiver, localization, meshing, and broadcaster. The localization module utilizes the prepossessed sensor data from the receiver, estimates the sensor pose online by registering LiDAR scans to maps, and dynamically grows the map. Then, our meshing module takes the registered LiDAR scan for incrementally reconstructing the triangle mesh on the fly. Finally, the real-time odometry, map, and mesh are published via our broadcaster. The key contribution of this work is the meshing module, which represents a scene by an efficient hierarchical voxels structure, performs fast finding of voxels observed by new scans, and reconstructs triangle facets in each voxel in an incremental manner. This voxel-wise meshing operation is delicately designed for the purpose of efficiency; it first performs a dimension reduction by projecting 3D points to a 2D local plane contained in the voxel, and then executes the meshing operation with pull, commit and push steps for incremental reconstruction of triangle facets. To the best of our knowledge, this is the first work in literature that can reconstruct online the triangle mesh of large-scale scenes, just relying on a standard CPU without GPU acceleration. To share our findings and make contributions to the community, we make our code publicly available on our GitHub: https://github.com/hku-mars/ImMesh.

HCApr 17, 2023
Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling

Haotian Li, Yun Wang, Q. Vera Liao et al.

Data storytelling plays an important role in data workers' daily jobs since it boosts team collaboration and public communication. However, to make an appealing data story, data workers spend tremendous efforts on various tasks, including outlining and styling the story. Recently, a growing research trend has been exploring how to assist data storytelling with advanced artificial intelligence (AI). However, existing studies may focus on individual tasks in the workflow of data storytelling and do not reveal a complete picture of humans' preference for collaborating with AI. To better understand real-world needs, we interviewed eighteen data workers from both industry and academia to learn where and how they would like to collaborate with AI. Surprisingly, though the participants showed excitement about collaborating with AI, many of them also expressed reluctance and pointed out nuanced reasons. Based on their responses, we first characterize stages and tasks in the practical data storytelling workflows and the desired roles of AI. Then the preferred collaboration patterns in different tasks are identified. Next, we summarize the interviewees' reasons why and why not they would like to collaborate with AI. Finally, we provide suggestions for human-AI collaborative data storytelling to hopefully shed light on future related research.

HCSep 27, 2023
Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration

Haotian Li, Yun Wang, Huamin Qu

Data storytelling is powerful for communicating data insights, but it requires diverse skills and considerable effort from human creators. Recent research has widely explored the potential for artificial intelligence (AI) to support and augment humans in data storytelling. However, there lacks a systematic review to understand data storytelling tools from the perspective of human-AI collaboration, which hinders researchers from reflecting on the existing collaborative tool designs that promote humans' and AI's advantages and mitigate their shortcomings. This paper investigated existing tools with a framework from two perspectives: the stages in the storytelling workflow where a tool serves, including analysis, planning, implementation, and communication, and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. Through our analysis, we recognize the common collaboration patterns in existing tools, summarize lessons learned from these patterns, and further illustrate research opportunities for human-AI collaboration in data storytelling.

SPMay 13, 2022
A microstructure estimation Transformer inspired by sparse representation for diffusion MRI

Tianshu Zheng, Cong Sun, Weihao Zheng et al.

Diffusion magnetic resonance imaging (dMRI) is an important tool in characterizing tissue microstructure based on biophysical models, which are complex and highly non-linear. Resolving microstructures with optimization techniques is prone to estimation errors and requires dense sampling in the q-space. Deep learning based approaches have been proposed to overcome these limitations. Motivated by the superior performance of the Transformer, in this work, we present a learning-based framework based on Transformer, namely, a Microstructure Estimation Transformer with Sparse Coding (METSC) for dMRI-based microstructure estimation with downsampled q-space data. To take advantage of the Transformer while addressing its limitation in large training data requirements, we explicitly introduce an inductive bias - model bias into the Transformer using a sparse coding technique to facilitate the training process. Thus, the METSC is composed with three stages, an embedding stage, a sparse representation stage, and a mapping stage. The embedding stage is a Transformer-based structure that encodes the signal to ensure the voxel is represented effectively. In the sparse representation stage, a dictionary is constructed by solving a sparse reconstruction problem that unfolds the Iterative Hard Thresholding (IHT) process. The mapping stage is essentially a decoder that computes the microstructural parameters from the output of the second stage, based on the weighted sum of normalized dictionary coefficients where the weights are also learned. We tested our framework on two dMRI models with downsampled q-space data, including the intravoxel incoherent motion (IVIM) model and the neurite orientation dispersion and density imaging (NODDI) model. The proposed method achieved up to 11.25 folds of acceleration in scan time and outperformed the other state-of-the-art learning-based methods.

HCJul 3, 2024
JailbreakHunter: A Visual Analytics Approach for Jailbreak Prompts Discovery from Large-Scale Human-LLM Conversational Datasets

Zhihua Jin, Shiyi Liu, Haotian Li et al.

Large Language Models (LLMs) have gained significant attention but also raised concerns due to the risk of misuse. Jailbreak prompts, a popular type of adversarial attack towards LLMs, have appeared and constantly evolved to breach the safety protocols of LLMs. To address this issue, LLMs are regularly updated with safety patches based on reported jailbreak prompts. However, malicious users often keep their successful jailbreak prompts private to exploit LLMs. To uncover these private jailbreak prompts, extensive analysis of large-scale conversational datasets is necessary to identify prompts that still manage to bypass the system's defenses. This task is highly challenging due to the immense volume of conversation data, diverse characteristics of jailbreak prompts, and their presence in complex multi-turn conversations. To tackle these challenges, we introduce JailbreakHunter, a visual analytics approach for identifying jailbreak prompts in large-scale human-LLM conversational datasets. We have designed a workflow with three analysis levels: group-level, conversation-level, and turn-level. Group-level analysis enables users to grasp the distribution of conversations and identify suspicious conversations using multiple criteria, such as similarity with reported jailbreak prompts in previous research and attack success rates. Conversation-level analysis facilitates the understanding of the progress of conversations and helps discover jailbreak prompts within their conversation contexts. Turn-level analysis allows users to explore the semantic similarity and token overlap between a singleturn prompt and the reported jailbreak prompts, aiding in the identification of new jailbreak strategies. The effectiveness and usability of the system were verified through multiple case studies and expert interviews.

AIFeb 11
To Think or Not To Think, That is The Question for Large Reasoning Models in Theory of Mind Tasks

Nanxu Gong, Haotian Li, Sixun Dong et al.

Theory of Mind (ToM) assesses whether models can infer hidden mental states such as beliefs, desires, and intentions, which is essential for natural social interaction. Although recent progress in Large Reasoning Models (LRMs) has boosted step-by-step inference in mathematics and coding, it is still underexplored whether this benefit transfers to socio-cognitive skills. We present a systematic study of nine advanced Large Language Models (LLMs), comparing reasoning models with non-reasoning models on three representative ToM benchmarks. The results show that reasoning models do not consistently outperform non-reasoning models and sometimes perform worse. A fine-grained analysis reveals three insights. First, slow thinking collapses: accuracy significantly drops as responses grow longer, and larger reasoning budgets hurt performance. Second, moderate and adaptive reasoning benefits performance: constraining reasoning length mitigates failure, while distinct success patterns demonstrate the necessity of dynamic adaptation. Third, option matching shortcut: when multiple choice options are removed, reasoning models improve markedly, indicating reliance on option matching rather than genuine deduction. We also design two intervention approaches: Slow-to-Fast (S2F) adaptive reasoning and Think-to-Match (T2M) shortcut prevention to further verify and mitigate the problems. With all results, our study highlights the advancement of LRMs in formal reasoning (e.g., math, code) cannot be fully transferred to ToM, a typical task in social reasoning. We conclude that achieving robust ToM requires developing unique capabilities beyond existing reasoning methods.

AIJan 26Code
Yunjue Agent Tech Report: A Fully Reproducible, Zero-Start In-Situ Self-Evolving Agent System for Open-Ended Tasks

Haotian Li, Shijun Yang, Weizhen Qi et al.

Conventional agent systems often struggle in open-ended environments where task distributions continuously drift and external supervision is scarce. Their reliance on static toolsets or offline training lags behind these dynamics, leaving the system's capability boundaries rigid and unknown. To address this, we propose the In-Situ Self-Evolving paradigm. This approach treats sequential task interactions as a continuous stream of experience, enabling the system to distill short-term execution feedback into long-term, reusable capabilities without access to ground-truth labels. Within this framework, we identify tool evolution as the critical pathway for capability expansion, which provides verifiable, binary feedback signals. Within this framework, we develop Yunjue Agent, a system that iteratively synthesizes, optimizes, and reuses tools to navigate emerging challenges. To optimize evolutionary efficiency, we further introduce a Parallel Batch Evolution strategy. Empirical evaluations across five diverse benchmarks under a zero-start setting demonstrate significant performance gains over proprietary baselines. Additionally, complementary warm-start evaluations confirm that the accumulated general knowledge can be seamlessly transferred to novel domains. Finally, we propose a novel metric to monitor evolution convergence, serving as a function analogous to training loss in conventional optimization. We open-source our codebase, system traces, and evolved tools to facilitate future research in resilient, self-evolving intelligence.

CLSep 26, 2023
KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation

Haotian Li, Bin Yu, Yuliang Wei et al.

Knowledge graph completion (KGC) revolves around populating missing triples in a knowledge graph using available information. Text-based methods, which depend on textual descriptions of triples, often encounter difficulties when these descriptions lack sufficient information for accurate prediction-an issue inherent to the datasets and not easily resolved through modeling alone. To address this and ensure data consistency, we first use large language models (LLMs) to generate coherent descriptions, bridging the semantic gap between queries and answers. Secondly, we utilize inverse relations to create a symmetric graph, thereby providing augmented training samples for KGC. Additionally, we employ the label information inherent in knowledge graphs (KGs) to enhance the existing contrastive framework, making it fully supervised. These efforts have led to significant performance improvements on the WN18RR and FB15k-237 datasets. According to standard evaluation metrics, our approach achieves a 4.2% improvement in Hit@1 on WN18RR and a 3.4% improvement in Hit@3 on FB15k-237, demonstrating superior performance.

92.6CLMay 18
Multi-agent AI systems outperform human teams in creativity

Tiancheng Hu, Yixuan Jiang, Haotian Li et al.

Although artificial intelligence (AI) now matches or exceeds human performance across numerous cognitive tasks, creativity remains a highly contested frontier. As AI systems based on large language models (LLMs) are increasingly adopted in research and innovation, it is essential to understand and augment their creativity. Here we demonstrate that multi-agent LLM teams not only surpass single agents, but also substantially outperform human teams in creativity (Cohen's d=1.50) across 4,541 multi-agent LLM ideas and 341 human-team ideas on six diverse problem-solving tasks. This advantage is driven by novelty while maintaining comparable usefulness. To investigate the generative processes in both groups, we represent conversations as paths through semantic space using neural language model representations. Both LLM and human teams produce more creative ideas when conversations range widely rather than staying centered on a single theme (low global coherence). However, the additional patterns that predict creativity differ: LLM teams benefit from efficient exploration (high semantic spread, shorter paths), while human teams benefit from maintaining smooth conversational flow (high local coherence, frequent pivots). Additionally, we identify model choice and discussion structure as orthogonal design levers that together explain 26.8% of variance in LLM conversational dynamics, paving the way for systematic approaches to developing multi-agent systems with augmented creative capabilities.

89.3AIMar 10
Social-R1: Towards Human-like Social Reasoning in LLMs

Jincenzi Wu, Yuxuan Lei, Jianxun Lian et al.

While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs. Current models often rely on superficial patterns rather than genuine social reasoning. We argue that cultivating human-like social intelligence requires training with challenging cases that resist shortcut solutions. To this end, we introduce ToMBench-Hard, an adversarial benchmark designed to provide hard training examples for social reasoning. Building on this, we propose Social-R1, a reinforcement learning framework that aligns model reasoning with human cognition through multi-dimensional rewards. Unlike outcome-based RL, Social-R1 supervises the entire reasoning process, enforcing structural alignment, logical integrity, and information density. Results show that our approach enables a 4B parameter model to surpass much larger counterparts and generalize robustly across eight diverse benchmarks. These findings demonstrate that challenging training cases with trajectory-level alignment offer a path toward efficient and reliable social intelligence.

CVMay 20, 2025Code
MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models

Xiao Lin, Zhining Liu, Ze Yang et al.

Warning: This paper contains examples of harmful language and images. Reader discretion is advised. Recently, vision-language models have demonstrated increasing influence in morally sensitive domains such as autonomous driving and medical analysis, owing to their powerful multimodal reasoning capabilities. As these models are deployed in high-stakes real-world applications, it is of paramount importance to ensure that their outputs align with human moral values and remain within moral boundaries. However, existing work on moral alignment either focuses solely on textual modalities or relies heavily on AI-generated images, leading to distributional biases and reduced realism. To overcome these limitations, we introduce MORALISE, a comprehensive benchmark for evaluating the moral alignment of vision-language models (VLMs) using diverse, expert-verified real-world data. We begin by proposing a comprehensive taxonomy of 13 moral topics grounded in Turiel's Domain Theory, spanning the personal, interpersonal, and societal moral domains encountered in everyday life. Built on this framework, we manually curate 2,481 high-quality image-text pairs, each annotated with two fine-grained labels: (1) topic annotation, identifying the violated moral topic(s), and (2) modality annotation, indicating whether the violation arises from the image or the text. For evaluation, we encompass two tasks, \textit{moral judgment} and \textit{moral norm attribution}, to assess models' awareness of moral violations and their reasoning ability on morally salient content. Extensive experiments on 19 popular open- and closed-source VLMs show that MORALISE poses a significant challenge, revealing persistent moral limitations in current state-of-the-art models. The full benchmark is publicly available at https://huggingface.co/datasets/Ze1025/MORALISE.

LGMay 29, 2025Code
Two Is Better Than One: Rotations Scale LoRAs

Hongcan Guo, Guoshun Nan, Yuan Yang et al.

Scaling Low-Rank Adaptation (LoRA)-based Mixture-of-Experts (MoE) facilitates large language models (LLMs) to efficiently adapt to diverse tasks. However, traditional gating mechanisms that route inputs to the best experts may fundamentally hinder LLMs' scalability, leading to poor generalization and underfitting issues. We identify that the root cause lies in the restricted expressiveness of existing weighted-sum mechanisms, both within and outside the convex cone of LoRA representations. This motivates us to propose RadarGate, a novel geometrically inspired gating method that introduces rotational operations of LoRAs representations to boost the expressiveness and facilitate richer feature interactions among multiple LoRAs for scalable LLMs. Specifically, we first fuse each LoRA representation to other LoRAs using a learnable component and then feed the output to a rotation matrix. This matrix involves learnable parameters that define the relative angular relationship between LoRA representations. Such a simple yet effective mechanism provides an extra degree of freedom, facilitating the learning of cross-LoRA synergies and properly tracking the challenging poor generalization and underfitting issues as the number of LoRA grows. Extensive experiments on 6 public benchmarks across 21 tasks show the effectiveness of our RadarGate for scaling LoRAs. We also provide valuable insights, revealing that the rotations to each pair of representations are contrastive, encouraging closer alignment of semantically similar representations during geometrical transformation while pushing distance ones further apart. We will release our code to the community.

AIMay 18, 2025Code
Enhancing Knowledge Graph Completion with GNN Distillation and Probabilistic Interaction Modeling

Lingzhi Wang, Pengcheng Huang, Haotian Li et al.

Knowledge graphs (KGs) serve as fundamental structures for organizing interconnected data across diverse domains. However, most KGs remain incomplete, limiting their effectiveness in downstream applications. Knowledge graph completion (KGC) aims to address this issue by inferring missing links, but existing methods face critical challenges: deep graph neural networks (GNNs) suffer from over-smoothing, while embedding-based models fail to capture abstract relational features. This study aims to overcome these limitations by proposing a unified framework that integrates GNN distillation and abstract probabilistic interaction modeling (APIM). GNN distillation approach introduces an iterative message-feature filtering process to mitigate over-smoothing, preserving the discriminative power of node representations. APIM module complements this by learning structured, abstract interaction patterns through probabilistic signatures and transition matrices, allowing for a richer, more flexible representation of entity and relation interactions. We apply these methods to GNN-based models and the APIM to embedding-based KGC models, conducting extensive evaluations on the widely used WN18RR and FB15K-237 datasets. Our results demonstrate significant performance gains over baseline models, showcasing the effectiveness of the proposed techniques. The findings highlight the importance of both controlling information propagation and leveraging structured probabilistic modeling, offering new avenues for advancing knowledge graph completion. And our codes are available at https://anonymous.4open.science/r/APIM_and_GNN-Distillation-461C.

CLMay 6, 2025
Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Bin Yu, Hang Yuan, Haotian Li et al.

Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the "overthinking" problem from teacher models, producing verbose and redundant reasoning chains during inference. To address this challenge, we propose Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long CoT reasoning dataset with their short counterparts obtained through structure-preserved rewriting. Our experiments demonstrate that models trained using the LS-Mixture SFT method, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3% across various benchmarks while substantially reducing model response length by approximately 47.61%. This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning while avoiding the inherent overthinking problems inherited from teacher models, thereby enabling efficient reasoning in the fine-tuned models.

62.7AIApr 28
Does Theory of Mind Improvement Really Benefit Human-AI Interactions? Empirical Findings from Interactive Evaluations

Nanxu Gong, Zixin Chen, Haotian Li et al.

Improving the Theory of Mind (ToM) capability of Large Language Models (LLMs) is crucial for effective social interactions between these AI models and humans. However, the existing benchmarks often measure ToM capability improvement through story-reading, multiple-choice questions from a third-person perspective, while ignoring the first-person, dynamic, and open-ended nature of human-AI (HAI) interactions. To directly examine how ToM improvement techniques benefit HAI interactions, we first proposed the new paradigm of interactive ToM evaluation with both perspective and metric shifts. Next, following the paradigm, we conducted a systematic study of four representative ToM enhancement techniques using both four real-world datasets and a user study, covering both goal-oriented tasks (e.g., coding, math) and experience-oriented tasks (e.g., counseling). Our findings reveal that improvements on static benchmarks do not always translate to better performance in dynamic HAI interactions. This paper offers critical insights into ToM evaluation, showing the necessity of interaction-based assessments in developing next-generation, socially aware LLMs for HAI symbiosis.

CLMay 23, 2025
Not All Tokens Are What You Need In Thinking

Hang Yuan, Bin Yu, Haotian Li et al.

Modern reasoning models, such as OpenAI's o1 and DeepSeek-R1, exhibit impressive problem-solving capabilities but suffer from critical inefficiencies: high inference latency, excessive computational resource consumption, and a tendency toward overthinking -- generating verbose chains of thought (CoT) laden with redundant tokens that contribute minimally to the final answer. To address these issues, we propose Conditional Token Selection (CTS), a token-level compression framework with a flexible and variable compression ratio that identifies and preserves only the most essential tokens in CoT. CTS evaluates each token's contribution to deriving correct answers using conditional importance scoring, then trains models on compressed CoT. Extensive experiments demonstrate that CTS effectively compresses long CoT while maintaining strong reasoning performance. Notably, on the GPQA benchmark, Qwen2.5-14B-Instruct trained with CTS achieves a 9.1% accuracy improvement with 13.2% fewer reasoning tokens (13% training token reduction). Further reducing training tokens by 42% incurs only a marginal 5% accuracy drop while yielding a 75.8% reduction in reasoning tokens, highlighting the prevalence of redundancy in existing CoT.

96.8HCApr 9
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery

Yifang Wang, Rui Sheng, Erzhuo Shao et al.

Large language models (LLMs) are transforming scientific workflows, not only through their generative capabilities but also through their emerging ability to use tools, reason about data, and coordinate complex analytical tasks. Yet in most human-AI collaborations, the primary outputs, figures, are still treated as static visual summaries: once rendered, they are handled by both humans and multimodal LLMs as images to be re-interpreted from pixels or captions. The emergent capabilities of LLMs open an opportunity to fundamentally rethink this paradigm. In this paper, we introduce the concept of LLM-native figures: data-driven artifacts that are simultaneously human-legible and machine-addressable. Unlike traditional plots, each artifact embeds complete provenance: the data subset, analytical operations and code, and visualization specification used to generate it. As a result, an LLM can "see through" the figure--tracing selections back to their sources, generating code to extend analyses, and orchestrating new visualizations through natural-language instructions or direct manipulation. We implement this concept through a hybrid language-visual interface that integrates LLM agents with a bidirectional mapping between figures and underlying data. Using the science of science domain as a testbed, we demonstrate that LLM-native figures can accelerate discovery, improve reproducibility, and make reasoning transparent across agents and users. More broadly, this work establishes a general framework for embedding provenance, interactivity, and explainability into the artifacts of modern research, redefining the figure not as an end product, but as an interface for discovery. For more details, please refer to the demo video available at www.llm-native-figure.com.

AIJan 19
SCULPT: Constraint-Guided Pruned MCTS that Carves Efficient Paths for Mathematical Reasoning

Qitong Fang, Haotian Li, Xu Wang

Automated agent workflows can enhance the problem-solving ability of large language models (LLMs), but common search strategies rely on stochastic exploration and often traverse implausible branches. This occurs because current pipelines sample candidate steps from generic prompts or learned policies with weak domain priors, yielding near-random walks over operators, units, and formats. To promote ordered exploration, this paper introduces SCULPT, a constraint-guided approach for Monte Carlo Tree Search (MCTS) that integrates domain-aware scoring into selection, expansion, simulation, and backpropagation. SCULPT scores and prunes actions using a combination of symbolic checks (dimensional consistency, type compatibility, magnitude sanity, depth control, and diversity) and structural pattern guidance, thereby steering the search toward plausible reasoning paths. Under matched LLM configurations, SCULPT yields stable improvements on multiple datasets; additional results with GPT-5.2 assess executor transferability and performance on frontier reasoning models. Overall, domain-aware constraints can improve accuracy while maintaining efficiency and reasoning stability.

HCMar 7
From Passive Consumption to Active Interaction: Exploring Interactive LLM Scaffolding to Support Learning Engagement

Zixin Chen, Haotian Li, Zhe Liu et al.

Large Language Models (LLMs) are increasingly used as learning companions, providing scaffolded explanations, hints, or step-by-step guidance. However, in current LLM-based learning scenarios, scaffolded content is primarily consumed passively, offering limited support for active learner engagement. Learning science research suggests that effective educational scaffolding depends not only on what support is provided, but also on how learners engage with it. In this work, we explore whether embedding lightweight interactive components into LLM-generated scaffolding responses can promote learning-oriented engagement and improve short-term learning outcomes. We evaluated this approach through a within-subjects laboratory study (N=8). Results provide initial evidence that interactive scaffolding increases learners' perceived engagement and attentional focus, while supporting short-term learning performance. We conclude with design implications for integrating interaction into LLM-generated scaffolding to support active learning engagement.

LGFeb 12
RAM-Net: Expressive Linear Attention with Selectively Addressable Memory

Kaicheng Xiao, Haotian Li, Liran Dong et al.

While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate that RAM-Net consistently surpasses state-of-the-art baselines in fine-grained long-range retrieval tasks and achieves competitive performance in standard language modeling and zero-shot commonsense reasoning benchmarks, validating its superior capability to capture complex dependencies with significantly reduced computational overhead.

CLOct 18, 2025
TrajSelector: Harnessing Latent Representations for Efficient and Effective Best-of-N in Large Reasoning Model

Bin Yu, Xinming Wang, Shijie Lian et al.

Large language models (LLMs) have shown remarkable progress in complex reasoning tasks, largely enabled by test-time scaling (TTS) paradigms that allocate additional compute during inference. Among these, external TTS (particularly the Best-of-N selection paradigm) yields scalable performance improvements by selecting from multiple independently generated reasoning trajectories. However, this approach faces key limitations: (i) the high computational overhead of deploying process reward models, (ii) the underutilization of the LLM's intrinsic latent representations. We introduce TrajSelector, an efficient and effective Best-of-N framework that exploit the hidden states in the sampler LLM for process-level scoring. A lightweight verifier (with only 0.6B parameters) evaluates the quality of step-wise trajectory, and then aggregates these scores to identify the optimal reasoning trajectory. Our framework employs a fully data-driven, end-to-end training recipe that eliminates reliance on massive step-level annotations. Experiential results across five benchmarks demonstrate that TrajSelector delivers consistent performance gains. In Best-of-32 settings, it surpasses majority voting by 4.61% accuracy and outperforms existing process reward models by 4.31% to 12.21%, all while maintaining lower inference costs.

IVSep 29, 2025
Non-Invasive Detection of PROState Cancer with Novel Time-Dependent Diffusion MRI and AI-Enhanced Quantitative Radiological Interpretation: PROS-TD-AI

Baltasar Ramos, Cristian Garrido, Paulette Narv'aez et al.

Prostate cancer (PCa) is the most frequently diagnosed malignancy in men and the eighth leading cause of cancer death worldwide. Multiparametric MRI (mpMRI) has become central to the diagnostic pathway for men at intermediate risk, improving de-tection of clinically significant PCa (csPCa) while reducing unnecessary biopsies and over-diagnosis. However, mpMRI remains limited by false positives, false negatives, and moderate to substantial interobserver agreement. Time-dependent diffusion (TDD) MRI, a novel sequence that enables tissue microstructure characterization, has shown encouraging preclinical performance in distinguishing clinically significant from insignificant PCa. Combining TDD-derived metrics with machine learning may provide robust, zone-specific risk prediction with less dependence on reader training and improved accuracy compared to current standard-of-care. This study protocol out-lines the rationale and describes the prospective evaluation of a home-developed AI-enhanced TDD-MRI software (PROSTDAI) in routine diagnostic care, assessing its added value against PI-RADS v2.1 and validating results against MRI-guided prostate biopsy.

CVSep 17, 2025
An Exploratory Study on Abstract Images and Visual Representations Learned from Them

Haotian Li, Jianbo Jiao

Imagine living in a world composed solely of primitive shapes, could you still recognise familiar objects? Recent studies have shown that abstract images-constructed by primitive shapes-can indeed convey visual semantic information to deep learning models. However, representations obtained from such images often fall short compared to those derived from traditional raster images. In this paper, we study the reasons behind this performance gap and investigate how much high-level semantic content can be captured at different abstraction levels. To this end, we introduce the Hierarchical Abstraction Image Dataset (HAID), a novel data collection that comprises abstract images generated from normal raster images at multiple levels of abstraction. We then train and evaluate conventional vision systems on HAID across various tasks including classification, segmentation, and object detection, providing a comprehensive study between rasterised and abstract image representations. We also discuss if the abstract image can be considered as a potentially effective format for conveying visual semantic information and contributing to vision tasks.

CVSep 16, 2025
SHREC 2025: Protein surface shape retrieval including electrostatic potential

Taher Yacoub, Camille Depenveiller, Atsushi Tatsuma et al.

This SHREC 2025 track dedicated to protein surface shape retrieval involved 9 participating teams. We evaluated the performance in retrieval of 15 proposed methods on a large dataset of 11,555 protein surfaces with calculated electrostatic potential (a key molecular surface descriptor). The performance in retrieval of the proposed methods was evaluated through different metrics (Accuracy, Balanced accuracy, F1 score, Precision and Recall). The best retrieval performance was achieved by the proposed methods that used the electrostatic potential complementary to molecular surface shape. This observation was also valid for classes with limited data which highlights the importance of taking into account additional molecular surface descriptors.

HCMar 4, 2025
Reflection on Data Storytelling Tools in the Generative AI Era from the Human-AI Collaboration Perspective

Haotian Li, Yun Wang, Huamin Qu

Human-AI collaborative tools attract attentions from the data storytelling community to lower the expertise barrier and streamline the workflow. The recent advance in large-scale generative AI techniques, e.g., large language models (LLMs) and text-to-image models, has the potential to enhance data storytelling with their power in visual and narration generation. After two years since these techniques were publicly available, it is important to reflect our progress of applying them and have an outlook for future opportunities. To achieve the goal, we compare the collaboration patterns of the latest tools with those of earlier ones using a dedicated framework for understanding human-AI collaboration in data storytelling. Through comparison, we identify consistently widely studied patterns, e.g., human-creator + AI-assistant, and newly explored or emerging ones, e.g., AI-creator + human-reviewer. The benefits of these AI techniques and implications to human-AI collaboration are also revealed. We further propose future directions to hopefully ignite innovations.

LGDec 20, 2024
Self-supervised Spatial-Temporal Learner for Precipitation Nowcasting

Haotian Li, Arno Siebes, Siamak Mehrkanoon

Nowcasting, the short-term prediction of weather, is essential for making timely and weather-dependent decisions. Specifically, precipitation nowcasting aims to predict precipitation at a local level within a 6-hour time frame. This task can be framed as a spatial-temporal sequence forecasting problem, where deep learning methods have been particularly effective. However, despite advancements in self-supervised learning, most successful methods for nowcasting remain fully supervised. Self-supervised learning is advantageous for pretraining models to learn representations without requiring extensive labeled data. In this work, we leverage the benefits of self-supervised learning and integrate it with spatial-temporal learning to develop a novel model, SpaT-SparK. SpaT-SparK comprises a CNN-based encoder-decoder structure pretrained with a masked image modeling (MIM) task and a translation network that captures temporal relationships among past and future precipitation maps in downstream tasks. We conducted experiments on the NL-50 dataset to evaluate the performance of SpaT-SparK. The results demonstrate that SpaT-SparK outperforms existing baseline supervised models, such as SmaAt-UNet, providing more accurate nowcasting predictions.

CLNov 24, 2024
Deep Sparse Latent Feature Models for Knowledge Graph Completion

Haotian Li, Rui Zhang, Lingzhi Wang et al.

Recent advances in knowledge graph completion (KGC) have emphasized text-based approaches to navigate the inherent complexities of large-scale knowledge graphs (KGs). While these methods have achieved notable progress, they frequently struggle to fully incorporate the global structural properties of the graph. Stochastic blockmodels (SBMs), especially the latent feature relational model (LFRM), offer robust probabilistic frameworks for identifying latent community structures and improving link prediction. This paper presents a novel probabilistic KGC framework utilizing sparse latent feature models, optimized via a deep variational autoencoder (VAE). Our proposed method dynamically integrates global clustering information with local textual features to effectively complete missing triples, while also providing enhanced interpretability of the underlying latent structures. Extensive experiments on four benchmark datasets with varying scales demonstrate the significant performance gains achieved by our method.

HCFeb 12, 2022
Structure-aware Visualization Retrieval

Haotian Li, Yong Wang, Aoyu Wu et al.

With the wide usage of data visualizations, a huge number of Scalable Vector Graphic (SVG)-based visualizations have been created and shared online. Accordingly, there has been an increasing interest in exploring how to retrieve perceptually similar visualizations from a large corpus, since it can benefit various downstream applications such as visualization recommendation. Existing methods mainly focus on the visual appearance of visualizations by regarding them as bitmap images. However, the structural information intrinsically existing in SVG-based visualizations is ignored. Such structural information can delineate the spatial and hierarchical relationship among visual elements, and characterize visualizations thoroughly from a new perspective. This paper presents a structure-aware method to advance the performance of visualization retrieval by collectively considering both the visual and structural information. We extensively evaluated our approach through quantitative comparisons, a user study and case studies. The results demonstrate the effectiveness of our approach and its advantages over existing methods.

LGDec 18, 2021
DegreEmbed: incorporating entity embedding into logic rule learning for knowledge graph reasoning

Haotian Li, Hongri Liu, Yao Wang et al.

Knowledge graphs (KGs), as structured representations of real world facts, are intelligent databases incorporating human knowledge that can help machine imitate the way of human problem solving. However, KGs are usually huge and there are inevitably missing facts in KGs, thus undermining applications such as question answering and recommender systems that are based on knowledge graph reasoning. Link prediction for knowledge graphs is the task aiming to complete missing facts by reasoning based on the existing knowledge. Two main streams of research are widely studied: one learns low-dimensional embeddings for entities and relations that can explore latent patterns, and the other gains good interpretability by mining logical rules. Unfortunately, the heterogeneity of modern KGs that involve entities and relations of various types is not well considered in the previous studies. In this paper, we propose DegreEmbed, a model that combines embedding-based learning and logic rule mining for inferring on KGs. Specifically, we study the problem of predicting missing links in heterogeneous KGs from the perspective of the degree of nodes. Experimentally, we demonstrate that our DegreEmbed model outperforms the state-of-the-art methods on real world datasets and the rules mined by our model are of high quality and interpretability.

AIDec 12, 2021
An original model for multi-target learning of logical rules for knowledge graph reasoning

Yuliang Wei, Haotian Li, Guodong Xin et al.

Large-scale knowledge graphs provide structured representations of human knowledge. However, as it is impossible to collect all knowledge, knowledge graphs are usually incomplete. Reasoning based on existing facts paves a way to discover missing facts. In this paper, we study the problem of learning logical rules for reasoning on knowledge graphs for completing missing factual triplets. Learning logical rules equips a model with strong interpretability as well as the ability to generalize to similar tasks. We propose a model able to fully use training data which also considers multi-target scenarios. In addition, considering the deficiency in evaluating the performance of models and the quality of mined rules, we further propose two novel indicators to help with the problem. Experimental results empirically demonstrate that our model outperforms state-of-the-art methods on five benchmark datasets. The results also prove the effectiveness of the indicators.

HCJul 27, 2021
KG4Vis: A Knowledge Graph-Based Approach for Visualization Recommendation

Haotian Li, Yong Wang, Songheng Zhang et al.

Visualization recommendation or automatic visualization generation can significantly lower the barriers for general users to rapidly create effective data visualizations, especially for those users without a background in data visualizations. However, existing rule-based approaches require tedious manual specifications of visualization rules by visualization experts. Other machine learning-based approaches often work like black-box and are difficult to understand why a specific visualization is recommended, limiting the wider adoption of these approaches. This paper fills the gap by presenting KG4Vis, a knowledge graph (KG)-based approach for visualization recommendation. It does not require manual specifications of visualization rules and can also guarantee good explainability. Specifically, we propose a framework for building knowledge graphs, consisting of three types of entities (i.e., data features, data columns and visualization design choices) and the relations between them, to model the mapping rules between data and effective visualizations. A TransE-based embedding technique is employed to learn the embeddings of both entities and relations of the knowledge graph from existing dataset-visualization pairs. Such embeddings intrinsically model the desirable visualization rules. Then, given a new dataset, effective visualizations can be inferred from the knowledge graph with semantically meaningful rules. We conducted extensive evaluations to assess the proposed approach, including quantitative comparisons, case studies and expert interviews. The results demonstrate the effectiveness of our approach.

HCMar 1, 2021
Deep Colormap Extraction from Visualizations

Lin-Ping Yuan, Wei Zeng, Siwei Fu et al.

This work presents a new approach based on deep learning to automatically extract colormaps from visualizations. After summarizing colors in an input visualization image as a Lab color histogram, we pass the histogram to a pre-trained deep neural network, which learns to predict the colormap that produces the visualization. To train the network, we create a new dataset of 64K visualizations that cover a wide variety of data distributions, chart types, and colormaps. The network adopts an atrous spatial pyramid pooling module to capture color features at multiple scales in the input color histograms. We then classify the predicted colormap as discrete or continuous and refine the predicted colormap based on its color histogram. Quantitative comparisons to existing methods show the superior performance of our approach on both synthetic and real-world visualizations. We further demonstrate the utility of our method with two use cases,i.e., color transfer and color remapping.

HCJan 20, 2021
A Visual Analytics Approach to Facilitate the Proctoring of Online Exams

Haotian Li, Min Xu, Yong Wang et al.

Online exams have become widely used to evaluate students' performance in mastering knowledge in recent years, especially during the pandemic of COVID-19. However, it is challenging to conduct proctoring for online exams due to the lack of face-to-face interaction. Also, prior research has shown that online exams are more vulnerable to various cheating behaviors, which can damage their credibility. This paper presents a novel visual analytics approach to facilitate the proctoring of online exams by analyzing the exam video records and mouse movement data of each student. Specifically, we detect and visualize suspected head and mouse movements of students in three levels of detail, which provides course instructors and teachers with convenient, efficient and reliable proctoring for online exams. Our extensive evaluations, including usage scenarios, a carefully-designed user study and expert interviews, demonstrate the effectiveness and usability of our approach.

CVJan 3, 2021
A Switched View of Retinex: Deep Self-Regularized Low-Light Image Enhancement

Zhuqing Jiang, Haotian Li, Liangjie Liu et al.

Self-regularized low-light image enhancement does not require any normal-light image in training, thereby freeing from the chains on paired or unpaired low-/normal-images. However, existing methods suffer color deviation and fail to generalize to various lighting conditions. This paper presents a novel self-regularized method based on Retinex, which, inspired by HSV, preserves all colors (Hue, Saturation) and only integrates Retinex theory into brightness (Value). We build a reflectance estimation network by restricting the consistency of reflectances embedded in both the original and a novel random disturbed form of the brightness of the same scene. The generated reflectance, which is assumed to be irrelevant of illumination by Retinex, is treated as enhanced brightness. Our method is efficient as a low-light image is decoupled into two subspaces, color and brightness, for better preservation and enhancement. Extensive experiments demonstrate that our method outperforms multiple state-of-the-art algorithms qualitatively and quantitatively and adapts to more lighting conditions.

HCAug 26, 2020
TradAO: A Visual Analytics System for Trading Algorithm Optimization

Ka Wing Tsang, Haotian Li, Fuk Ming Lam et al.

With the wide applications of algorithmic trading, it has become critical for traders to build a winning trading algorithm to beat the market. However, due to the lack of efficient tools, traders mainly rely on their memory to manually compare the algorithm instances of a trading algorithm and further select the best trading algorithm instance for the real trading deployment. We work closely with industry practitioners to discover and consolidate user requirements and develop an interactive visual analytics system for trading algorithm optimization. Structured expert interviews are conducted to evaluateTradAOand a representative case study is documented for illustrating the system effectiveness. To the best of our knowledge, previous financial data visual analyses have mainly aimed to assist investment managers in investment portfolio analysis but have neglected the need of traders in developing trading algorithms for portfolio execution.TradAOis the first visual analytics system that assists users in comprehensively exploring the performances of a trading algorithm with different parameter settings.

LGAug 4, 2020
Peer-inspired Student Performance Prediction in Interactive Online Question Pools with Graph Neural Network

Haotian Li, Huan Wei, Yong Wang et al.

Student performance prediction is critical to online education. It can benefit many downstream tasks on online learning platforms, such as estimating dropout rates, facilitating strategic intervention, and enabling adaptive online learning. Interactive online question pools provide students with interesting interactive questions to practice their knowledge in online education. However, little research has been done on student performance prediction in interactive online question pools. Existing work on student performance prediction targets at online learning platforms with predefined course curriculum and accurate knowledge labels like MOOC platforms, but they are not able to fully model knowledge evolution of students in interactive online question pools. In this paper, we propose a novel approach using Graph Neural Networks (GNNs) to achieve better student performance prediction in interactive online question pools. Specifically, we model the relationship between students and questions using student interactions to construct the student-interaction-question network and further present a new GNN model, called R^2GCN, which intrinsically works for the heterogeneous networks, to achieve generalizable student performance prediction in interactive online question pools. We evaluate the effectiveness of our approach on a real-world dataset consisting of 104,113 mouse trajectories generated in the problem-solving process of over 4000 students on 1631 questions. The experiment results show that our approach can achieve a much higher accuracy of student performance prediction than both traditional machine learning approaches and GNN models.

HCJul 31, 2020
Topology Density Map for Urban Data Visualization and Analysis

Zezheng Feng, Haotian Li, Wei Zeng et al.

Density map is an effective visualization technique for depicting the scalar field distribution in 2D space. Conventional methods for constructing density maps are mainly based on Euclidean distance, limiting their applicability in urban analysis that shall consider road network and urban traffic. In this work, we propose a new method named Topology Density Map, targeting for accurate and intuitive density maps in the context of urban environment. Based on the various constraints of road connections and traffic conditions, the method first constructs a directed acyclic graph (DAG) that propagates nonlinear scalar fields along 1D road networks. Next, the method extends the scalar fields to a 2D space by identifying key intersecting points in the DAG, dividing the underlying territory into planar regions using a weighted Voronoi diagram, and calculating the scalar fields for every point. Two case studies demonstrate that the Topology Density Map supplies accurate information to users and provides an intuitive visualization for decision making. An interview with domain experts demonstrates the feasibility, usability, and effectiveness of our method.

HCJan 9, 2020
Predicting Student Performance in Interactive Online Question Pools Using Mouse Interaction Features

Huan Wei, Haotian Li, Meng Xia et al.

Modeling student learning and further predicting the performance is a well-established task in online learning and is crucial to personalized education by recommending different learning resources to different students based on their needs. Interactive online question pools (e.g., educational game platforms), an important component of online education, have become increasingly popular in recent years. However, most existing work on student performance prediction targets at online learning platforms with a well-structured curriculum, predefined question order and accurate knowledge tags provided by domain experts. It remains unclear how to conduct student performance prediction in interactive online question pools without such well-organized question orders or knowledge tags by experts. In this paper, we propose a novel approach to boost student performance prediction in interactive online question pools by further considering student interaction features and the similarity between questions. Specifically, we introduce new features (e.g., think time, first attempt, and first drag-and-drop) based on student mouse movement trajectories to delineate students' problem-solving details. In addition, heterogeneous information network is applied to integrating students' historical problem-solving information on similar questions, enhancing student performance predictions on a new question. We evaluate the proposed approach on the dataset from a real-world interactive question pool using four typical machine learning models.

CVNov 17, 2018
VommaNet: an End-to-End Network for Disparity Estimation from Reflective and Texture-less Light Field Images

Haoxin Ma, Haotian Li, Zhiwen Qian et al.

The precise combination of image sensor and micro-lens array enables lenslet light field cameras to record both angular and spatial information of incoming light, therefore, one can calculate disparity and depth from light field images. In turn, 3D models of the recorded objects can be recovered, which is a great advantage over other imaging system. However, reflective and texture-less areas in light field images have complicated conditions, making it hard to correctly calculate disparity with existing algorithms. To tackle this problem, we introduce a novel end-to-end network VommaNet to retrieve multi-scale features from reflective and texture-less regions for accurate disparity estimation. Meanwhile, our network has achieved similar or better performance in other regions for both synthetic light field images and real-world data compared to the state-of-the-art algorithms. Currently, we achieve the best score for mean squared error (MSE) on HCI 4D Light Field Benchmark.