Dan Luo

CL
h-index98
18papers
177citations
Novelty46%
AI Score54

18 Papers

LGMay 31
CryoProt: A Protein Pretraining Framework with Cross-Box Interactions on Cryo-EM Density Maps

Dan Luo, Xuan Lin, Peng Zhou et al.

Despite the growing availability of cryo-electron microscopy (cryo-EM) density maps, effectively leveraging them for protein representation remains challenging. First, current methods lack a general-purpose protein pretraining framework tailored for cryo-EM density maps, designed for protein-related property prediction. Second, existing approaches typically partition density maps into local box regions and model them independently, overlooking interactions across boxes which are essential for capturing global structural context in cryo-EM density map. To address these challenges, we propose CryoProt, a protein pretraining framework designed for cryo-EM density maps. CryoProt introduces a Map Encoder based on multi-head latent attention (MLA), where box-level representations interact through a shared latent space, enabling explicit modeling of cross-box dependencies within the density map. Furthermore, we adopt a multi-task pretraining strategy to learn generalizable representations that can be effectively transferred to diverse downstream tasks, such as protein flexibility prediction, where cryo-EM density maps are not required and can be inferred implicitly by the pretrained model. Experimental results demonstrate that CryoProt consistently outperforms existing state-of-the-art methods across multiple benchmarks, achieving up to 12% improvement over the best-performing baselines, highlighting the importance of modeling cross-box interactions in cryo-EM data. The source code is publicly available at https://anonymous.4open.science/r/CryoProt.

CLJul 2, 2024Code
Efficient Sparse Attention needs Adaptive Token Release

Chaoran Zhang, Lixin Zou, Dan Luo et al.

In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide array of text-centric tasks. However, their `large' scale introduces significant computational and storage challenges, particularly in managing the key-value states of the transformer, which limits their wider applicability. Therefore, we propose to adaptively release resources from caches and rebuild the necessary key-value states. Particularly, we accomplish this by a lightweight controller module to approximate an ideal top-$K$ sparse attention. This module retains the tokens with the highest top-$K$ attention weights and simultaneously rebuilds the discarded but necessary tokens, which may become essential for future decoding. Comprehensive experiments in natural language generation and modeling reveal that our method is not only competitive with full attention in terms of performance but also achieves a significant throughput improvement of up to 221.8%. The code for replication is available on the https://github.com/WHUIR/ADORE.

AIAug 29, 2023
Enhancing Psychological Counseling with Large Language Model: A Multifaceted Decision-Support System for Non-Professionals

Guanghui Fu, Qing Zhao, Jianqiang Li et al.

In the contemporary landscape of social media, an alarming number of users express negative emotions, some of which manifest as strong suicidal intentions. This situation underscores a profound need for trained psychological counselors who can enact effective mental interventions. However, the development of these professionals is often an imperative but time-consuming task. Consequently, the mobilization of non-professionals or volunteers in this capacity emerges as a pressing concern. Leveraging the capabilities of artificial intelligence, and in particular, the recent advances in large language models, offers a viable solution to this challenge. This paper introduces a novel model constructed on the foundation of large language models to fully assist non-professionals in providing psychological interventions on online user discourses. This framework makes it plausible to harness the power of non-professional counselors in a meaningful way. A comprehensive study was conducted involving ten professional psychological counselors of varying expertise, evaluating the system across five critical dimensions. The findings affirm that our system is capable of analyzing patients' issues with relative accuracy and proffering professional-level strategies recommendations, thereby enhancing support for non-professionals. This research serves as a compelling validation of the application of large language models in the field of psychology and lays the groundwork for a new paradigm of community-based mental health support.

LGMay 28
A Triple-Modal Contrastive Learning Framework with Sequence, Graph, and 3D Features for Drug-Target Interaction Prediction

Le Xu, Xi Zhang, Dan Luo et al.

Accurate prediction of drug-target interactions (DTI) is critical for drug discovery. Existing methods often rely on single-modal representations (e.g., sequences or graphs) or combine only two modalities, overlooking 3D structural features. To address this challenge, we propose TriMod-DTI, a triple-modal contrastive learning framework that incorporates 1D sequences, 2D graphs, and 3D structures of drugs and proteins, obtaining the universal and complementary feature representations for DTI prediction. We design a Feature Extractor to capture drug and target features across the three modalities, thereby enriching their representations. We further propose a triple-modal contrastive learning strategy to align different modal representations of the same drug or protein in the latent space. By constructing cross-modal positive and negative sample pairs, this approach enhances the model's discriminative ability. Experiments on three benchmark datasets demonstrate that TriMod-DTI outperforms state-of-the-art methods. The ablation studies validate the contributions of each modality. Moreover, case studies highlight its practical potential for DTI prediction and drug discovery.

CLSep 7, 2023
Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media

Hongzhi Qi, Qing Zhao, Jianqiang Li et al.

On social media, users often express their personal feelings, which may exhibit cognitive distortions or even suicidal tendencies on certain specific topics. Early recognition of these signs is critical for effective psychological intervention. In this paper, we introduce two novel datasets from Chinese social media: SOS-HL-1K for suicidal risk classification and SocialCD-3K for cognitive distortions detection. The SOS-HL-1K dataset contained 1,249 posts and SocialCD-3K dataset was a multi-label classification dataset that containing 3,407 posts. We propose a comprehensive evaluation using two supervised learning methods and eight large language models (LLMs) on the proposed datasets. From the prompt engineering perspective, we experimented with two types of prompt strategies, including four zero-shot and five few-shot strategies. We also evaluated the performance of the LLMs after fine-tuning on the proposed tasks. The experimental results show that there is still a huge gap between LLMs relying only on prompt engineering and supervised learning. In the suicide classification task, this gap is 6.95% points in F1-score, while in the cognitive distortion task, the gap is even more pronounced, reaching 31.53% points in F1-score. However, after fine-tuning, this difference is significantly reduced. In the suicide and cognitive distortion classification tasks, the gap decreases to 4.31% and 3.14%, respectively. This research highlights the potential of LLMs in psychological contexts, but supervised learning remains necessary for more challenging tasks. All datasets and code are made available.

LGApr 8, 2023
Interpretable machine learning-accelerated seed treatment by nanomaterials for environmental stress alleviation

Hengjie Yu, Dan Luo, Sam F. Y. Li et al.

Crops are constantly challenged by different environmental conditions. Seed treatment by nanomaterials is a cost-effective and environmentally-friendly solution for environmental stress mitigation in crop plants. Here, 56 seed nanopriming treatments are used to alleviate environmental stresses in maize. Seven selected nanopriming treatments significantly increase the stress resistance index (SRI) by 13.9% and 12.6% under salinity stress and combined heat-drought stress, respectively. Metabolomics data reveals that ZnO nanopriming treatment, with the highest SRI value, mainly regulates the pathways of amino acid metabolism, secondary metabolite synthesis, carbohydrate metabolism, and translation. Understanding the mechanism of seed nanopriming is still difficult due to the variety of nanomaterials and the complexity of interactions between nanomaterials and plants. Using the nanopriming data, we present an interpretable structure-activity relationship (ISAR) approach based on interpretable machine learning for predicting and understanding its stress mitigation effects. The post hoc and model-based interpretation approaches of machine learning are combined to provide complementary benefits and give researchers or policymakers more illuminating or trustworthy results. The concentration, size, and zeta potential of nanoparticles are identified as dominant factors for correlating root dry weight under salinity stress, and their effects and interactions are explained. Additionally, a web-based interactive tool is developed for offering prediction-level interpretation and gathering more details about specific nanopriming treatments. This work offers a promising framework for accelerating the agricultural applications of nanomaterials and may profoundly contribute to nanosafety assessment.

CVFeb 5, 2022Code
Memory Defense: More Robust Classification via a Memory-Masking Autoencoder

Eashan Adhikarla, Dan Luo, Brian D. Davison

Many deep neural networks are susceptible to minute perturbations of images that have been carefully crafted to cause misclassification. Ideally, a robust classifier would be immune to small variations in input images, and a number of defensive approaches have been created as a result. One method would be to discern a latent representation which could ignore small changes to the input. However, typical autoencoders easily mingle inter-class latent representations when there are strong similarities between classes, making it harder for a decoder to accurately project the image back to the original high-dimensional space. We propose a novel framework, Memory Defense, an augmented classifier with a memory-masking autoencoder to counter this challenge. By masking other classes, the autoencoder learns class-specific independent latent representations. We test the model's robustness against four widely used attacks. Experiments on the Fashion-MNIST & CIFAR-10 datasets demonstrate the superiority of our model. We make available our source code at GitHub repository: https://github.com/eashanadhikarla/MemDefense

AINov 5, 2025
Adobe Summit Concierge Evaluation with Human in the Loop

Yiru Chen, Sally Fang, Sai Sree Harsha et al.

Generative AI assistants offer significant potential to enhance productivity, streamline information access, and improve user experience in enterprise contexts. In this work, we present Summit Concierge, a domain-specific AI assistant developed for Adobe Summit. The assistant handles a wide range of event-related queries and operates under real-world constraints such as data sparsity, quality assurance, and rapid deployment. To address these challenges, we adopt a human-in-the-loop development workflow that combines prompt engineering, retrieval grounding, and lightweight human validation. We describe the system architecture, development process, and real-world deployment outcomes. Our experience shows that agile, feedback-driven development enables scalable and reliable AI assistants, even in cold-start scenarios.

CVJun 3, 2025
NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

Xiaohong Liu, Xiongkuo Min, Qiang Hu et al.

This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking head. The user-generated video track uses the FineVD-GC, which contains 6,284 user generated videos. The user-generated video track has a total of 125 registered participants. A total of 242 submissions are received in the development phase, and 136 submissions are received in the test phase. Finally, 5 participating teams submitted their models and fact sheets. The AI generated video track uses the Q-Eval-Video, which contains 34,029 AI-Generated Videos (AIGVs) generated by 11 popular Text-to-Video (T2V) models. A total of 133 participants have registered in this track. A total of 396 submissions are received in the development phase, and 226 submissions are received in the test phase. Finally, 6 participating teams submitted their models and fact sheets. The talking head track uses the THQA-NTIRE, which contains 12,247 2D and 3D talking heads. A total of 89 participants have registered in this track. A total of 225 submissions are received in the development phase, and 118 submissions are received in the test phase. Finally, 8 participating teams submitted their models and fact sheets. Each participating team in every track has proposed a method that outperforms the baseline, which has contributed to the development of fields in three tracks.

CLJan 25, 2025
Federated Retrieval Augmented Generation for Multi-Product Question Answering

Parshin Shojaee, Sai Sree Harsha, Dan Luo et al.

Recent advancements in Large Language Models and Retrieval-Augmented Generation have boosted interest in domain-specific question-answering for enterprise products. However, AI Assistants often face challenges in multi-product QA settings, requiring accurate responses across diverse domains. Existing multi-domain RAG-QA approaches either query all domains indiscriminately, increasing computational costs and LLM hallucinations, or rely on rigid resource selection, which can limit search results. We introduce MKP-QA, a novel multi-product knowledge-augmented QA framework with probabilistic federated search across domains and relevant knowledge. This method enhances multi-domain search quality by aggregating query-domain and query-passage probabilistic relevance. To address the lack of suitable benchmarks for multi-product QAs, we also present new datasets focused on three Adobe products: Adobe Experience Platform, Target, and Customer Journey Analytics. Our experiments show that MKP-QA significantly boosts multi-product RAG-QA performance in terms of both retrieval accuracy and response quality.

SDApr 14, 2025
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis

Dan Luo, Chengyuan Ma, Weiqin Li et al.

With the advancement of speech synthesis technology, users have higher expectations for the naturalness and expressiveness of synthesized speech. But previous research ignores the importance of prompt selection. This study proposes a text-to-speech (TTS) framework based on Retrieval-Augmented Generation (RAG) technology, which can dynamically adjust the speech style according to the text content to achieve more natural and vivid communication effects. We have constructed a speech style knowledge database containing high-quality speech samples in various contexts and developed a style matching scheme. This scheme uses embeddings, extracted by Llama, PER-LLM-Embedder,and Moka, to match with samples in the knowledge database, selecting the most appropriate speech style for synthesis. Furthermore, our empirical research validates the effectiveness of the proposed method. Our demo can be viewed at: https://thuhcsi.github.io/icme2025-AutoStyle-TTS

AIOct 14, 2024
From Anchors to Answers: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models

Yanbiao Ji, Chang Liu, Xin Chen et al.

Enabling large language models (LLMs) to effectively process and reason with graph-structured data remains a significant challenge despite their remarkable success in natural language tasks. Current approaches either convert graph structures into verbose textual descriptions, consuming substantial computational resources, or employ complex graph neural networks as tokenizers, which introduce significant training overhead. To bridge this gap, we present NT-LLM, a novel framework with an anchor-based positional encoding scheme for graph representation. Our approach strategically selects reference nodes as anchors and encodes each node's position relative to these anchors, capturing essential topological information without the computational burden of existing methods. Notably, we identify and address a fundamental issue: the inherent misalignment between discrete hop-based distances in graphs and continuous distances in embedding spaces. By implementing a rank-preserving objective for positional encoding pretraining, NT-LLM achieves superior performance across diverse graph tasks ranging from basic structural analysis to complex reasoning scenarios. Our comprehensive evaluation demonstrates that this lightweight yet powerful approach effectively enhances LLMs' ability to understand and reason with graph-structured information, offering an efficient solution for graph-based applications of language models.

CLApr 17, 2024
AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts

Meng Jiang, Yi Jing Yu, Qing Zhao et al.

Cognitive Behavioral Therapy (CBT) is an effective technique for addressing the irrational thoughts stemming from mental illnesses, but it necessitates precise identification of cognitive pathways to be successfully implemented in patient care. In current society, individuals frequently express negative emotions on social media on specific topics, often exhibiting cognitive distortions, including suicidal behaviors in extreme cases. Yet, there is a notable absence of methodologies for analyzing cognitive pathways that could aid psychotherapists in conducting effective interventions online. In this study, we gathered data from social media and established the task of extracting cognitive pathways, annotating the data based on a cognitive theoretical framework. We initially categorized the task of extracting cognitive pathways as a hierarchical text classification with four main categories and nineteen subcategories. Following this, we structured a text summarization task to help psychotherapists quickly grasp the essential information. Our experiments evaluate the performance of deep learning and large language models (LLMs) on these tasks. The results demonstrate that our deep learning method achieved a micro-F1 score of 62.34% in the hierarchical text classification task. Meanwhile, in the text summarization task, GPT-4 attained a Rouge-1 score of 54.92 and a Rouge-2 score of 30.86, surpassing the experimental deep learning model's performance. However, it may suffer from an issue of hallucination. We have made all models and codes publicly available to support further research in this field.

ROMay 13, 2025
DynamicDTA: Drug-Target Binding Affinity Prediction Using Dynamic Descriptors and Graph Representation

Dan Luo, Jinyu Zhou, Le Xu et al.

Predicting drug-target binding affinity (DTA) is essential for identifying potential therapeutic candidates in drug discovery. However, most existing models rely heavily on static protein structures, often overlooking the dynamic nature of proteins, which is crucial for capturing conformational flexibility that will be beneficial for protein binding interactions. We introduce DynamicDTA, an innovative deep learning framework that incorporates static and dynamic protein features to enhance DTA prediction. The proposed DynamicDTA takes three types of inputs, including drug sequence, protein sequence, and dynamic descriptors. A molecular graph representation of the drug sequence is generated and subsequently processed through graph convolutional network, while the protein sequence is encoded using dilated convolutions. Dynamic descriptors, such as root mean square fluctuation, are processed through a multi-layer perceptron. These embedding features are fused with static protein features using cross-attention, and a tensor fusion network integrates all three modalities for DTA prediction. Extensive experiments on three datasets demonstrate that DynamicDTA achieves by at least 3.4% improvement in RMSE score with comparison to seven state-of-the-art baseline methods. Additionally, predicting novel drugs for Human Immunodeficiency Virus Type 1 and visualizing the docking complexes further demonstrates the reliability and biological relevance of DynamicDTA.

CLApr 19, 2024
SOS-1K: A Fine-grained Suicide Risk Classification Dataset for Chinese Social Media Analysis

Hongzhi Qi, Hanfei Liu, Jianqiang Li et al.

In the social media, users frequently express personal emotions, a subset of which may indicate potential suicidal tendencies. The implicit and varied forms of expression in internet language complicate accurate and rapid identification of suicidal intent on social media, thus creating challenges for timely intervention efforts. The development of deep learning models for suicide risk detection is a promising solution, but there is a notable lack of relevant datasets, especially in the Chinese context. To address this gap, this study presents a Chinese social media dataset designed for fine-grained suicide risk classification, focusing on indicators such as expressions of suicide intent, methods of suicide, and urgency of timing. Seven pre-trained models were evaluated in two tasks: high and low suicide risk, and fine-grained suicide risk classification on a level of 0 to 10. In our experiments, deep learning models show good performance in distinguishing between high and low suicide risk, with the best model achieving an F1 score of 88.39%. However, the results for fine-grained suicide risk classification were still unsatisfactory, with an weighted F1 score of 50.89%. To address the issues of data imbalance and limited dataset size, we investigated both traditional and advanced, large language model based data augmentation techniques, demonstrating that data augmentation can enhance model performance by up to 4.65% points in F1-score. Notably, the Chinese MentalBERT model, which was pre-trained on psychological domain data, shows superior performance in both tasks. This study provides valuable insights for automatic identification of suicidal individuals, facilitating timely psychological intervention on social media platforms. The source code and data are publicly available.

CLAug 27, 2025
LFD: Layer Fused Decoding to Exploit External Knowledge in Retrieval-Augmented Generation

Yang Sun, Zhiyong Xie, Dan Luo et al.

Retrieval-augmented generation (RAG) incorporates external knowledge into large language models (LLMs), improving their adaptability to downstream tasks and enabling information updates. Surprisingly, recent empirical evidence demonstrates that injecting noise into retrieved relevant documents paradoxically facilitates exploitation of external knowledge and improves generation quality. Although counterintuitive and challenging to apply in practice, this phenomenon enables granular control and rigorous analysis of how LLMs integrate external knowledge. Therefore, in this paper, we intervene on noise injection and establish a layer-specific functional demarcation within the LLM: shallow layers specialize in local context modeling, intermediate layers focus on integrating long-range external factual knowledge, and deeper layers primarily rely on parametric internal knowledge. Building on this insight, we propose Layer Fused Decoding (LFD), a simple decoding strategy that directly combines representations from an intermediate layer with final-layer decoding outputs to fully exploit the external factual knowledge. To identify the optimal intermediate layer, we introduce an internal knowledge score (IKS) criterion that selects the layer with the lowest IKS value in the latter half of layers. Experimental results across multiple benchmarks demonstrate that LFD helps RAG systems more effectively surface retrieved context knowledge with minimal cost.

SDAug 8, 2025
DAFMSVC: One-Shot Singing Voice Conversion with Dual Attention Mechanism and Flow Matching

Wei Chen, Binzhu Sha, Dan Luo et al.

Singing Voice Conversion (SVC) transfers a source singer's timbre to a target while keeping melody and lyrics. The key challenge in any-to-any SVC is adapting unseen speaker timbres to source audio without quality degradation. Existing methods either face timbre leakage or fail to achieve satisfactory timbre similarity and quality in the generated audio. To address these challenges, we propose DAFMSVC, where the self-supervised learning (SSL) features from the source audio are replaced with the most similar SSL features from the target audio to prevent timbre leakage. It also incorporates a dual cross-attention mechanism for the adaptive fusion of speaker embeddings, melody, and linguistic content. Additionally, we introduce a flow matching module for high quality audio generation from the fused features. Experimental results show that DAFMSVC significantly enhances timbre similarity and naturalness, outperforming state-of-the-art methods in both subjective and objective evaluations.

CVMay 2, 2023
Cross-view Action Recognition via Contrastive View-invariant Representation

Yuexi Zhang, Dan Luo, Balaji Sundareshan et al.

Cross view action recognition (CVAR) seeks to recognize a human action when observed from a previously unseen viewpoint. This is a challenging problem since the appearance of an action changes significantly with the viewpoint. Applications of CVAR include surveillance and monitoring of assisted living facilities where is not practical or feasible to collect large amounts of training data when adding a new camera. We present a simple yet efficient CVAR framework to learn invariant features from either RGB videos, 3D skeleton data, or both. The proposed approach outperforms the current state-of-the-art achieving similar levels of performance across input modalities: 99.4% (RGB) and 99.9% (3D skeletons), 99.4% (RGB) and 99.9% (3D Skeletons), 97.3% (RGB), and 99.2% (3D skeletons), and 84.4%(RGB) for the N-UCLA, NTU-RGB+D 60, NTU-RGB+D 120, and UWA3DII datasets, respectively.