Jin Xiao

CV
h-index22
13papers
396citations
Novelty49%
AI Score56

13 Papers

CLSep 17, 2023Code
Can Large Language Models Understand Real-World Complex Instructions?

Qianyu He, Jie Zeng, Wenhao Huang et al.

Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments. Resources of CELLO are publicly available at https://github.com/Abbey4799/CELLO.

QMJul 4, 2022
Accurate RNA 3D structure prediction using a language model-based deep learning approach

Tao Shen, Zhihang Hu, Siqi Sun et al.

Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.

CVJun 18, 2025Code
Enhancing point cloud analysis via neighbor aggregation correction based on cross-stage structure correlation

Jiaqi Shi, Jin Xiao, Xiaoguang Hu et al.

Point cloud analysis is the cornerstone of many downstream tasks, among which aggregating local structures is the basis for understanding point cloud data. While numerous works aggregate neighbor using three-dimensional relative coordinates, there are irrelevant point interference and feature hierarchy gap problems due to the limitation of local coordinates. Although some works address this limitation by refining spatial description though explicit modeling of cross-stage structure, these enhancement methods based on direct geometric structure encoding have problems of high computational overhead and noise sensitivity. To overcome these problems, we propose the Point Distribution Set Abstraction module (PDSA) that utilizes the correlation in the high-dimensional space to correct the feature distribution during aggregation, which improves the computational efficiency and robustness. PDSA distinguishes the point correlation based on a lightweight cross-stage structural descriptor, and enhances structural homogeneity by reducing the variance of the neighbor feature matrix and increasing classes separability though long-distance modeling. Additionally, we introducing a key point mechanism to optimize the computational overhead. The experimental result on semantic segmentation and classification tasks based on different baselines verify the generalization of the method we proposed, and achieve significant performance improvement with less parameter cost. The corresponding ablation and visualization results demonstrate the effectiveness and rationality of our method. The code and training weight is available at: https://github.com/AGENT9717/PointDistribution

LGMay 27, 2025Code
ChemHAS: Hierarchical Agent Stacking for Enhancing Chemistry Tools

Zhucong Li, Bowei Zhang, Jin Xiao et al.

Large Language Model (LLM)-based agents have demonstrated the ability to improve performance in chemistry-related tasks by selecting appropriate tools. However, their effectiveness remains limited by the inherent prediction errors of chemistry tools. In this paper, we take a step further by exploring how LLMbased agents can, in turn, be leveraged to reduce prediction errors of the tools. To this end, we propose ChemHAS (Chemical Hierarchical Agent Stacking), a simple yet effective method that enhances chemistry tools through optimizing agent-stacking structures from limited data. ChemHAS achieves state-of-the-art performance across four fundamental chemistry tasks, demonstrating that our method can effectively compensate for prediction errors of the tools. Furthermore, we identify and characterize four distinct agent-stacking behaviors, potentially improving interpretability and revealing new possibilities for AI agent applications in scientific research. Our code and dataset are publicly available at https: //anonymous.4open.science/r/ChemHAS-01E4/README.md.

CLNov 6, 2024Code
QUILL: Quotation Generation Enhancement of Large Language Models

Jin Xiao, Bowei Zhang, Qianyu He et al.

While Large language models (LLMs) have become excellent writing assistants, they still struggle with quotation generation. This is because they either hallucinate when providing factual quotations or fail to provide quotes that exceed human expectations. To bridge the gap, we systematically study how to evaluate and improve LLMs' performance in quotation generation tasks. We first establish a holistic and automatic evaluation system for quotation generation task, which consists of five criteria each with corresponding automatic metric. To improve the LLMs' quotation generation abilities, we construct a bilingual knowledge base that is broad in scope and rich in dimensions, containing up to 32,022 quotes. Moreover, guided by our critiria, we further design a quotation-specific metric to rerank the retrieved quotations from the knowledge base. Extensive experiments show that our metrics strongly correlate with human preferences. Existing LLMs struggle to generate desired quotes, but our quotation knowledge base and reranking metric help narrow this gap. Our dataset and code are publicly available at https://github.com/GraceXiaoo/QUILL.

CVMay 22, 2023Code
A bioinspired three-stage model for camouflaged object detection

Tianyou Chen, Jin Xiao, Xiaoguang Hu et al.

Camouflaged objects are typically assimilated into their backgrounds and exhibit fuzzy boundaries. The complex environmental conditions and the high intrinsic similarity between camouflaged targets and their surroundings pose significant challenges in accurately locating and segmenting these objects in their entirety. While existing methods have demonstrated remarkable performance in various real-world scenarios, they still face limitations when confronted with difficult cases, such as small targets, thin structures, and indistinct boundaries. Drawing inspiration from human visual perception when observing images containing camouflaged objects, we propose a three-stage model that enables coarse-to-fine segmentation in a single iteration. Specifically, our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features. This proposed approach not only reduces computational overhead but also mitigates interference caused by background noise. Furthermore, considering the significance of multi-scale information, we have designed a multi-scale feature enhancement module that enlarges the receptive field while preserving detailed structural cues. Additionally, a boundary enhancement module has been developed to enhance performance by leveraging boundary information. Subsequently, a mask-guided fusion module is proposed to generate fine-grained results by integrating coarse prediction maps with high-resolution feature maps. Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities. Upon acceptance of the paper, the source code will be made publicly available at https://github.com/clelouch/BTSNet.

SEJul 20, 2021Code
Mono2Micro: A Practical and Effective Tool for Decomposing Monolithic Java Applications to Microservices

Anup Kalia, Jin Xiao, Rahul Krishna et al.

In migrating production workloads to cloud, enterprises often face the daunting task of evolving monolithic applications toward a microservice architecture. At IBM, we developed a tool called Mono2Micro to assist with this challenging task. Mono2Micro performs spatio-temporal decomposition, leveraging well-defined business use cases and runtime call relations to create functionally cohesive partitioning of application classes. Our preliminary evaluation of Mono2Micro showed promising results. How well does Mono2Micro perform against other decomposition techniques, and how do practitioners perceive the tool? This paper describes the technical foundations of Mono2Micro and presents results to answer these two questions. To answer the first question, we evaluated Mono2Micro against four existing techniques on a set of open-source and proprietary Java applications and using different metrics to assess the quality of decomposition and tool's efficiency. Our results show that Mono2Micro significantly outperforms state-of-the-art baselines in specific metrics well-defined for the problem domain. To answer the second question, we conducted a survey of twenty-one practitioners in various industry roles who have used Mono2Micro. This study highlights several benefits of the tool, interesting practitioner perceptions, and scope for further improvements. Overall, these results show that Mono2Micro can provide a valuable aid to practitioners in creating functionally cohesive and explainable microservice decompositions.

SEJun 12, 2021Code
Lessons learned from hyper-parameter tuning for microservice candidate identification

Rahul Yedida, Rahul Krishna, Anup Kalia et al.

When optimizing software for the cloud, monolithic applications need to be partitioned into many smaller *microservices*. While many tools have been proposed for this task, we warn that the evaluation of those approaches has been incomplete; e.g. minimal prior exploration of hyperparameter optimization. Using a set of open source Java EE applications, we show here that (a) such optimization can significantly improve microservice partitioning; and that (b) an open issue for future work is how to find which optimizer works best for different problems. To facilitate that future work, see [https://github.com/yrahul3910/ase-tuned-mono2micro](https://github.com/yrahul3910/ase-tuned-mono2micro) for a reproduction package for this research.

CVMay 4
Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis

Jiaqi Shi, Jin Xiao, Xiaoguang Hu et al.

In 3D point cloud understanding, the core challenge lies in accurately capturing discriminative features within complex neighborhoods, which directly affects the execution precision of downstream tasks such as embodied AI and autonomous driving. Existing methods explore feature correlation discrimination but are limited to point-level spatial distribution or channel responses, enabling only coarse-grained level evaluation. For modern multi-scale point cloud networks, such coarse-grained metrics inevitably incur significant information loss in deeper layers. To address this issue, we propose a novel network equipped with a channel-level metric-based enhancement mechanism, termed the PointCRA network. Our core idea is to introduce temporal trend variation as a new evaluation dimension to avoid the information loss caused by weight dimension collapse in existing spatial and channel attention mechanisms. On this basis, we construct a multi-level calibration framework guided by neighborhood homogeneity for weight calibration, and design a dedicated loss function to enhance channel discriminability. The module effectively leverages the intrinsic feature priors of deep networks to adaptively correct the feature aggregation process, offering strong interpretability with low parameter overhead. Furthermore, our proposed method exhibits strong transferability, interpretability, and parameter efficiency. We validate the proposed method effectiveness on diverse datasets and benchmark models, and further demonstrate its rationality through extensive analytical experiments. Our PointCRA achieves 77.5% mIoU on the S3DIS dataset, 90.4% OA on the ScanObjectNN dataset, and 87.4% instance mIoU on the ShapeNetPart dataset. The code and pretrained weights are publicly available on GitHub:

IRDec 15, 2025
What Makes an Ideal Quote? Recommending "Unexpected yet Rational" Quotations via Novelty

Bowei Zhang, Jin Xiao, Guanglei Yue et al.

Quotation recommendation aims to enrich writing by suggesting quotes that complement a given context, yet existing systems mostly optimize surface-level topical relevance and ignore the deeper semantic and aesthetic properties that make quotations memorable. We start from two empirical observations. First, a systematic user study shows that people consistently prefer quotations that are ``unexpected yet rational'' in context, identifying novelty as a key desideratum. Second, we find that strong existing models struggle to fully understand the deep meanings of quotations. Inspired by defamiliarization theory, we therefore formalize quote recommendation as choosing contextually novel but semantically coherent quotations. We operationalize this objective with NovelQR, a novelty-driven quotation recommendation framework. A generative label agent first interprets each quotation and its surrounding context into multi-dimensional deep-meaning labels, enabling label-enhanced retrieval. A token-level novelty estimator then reranks candidates while mitigating auto-regressive continuation bias. Experiments on bilingual datasets spanning diverse real-world domains show that our system recommends quotations that human judges rate as more appropriate, more novel, and more engaging than other baselines, while matching or surpassing existing methods in novelty estimation.

CVSep 17, 2025
Task-Aware Image Signal Processor for Advanced Visual Perception

Kai Chen, Jin Xiao, Leheng Zhang et al.

In recent years, there has been a growing trend in computer vision towards exploiting RAW sensor data, which preserves richer information compared to conventional low-bit RGB images. Early studies mainly focused on enhancing visual quality, while more recent efforts aim to leverage the abundant information in RAW data to improve the performance of visual perception tasks such as object detection and segmentation. However, existing approaches still face two key limitations: large-scale ISP networks impose heavy computational overhead, while methods based on tuning traditional ISP pipelines are restricted by limited representational capacity.To address these issues, we propose Task-Aware Image Signal Processing (TA-ISP), a compact RAW-to-RGB framework that produces task-oriented representations for pretrained vision models. Instead of heavy dense convolutional pipelines, TA-ISP predicts a small set of lightweight, multi-scale modulation operators that act at global, regional, and pixel scales to reshape image statistics across different spatial extents. This factorized control significantly expands the range of spatially varying transforms that can be represented while keeping memory usage, computation, and latency tightly constrained. Evaluated on several RAW-domain detection and segmentation benchmarks under both daytime and nighttime conditions, TA-ISP consistently improves downstream accuracy while markedly reducing parameter count and inference time, making it well suited for deployment on resource-constrained devices.

LGSep 29, 2021
An Expert System for Redesigning Software for Cloud Applications

Rahul Yedida, Rahul Krishna, Anup Kalia et al.

Cloud-based software has many advantages. When services are divided into many independent components, they are easier to update. Also, during peak demand, it is easier to scale cloud services (just hire more CPUs). Hence, many organizations are partitioning their monolithic enterprise applications into cloud-based microservices. Recently there has been much work using machine learning to simplify this partitioning task. Despite much research, no single partitioning method can be recommended as generally useful. More specifically, those prior solutions are "brittle"; i.e. if they work well for one kind of goal in one dataset, then they can be sub-optimal if applied to many datasets and multiple goals. In order to find a generally useful partitioning method, we propose DEEPLY. This new algorithm extends the CO-GCN deep learning partition generator with (a) a novel loss function and (b) some hyper-parameter optimization. As shown by our experiments, DEEPLY generally outperforms prior work (including CO-GCN, and others) across multiple datasets and goals. To the best of our knowledge, this is the first report in SE of such stable hyper-parameter optimization. To aid reuse of this work, DEEPLY is available on-line at https://bit.ly/2WhfFlB.

CVAug 17, 2020
A Smartphone-based System for Real-time Early Childhood Caries Diagnosis

Yipeng Zhang, Haofu Liao, Jin Xiao et al.

Early childhood caries (ECC) is the most common, yet preventable chronic disease in children under the age of 6. Treatments on severe ECC are extremely expensive and unaffordable for socioeconomically disadvantaged families. The identification of ECC in an early stage usually requires expertise in the field, and hence is often ignored by parents. Therefore, early prevention strategies and easy-to-adopt diagnosis techniques are desired. In this study, we propose a multistage deep learning-based system for cavity detection. We create a dataset containing RGB oral images labeled manually by dental practitioners. We then investigate the effectiveness of different deep learning models on the dataset. Furthermore, we integrate the deep learning system into an easy-to-use mobile application that can diagnose ECC from an early stage and provide real-time results to untrained users.