Tongtong Xu

SE
h-index6
4papers
22citations
Novelty56%
AI Score41

4 Papers

DCJan 30
AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation

Zhongzhen Wen, Shudi Shao, Zhong Li et al.

The performance of deep learning models critically depends on efficient kernel implementations, yet developing high-performance kernels for specialized accelerators remains time-consuming and expertise-intensive. While recent work demonstrates that large language models (LLMs) can generate correct and performant GPU kernels, kernel generation for neural processing units (NPUs) remains largely underexplored due to domain-specific programming models, limited public examples, and sparse documentation. Consequently, directly generating AscendC kernels with LLMs yields extremely low correctness, highlighting a substantial gap between GPU and NPU kernel generation. We present AscendCraft, a DSL-guided approach for automatic AscendC kernel generation. AscendCraft introduces a lightweight DSL that abstracts non-essential complexity while explicitly modeling Ascend-specific execution semantics. Kernels are first generated in the DSL using category-specific expert examples and then transcompiled into AscendC through structured, constraint-driven LLM lowering passes. Evaluated on MultiKernelBench across seven operator categories, AscendCraft achieves 98.1% compilation success and 90.4% functional correctness. Moreover, 46.2% of generated kernels match or exceed PyTorch eager execution performance, demonstrating that DSL-guided transcompilation can enable LLMs to generate both correct and competitive NPU kernels. Beyond benchmarks, AscendCraft further demonstrates its generality by successfully generating two correct kernels for newly proposed mHC architecture, achieving performance that substantially surpasses PyTorch eager execution.

SESep 17, 2025
Reasoning Efficiently Through Adaptive Chain-of-Thought Compression: A Self-Optimizing Framework

Kerui Huang, Shuhan Liu, Xing Hu et al.

Chain-of-Thought (CoT) reasoning enhances Large Language Models (LLMs) by prompting intermediate steps, improving accuracy and robustness in arithmetic, logic, and commonsense tasks. However, this benefit comes with high computational costs: longer outputs increase latency, memory usage, and KV-cache demands. These issues are especially critical in software engineering tasks where concise and deterministic outputs are required. To investigate these trade-offs, we conduct an empirical study based on code generation benchmarks. The results reveal that longer CoT does not always help. Excessive reasoning often causes truncation, accuracy drops, and latency up to five times higher, with failed outputs consistently longer than successful ones. These findings challenge the assumption that longer reasoning is inherently better and highlight the need for adaptive CoT control. Motivated by this, we propose SEER (Self-Enhancing Efficient Reasoning), an adaptive framework that compresses CoT while preserving accuracy. SEER combines Best-of-N sampling with task-aware adaptive filtering, dynamically adjusting thresholds based on pre-inference outputs to reduce verbosity and computational overhead. We then evaluate SEER on three software engineering tasks and one math task. On average, SEER shortens CoT by 42.1%, improves accuracy by reducing truncation, and eliminates most infinite loops. These results demonstrate SEER as a practical method to make CoT-enhanced LLMs more efficient and robust, even under resource constraints.

LGMay 15, 2023
Private Training Set Inspection in MLaaS

Mingxue Xu, Tongtong Xu, Po-Yu Chen

Machine Learning as a Service (MLaaS) is a popular cloud-based solution for customers who aim to use an ML model but lack training data, computation resources, or expertise in ML. In this case, the training datasets are typically a private possession of the ML or data companies and are inaccessible to the customers, but the customers still need an approach to confirm that the training datasets meet their expectations and fulfil regulatory measures like fairness. However, no existing work addresses the above customers' concerns. This work is the first attempt to solve this problem, taking data origin as an entry point. We first define origin membership measurement and based on this, we then define diversity and fairness metrics to address customers' concerns. We then propose a strategy to estimate the values of these two metrics in the inaccessible training dataset, combining shadow training techniques from membership inference and an efficient featurization scheme in multiple instance learning. The evaluation contains an application of text review polarity classification applications based on the language BERT model. Experimental results show that our solution can achieve up to 0.87 accuracy for membership inspection and up to 99.3% confidence in inspecting diversity and fairness distribution.

SEJun 5, 2019
RESTORE: Retrospective Fault Localization Enhancing Automated Program Repair

Tongtong Xu, Liushan Chen, Yu Pei et al.

Fault localization is a crucial step of automated program repair, because accurately identifying program locations that are most closely implicated with a fault greatly affects the effectiveness of the patching process. An ideal fault localization technique would provide precise information while requiring moderate computational resources---to best support an efficient search for correct fixes. In contrast, most automated program repair tools use standard fault localization techniques---which are not tightly integrated with the overall program repair process, and hence deliver only subpar efficiency. In this paper, we present retrospective fault localization: a novel fault localization technique geared to the requirements of automated program repair. A key idea of retrospective fault localization is to reuse the outcome of failed patch validation to support mutation-based dynamic analysis---providing accurate fault localization information without incurring onerous computational costs. We implemented retrospective fault localization in a tool called RESTORE---based on the JAID Java program repair system. Experiments involving faults from the Defects4J standard benchmark indicate that retrospective fault localization can boost automated program repair: RESTORE efficiently explores a large fix space, delivering state-of-the-art effectiveness (41 Defects4J bugs correctly fixed, 8 more than any other automated repair tools for Java) while simultaneously boosting performance (speedup over 3 compared to JAID). Retrospective fault localization is applicable to any automated program repair techniques that rely on fault localization and dynamic validation of patches.