Tong Ye

h-index42

9papers

59citations

Novelty51%

AI Score31

Ranked #129,822 of 194,257 authors (top 67%)#23,569 in CL (top 77%)

9 Papers

0.5CLMar 15, 2023

On the Calibration and Uncertainty with Pólya-Gamma Augmentation for Dialog Retrieval Models

Tong Ye, Shijing Si, Jianzong Wang et al.

Deep neural retrieval models have amply demonstrated their power but estimating the reliability of their predictions remains challenging. Most dialog response retrieval models output a single score for a response on how relevant it is to a given question. However, the bad calibration of deep neural network results in various uncertainty for the single score such that the unreliable predictions always misinform user decisions. To investigate these issues, we present an efficient calibration and uncertainty estimation framework PG-DRR for dialog response retrieval models which adds a Gaussian Process layer to a deterministic deep neural network and recovers conjugacy for tractable posterior inference by Pólya-Gamma augmentation. Finally, PG-DRR achieves the lowest empirical calibration error (ECE) in the in-domain datasets and the distributional shift task while keeping $R_{10}@1$ and MAP performance.

1.3CLMar 15, 2023

Efficient Uncertainty Estimation with Gaussian Process for Reliable Dialog Response Retrieval

Tong Ye, Zhitao Li, Jianzong Wang et al.

Deep neural networks have achieved remarkable performance in retrieval-based dialogue systems, but they are shown to be ill calibrated. Though basic calibration methods like Monte Carlo Dropout and Ensemble can calibrate well, these methods are time-consuming in the training or inference stages. To tackle these challenges, we propose an efficient uncertainty calibration framework GPF-BERT for BERT-based conversational search, which employs a Gaussian Process layer and the focal loss on top of the BERT architecture to achieve a high-quality neural ranker. Extensive experiments are conducted to verify the effectiveness of our method. In comparison with basic calibration methods, GPF-BERT achieves the lowest empirical calibration error (ECE) in three in-domain datasets and the distributional shift tasks, while yielding the highest $R_{10}@1$ and MAP performance on most cases. In terms of time consumption, our GPF-BERT has an 8$\times$ speedup.

5.7SDJun 27, 2022Code

Uncertainty Calibration for Deep Audio Classifiers

Tong Ye, Shijing Si, Jianzong Wang et al.

Although deep Neural Networks (DNNs) have achieved tremendous success in audio classification tasks, their uncertainty calibration are still under-explored. A well-calibrated model should be accurate when it is certain about its prediction and indicate high uncertainty when it is likely to be inaccurate. In this work, we investigate the uncertainty calibration for deep audio classifiers. In particular, we empirically study the performance of popular calibration methods: (i) Monte Carlo Dropout, (ii) ensemble, (iii) focal loss, and (iv) spectral-normalized Gaussian process (SNGP), on audio classification datasets. To this end, we evaluate (i-iv) for the tasks of environment sound and music genre classification. Results indicate that uncalibrated deep audio classifiers may be over-confident, and SNGP performs the best and is very efficient on the two datasets of this paper.

3.0IVNov 30, 2023

Quantification of cardiac capillarization in single-immunostained myocardial slices using weakly supervised instance segmentation

Zhao Zhang, Xiwen Chen, William Richardson et al.

Decreased myocardial capillary density has been reported as an important histopathological feature associated with various heart disorders. Quantitative assessment of cardiac capillarization typically involves double immunostaining of cardiomyocytes (CMs) and capillaries in myocardial slices. In contrast, single immunostaining of basement membrane components is a straightforward approach to simultaneously label CMs and capillaries, presenting fewer challenges in background staining. However, subsequent image analysis always requires manual work in identifying and segmenting CMs and capillaries. Here, we developed an image analysis tool, AutoQC, to automatically identify and segment CMs and capillaries in immunofluorescence images of collagen type IV, a predominant basement membrane protein within the myocardium. In addition, commonly used capillarization-related measurements can be derived from segmentation masks. AutoQC features a weakly supervised instance segmentation algorithm by leveraging the power of a pre-trained segmentation model via prompt engineering. AutoQC outperformed YOLOv8-Seg, a state-of-the-art instance segmentation model, in both instance segmentation and capillarization assessment. Furthermore, the training of AutoQC required only a small dataset with bounding box annotations instead of pixel-wise annotations, leading to a reduced workload during network training. AutoQC provides an automated solution for quantifying cardiac capillarization in basement-membrane-immunostained myocardial slices, eliminating the need for manual image analysis once it is trained.

15.9SEFeb 17, 2025

LLM4EFFI: Leveraging Large Language Models to Enhance Code Efficiency and Correctness

Tong Ye, Weigang Huang, Xuhong Zhang et al.

Large Language Models (LLMs), particularly Code LLMs, have demonstrated impressive performance in code generation. Current research primarily focuses on the correctness of generated code, while efficiency remains less explored. Recent works have focused on modifying the initial version of the code to improve its efficiency. However, such refinements are limited by the algorithmic design and overall logic of the initial code, resulting in only incremental improvements. In contrast, when human developers write high-quality code, they typically begin by designing several potential solutions at the logical level, evaluating various algorithms and their complexities, and then proceeding to implement and optimize the solution. In this study, we introduce \tool: \uline{L}arge \uline{L}anguage \uline{M}odel for Code \uline{Effi}ciency, a novel framework that enables LLMs to generate code that balances both efficiency and correctness. Specifically, \tool divides the efficiency optimization process into two domains: algorithmic exploration in the logic domain and implementation optimization in the code domain. The correctness of the code is then guaranteed through a synthetic test case refinement process. This approach, which prioritizes efficiency before ensuring correctness, offers a new paradigm for efficient code generation. Experiments demonstrate that \tool consistently improves both efficiency and correctness, achieving new state-of-the-art performance in code efficiency benchmarks across various LLM backbones.

5.2CVJan 11, 2024

Enhancing Digital Hologram Reconstruction Using Reverse-Attention Loss for Untrained Physics-Driven Deep Learning Models with Uncertain Distance

Xiwen Chen, Hao Wang, Zhao Zhang et al.

Untrained Physics-based Deep Learning (DL) methods for digital holography have gained significant attention due to their benefits, such as not requiring an annotated training dataset, and providing interpretability since utilizing the governing laws of hologram formation. However, they are sensitive to the hard-to-obtain precise object distance from the imaging plane, posing the $\textit{Autofocusing}$ challenge. Conventional solutions involve reconstructing image stacks for different potential distances and applying focus metrics to select the best results, which apparently is computationally inefficient. In contrast, recently developed DL-based methods treat it as a supervised task, which again needs annotated data and lacks generalizability. To address this issue, we propose $\textit{reverse-attention loss}$, a weighted sum of losses for all possible candidates with learnable weights. This is a pioneering approach to addressing the Autofocusing challenge in untrained deep-learning methods. Both theoretical analysis and experiments demonstrate its superiority in efficiency and accuracy. Interestingly, our method presents a significant reconstruction performance over rival methods (i.e. alternating descent-like optimization, non-weighted loss integration, and random distance assignment) and even is almost equal to that achieved with a precisely known object distance. For example, the difference is less than 1dB in PSNR and 0.002 in SSIM for the target sample in our experiment.

5.9PLJun 17, 2024

A Problem-Oriented Perspective and Anchor Verification for Code Optimization

Tong Ye, Tengfei Ma, Xuhong Zhang et al.

Large Language Models (LLMs) have shown remarkable capabilities in solving various programming tasks, such as code generation. However, their potential for code optimization, particularly in performance enhancement, remains largely unexplored. This paper investigates the capabilities of LLMs in optimizing code for minimal execution time, addressing a critical gap in current research. The recently proposed code optimization methods construct program optimization pairs based on iterative submissions from the same programmer for the same problem. However, this approach confines LLMs to local performance improvements, neglecting global algorithmic innovation. To overcome this limitation, we adopt a completely different perspective by reconstructing the optimization pairs into a problem-oriented approach. This allows for the integration of various ideas from multiple programmers tackling the same problem. Furthermore, we observe that code optimization presents greater challenges compared to code generation, often accompanied by "optimization tax". Recognizing the inherent trade-offs in correctness and efficiency, we introduce a novel anchor verification framework to mitigate this "optimization tax". Ultimately, the problem oriented perspective combined with the anchor verification framework significantly enhances both the correct optimization ratio and speedup to new levels.

23.8AIMay 18, 2023Code

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Tong Ye, Lingfei Wu, Tengfei Ma et al.

Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although neural language models achieve significant performance in this field, they are limited by their inability to access external knowledge. To address this limitation, an emerging trend is combining neural models with external knowledge through retrieval methods. Previous methods have relied on the sentence-level retrieval paradigm on the encoder side. However, this paradigm is coarse-grained, noise-filled and cannot directly take advantage of the high-quality retrieved summary tokens on the decoder side. In this paper, we propose a fine-grained Token-level retrieval-augmented mechanism (Tram) on the decoder side rather than the encoder side to enhance the performance of neural models and produce more low-frequency tokens in generating summaries. Furthermore, to overcome the challenge of token-level retrieval in capturing contextual code semantics, we also propose integrating code semantics into individual summary tokens. The results of extensive experiments and human evaluation show that our token-level retrieval-augmented approach significantly improves performance and is more interpretable.

0.6CLFeb 22, 2022

VU-BERT: A Unified framework for Visual Dialog

Tong Ye, Shijing Si, Jianzong Wang et al.

The visual dialog task attempts to train an agent to answer multi-turn questions given an image, which requires the deep understanding of interactions between the image and dialog history. Existing researches tend to employ the modality-specific modules to model the interactions, which might be troublesome to use. To fill in this gap, we propose a unified framework for image-text joint embedding, named VU-BERT, and apply patch projection to obtain vision embedding firstly in visual dialog tasks to simplify the model. The model is trained over two tasks: masked language modeling and next utterance retrieval. These tasks help in learning visual concepts, utterances dependence, and the relationships between these two modalities. Finally, our VU-BERT achieves competitive performance (0.7287 NDCG scores) on VisDial v1.0 Datasets.