Yuxin Ye

AI
h-index2
4papers
37citations
Novelty56%
AI Score35

4 Papers

SDOct 31, 2023
LAVSS: Location-Guided Audio-Visual Spatial Audio Separation

Yuxin Ye, Wenming Yang, Yapeng Tian

Existing machine learning research has achieved promising results in monaural audio-visual separation (MAVS). However, most MAVS methods purely consider what the sound source is, not where it is located. This can be a problem in VR/AR scenarios, where listeners need to be able to distinguish between similar audio sources located in different directions. To address this limitation, we have generalized MAVS to spatial audio separation and proposed LAVSS: a location-guided audio-visual spatial audio separator. LAVSS is inspired by the correlation between spatial audio and visual location. We introduce the phase difference carried by binaural audio as spatial cues, and we utilize positional representations of sounding objects as additional modality guidance. We also leverage multi-level cross-modal attention to perform visual-positional collaboration with audio features. In addition, we adopt a pre-trained monaural separator to transfer knowledge from rich mono sounds to boost spatial audio separation. This exploits the correlation between monaural and binaural channels. Experiments on the FAIR-Play dataset demonstrate the superiority of the proposed LAVSS over existing benchmarks of audio-visual separation. Our project page: https://yyx666660.github.io/LAVSS/.

AIOct 27, 2023
Ontology Revision based on Pre-trained Language Models

Qiu Ji, Guilin Qi, Yuxin Ye et al.

Ontology revision aims to seamlessly incorporate a new ontology into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. Similar to repair single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful, because incoherence is a main potential factor to cause inconsistency and reasoning with an inconsistent ontology will obtain meaningless answers.To deal with this problem, various ontology revision approaches have been proposed to define revision operators and design ranking strategies for axioms in an ontology. However, they rarely consider axiom semantics which provides important information to differentiate axioms. In addition, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years.Therefore, in this paper, we study how to apply pre-trained models to revise ontologies. We first define four scoring functions to rank axioms based on a pre-trained model by considering various information from an ontology. Based on the functions, an ontology revision algorithm is then proposed to deal with unsatisfiable concepts at once. To improve efficiency, an adapted revision algorithm is designed to deal with unsatisfiable concepts group by group. We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. According to the experiments, our algorithms could achieve promising performance.

IVJul 4, 2025
Dual-Alignment Knowledge Retention for Continual Medical Image Segmentation

Yuxin Ye, Yan Liu, Shujian Yu

Continual learning in medical image segmentation involves sequential data acquisition across diverse domains (e.g., clinical sites), where task interference between past and current domains often leads to catastrophic forgetting. Existing continual learning methods fail to capture the complex dependencies between tasks. We introduce a novel framework that mitigates forgetting by establishing and enhancing complex dependencies between historical data and the network in the present task. Our framework features a dual-alignment strategy, the cross-network alignment (CNA) module aligns the features extracted from the bottleneck layers of the current and previous networks, respectively, while the cross-representation alignment (CRA) module aligns the features learned by the current network from historical buffered data and current input data, respectively. Implementing both types of alignment is a non-trivial task. To address this, we further analyze the linear and nonlinear forms of the well-established Hilbert-Schmidt Independence Criterion (HSIC) and deliberately design feature mapping and feature pairing blocks within the CRA module. Experiments on medical image segmentation task demonstrate our framework's effectiveness in mitigating catastrophic forgetting under domain shifts.

CLJun 25, 2024
Banishing LLM Hallucinations Requires Rethinking Generalization

Johnny Li, Saksham Consul, Eda Zhou et al.

Despite their powerful chat, coding, and reasoning abilities, Large Language Models (LLMs) frequently hallucinate. Conventional wisdom suggests that hallucinations are a consequence of a balance between creativity and factuality, which can be mitigated, but not eliminated, by grounding the LLM in external knowledge sources. Through extensive systematic experiments, we show that these traditional approaches fail to explain why LLMs hallucinate in practice. Specifically, we show that LLMs augmented with a massive Mixture of Memory Experts (MoME) can easily memorize large datasets of random numbers. We corroborate these experimental findings with a theoretical construction showing that simple neural networks trained to predict the next token hallucinate when the training loss is above a threshold as it usually does in practice when training on internet scale data. We interpret our findings by comparing against traditional retrieval methods for mitigating hallucinations. We use our findings to design a first generation model for removing hallucinations -- Lamini-1 -- that stores facts in a massive mixture of millions of memory experts that are retrieved dynamically.