Yu-Chen Lin

CL
h-index3
11papers
311citations
Novelty49%
AI Score31

11 Papers

ASJun 18, 2022
NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

Chi-Chang Lee, Cheng-Hung Hu, Yu-Chen Lin et al.

For deep learning-based speech enhancement (SE) systems, the training-test acoustic mismatch can cause notable performance degradation. To address the mismatch issue, numerous noise adaptation strategies have been derived. In this paper, we propose a novel method, called noise adaptive speech enhancement with target-conditional resampling (NASTAR), which reduces mismatches with only one sample (one-shot) of noisy speech in the target environment. NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model. The noise extractor estimates the target noise from the noisy speech, called pseudo-noise. The noise retrieval model retrieves relevant noise samples from a pool of noise signals according to the noisy speech, called relevant-cohort. The pseudo-noise and the relevant-cohort set are jointly sampled and mixed with the source speech corpus to prepare simulated training data for noise adaptation. Experimental results show that NASTAR can effectively use one noisy speech sample to adapt an SE model to a target condition. Moreover, both the noise extractor and the noise retrieval model contribute to model adaptation. To our best knowledge, NASTAR is the first work to perform one-shot noise adaptation through noise extraction and retrieval.

CLJun 12, 2023
Linear Classifier: An Often-Forgotten Baseline for Text Classification

Yu-Chen Lin, Si-An Chen, Jie-Jyun Liu et al.

Large-scale pre-trained language models such as BERT are popular solutions for text classification. Due to the superior performance of these advanced methods, nowadays, people often directly train them for a few epochs and deploy the obtained model. In this opinion paper, we point out that this way may only sometimes get satisfactory results. We argue the importance of running a simple baseline like linear classifiers on bag-of-words features along with advanced methods. First, for many text data, linear methods show competitive performance, high efficiency, and robustness. Second, advanced models such as BERT may only achieve the best results if properly applied. Simple baselines help to confirm whether the results of advanced models are acceptable. Our experimental results fully support these points.

CLNov 27, 2023
Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model

Yu-Chen Lin, Akhilesh Kumar, Norman Chang et al.

We present four main contributions to enhance the performance of Large Language Models (LLMs) in generating domain-specific code: (i) utilizing LLM-based data splitting and data renovation techniques to improve the semantic representation of embeddings' space; (ii) introducing the Chain of Density for Renovation Credibility (CoDRC), driven by LLMs, and the Adaptive Text Renovation (ATR) algorithm for assessing data renovation reliability; (iii) developing the Implicit Knowledge Expansion and Contemplation (IKEC) Prompt technique; and (iv) effectively refactoring existing scripts to generate new and high-quality scripts with LLMs. By using engineering simulation software RedHawk-SC as a case study, we demonstrate the effectiveness of our data pre-processing method for expanding and categorizing scripts. When combined with IKEC, these techniques enhance the Retrieval-Augmented Generation (RAG) method in retrieving more relevant information, ultimately achieving a 73.33% "Percentage of Correct Lines" for code generation problems in MapReduce applications.

CVOct 18, 2023
Domain-Generalized Face Anti-Spoofing with Unknown Attacks

Zong-Wei Hong, Yu-Chen Lin, Hsuan-Tung Liu et al.

Although face anti-spoofing (FAS) methods have achieved remarkable performance on specific domains or attack types, few studies have focused on the simultaneous presence of domain changes and unknown attacks, which is closer to real application scenarios. To handle domain-generalized unknown attacks, we introduce a new method, DGUA-FAS, which consists of a Transformer-based feature extractor and a synthetic unknown attack sample generator (SUASG). The SUASG network simulates unknown attack samples to assist the training of the feature extractor. Experimental results show that our method achieves superior performance on domain generalization FAS with known or unknown attacks.

CLFeb 14, 2025
Efficient Multitask Learning in Small Language Models Through Upside-Down Reinforcement Learning

Yu-Chen Lin, Sanat Sharma, Hari Manikandan et al.

In this work, we demonstrate that small language models (SLMs), specifically a 100M parameter GPT-2 model, can achieve competitive performance in multitask prompt generation tasks while requiring only a fraction of the computational resources needed by large language models (LLMs). Through a novel combination of upside-down reinforcement learning and synthetic data distillation from a powerful LLM, Llama-3, we train an SLM that achieves relevance scores within 5% of state-of-the-art models, including Llama-3, Qwen2, and Mistral, despite being up to 80 times smaller, making it highly suitable for resource-constrained and real-time applications. This study highlights the potential of SLMs as efficient multitask learners in multimodal settings, providing a promising alternative to LLMs for scalable, low-latency deployments.

CVApr 9, 2024
Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation

Zong-Wei Hong, Yu-Chen Lin

The domain of computer vision has experienced significant advancements in facial-landmark detection, becoming increasingly essential across various applications such as augmented reality, facial recognition, and emotion analysis. Unlike object detection or semantic segmentation, which focus on identifying objects and outlining boundaries, faciallandmark detection aims to precisely locate and track critical facial features. However, deploying deep learning-based facial-landmark detection models on embedded systems with limited computational resources poses challenges due to the complexity of facial features, especially in dynamic settings. Additionally, ensuring robustness across diverse ethnicities and expressions presents further obstacles. Existing datasets often lack comprehensive representation of facial nuances, particularly within populations like those in Taiwan. This paper introduces a novel approach to address these challenges through the development of a knowledge distillation method. By transferring knowledge from larger models to smaller ones, we aim to create lightweight yet powerful deep learning models tailored specifically for facial-landmark detection tasks. Our goal is to design models capable of accurately locating facial landmarks under varying conditions, including diverse expressions, orientations, and lighting environments. The ultimate objective is to achieve high accuracy and real-time performance suitable for deployment on embedded systems. This method was successfully implemented and achieved a top 6th place finish out of 165 participants in the IEEE ICME 2024 PAIR competition.

SDNov 8, 2021
SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

Yu-Chen Lin, Cheng Yu, Yi-Te Hsu et al.

Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified performance on regression tasks because the nature between these and classification tasks differs. In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing. The proposed method compressed the sizes of deep neural network (DNN)-based speech enhancement models by quantizing the fraction bits of single-precision floating-point parameters during training. Before inference implementation, all parameters in the trained SEOFP-NET model are slightly adjusted to accelerate the inference time by replacing the floating-point multiplier with an integer-adder. For generalization, the SEOFP-NET technique is introduced to different speech enhancement tasks in speech signal processing with different model architectures under various corpora. The experimental results indicate that the size of SEOFP-NET models can be significantly compressed by up to 81.249% without noticeably downgrading their speech enhancement performance, and the inference time can be accelerated to 1.212x compared with the baseline models. The results also verify that the proposed SEOFP-NET can cooperate with other efficiency strategies to achieve a synergy effect for model compression. In addition, the just noticeable difference (JND) was applied to the user study experiment to statistically analyze the effect of speech enhancement on listening. The results indicate that the listeners cannot facilely differentiate between the enhanced speech signals processed by the baseline model and the proposed SEOFP-NET.

SDJun 9, 2021
Speech Recovery for Real-World Self-powered Intermittent Devices

Yu-Chen Lin, Tsun-An Hsieh, Kuo-Hsuan Hung et al.

The incompleteness of speech inputs severely degrades the performance of all the related speech signal processing applications. Although many researches have been proposed to address this issue, they controlled the data missing conditions by simulation with self-defined masking lengths or sizes. Besides, the masking definitions are different among all these experimental settings. This paper presents a novel intermittent speech recovery (ISR) system for real-world self-powered intermittent devices. Three contributive stages: interpolation, enhancement, and combination are applied to the ISR system for speech reconstruction. The experimental results show that our recovery system increases speech quality by up to 591.7%, while increasing speech intelligibility by up to 80.5%. Most importantly, the proposed ISR system improves the WER scores by up to 52.6%. The promising results not only confirm the effectiveness of the reconstruction but also encourage the utilization of these battery-free wearable/IoT devices.

ASMay 24, 2020
SERIL: Noise Adaptive Speech Enhancement using Regularization-based Incremental Learning

Chi-Chang Lee, Yu-Chen Lin, Hsuan-Tien Lin et al.

Numerous noise adaptation techniques have been proposed to fine-tune deep-learning models in speech enhancement (SE) for mismatched noise environments. Nevertheless, adaptation to a new environment may lead to catastrophic forgetting of the previously learned environments. The catastrophic forgetting issue degrades the performance of SE in real-world embedded devices, which often revisit previous noise environments. The nature of embedded devices does not allow solving the issue with additional storage of all pre-trained models or earlier training data. In this paper, we propose a regularization-based incremental learning SE (SERIL) strategy, complementing existing noise adaptation strategies without using additional storage. With a regularization constraint, the parameters are updated to the new noise environment while retaining the knowledge of the previous noise environments. The experimental results show that, when faced with a new noise domain, the SERIL model outperforms the unadapted SE model. Meanwhile, compared with the current adaptive technique based on fine-tuning, the SERIL model can reduce the forgetting of previous noise environments by 52%. The results verify that the SERIL model can effectively adjust itself to new noise environments while overcoming the catastrophic forgetting issue. The results make SERIL a favorable choice for real-world SE applications, where the noise environment changes frequently.

LGSep 25, 2019
Stock Prices Prediction using Deep Learning Models

Jialin Liu, Fei Chao, Yu-Chen Lin et al.

Financial markets have a vital role in the development of modern society. They allow the deployment of economic resources. Changes in stock prices reflect changes in the market. In this study, we focus on predicting stock prices by deep learning model. This is a challenge task, because there is much noise and uncertainty in information that is related to stock prices. So this work uses sparse autoencoders with one-dimension (1-D) residual convolutional networks which is a deep learning model, to de-noise the data. Long-short term memory (LSTM) is then used to predict the stock price. The prices, indices and macroeconomic variables in past are the features used to predict the next day's price. Experiment results show that 1-D residual convolutional networks can de-noise data and extract deep features better than a model that combines wavelet transforms (WT) and stacked autoencoders (SAEs). In addition, we compare the performances of model with two different forecast targets of stock price: absolute stock price and price rate of change. The results show that predicting stock price through price rate of change is better than predicting absolute prices directly.

ASAug 17, 2018
A study on speech enhancement using exponent-only floating point quantized neural network (EOFP-QNN)

Yi-Te Hsu, Yu-Chen Lin, Szu-Wei Fu et al.

Numerous studies have investigated the effectiveness of neural network quantization on pattern classification tasks. The present study, for the first time, investigated the performance of speech enhancement (a regression task in speech processing) using a novel exponent-only floating-point quantized neural network (EOFP-QNN). The proposed EOFP-QNN consists of two stages: mantissa-quantization and exponent-quantization. In the mantissa-quantization stage, EOFP-QNN learns how to quantize the mantissa bits of the model parameters while preserving the regression accuracy using the least mantissa precision. In the exponent-quantization stage, the exponent part of the parameters is further quantized without causing any additional performance degradation. We evaluated the proposed EOFP quantization technique on two types of neural networks, namely, bidirectional long short-term memory (BLSTM) and fully convolutional neural network (FCN), on a speech enhancement task. Experimental results showed that the model sizes can be significantly reduced (the model sizes of the quantized BLSTM and FCN models were only 18.75% and 21.89%, respectively, compared to those of the original models) while maintaining satisfactory speech-enhancement performance.