Shih-Hsuan Chiu

CL
h-index5
5papers
64citations
Novelty46%
AI Score39

5 Papers

LGJan 27
CiMRAG: Cim-Aware Domain-Adaptive and Noise-Resilient Retrieval-Augmented Generation for Edge-Based LLMs

Shih-Hsuan Chiu, Ming-Syan Chen

Personalized virtual assistants powered by large language models (LLMs) on edge devices are attracting growing attention, with Retrieval-Augmented Generation (RAG) emerging as a key method for personalization by retrieving relevant profile data and generating tailored responses. However, deploying RAG on edge devices faces efficiency hurdles due to the rapid growth of profile data, such as user-LLM interactions and recent updates. While Computing-in-Memory (CiM) architectures mitigate this bottleneck by eliminating data movement between memory and processing units via in-situ operations, they are susceptible to environmental noise that can degrade retrieval precision. This poses a critical issue in dynamic, multi-domain edge-based scenarios (e.g., travel, medicine, and law) where both accuracy and adaptability are paramount. To address these challenges, we propose Task-Oriented Noise-resilient Embedding Learning (TONEL), a framework that improves noise robustness and domain adaptability for RAG in noisy edge environments. TONEL employs a noise-aware projection model to learn task-specific embeddings compatible with CiM hardware constraints, enabling accurate retrieval under noisy conditions. Extensive experiments conducted on personalization benchmarks demonstrate the effectiveness and practicality of our methods relative to strong baselines, especially in task-specific noisy scenarios.

SIFeb 15, 2025
Human-Centric Community Detection in Hybrid Metaverse Networks with Integrated AI Entities

Shih-Hsuan Chiu, Ya-Wen Teng, De-Nian Yang et al.

Community detection is a cornerstone problem in social network analysis (SNA), aimed at identifying cohesive communities with minimal external links. However, the rise of generative AI and Metaverse introduce complexities by creating hybrid human-AI social networks (denoted by HASNs), where traditional methods fall short, especially in human-centric settings. This paper introduces a novel community detection problem in HASNs (denoted by MetaCD), which seeks to enhance human connectivity within communities while reducing the presence of AI nodes. Effective processing of MetaCD poses challenges due to the delicate trade-off between excluding certain AI nodes and maintaining community structure. To address this, we propose CUSA, an innovative framework incorporating AI-aware clustering techniques that navigate this trade-off by selectively retaining AI nodes that contribute to community integrity. Furthermore, given the scarcity of real-world HASNs, we devise four strategies for synthesizing these networks under various hypothetical scenarios. Empirical evaluations on real social networks, reconfigured as HASNs, demonstrate the effectiveness and practicality of our approach compared to traditional non-deep learning and graph neural network (GNN)-based methods.

CLNov 5, 2021
Effective Cross-Utterance Language Modeling for Conversational Speech Recognition

Bi-Cheng Yan, Hsin-Wei Wang, Shih-Hsuan Chiu et al.

Conversational speech normally is embodied with loose syntactic structures at the utterance level but simultaneously exhibits topical coherence relations across consecutive utterances. Prior work has shown that capturing longer context information with a recurrent neural network or long short-term memory language model (LM) may suffer from the recent bias while excluding the long-range context. In order to capture the long-term semantic interactions among words and across utterances, we put forward disparate conversation history fusion methods for language modeling in automatic speech recognition (ASR) of conversational speech. Furthermore, a novel audio-fusion mechanism is introduced, which manages to fuse and utilize the acoustic embeddings of a current utterance and the semantic content of its corresponding conversation history in a cooperative way. To flesh out our ideas, we frame the ASR N-best hypothesis rescoring task as a prediction problem, leveraging BERT, an iconic pre-trained LM, as the ingredient vehicle to facilitate selection of the oracle hypothesis from a given N-best hypothesis list. Empirical experiments conducted on the AMI benchmark dataset seem to demonstrate the feasibility and efficacy of our methods in relation to some current top-of-line methods. The proposed methods not only achieve significant inference time reduction but also improve the ASR performance for conversational speech.

CLJun 13, 2021
Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition

Shih-Hsuan Chiu, Tien-Hong Lo, Fu-An Chao et al.

How to effectively incorporate cross-utterance information cues into a neural language model (LM) has emerged as one of the intriguing issues for automatic speech recognition (ASR). Existing research efforts on improving contextualization of an LM typically regard previous utterances as a sequence of additional input and may fail to capture complex global structural dependencies among these utterances. In view of this, we in this paper seek to represent the historical context information of an utterance as graph-structured data so as to distill cross-utterances, global word interaction relationships. To this end, we apply a graph convolutional network (GCN) on the resulting graph to obtain the corresponding GCN embeddings of historical words. GCN has recently found its versatile applications on social-network analysis, text summarization, and among others due mainly to its ability of effectively capturing rich relational information among elements. However, GCN remains largely underexplored in the context of ASR, especially for dealing with conversational speech. In addition, we frame ASR N-best reranking as a prediction problem, leveraging bidirectional encoder representations from transformers (BERT) as the vehicle to not only seize the local intrinsic word regularity patterns inherent in a candidate hypothesis but also incorporate the cross-utterance, historical word interaction cues distilled by GCN for promoting performance. Extensive experiments conducted on the AMI benchmark dataset seem to confirm the pragmatic utility of our methods, in relation to some current top-of-the-line methods.

CLApr 11, 2021
Innovative Bert-based Reranking Language Models for Speech Recognition

Shih-Hsuan Chiu, Berlin Chen

More recently, Bidirectional Encoder Representations from Transformers (BERT) was proposed and has achieved impressive success on many natural language processing (NLP) tasks such as question answering and language understanding, due mainly to its effective pre-training then fine-tuning paradigm as well as strong local contextual modeling ability. In view of the above, this paper presents a novel instantiation of the BERT-based contextualized language models (LMs) for use in reranking of N-best hypotheses produced by automatic speech recognition (ASR). To this end, we frame N-best hypothesis reranking with BERT as a prediction problem, which aims to predict the oracle hypothesis that has the lowest word error rate (WER) given the N-best hypotheses (denoted by PBERT). In particular, we also explore to capitalize on task-specific global topic information in an unsupervised manner to assist PBERT in N-best hypothesis reranking (denoted by TPBERT). Extensive experiments conducted on the AMI benchmark corpus demonstrate the effectiveness and feasibility of our methods in comparison to the conventional autoregressive models like the recurrent neural network (RNN) and a recently proposed method that employed BERT to compute pseudo-log-likelihood (PLL) scores for N-best hypothesis reranking.