Soung Chang Liew

CR
h-index46
17papers
439citations
Novelty59%
AI Score58

17 Papers

81.9ARMay 9Code
VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair

Haomin Qi, Yuyang Du, Lihao Zhang et al.

Large language models (LLMs) have demonstrated immense potential in computer-aided design (CAD), particularly for automated debugging and verification within electronic design automation (EDA) tools. However, Design for Testability (DFT) remains a relatively underexplored area. This paper presents VeriRAG, the first LLM-assisted DFT-EDA framework. VeriRAG leverages a Retrieval-Augmented Generation (RAG) approach to enable LLM to revise code to ensure DFT compliance. VeriRAG integrates (1) an autoencoder-based similarity measurement model for precise retrieval of reference RTL designs for the LLM, and (2) an iterative code revision pipeline that allows the LLM to ensure DFT compliance while maintaining synthesizability. To support VeriRAG, we introduce VeriDFT, a Verilog-based DFT dataset curated for DFT-aware RTL repairs. VeriRAG retrieves structurally similar RTL designs from VeriDFT, each paired with a rigorously validated correction, as references for code repair. With VeriRAG and VeriDFT, we achieve fully automated DFT correction -- resulting in a 7.72-fold improvement in successful repair rate compared to the zero-shot baseline (Fig. 5 in Section V). Ablation studies further confirm the contribution of each component of the VeriRAG framework. We open-source our data, models, and scripts at https://github.com/HarminChee/VeriRAG.

82.5LGApr 18Code
Learning to Trade Like an Expert: Cognitive Fine-Tuning for Stable Financial Reasoning in Language Models

Yuchen Pan, Soung Chang Liew

Recent deployments of large language models (LLMs) as autonomous trading agents raise questions about whether financial decision-making competence generalizes beyond specific market patterns and how it should be trained and evaluated in noisy markets lacking ground truth. We propose a structured framework for training and evaluating such models. Central to our approach is a curated, multiple-choice question (MCQ) dataset derived from classic textbooks and historical markets, verified by an AI committee, enriched with structured reasoning traces, and augmented to reduce shortcut learning. To evaluate whether performance on isolated MCQs generalizes to real-world trading, we introduce a two-stage protocol combining test-set evaluation with an MCQ-based chronological trading simulation. Extensive evaluations across market regimes provide statistically robust evidence that open models trained with our framework exhibit competitive, risk-aware behavior over time, outperform open-source baselines, and approach frontier-model performance at smaller scale. We release the dataset and evaluation framework to support further research.

17.2ITMar 27
CL-SEC: Cross-Layer Semantic Error Correction Empowered by Language Models

Yirun Wang, Yuyang Du, Soung Chang Liew et al.

Achieving reliable communication has long been a fundamental challenge in networked systems. Semantic Error Correction (SEC) leverages the semantic understanding capabilities of language models (LMs) to perform application-layer error correction, complementing conventional channel decoding. While promising, existing SEC approaches rely solely on context captured by LMs at the application layer, ignoring the rich information available at the physical layer. To address this limitation, this paper introduces Cross-Layer SEC (CL-SEC), an LM-empowered error correction framework that integrates cross-layer information from both the physical and application layers to jointly correct corrupted words in text communication. Using a Bayesian combination in product form tailored to this framework, CL-SEC achieves significantly improved performance over methods that process information in isolated layers. CL-SEC shows substantial gains across multiple error-correction metrics, including bit-error rate, word-error rate, and semantic fidelity scores. Importantly, unlike most semantic communication systems that focus solely on recovering the semantic meaning of transmitted messages, CL-SEC aims to reconstruct the original transmitted message verbatim, leveraging the semantic understanding capabilities of LMs for precise reconstruction.

33.6CRApr 12
BioZero: Privacy-Preserving and Publicly Verifiable On-Chain Biometric Authentication via Homomorphic Commitments and Zero-Knowledge Proofs

Zibin Lin, Taotao Wang, Junhao Lai et al.

Decentralized identity systems promise user-controlled identifiers and cross-domain verification without a shared identity provider, yet authentication still reduces to possession of keys or credentials once secrets are leaked, reused, or replayed. We present BioZero, a privacy-preserving biometric authentication protocol for decentralized identity that binds an enrolled identity to a biometric witness without revealing biometric templates, while enabling publicly verifiable on-chain decisions. BioZero combines Pedersen commitment-homomorphic computation, consistency spot-checks, and Groth16 zero-knowledge proofs to achieve identity-bound authentication with succinct on-chain verification. We analyze acceptance soundness, freshness, template privacy, and non-malleability under an open decentralized threat model including replay, timing, brute-force, oracle, and forgery attacks. On an Ethereum testbed, BioZero achieves up to 67.8x lower network-adjusted total authentication latency and up to 266.4x faster client-side proving than a zk-SNARK-only baseline. Verification stays in the millisecond range (28.8-41.2 ms vs. 35.4-77.6 ms). With lambda=1 spot-checking, gas grows from 336,778 to 954,066 as N increases from 2 to 128, becomes lower than the baseline from N>=16, and is 2.59x lower at N=128. LFW experiments on 128D and 512D models show accuracy loss below 1% across practical quantization ranges. These results indicate that BioZero is a practical authentication layer for decentralized biometric identity systems.

CLSep 21, 2024
Rephrase and Contrast: Fine-Tuning Language Models for Enhanced Understanding of Communication and Computer Networks

Liujianfu Wang, Yuyang Du, Jingqi Lin et al.

Large language models (LLMs) are being widely researched across various disciplines, with significant recent efforts focusing on adapting LLMs for understanding of how communication networks operate. However, over-reliance on prompting techniques hinders the full exploitation of the generalization ability of these models, and the lack of efficient fine-tuning methods prevents the full realization of lightweight LLMs' potential. This paper addresses these challenges by introducing our Rephrase and Contrast (RaC) framework, an efficient fine-tuning framework. RaC enhances LLMs' comprehension and critical thinking abilities by incorporating question reformulation and contrastive analysis of correct and incorrect answers during the fine-tuning process. Experimental results demonstrate a 63.73% accuracy improvement over the foundational model when tested on a comprehensive networking problem set. Moreover, to efficiently construct the dataset for RaC fine-tuning, we develop a GPT-assisted data mining method for generating high-quality question-answer (QA) pairs; furthermore, we introduce ChoiceBoost, a data augmentation technique that expands dataset size while reducing answer-order bias. Apart from these technical innovations, we contribute to the networking community by open-sourcing valuable research resources, including: 1) the fine-tuned networking model referred to as RaC-Net, 2) the training dataset used for fine-tuning the model, 3) three testing problem sets of different difficulties to serve as benchmarks for future research, and 4) code associated with the above resources.

LGAug 19, 2024
GINO-Q: Learning an Asymptotically Optimal Index Policy for Restless Multi-armed Bandits

Gongpu Chen, Soung Chang Liew, Deniz Gunduz

The restless multi-armed bandit (RMAB) framework is a popular model with applications across a wide variety of fields. However, its solution is hindered by the exponentially growing state space (with respect to the number of arms) and the combinatorial action space, making traditional reinforcement learning methods infeasible for large-scale instances. In this paper, we propose GINO-Q, a three-timescale stochastic approximation algorithm designed to learn an asymptotically optimal index policy for RMABs. GINO-Q mitigates the curse of dimensionality by decomposing the RMAB into a series of subproblems, each with the same dimension as a single arm, ensuring that complexity increases linearly with the number of arms. Unlike recently developed Whittle-index-based algorithms, GINO-Q does not require RMABs to be indexable, enhancing its flexibility and applicability. Our experimental results demonstrate that GINO-Q consistently learns near-optimal policies, even for non-indexable RMABs where Whittle-index-based algorithms perform poorly, and it converges significantly faster than existing baselines.

ITDec 14, 2023
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution

Hongwei Cui, Yuyang Du, Qun Yang et al.

Task-oriented communications are an important element in future intelligent IoT systems. Existing IoT systems, however, are limited in their capacity to handle complex tasks, particularly in their interactions with humans to accomplish these tasks. In this paper, we present LLMind, an LLM-based task-oriented AI agent framework that enables effective collaboration among IoT devices, with humans communicating high-level verbal instructions, to perform complex tasks. Inspired by the functional specialization theory of the brain, our framework integrates an LLM with domain-specific AI modules, enhancing its capabilities. Complex tasks, which may involve collaborations of multiple domain-specific AI modules and IoT devices, are executed through a control script generated by the LLM using a Language-Code transformation approach, which first converts language descriptions to an intermediate finite-state machine (FSM) before final precise transformation to code. Furthermore, the framework incorporates a novel experience accumulation mechanism to enhance response speed and effectiveness, allowing the framework to evolve and become progressively sophisticated through continuing user and machine interactions.

77.1NIApr 9
Real-Time Cross-Layer Semantic Error Correction Using Language Models and Software-Defined Radio

Yuchen Pan, Yuyang Du, Yirun Wang et al.

As Language Models (LMs) advance, Semantic Error Correction (SEC) has emerged as a promising approach for reliable network designs. Yet existing methods prioritize intent over accuracy, falling short of verbatim recovery. Our recent work, Cross-Layer SEC (CL-SEC), addressed this by fusing physical-layer Log-Likelihood Ratios (LLRs) with semantic context, but its real-time feasibility remained unvalidated. This paper demonstrates CL-SEC on a live Software-Defined Radio (SDR) testbed, resolving implementation barriers with: 1) an SDR middleware enabling real-time LLR extraction from FPGA hardware, and 2) a generalized inference interface supporting modern encoder-decoder LMs. Real-world experiments confirm that the cross-layer fusion significantly outperforms either source alone.

77.8CRApr 1
LightGuard: Transparent WiFi Security via Physical-Layer LiFi Key Bootstrapping

Shiqi Xu, Yuyang Du, Mingyue Zhang et al.

WiFi is inherently vulnerable to eavesdropping because RF signals may penetrate many physical boundaries, such as walls and floors. LiFi, by contrast, is an optical method confined to line-of-sight and blocked by opaque surfaces. We present LightGuard, a dual-link architecture built on this insight: cryptographic key establishment can be offloaded from WiFi to a physically confined LiFi channel to mitigate the risk of key exposure over RF. LightGuard derives session keys over a LiFi link and installs them on the WiFi interface, ensuring cryptographic material never traverses the open RF medium. A prototype with off-the-shelf WiFi NICs and our LiFi transceiver frontend validates the design.

LGMay 22, 2021
Denoising Noisy Neural Networks: A Bayesian Approach with Compensation

Yulin Shao, Soung Chang Liew, Deniz Gunduz

Deep neural networks (DNNs) with noisy weights, which we refer to as noisy neural networks (NoisyNNs), arise from the training and inference of DNNs in the presence of noise. NoisyNNs emerge in many new applications, including the wireless transmission of DNNs, the efficient deployment or storage of DNNs in analog devices, and the truncation or quantization of DNN weights. This paper studies a fundamental problem of NoisyNNs: how to reconstruct the DNN weights from their noisy manifestations. While all prior works relied on the maximum likelihood (ML) estimation, this paper puts forth a denoising approach to reconstruct DNNs with the aim of maximizing the inference accuracy of the reconstructed models. The superiority of our denoiser is rigorously proven in two small-scale problems, wherein we consider a quadratic neural network function and a shallow feedforward neural network, respectively. When applied to advanced learning tasks with modern DNN architectures, our denoiser exhibits significantly better performance than the ML estimator. Consider the average test accuracy of the denoised DNN model versus the weight variance to noise power ratio (WNR) performance. When denoising a noisy ResNet34 model arising from noisy inference, our denoiser outperforms ML estimation by up to 4.1 dB to achieve a test accuracy of 60%.When denoising a noisy ResNet18 model arising from noisy training, our denoiser outperforms ML estimation by 13.4 dB and 8.3 dB to achieve test accuracies of 60% and 80%, respectively.

ITFeb 26, 2021
Federated Edge Learning with Misaligned Over-The-Air Computation

Yulin Shao, Deniz Gunduz, Soung Chang Liew

Over-the-air computation (OAC) is a promising technique to realize fast model aggregation in the uplink of federated edge learning. OAC, however, hinges on accurate channel-gain precoding and strict synchronization among the edge devices, which are challenging in practice. As such, how to design the maximum likelihood (ML) estimator in the presence of residual channel-gain mismatch and asynchronies is an open problem. To fill this gap, this paper formulates the problem of misaligned OAC for federated edge learning and puts forth a whitened matched filtering and sampling scheme to obtain oversampled, but independent, samples from the misaligned and overlapped signals. Given the whitened samples, a sum-product ML estimator and an aligned-sample estimator are devised to estimate the arithmetic sum of the transmitted symbols. In particular, the computational complexity of our sum-product ML estimator is linear in the packet length and hence is significantly lower than the conventional ML estimator. Extensive simulations on the test accuracy versus the average received energy per symbol to noise power spectral density ratio (EsN0) yield two main results: 1) In the low EsN0 regime, the aligned-sample estimator can achieve superior test accuracy provided that the phase misalignment is non-severe. In contrast, the ML estimator does not work well due to the error propagation and noise enhancement in the estimation process. 2) In the high EsN0 regime, the ML estimator attains the optimal learning performance regardless of the severity of phase misalignment. On the other hand, the aligned-sample estimator suffers from a test-accuracy loss caused by phase misalignment.

NIJan 2, 2021
Speeding up Block Propagation in Blockchain Network: Uncoded and Coded Designs

Lihao Zhang, Taotao Wang, Soung Chang Liew

We design and validate new block propagation protocols for the peer-to-peer (P2P) network of the Bitcoin blockchain. Despite its strong protection for security and privacy, the current Bitcoin blockchain can only support a low number of transactions per second (TPS). In this work, we redesign the current Bitcoin's networking protocol to increase TPS without changing vital components in its consensus-building protocol. In particular, we improve the compact-block relaying protocol to enable the propagation of blocks containing a massive number of transactions without inducing extra propagation latencies. Our improvements consist of (i) replacing the existing store-and-forward compact-block relaying scheme with a cut-through compact-block relaying scheme; (ii) exploiting rateless erasure codes for P2P networks to increase block-propagation efficiency. Since our protocols only need to rework the current Bitcoin's networking protocol and does not modify the data structures and crypto-functional components, they can be seamlessly incorporated into the existing Bitcoin blockchain. To validate our designs, we perform analysis on our protocols and implement a Bitcoin network simulator on NS3 to run different block propagation protocols. The analysis and experimental results confirm that our new block propagation protocols could increase the TPS of the Bitcoin blockchain by 100x without compromising security and consensus-building.

NIOct 3, 2020
Ethna: Analyzing the Underlying Peer-to-Peer Network of the Ethereum Blockchain

Taotao Wang, Chonghe Zhao, Qing Yang et al.

The peer-to-peer (P2P) network of blockchain used to transport its transactions and blocks has a high impact on the efficiency and security of the system. The P2P network topologies of popular blockchains such as Bitcoin and Ethereum, therefore, deserve our highest attention. The current Ethereum blockchain explorers (e.g., Etherscan) focus on the tracking of block and transaction records but omit the characterization of the underlying P2P network. This work presents the Ethereum Network Analyzer (Ethna), a tool that probes and analyzes the P2P network of the Ethereum blockchain. Unlike Bitcoin that adopts an unstructured P2P network, Ethereum relies on the Kademlia DHT to manage its P2P network. Therefore, the existing analytical methods for Bitcoin-like P2P networks are not applicable to Ethereum. Ethna implements a novel method that accurately measures the degrees of Ethereum nodes. Furthermore, it incorporates an algorithm that derives the latency metrics of message propagation in the Ethereum P2P network. We ran Ethna on the Ethereum Mainnet and conducted extensive experiments to analyze the topological features of its P2P network. Our analysis shows that the Ethereum P2P network possesses a certain effect of small-world networks, and the degrees of nodes follow a power-law distribution that characterizes scale-free networks.

CRNov 29, 2019
When Blockchain Meets AI: Optimal Mining Strategy Achieved By Machine Learning

Taotao Wang, Soung Chang Liew, Shengli Zhang

This work applies reinforcement learning (RL) from the AI machine learning field to derive an optimal Bitcoin-like blockchain mining strategy without knowing the details of the blockchain network model. Previously, the most profitable mining strategy was believed to be honest mining encoded in the default blockchain protocol. It was shown later that it is possible to gain more mining rewards by deviating from honest mining. In particular, the mining problem can be formulated as a Markov Decision Process (MDP) which can be solved to give the optimal mining strategy. However, solving the mining MDP requires knowing the values of various parameters that characterize the blockchain network model. In real blockchain networks, these parameter values are not easy to obtain and may change over time. This hinders the use of the MDP model-based solution. In this work, we employ RL to dynamically learn a mining strategy with performance approaching that of the optimal mining strategy by observing and interacting with the network. Since the mining MDP problem has a non-linear objective function (rather than linear functions of standard MDP problems), we design a new multi-dimensional RL algorithm to solve the problem. Experimental results indicate that, without knowing the parameter values of the mining MDP model, our multi-dimensional RL mining algorithm can still achieve the optimal performance over time-varying blockchain networks.

CRNov 3, 2019
Game-Theoretical Analysis of Mining Strategy for Bitcoin-NG Blockchain Protocol

Taotao Wang, Xiaoqian Bai, Hao Wang et al.

Bitcoin-NG, a scalable blockchain protocol, divides each block into a key block and many micro blocks to effectively improve the transaction processing capacity. Bitcoin-NG has a special incentive mechanism (i.e. splitting transaction fees to the current and the next leader) to maintain its security. However, this design of the incentive mechanism ignores the joint effect of transaction fees, mint coins and mining duration lengths on the expected mining reward. In this paper, we identify the advanced mining attack that deliberately ignores micro blocks to enlarge the mining duration length to increase the likelihood of winning the mining race. We first show that an advanced mining attacker can maximize its expected reward by optimizing its mining duration length. We then formulate a game-theoretical model in which multiple mining players perform advanced mining to compete with each other. We analyze the Nash equilibrium for the mining game. Our analytical and simulation results indicate that all mining players in the mining game converge to having advanced mining at the equilibrium and have no incentives for deviating from the equilibrium; the transaction processing capability of the Bitcoin-NG network at the equilibrium is decreased by advanced mining. Therefore, we conclude that the Bitcoin-NG blockchain protocol is vulnerable to advanced mining attack. We discuss how to reduce the negative impact of advanced mining for Bitcoin-NG.

CROct 1, 2019
PubChain: A Decentralized Open-Access Publication Platform with Participants Incentivized by Blockchain Technology

Taotao Wang, Soung Chang Liew, Shengli Zhang

We design and implement Publication Chain (PubChain), a decentralized open-access publication platform built on decentralized and distributed technologies of blockchain and IPFS peer-to-peer file sharing systems. The existing publication platforms have some severe drawbacks. First, instead of promoting widespread knowledge sharing, access to publications on the platforms owned by publishers is often on a fee basis. This drawback of pay wall prevents researchers from "standing on the shoulders of giants". Moreover, the peer review process on most all existing publication platforms (including both open-access and publisher platforms) is prone to be ineffective, since there is no proper incentive to reviewers for performing high-qualified reviews. PubChain is an alternative platform to the existing publication venues aiming to address their drawbacks. No central third-party owns the contents (i.e., papers and reviews) of PubChain. Exploiting blockchain technology, we devise an elaborate incentive scheme on PubChain to incentivize key stakeholders (i.e., authors, readers and reviewers) to participate publication activities on PubChain in a substantive manner by earning credits and rewards through self-motivated interactions. We have performed simulations to investigate the robustness of our proposed incentive scheme against fraudulent publications and reviews. We also have implemented a prototype of PubChain to demonstrate its key concepts.

LGSep 26, 2018
AlphaSeq: Sequence Discovery with Deep Reinforcement Learning

Yulin Shao, Soung Chang Liew, Taotao Wang

Sequences play an important role in many applications and systems. Discovering sequences with desired properties has long been an interesting intellectual pursuit. This paper puts forth a new paradigm, AlphaSeq, to discover desired sequences algorithmically using deep reinforcement learning (DRL) techniques. AlphaSeq treats the sequence discovery problem as an episodic symbol-filling game, in which a player fills symbols in the vacant positions of a sequence set sequentially during an episode of the game. Each episode ends with a completely-filled sequence set, upon which a reward is given based on the desirability of the sequence set. AlphaSeq models the game as a Markov Decision Process (MDP), and adapts the DRL framework of AlphaGo to solve the MDP. Sequences discovered improve progressively as AlphaSeq, starting as a novice, learns to become an expert game player through many episodes of game playing. Compared with traditional sequence construction by mathematical tools, AlphaSeq is particularly suitable for problems with complex objectives intractable to mathematical analysis. We demonstrate the searching capabilities of AlphaSeq in two applications: 1) AlphaSeq successfully rediscovers a set of ideal complementary codes that can zero-force all potential interferences in multi-carrier CDMA systems. 2) AlphaSeq discovers new sequences that triple the signal-to-interference ratio -- benchmarked against the well-known Legendre sequence -- of a mismatched filter estimator in pulse compression radar systems.