Toshiaki Koike-Akino

h-index61

58papers

893citations

Novelty52%

AI Score56

Ranked #18,938 of 205,806 authors (top 9%)#4,252 in LG (top 10%)

58 Papers

CVSep 30, 2023

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit et al.

Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a predefined or pretrained model, which is not explicitly trained on the generative task, to guide the generative process (e.g., using language). However, such guidance is typically useful only towards synthesizing high-level semantics rather than editing fine-grained details as in image-to-image translation tasks. To this end, and capitalizing on the powerful fine-grained generative control offered by the recent diffusion-based generative models, we introduce Steered Diffusion, a generalized framework for photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model at inference time via designing a loss using a pre-trained inverse model that characterizes the conditional task. This loss modulates the sampling trajectory of the diffusion process. Our framework allows for easy incorporation of multiple conditions during inference. We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution. Our results demonstrate clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models while adding negligible additional computational cost.

HCSep 20, 2022

Adversarial Bi-Regressor Network for Domain Adaptive Regression

Haifeng Xia, Pu Perry Wang, Toshiaki Koike-Akino et al.

Domain adaptation (DA) aims to transfer the knowledge of a well-labeled source domain to facilitate unlabeled target learning. When turning to specific tasks such as indoor (Wi-Fi) localization, it is essential to learn a cross-domain regressor to mitigate the domain shift. This paper proposes a novel method Adversarial Bi-Regressor Network (ABRNet) to seek more effective cross-domain regression model. Specifically, a discrepant bi-regressor architecture is developed to maximize the difference of bi-regressor to discover uncertain target instances far from the source distribution, and then an adversarial training mechanism is adopted between feature extractor and dual regressors to produce domain-invariant representations. To further bridge the large domain gap, a domain-specific augmentation module is designed to synthesize two source-similar and target-similar intermediate domains to gradually eliminate the original domain mismatch. The empirical studies on two cross-domain regressive benchmarks illustrate the power of our method on solving the domain adaptive regression (DAR) problem.

LGMay 17, 2022

AutoQML: Automated Quantum Machine Learning for Wi-Fi Integrated Sensing and Communications

Toshiaki Koike-Akino, Pu Wang, Ye Wang

Commercial Wi-Fi devices can be used for integrated sensing and communications (ISAC) to jointly exchange data and monitor indoor environment. In this paper, we investigate a proof-of-concept approach using automated quantum machine learning (AutoQML) framework called AutoAnsatz to recognize human gesture. We address how to efficiently design quantum circuits to configure quantum neural networks (QNN). The effectiveness of AutoQML is validated by an in-house experiment for human pose recognition, achieving state-of-the-art performance greater than 80% accuracy for a limited data size with a significantly small number of trainable parameters.

LGMay 17, 2022

Quantum Transfer Learning for Wi-Fi Sensing

Toshiaki Koike-Akino, Pu Wang, Ye Wang

Beyond data communications, commercial-off-the-shelf Wi-Fi devices can be used to monitor human activities, track device locomotion, and sense the ambient environment. In particular, spatial beam attributes that are inherently available in the 60-GHz IEEE 802.11ad/ay standards have shown to be effective in terms of overhead and channel measurement granularity for these indoor sensing tasks. In this paper, we investigate transfer learning to mitigate domain shift in human monitoring tasks when Wi-Fi settings and environments change over time. As a proof-of-concept study, we consider quantum neural networks (QNN) as well as classical deep neural networks (DNN) for the future quantum-ready society. The effectiveness of both DNN and QNN is validated by an in-house experiment for human pose recognition, achieving greater than 90% accuracy with a limited data size.

QUANT-PHSep 29, 2022

quEEGNet: Quantum AI for Biosignal Processing

Toshiaki Koike-Akino, Ye Wang

In this paper, we introduce an emerging quantum machine learning (QML) framework to assist classical deep learning methods for biosignal processing applications. Specifically, we propose a hybrid quantum-classical neural network model that integrates a variational quantum circuit (VQC) into a deep neural network (DNN) for electroencephalogram (EEG), electromyogram (EMG), and electrocorticogram (ECoG) analysis. We demonstrate that the proposed quantum neural network (QNN) achieves state-of-the-art performance while the number of trainable parameters is kept small for VQC.

SPMay 17, 2022

Learning to Learn Quantum Turbo Detection

Bryan Liu, Toshiaki Koike-Akino, Ye Wang et al.

This paper investigates a turbo receiver employing a variational quantum circuit (VQC). The VQC is configured with an ansatz of the quantum approximate optimization algorithm (QAOA). We propose a 'learning to learn' (L2L) framework to optimize the turbo VQC decoder such that high fidelity soft-decision output is generated. Besides demonstrating the proposed algorithm's computational complexity, we show that the L2L VQC turbo decoder can achieve an excellent performance close to the optimal maximum-likelihood performance in a multiple-input multiple-output system.

LGAug 29, 2024

Analyzing Inference Privacy Risks Through Gradients in Machine Learning

Zhuohang Li, Andrew Lowy, Jing Liu et al.

In distributed learning settings, models are iteratively updated with shared gradients computed from potentially sensitive user data. While previous work has studied various privacy risks of sharing gradients, our paper aims to provide a systematic approach to analyze private information leakage from gradients. We present a unified game-based framework that encompasses a broad range of attacks including attribute, property, distributional, and user disclosures. We investigate how different uncertainties of the adversary affect their inferential power via extensive experiments on five datasets across various data modalities. Our results demonstrate the inefficacy of solely relying on data aggregation to achieve privacy against inference attacks in distributed learning. We further evaluate five types of defenses, namely, gradient pruning, signed gradient descent, adversarial perturbations, variational information bottleneck, and differential privacy, under both static and adaptive adversary settings. We provide an information-theoretic view for analyzing the effectiveness of these defenses against inference from gradients. Finally, we introduce a method for auditing attribute inference privacy, improving the empirical estimation of worst-case privacy through crafting adversarial canary records.

QUANT-PHJul 18, 2022

Quantum Feature Extraction for THz Multi-Layer Imaging

Toshiaki Koike-Akino, Pu Wang, Genki Yamashita et al.

A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding. We show a proof-of-concept demonstration of an emerging quantum machine learning (QML) framework to deal with depth variation, shadow effect, and double-sided content recognition, through an experimental validation.

LGSep 11, 2024

Exploring User-level Gradient Inversion with a Diffusion Prior

Zhuohang Li, Andrew Lowy, Jing Liu et al.

We explore user-level gradient inversion as a new attack surface in distributed learning. We first investigate existing attacks on their ability to make inferences about private information beyond training data reconstruction. Motivated by the low reconstruction quality of existing methods, we propose a novel gradient inversion attack that applies a denoising diffusion model as a strong image prior in order to enhance recovery in the large batch setting. Unlike traditional attacks, which aim to reconstruct individual samples and suffer at large batch and image sizes, our approach instead aims to recover a representative image that captures the sensitive shared semantic information corresponding to the underlying user. Our experiments with face images demonstrate the ability of our methods to recover realistic facial images along with private user attributes.

LGAug 30, 2024

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino et al.

Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.

SPMay 17, 2022

Variational Quantum Compressed Sensing for Joint User and Channel State Acquisition in Grant-Free Device Access Systems

Bryan Liu, Toshiaki Koike-Akino, Ye Wang et al.

This paper introduces a new quantum computing framework integrated with a two-step compressed sensing technique, applied to a joint channel estimation and user identification problem. We propose a variational quantum circuit (VQC) design as a new denoising solution. For a practical grant-free communications system having correlated device activities, variational quantum parameters for Pauli rotation gates in the proposed VQC system are optimized to facilitate to the non-linear estimation. Numerical results show that the VQC method can outperform modern compressed sensing techniques using an element-wise denoiser.

LGJul 16, 2024

Variational Randomized Smoothing for Sample-Wise Adversarial Robustness

Ryo Hase, Ye Wang, Toshiaki Koike-Akino et al.

Randomized smoothing is a defensive technique to achieve enhanced robustness against adversarial examples which are small input perturbations that degrade the performance of neural network models. Conventional randomized smoothing adds random noise with a fixed noise level for every input sample to smooth out adversarial perturbations. This paper proposes a new variational framework that uses a per-sample noise level suitable for each input by introducing a noise level selector. Our experimental results demonstrate enhancement of empirical robustness against adversarial attacks. We also provide and analyze the certified robustness for our sample-wise smoothing method.

LGOct 12, 2023

Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino et al.

Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects. We reduce this performance decrease using new regularization techniques during model training. We propose several graphical models to describe an EEG classification task. From each model, we identify statistical relationships that should hold true in an idealized training scenario (with infinite data and a globally-optimal model) but that may not hold in practice. We design regularization penalties to enforce these relationships in two stages. First, we identify suitable proxy quantities (divergences such as Mutual Information and Wasserstein-1) that can be used to measure statistical independence and dependence relationships. Second, we provide algorithms to efficiently estimate these quantities during training using secondary neural network models. We conduct extensive computational experiments using a large benchmark EEG dataset, comparing our proposed techniques with a baseline method that uses an adversarial classifier. We find our proposed methods significantly increase balanced accuracy on test subjects and decrease overfitting. The proposed methods exhibit a larger benefit over a greater range of hyperparameters than the baseline method, with only a small computational cost at training time. These benefits are largest when used for a fixed training period, though there is still a significant benefit for a subset of hyperparameters when our techniques are used in conjunction with early stopping regularization.

CVJul 15, 2024

GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM

Keshav Bimbraw, Ye Wang, Jing Liu et al.

Large vision-language models (LVLMs), such as the Generative Pre-trained Transformer 4-omni (GPT-4o), are emerging multi-modal foundation models which have great potential as powerful artificial-intelligence (AI) assistance tools for a myriad of applications, including healthcare, industrial, and academic sectors. Although such foundation models perform well in a wide range of general tasks, their capability without fine-tuning is often limited in specialized tasks. However, full fine-tuning of large foundation models is challenging due to enormous computation/memory/dataset requirements. We show that GPT-4o can decode hand gestures from forearm ultrasound data even with no fine-tuning, and improves with few-shot, in-context learning.

HCJul 15, 2024

Random Channel Ablation for Robust Hand Gesture Classification with Multimodal Biosignals

Keshav Bimbraw, Jing Liu, Ye Wang et al.

Biosignal-based hand gesture classification is an important component of effective human-machine interaction. For multimodal biosignal sensing, the modalities often face data loss due to missing channels in the data which can adversely affect the gesture classification performance. To make the classifiers robust to missing channels in the data, this paper proposes using Random Channel Ablation (RChA) during the training process. Ultrasound and force myography (FMG) data were acquired from the forearm for 12 hand gestures over 2 subjects. The resulting multimodal data had 16 total channels, 8 for each modality. The proposed method was applied to convolutional neural network architecture, and compared with baseline, imputation, and oracle methods. Using 5-fold cross-validation for the two subjects, on average, 12.2% and 24.5% improvement was observed for gesture classification with up to 4 and 8 missing channels respectively compared to the baseline. Notably, the proposed method is also robust to an increase in the number of missing channels compared to other methods. These results show the efficacy of using random channel ablation to improve classifier robustness for multimodal and multi-channel biosignal-based hand gesture classification.

81.8LGMar 16

Amplification Effects in Test-Time Reinforcement Learning: Safety and Reasoning Vulnerabilities

Vanshaj Khattar, Md Rafi ur Rashid, Moumita Choudhury et al.

Test-time training (TTT) has recently emerged as a promising method to improve the reasoning abilities of large language models (LLMs), in which the model directly learns from test data without access to labels. However, this reliance on test data also makes TTT methods vulnerable to harmful prompt injections. In this paper, we investigate safety vulnerabilities of TTT methods, where we study a representative self-consistency-based test-time learning method: test-time reinforcement learning (TTRL), a recent TTT method that improves LLM reasoning by rewarding self-consistency using majority vote as a reward signal. We show that harmful prompt injection during TTRL amplifies the model's existing behaviors, i.e., safety amplification when the base model is relatively safe, and harmfulness amplification when it is vulnerable to the injected data. In both cases, there is a decline in reasoning ability, which we refer to as the reasoning tax. We also show that TTT methods such as TTRL can be exploited adversarially using specially designed "HarmInject" prompts to force the model to answer jailbreak and reasoning queries together, resulting in stronger harmfulness amplification. Overall, our results highlight that TTT methods that enhance LLM reasoning by promoting self-consistency can lead to amplification behaviors and reasoning degradation, highlighting the need for safer TTT methods.

CLFeb 26, 2025Code

Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in Product QA Agents

Ashley Lewis, Michael White, Jing Liu et al.

The deployment of Large Language Models (LLMs) in customer support is constrained by hallucination (generating false information) and the high cost of proprietary models. To address these challenges, we propose a retrieval-augmented question-answering (QA) pipeline and explore how to balance human input and automation. Using a dataset of questions about a Samsung Smart TV user manual, we demonstrate that synthetic data generated by LLMs outperforms crowdsourced data in reducing hallucination in finetuned models. We also compare self-training (fine-tuning models on their own outputs) and knowledge distillation (fine-tuning on stronger models' outputs, e.g., GPT-4o), and find that self-training achieves comparable hallucination reduction. We conjecture that this surprising finding can be attributed to increased exposure bias issues in the knowledge distillation case and support this conjecture with post hoc analysis. We also improve robustness to unanswerable questions and retrieval failures with contextualized "I don't know" responses. These findings show that scalable, cost-efficient QA systems can be built using synthetic data and self-training with open-source models, reducing reliance on proprietary tools or costly human annotations.

67.9LGMay 13

Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

Ye Wang, Jing Liu, Toshiaki Koike-Akino

Inference-time alignment techniques offer a lightweight alternative or complement to costly reinforcement learning, while enabling continual adaptation as alignment objectives and reward targets evolve. Existing theoretical analyses justify these methods as approximations to sampling from distributions optimally tilted toward a given reward model. We extend these techniques by introducing reference-model temperature adjustment, which leads to further generalization of inference-time alignment to ensembles of generative reward models combined as a sharpened logarithmic opinion pool (SLOP). To mitigate reward hacking, we propose an algorithm for calibrating SLOP weight parameters and experimentally demonstrate that it improves robustness while preserving alignment performance.

67.0LGMar 16

Directional Embedding Smoothing for Robust Vision Language Models

Ye Wang, Jing Liu, Toshiaki Koike-Akino

The safety and reliability of vision-language models (VLMs) are a crucial part of deploying trustworthy agentic AI systems. However, VLMs remain vulnerable to jailbreaking attacks that undermine their safety alignment to yield harmful outputs. In this work, we extend the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense to VLMs and evaluate its performance against the JailBreakV-28K benchmark of multi-modal jailbreaking attacks. We find that RESTA is effective in reducing attack success rate over this diverse corpus of attacks, in particular, when employing directional embedding noise, where the injected noise is aligned with the original token embedding vectors. Our results demonstrate that RESTA can contribute to securing VLMs within agentic systems, as a lightweight, inference-time defense layer of an overall security framework.

90.1LGMar 11

TTQ: Activation-Aware Test-Time Quantization to Accelerate LLM Inference On The Fly

Toshiaki Koike-Akino, Jing Liu, Ye Wang

To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these methods highly rely on calibration data, domain shift issues may arise for unseen downstream tasks. We propose a test-time quantization (TTQ) framework which compresses large models on the fly at inference time to resolve this issue. With an efficient online calibration, instant activation-aware quantization can adapt every prompt regardless of the downstream tasks, yet achieving inference speedup. Several experiments demonstrate that TTQ can improve the quantization performance over state-of-the-art baselines.

ROFeb 26

Embedding Morphology into Transformers for Cross-Robot Policy Learning

Kei Suzuki, Jing Liu, Ye Wang et al.

Cross-robot policy learning -- training a single policy to perform well across multiple embodiments -- remains a central challenge in robot learning. Transformer-based policies, such as vision-language-action (VLA) models, are typically embodiment-agnostic and must infer kinematic structure purely from observations, which can reduce robustness across embodiments and even limit performance within a single embodiment. We propose an embodiment-aware transformer policy that injects morphology via three mechanisms: (1) kinematic tokens that factorize actions across joints and compress time through per-joint temporal chunking; (2) a topology-aware attention bias that encodes kinematic topology as an inductive bias in self-attention, encouraging message passing along kinematic edges; and (3) joint-attribute conditioning that augments topology with per-joint descriptors to capture semantics beyond connectivity. Across a range of embodiments, this structured integration consistently improves performance over a vanilla pi0.5 VLA baseline, indicating improved robustness both within an embodiment and across embodiments.

CVApr 25, 2024

TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models

Haomiao Ni, Bernhard Egger, Suhas Lohit et al.

Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from a given image (e.g., a woman's photo) and a text description (e.g., "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets and specific model designs for text and image conditioning. In this paper, we propose TI2V-Zero, a zero-shot, tuning-free method that empowers a pretrained text-to-video (T2V) diffusion model to be conditioned on a provided image, enabling TI2V generation without any optimization, fine-tuning, or introducing external modules. Our approach leverages a pretrained T2V diffusion foundation model as the generative prior. To guide video generation with the additional image input, we propose a "repeat-and-slide" strategy that modulates the reverse denoising process, allowing the frozen diffusion model to synthesize a video frame-by-frame starting from the provided image. To ensure temporal continuity, we employ a DDPM inversion strategy to initialize Gaussian noise for each newly synthesized frame and a resampling technique to help preserve visual details. We conduct comprehensive experiments on both domain-specific and open-domain datasets, where TI2V-Zero consistently outperforms a recent open-domain TI2V model. Furthermore, we show that TI2V-Zero can seamlessly extend to other tasks such as video infilling and prediction when provided with more images. Its autoregressive design also supports long video generation.

CRFeb 14, 2024

Why Does Differential Privacy with Large Epsilon Defend Against Practical Membership Inference Attacks?

Andrew Lowy, Zhuohang Li, Jing Liu et al.

For small privacy parameter $ε$, $ε$-differential privacy (DP) provides a strong worst-case guarantee that no membership inference attack (MIA) can succeed at determining whether a person's data was used to train a machine learning model. The guarantee of DP is worst-case because: a) it holds even if the attacker already knows the records of all but one person in the data set; and b) it holds uniformly over all data sets. In practical applications, such a worst-case guarantee may be overkill: practical attackers may lack exact knowledge of (nearly all of) the private data, and our data set might be easier to defend, in some sense, than the worst-case data set. Such considerations have motivated the industrial deployment of DP models with large privacy parameter (e.g. $ε\geq 7$), and it has been observed empirically that DP with large $ε$ can successfully defend against state-of-the-art MIAs. Existing DP theory cannot explain these empirical findings: e.g., the theoretical privacy guarantees of $ε\geq 7$ are essentially vacuous. In this paper, we aim to close this gap between theory and practice and understand why a large DP parameter can prevent practical MIAs. To tackle this problem, we propose a new privacy notion called practical membership privacy (PMP). PMP models a practical attacker's uncertainty about the contents of the private data. The PMP parameter has a natural interpretation in terms of the success rate of a practical MIA on a given data set. We quantitatively analyze the PMP parameter of two fundamental DP mechanisms: the exponential mechanism and Gaussian mechanism. Our analysis reveals that a large DP parameter often translates into a much smaller PMP parameter, which guarantees strong privacy against practical MIAs. Using our findings, we offer principled guidance for practitioners in choosing the DP parameter.

CVMar 18, 2024

SuperLoRA: Parameter-Efficient Unified Adaptation of Multi-Layer Attention Modules

Xiangyu Chen, Jing Liu, Ye Wang et al.

Low-rank adaptation (LoRA) and its variants are widely employed in fine-tuning large models, including large language models for natural language processing and diffusion models for computer vision. This paper proposes a generalized framework called SuperLoRA that unifies and extends different LoRA variants, which can be realized under different hyper-parameter settings. Introducing grouping, folding, shuffling, projecting, and tensor factoring, SuperLoRA offers high flexibility compared with other LoRA variants and demonstrates superior performance for transfer learning tasks especially in the extremely few-parameter regimes.

LGMar 7, 2025

Quantum-PEFT: Ultra parameter-efficient fine-tuning

Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu et al.

This paper introduces Quantum-PEFT that leverages quantum computations for parameter-efficient fine-tuning (PEFT). Unlike other additive PEFT methods, such as low-rank adaptation (LoRA), Quantum-PEFT exploits an underlying full-rank yet surprisingly parameter efficient quantum unitary parameterization. With the use of Pauli parameterization, the number of trainable parameters grows only logarithmically with the ambient dimension, as opposed to linearly as in LoRA-based PEFT methods. Quantum-PEFT achieves vanishingly smaller number of trainable parameters than the lowest-rank LoRA as dimensions grow, enhancing parameter efficiency while maintaining a competitive performance. We apply Quantum-PEFT to several transfer learning benchmarks in language and vision, demonstrating significant advantages in parameter efficiency.

ARMar 15, 2024

AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs

Md Rubel Ahmed, Toshiaki Koike-Akino, Kieran Parsons et al.

High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design space parameters can take much time and effort for hardware engineers to meet specific design specifications. This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization. Our tool focuses on HLS pragma exploration and operation transformation. It utilizes integrated DNNs to predict synthesizability within a given FPGA resource budget. We also investigate the potential of emerging quantum neural networks (QNNs) instead of classical DNNs for the AutoHLS pipeline. Our experimental results demonstrate up to a 70-fold speedup in exploration time.

LGJan 27, 2025

Smoothed Embeddings for Robust Language Models

Ryo Hase, Md Rafi Ur Rashid, Ashley Lewis et al.

Improving the safety and reliability of large language models (LLMs) is a crucial aspect of realizing trustworthy AI systems. Although alignment methods aim to suppress harmful content generation, LLMs are often still vulnerable to jailbreaking attacks that employ adversarial inputs that subvert alignment and induce harmful outputs. We propose the Randomized Embedding Smoothing and Token Aggregation (RESTA) defense, which adds random noise to the embedding vectors and performs aggregation during the generation of each output token, with the aim of better preserving semantic information. Our experiments demonstrate that our approach achieves superior robustness versus utility tradeoffs compared to the baseline defenses.

CVApr 24, 2025

Range Image-Based Implicit Neural Compression for LiDAR Point Clouds

Akihiro Kuwabara, Sorachi Kato, Takuya Fujihashi et al.

This paper presents a novel scheme to efficiently compress Light Detection and Ranging~(LiDAR) point clouds, enabling high-precision 3D scene archives, and such archives pave the way for a detailed understanding of the corresponding 3D scenes. We focus on 2D range images~(RIs) as a lightweight format for representing 3D LiDAR observations. Although conventional image compression techniques can be adapted to improve compression efficiency for RIs, their practical performance is expected to be limited due to differences in bit precision and the distinct pixel value distribution characteristics between natural images and RIs. We propose a novel implicit neural representation~(INR)--based RI compression method that effectively handles floating-point valued pixels. The proposed method divides RIs into depth and mask images and compresses them using patch-wise and pixel-wise INR architectures with model pruning and quantization, respectively. Experiments on the KITTI dataset show that the proposed method outperforms existing image, point cloud, RI, and INR-based compression methods in terms of 3D reconstruction and detection quality at low bitrates and decoding latency.

LGNov 6, 2024

Quantum Diffusion Models for Few-Shot Learning

Ruhan Wang, Ye Wang, Jing Liu et al.

Modern quantum machine learning (QML) methods involve the variational optimization of parameterized quantum circuits on training datasets, followed by predictions on testing datasets. Most state-of-the-art QML algorithms currently lack practical advantages due to their limited learning capabilities, especially in few-shot learning tasks. In this work, we propose three new frameworks employing quantum diffusion model (QDM) as a solution for the few-shot learning: label-guided generation inference (LGGI); label-guided denoising inference (LGDI); and label-guided noise addition inference (LGNAI). Experimental results demonstrate that our proposed algorithms significantly outperform existing methods.

LGJun 11, 2025

AWP: Activation-Aware Weight Pruning and Quantization with Projected Gradient Descent

Jing Liu, Toshiaki Koike-Akino, Ye Wang et al.

To address the enormous size of Large Language Models (LLMs), model compression methods, such as quantization and pruning, are often deployed, especially on edge devices. In this work, we focus on layer-wise post-training quantization and pruning. Drawing connections between activation-aware weight pruning and sparse approximation problems, and motivated by the success of Iterative Hard Thresholding (IHT), we propose a unified method for Activation-aware Weight pruning and quantization via Projected gradient descent (AWP). Our experiments demonstrate that AWP outperforms state-of-the-art LLM pruning and quantization methods. Theoretical convergence guarantees of the proposed method for pruning are also provided.

LGMay 31, 2025

Probabilistic Forecasting for Building Energy Systems using Time-Series Foundation Models

Young Jin Park, Francois Germain, Jing Liu et al. · mit

Decision-making in building energy systems critically depends on the predictive accuracy of relevant time-series models. In scenarios lacking extensive data from a target building, foundation models (FMs) represent a promising technology that can leverage prior knowledge from vast and diverse pre-training datasets to construct accurate probabilistic predictors for use in decision-making tools. This paper investigates the applicability and fine-tuning strategies of time-series foundation models (TSFMs) in building energy forecasting. We analyze both full fine-tuning and parameter-efficient fine-tuning approaches, particularly low-rank adaptation (LoRA), by using real-world data from a commercial net-zero energy building to capture signals such as room occupancy, carbon emissions, plug loads, and HVAC energy consumption. Our analysis reveals that the zero-shot predictive performance of TSFMs is generally suboptimal. To address this shortcoming, we demonstrate that employing either full fine-tuning or parameter-efficient fine-tuning significantly enhances forecasting accuracy, even with limited historical data. Notably, fine-tuning with low-rank adaptation (LoRA) substantially reduces computational costs without sacrificing accuracy. Furthermore, fine-tuned TSFMs consistently outperform state-of-the-art deep forecasting models (e.g., temporal fusion transformers) in accuracy, robustness, and generalization across varying building zones and seasonal conditions. These results underline the efficacy of TSFMs for practical, data-constrained building energy management systems, enabling improved decision-making in pursuit of energy efficiency and sustainability.

LGMay 27, 2025

TuneComp: Joint Fine-tuning and Compression for Large Foundation Models

Xiangyu Chen, Jing Liu, Ye Wang et al.

To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine-tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.

LGMay 24, 2025

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts

Toshiaki Koike-Akino, Jing Liu, Ye Wang

To tackle the huge computational demand of large foundation models, activation-aware compression techniques without retraining have been introduced. However, since these rely on calibration data, domain shift may arise for unknown downstream tasks. With a computationally efficient calibration, activation-aware pruning can be executed for every prompt adaptively, yet achieving reduced complexity at inference. We formulate it as a mixture of micro-experts, called $μ$-MoE. Several experiments demonstrate that $μ$-MoE can dynamically adapt to task/prompt-dependent structured sparsity on the fly.

LGMay 23, 2025

LatentLLM: Attention-Aware Joint Tensor Compression

Toshiaki Koike-Akino, Xiangyu Chen, Jing Liu et al.

Modern foundation models such as large language models (LLMs) and large multi-modal models (LMMs) require a massive amount of computational and memory resources. We propose a new framework to convert such LLMs/LMMs into a reduced-dimension latent structure. Our method extends a local activation-aware tensor decomposition to a global attention-aware joint tensor de-composition. Our framework can significantly improve the model accuracy over the existing model compression methods when reducing the latent dimension to realize computationally/memory-efficient LLMs/LLMs. We show the benefit on several benchmark including multi-modal reasoning tasks.

IVDec 19, 2024

Quantum Implicit Neural Compression

Takuya Fujihashi, Toshiaki Koike-Akino

Signal compression based on implicit neural representation (INR) is an emerging technique to represent multimedia signals with a small number of bits. While INR-based signal compression achieves high-quality reconstruction for relatively low-resolution signals, the accuracy of high-frequency details is significantly degraded with a small model. To improve the compression efficiency of INR, we introduce quantum INR (quINR), which leverages the exponentially rich expressivity of quantum neural networks for data compression. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods, up to 1.2dB gain.

LGJun 7, 2024

Efficient Differentially Private Fine-Tuning of Diffusion Models

Jing Liu, Andrew Lowy, Toshiaki Koike-Akino et al.

The recent developments of Diffusion Models (DMs) enable generation of astonishingly high-quality synthetic samples. Recent work showed that the synthetic samples generated by the diffusion model, which is pre-trained on public data and fully fine-tuned with differential privacy on private data, can train a downstream classifier, while achieving a good privacy-utility tradeoff. However, fully fine-tuning such large diffusion models with DP-SGD can be very resource-demanding in terms of memory usage and computation. In this work, we investigate Parameter-Efficient Fine-Tuning (PEFT) of diffusion models using Low-Dimensional Adaptation (LoDA) with Differential Privacy. We evaluate the proposed method with the MNIST and CIFAR-10 datasets and demonstrate that such efficient fine-tuning can also generate useful synthetic samples for training downstream classifiers, with guaranteed privacy protection of fine-tuning data. Our source code will be made available on GitHub.

SPFeb 19, 2022

Multi-Modal Recurrent Fusion for Indoor Localization

Jianyuan Yu, Pu, Wang et al.

This paper considers indoor localization using multi-modal wireless signals including Wi-Fi, inertial measurement unit (IMU), and ultra-wideband (UWB). By formulating the localization as a multi-modal sequence regression problem, a multi-stream recurrent fusion method is proposed to combine the current hidden state of each modality in the context of recurrent neural networks while accounting for the modality uncertainty which is directly learned from its own immediate past states. The proposed method was evaluated on the large-scale SPAWC2021 multi-modal localization dataset and compared with a wide range of baseline methods including the trilateration method, traditional fingerprinting methods, and convolution network-based methods.

MMJan 12, 2022

Federated AirNet: Hybrid Digital-Analog Neural Network Transmission for Federated Learning

Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe

A key issue in federated learning over wireless channels is how to exchange a large number of the model parameters via time-varying channels. Two types of solutions based on digital and analog schemes are used typically. The digital-based solution takes quantization and entropy coding for compression, whereas transmissions via wireless channels may cause catastrophic errors owing to the all-or-nothing behavior in entropy coding. The analog-based solutions such as AirNet and AirComp use analog modulation for the parameter transmissions. However, such an analog scheme often causes significant distortion due to the source signal's large power without compression gain. This paper proposes a novel hybrid digital-analog transmission-Federated AirNet--for the model parameter transmissions in federated learning. The Federated AirNet integrates low-rate digital coding and energy-compact analog modulation. The digital coding offers the baseline of the model parameters and compacts the source signal power. In addition, the residual parameters, which are obtained from the original and encoded model parameters, are analog-modulated to enhance the baseline according to the instantaneous wireless channel quality. We show that the proposed Federated AirNet yields better image classification accuracy compared with the digital-based and analog-based solutions over a wide range of wireless channel signal-to-noise ratios (SNRs).

NIDec 28, 2021

Multi-Band Wi-Fi Sensing with Matched Feature Granularity

Jianyuan Yu, Pu, Wang et al.

Complementary to the fine-grained channel state information (CSI) from the physical layer and coarse-grained received signal strength indicator (RSSI) measurements, the mid-grained spatial beam attributes (e.g., beam SNR) that are available at millimeter-wave (mmWave) bands during the mandatory beam training phase can be repurposed for Wi-Fi sensing applications. In this paper, we propose a multi-band Wi-Fi fusion method for Wi-Fi sensing that hierarchically fuses the features from both the fine-grained CSI at sub-6 GHz and the mid-grained beam SNR at 60 GHz in a granularity matching framework. The granularity matching is realized by pairing two feature maps from the CSI and beam SNR at different granularity levels and linearly combining all paired feature maps into a fused feature map with learnable weights. To further address the issue of limited labeled training data, we propose an autoencoder-based multi-band Wi-Fi fusion network that can be pre-trained in an unsupervised fashion. Once the autoencoder-based fusion network is pre-trained, we detach the decoders and append multi-task sensing heads to the fused feature map by fine-tuning the fusion block and re-training the multi-task heads from the scratch. The multi-band Wi-Fi fusion framework is thoroughly validated by in-house experimental Wi-Fi sensing datasets spanning three tasks: 1) pose recognition; 2) occupancy sensing; and 3) indoor localization. Comparison to four baseline methods (i.e., CSI-only, beam SNR-only, input fusion, and feature fusion) demonstrates the granularity matching improves the multi-task sensing performance. Quantitative performance is evaluated as a function of the number of labeled training data, latent space dimension, and fine-tuning learning rates.

LGDec 17, 2021

AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data

Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino et al.

We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.

SPNov 18, 2021

A Modular 1D-CNN Architecture for Real-time Digital Pre-distortion

Udara De Silva, Toshiaki Koike-Akino, Rui Ma et al.

This study reports a novel hardware-friendly modular architecture for implementing one dimensional convolutional neural network (1D-CNN) digital predistortion (DPD) technique to linearize RF power amplifier (PA) real-time.The modular nature of our design enables DPD system adaptation for variable resource and timing constraints.Our work also presents a co-simulation architecture to verify the DPD performance with an actual power amplifier hardware-in-the-loop.The experimental results with 100 MHz signals show that the proposed 1D-CNN obtains superior performance compared with other neural network architectures for real-time DPD application.

MMNov 16, 2021

Soft Delivery: Survey on A New Paradigm for Wireless and Mobile Multimedia Streaming

Takuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe

The increasing demand for video streaming services is the key driver of modern wireless and mobile communications. For robust and high-quality delivery of video content over wireless and mobile networks, the main challenge is sending image and video signals to single and multiple users over unstable and diverse channel environments. To this end, many studies have designed digital-based video delivery schemes, which mainly consist of a sequence of digital-based coding and transmission schemes. Although digital-based schemes perform well when the channel characteristics are known in advance, significant quality degradation, known as cliff and leveling effects, often occurs owing to the fluctuating channel characteristics. To prevent cliff and leveling effects irrespective of the channel characteristics of each user, a new paradigm for wireless and mobile video streaming has been proposed. Soft delivery schemes skip the digital operations of quantization and entropy and channel coding while directly mapping the power-assigned frequency--domain coefficients onto the transmission symbols. This modification is based on the fact that the pixel distortion due to communication noise is proportional to the magnitude of the noise, resulting in graceful quality improvement, wherein quality is improved gradually, according to the wireless channel quality without any cliff and leveling effects. Herein, we present a comprehensive summary of soft delivery schemes.

LGJun 16, 2021

EEG-GNN: Graph Neural Networks for Classification of Electroencephalogram (EEG) Signals

Andac Demir, Toshiaki Koike-Akino, Ye Wang et al.

Convolutional neural networks (CNN) have been frequently used to extract subject-invariant features from electroencephalogram (EEG) for classification tasks. This approach holds the underlying assumption that electrodes are equidistant analogous to pixels of an image and hence fails to explore/exploit the complex functional neural connectivity between different electrode sites. We overcome this limitation by tailoring the concepts of convolution and pooling applied to 2D grid-like inputs for the functional network of electrode sites. Furthermore, we develop various graph neural network (GNN) models that project electrodes onto the nodes of a graph, where the node features are represented as EEG channel samples collected over a trial, and nodes can be connected by weighted/unweighted edges according to a flexible policy formulated by a neuroscientist. The empirical evaluations show that our proposed GNN-based framework outperforms standard CNN classifiers across ErrP, and RSVP datasets, as well as allowing neuroscientific interpretability and explainability to deep learning methods tailored to EEG related classification problems. Another practical advantage of our GNN-based framework is that it can be used in EEG channel selection, which is critical for reducing computational cost, and designing portable EEG headsets.

SPSep 28, 2020

Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders

Mo Han, Ozan Ozdenizci, Toshiaki Koike-Akino et al.

Human computer interaction (HCI) involves a multidisciplinary fusion of technologies, through which the control of external devices could be achieved by monitoring physiological status of users. However, physiological biosignals often vary across users and recording sessions due to unstable physical/mental conditions and task-irrelevant activities. To deal with this challenge, we propose a method of adversarial feature encoding with the concept of a Rateless Autoencoder (RAE), in order to exploit disentangled, nuisance-robust, and universal representations. We achieve a good trade-off between user-specific and task-relevant features by making use of the stochastic disentanglement of the latent representations by adopting additional adversarial networks. The proposed model is applicable to a wider range of unknown users and tasks as well as different classifiers. Results on cross-subject transfer evaluations show the advantages of the proposed framework, with up to an 11.6% improvement in the average subject-transfer classification accuracy.

LGAug 26, 2020

Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in biosignal processing have enabled users to exploit their physiological status for manipulating devices in a reliable and safe manner. One major challenge of physiological sensing lies in the variability of biosignals across different users and tasks. To address this issue, we propose an adversarial feature extractor for transfer learning to exploit disentangled universal representations. We consider the trade-off between task-relevant features and user-discriminative information by introducing additional adversary and nuisance networks in order to manipulate the latent representations such that the learned feature extractor is applicable to unknown users and various tasks. Results on cross-subject transfer evaluations exhibit the benefits of the proposed framework, with up to 8.8% improvement in average accuracy of classification, and demonstrate adaptability to a broader range of subjects.

LGJul 22, 2020

Robust Machine Learning via Privacy/Rate-Distortion Theory

Ye Wang, Shuchin Aeron, Adnan Siraj Rakin et al.

Robust machine learning formulations have emerged to address the prevalent vulnerability of deep neural networks to adversarial examples. Our work draws the connection between optimal robust learning and the privacy-utility tradeoff problem, which is a generalization of the rate-distortion problem. The saddle point of the game between a robust classifier and an adversarial perturbation can be found via the solution of a maximum conditional entropy problem. This information-theoretic perspective sheds light on the fundamental tradeoff between robustness and clean data performance, which ultimately arises from the geometric structure of the underlying data distribution and perturbation constraints.

LGJul 2, 2020

AutoBayes: Automated Bayesian Graph Exploration for Nuisance-Robust Inference

Andac Demir, Toshiaki Koike-Akino, Ye Wang et al.

Learning data representations that capture task-related features, but are invariant to nuisance variations remains a key challenge in machine learning. We introduce an automated Bayesian inference framework, called AutoBayes, that explores different graphical models linking classifier, encoder, decoder, estimator and adversarial network blocks to optimize nuisance-invariant machine learning pipelines. AutoBayes also enables learning disentangled representations, where the latent variable is split into multiple pieces to impose various relationships with the nuisance variation and task labels. We benchmark the framework on several public datasets, and provide analysis of its capability for subject-transfer learning with/without variational modeling and adversarial training. We demonstrate a significant performance improvement with ensemble learning across explored graphical models.

SPJun 17, 2020

Wireless 3D Point Cloud Delivery Using Deep Graph Neural Networks

Takuya Fujihashi, Toshiaki Koike-Akino, Siheng Chen et al.

In typical point cloud delivery, a sender uses octree-based digital video compression to send three-dimensional (3D) points and color attributes over band-limited links. However, the digital-based schemes have an issue called the cliff effect, where the 3D reconstruction quality will be a step function in terms of wireless channel quality. To prevent the cliff effect subject to channel quality fluctuation, we have proposed soft point cloud delivery called HoloCast. Although the HoloCast realizes graceful quality improvement according to wireless channel quality, it requires large communication overheads. In this paper, we propose a novel scheme for soft point cloud delivery to simultaneously realize better quality and lower communication overheads. The proposed scheme introduces an end-to-end deep learning framework based on graph neural network (GNN) to reconstruct high-quality point clouds from its distorted observation under wireless fading channels. We demonstrate that the proposed GNN-based scheme can reconstruct clean 3D point cloud with low overheads by removing fading and noise effects.

LGMay 6, 2020

Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction

Toshiaki Koike-Akino, Ye Wang

We propose a new concept of rateless auto-encoders (RL-AEs) that enable a flexible latent dimensionality, which can be seamlessly adjusted for varying distortion and dimensionality requirements. In the proposed RL-AEs, instead of a deterministic bottleneck architecture, we use an over-complete representation that is stochastically regularized with weighted dropouts, in a manner analogous to sparse AE (SAE). Unlike SAEs, our RL-AEs employ monotonically increasing dropout rates across the latent representation nodes such that the latent variables become sorted by importance like in principal component analysis (PCA). This is motivated by the rateless property of conventional PCA, where the least important principal components can be discarded to realize variable rate dimensionality reduction that gracefully degrades the distortion. In contrast, since the latent variables of conventional AEs are equally important for data reconstruction, they cannot be simply discarded to further reduce the dimensionality after the AE model is trained. Our proposed stochastic bottleneck framework enables seamless rate adaptation with high reconstruction performance, without requiring predetermined latent dimensionality at training. We experimentally demonstrate that the proposed RL-AEs can achieve variable dimensionality reduction while achieving low distortion compared to conventional AEs.

SPApr 15, 2020

Disentangled Adversarial Transfer Learning for Physiological Biosignals

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in wearable sensors demonstrate promising results for monitoring physiological status in effective and comfortable ways. One major challenge of physiological status assessment is the problem of transfer learning caused by the domain inconsistency of biosignals across users or different recording sessions from the same user. We propose an adversarial inference approach for transfer learning to extract disentangled nuisance-robust representations from physiological biosignal data in stress status level assessment. We exploit the trade-off between task-related features and person-discriminative information by using both an adversary network and a nuisance network to jointly manipulate and disentangle the learned latent representations by the encoder, which are then input to a discriminative classifier. Results on cross-subjects transfer evaluations demonstrate the benefits of the proposed adversarial framework, and thus show its capabilities to adapt to a broader range of subjects. Finally we highlight that our proposed adversarial transfer learning approach is also applicable to other deep feature learning frameworks.