Guangyi Zhang

LG
h-index16
26papers
830citations
Novelty49%
AI Score56

26 Papers

SPJun 8, 2022
Robust Semantic Communications with Masked VQ-VAE Enabled Codebook

Qiyu Hu, Guangyi Zhang, Zhijin Qin et al.

Although semantic communications have exhibited satisfactory performance for a large number of tasks, the impact of semantic noise and the robustness of the systems have not been well investigated. Semantic noise refers to the misleading between the intended semantic symbols and received ones, thus cause the failure of tasks. In this paper, we first propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise. In particular, we analyze sample-dependent and sample-independent semantic noise. To combat the semantic noise, the adversarial training with weight perturbation is developed to incorporate the samples with semantic noise in the training dataset. Then, we propose to mask a portion of the input, where the semantic noise appears frequently, and design the masked vector quantized-variational autoencoder (VQ-VAE) with the noise-related masking strategy. We use a discrete codebook shared by the transmitter and the receiver for encoded feature representation. To further improve the system robustness, we develop a feature importance module (FIM) to suppress the noise-related and task-unrelated features. Thus, the transmitter simply needs to transmit the indices of these important task-related features in the codebook. Simulation results show that the proposed method can be applied in many downstream tasks and significantly improve the robustness against semantic noise with remarkable reduction on the transmission overhead.

LGAug 23, 2022
Regularized impurity reduction: Accurate decision trees with complexity guarantees

Guangyi Zhang, Aristides Gionis

Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and splitting tests are associated with non-uniform costs. As a measure of tree complexity, we adopt the expected cost to classify an object drawn from the input distribution, which, in the uniform-cost case, is the expected number of tests. We propose a tree-induction algorithm that gives a logarithmic approximation guarantee on the tree complexity. This approximation factor is tight up to a constant factor under mild assumptions. The algorithm recursively selects a test that maximizes a greedy criterion defined as a weighted sum of three components. The first two components encourage the selection of tests that improve the balance and the cost-efficiency of the tree, respectively, while the third impurity-reduction component encourages the selection of more discriminative tests. As shown in our empirical evaluation, compared to the original heuristics, the enhanced algorithms strike an excellent balance between predictive accuracy and tree complexity.

DSApr 8, 2022
Ranking with submodular functions on a budget

Guangyi Zhang, Nikolaj Tatti, Aristides Gionis

Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.

LGNov 3, 2022
A Convergence Theory for Federated Average: Beyond Smoothness

Xiaoxiao Li, Zhao Song, Runzhou Tao et al.

Federated learning enables a large amount of edge computing devices to learn a model without data sharing jointly. As a leading algorithm in this setting, Federated Average FedAvg, which runs Stochastic Gradient Descent (SGD) in parallel on local devices and averages the sequences only once in a while, have been widely used due to their simplicity and low communication cost. However, despite recent research efforts, it lacks theoretical analysis under assumptions beyond smoothness. In this paper, we analyze the convergence of FedAvg. Different from the existing work, we relax the assumption of strong smoothness. More specifically, we assume the semi-smoothness and semi-Lipschitz properties for the loss function, which have an additional first-order term in assumption definitions. In addition, we also assume bound on the gradient, which is weaker than the commonly used bounded gradient assumption in the convergence analysis scheme. As a solution, this paper provides a theoretical convergence study on Federated Learning.

CVFeb 25, 2023
Partial Label Learning for Emotion Recognition from EEG

Guangyi Zhang, Ali Etemad

Fully supervised learning has recently achieved promising performance in various electroencephalography (EEG) learning tasks by training on large datasets with ground truth labels. However, labeling EEG data for affective experiments is challenging, as it can be difficult for participants to accurately distinguish between similar emotions, resulting in ambiguous labeling (reporting multiple emotions for one EEG instance). This notion could cause model performance degradation, as the ground truth is hidden within multiple candidate labels. To address this issue, Partial Label Learning (PLL) has been proposed to identify the ground truth from candidate labels during the training phase, and has shown good performance in the computer vision domain. However, PLL methods have not yet been adopted for EEG representation learning or implemented for emotion recognition tasks. In this paper, we adapt and re-implement six state-of-the-art PLL approaches for emotion recognition from EEG on two large emotion datasets (SEED-IV and SEED-V). These datasets contain four and five categories of emotions, respectively. We evaluate the performance of all methods in classical, circumplex-based and real-world experiments. The results show that PLL methods can achieve strong results in affective computing from EEG and achieve comparable performance to fully supervised learning. We also investigate the effect of label disambiguation, a key step in many PLL methods. The results show that in most cases, label disambiguation would benefit the model when the candidate labels are generated based on their similarities to the ground truth rather than obeying a uniform distribution. This finding suggests the potential of using label disambiguation-based PLL methods for circumplex-based and real-world affective tasks.

IVAug 29, 2024
Learned Image Transmission with Hierarchical Variational Autoencoder

Guangyi Zhang, Hanlei Li, Yunlong Cai et al.

In this paper, we introduce an innovative hierarchical joint source-channel coding (HJSCC) framework for image transmission, utilizing a hierarchical variational autoencoder (VAE). Our approach leverages a combination of bottom-up and top-down paths at the transmitter to autoregressively generate multiple hierarchical representations of the original image. These representations are then directly mapped to channel symbols for transmission by the JSCC encoder. We extend this framework to scenarios with a feedback link, modeling transmission over a noisy channel as a probabilistic sampling process and deriving a novel generative formulation for JSCC with feedback. Compared with existing approaches, our proposed HJSCC provides enhanced adaptability by dynamically adjusting transmission bandwidth, encoding these representations into varying amounts of channel symbols. Extensive experiments on images of varying resolutions demonstrate that our proposed model outperforms existing baselines in rate-distortion performance and maintains robustness against channel noise. The source code will be made available upon acceptance.

LGMay 20
Efficient Banzhaf-Based Data Valuation for $k$-Nearest Neighbors Classification

Guangyi Zhang, Lutz Oettershagen, Lixu Wang et al.

Data valuation, the task of quantifying the contribution of individual data points to model performance, has emerged as a fundamental challenge in machine learning. Game-theoretic approaches, such as the Banzhaf value, offer principled frameworks for fair data valuation; however, they suffer from exponential computational complexity. We address this challenge by developing efficient algorithms specifically tailored for computing Banzhaf values in $k$-nearest neighbor ($k$NN) classifiers. We first establish the theoretical hardness of the problem by proving that it is \#P-hard. Despite this intractability, we exploit the locality properties of $k$NN classifiers to develop practical exact algorithms. Our main contribution is a dynamic programming framework that achieves significant computational improvements: we present a pseudo-polynomial algorithm with $O(Wkn^2)$ time complexity for weighted $k$NN classifiers, where $W$ is the maximum sum of top-$k$ weights, and a specialized algorithm for unweighted $k$NN that achieves $O(nk^2)$ time complexity, that is, linear in the number of data points. We also offer efficient Monte Carlo estimation methods. Extensive experiments on real-world datasets demonstrate the practical efficiency of our approach and its effectiveness in data valuation applications.

LGFeb 2
Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts

Guangyi Zhang, Yunlong Cai, Guanding Yu et al.

We study the problem of monitoring model performance in dynamic environments where labeled data are limited. To this end, we propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI). PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels. Harmful shifts are detected via a threshold-based comparison with an upper bound on the nominal risk, satisfying assumption-free finite-sample guarantees in the probability of false alarm. We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM), and telecommunications monitoring tasks.

CVJul 24, 2025Code
Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction

Runmin Zhang, Zhu Yu, Si-Yuan Cao et al.

This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction. Unlike previous approaches that restrict the receptive field of voxels to fixed locations on images, we introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image and dynamically adjust the contributions from different views, enhancing the representation capability of voxel features. Furthermore, we propose a sparse volume construction strategy that adaptively identifies and selects voxels with high occupancy probabilities for feature refinement, minimizing redundant computation in free space. Benefiting from the above designs, our framework achieves effective and efficient volume construction in an adaptive way. Better still, our network can be supervised using only 3D bounding boxes, eliminating the dependence on ground-truth scene geometry. Experimental results demonstrate that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets. The source code is available at https://github.com/RM-Zhang/SGCDet.

CVAug 19, 2020Code
Spatio-Temporal EEG Representation Learning on Riemannian Manifold and Euclidean Space

Guangyi Zhang, Ali Etemad

We present a novel deep neural architecture for learning electroencephalogram (EEG). To learn the spatial information, our model first obtains the Riemannian mean and distance from spatial covariance matrices (SCMs) on a Riemannian manifold. We then project the spatial information onto a Euclidean space via tangent space learning. Following, two fully connected layers are used to learn the spatial information embeddings. Moreover, our proposed method learns the temporal information via differential entropy and logarithm power spectrum density features extracted from EEG signals in a Euclidean space using a deep long short-term memory network with a soft attention mechanism. To combine the spatial and temporal information, we use an effective fusion strategy, which learns attention weights applied to embedding-specific features for decision making. We evaluate our proposed framework on four public datasets across three popular EEG-related tasks, notably emotion recognition, vigilance estimation, and motor imagery classification, containing various types of tasks such as binary classification, multi-class classification, and regression. Our proposed architecture outperforms other methods on SEED-VIG, and approaches the state-of-the-art on the other three datasets (SEED, BCI-IV 2A, and BCI-IV 2B), showing the robustness of our framework in EEG representation learning. The source code of our paper is publicly available at https://github.com/guangyizhangbci/EEG_Riemannian.

LGMay 6
FL-Sailer: Efficient and Privacy-Preserving Federated Learning for Scalable Single-Cell Epigenetic Data Analysis via Adaptive Sampling

Guangyi Zhang, Yi Dai, Yiyun He et al.

Single-cell ATAC-seq (scATAC-seq) enables high-resolution mapping of chromatin accessibility, yet privacy regulations and data size constraints hinder multi-institutional sharing. Federated learning (FL) offers a privacy-preserving alternative, but faces three fundamental barriers in scATAC-seq analysis: ultra-high dimensionality, extreme sparsity, and severe cross-institutional heterogeneity. We propose FL-Sailer, the first FL framework designed for scATAC-seq data. FL-Sailer integrates two key innovations: (i) adaptive leverage score sampling, which selects biologically interpretable features while reducing dimensionality by 80%, and (ii) an invariant VAE architecture, which disentangles biological signals from technical confounders via mutual information minimization. We provide a convergence guarantee, showing that FL-Sailer converges to an approximate solution of the original high-dimensional problem with bounded error. Extensive experiments on synthetic and real epigenomic datasets demonstrate that FL-Sailer not only enables previously infeasible multi-institutional collaborations but also surpasses centralized methods by leveraging adaptive sampling as an implicit regularizer to suppress technical noise. Our work establishes that federated learning, when tailored to domain-specific challenges, can become a superior paradigm for collaborative epigenomic research.

LGOct 11, 2025
Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding

Payel Bhattacharjee, Fengwei Tian, Meiyu Zhong et al.

Edge-cloud speculative decoding (SD) accelerates inference by having a cloud-based large language model (LLM) that verifies draft tokens generated by a resource-constrained small language model (SLM) at the edge. A central bottleneck is the limited bandwidth of the edge-cloud link, which necessitates efficient compression of draft token distributions. We first derive an information-theoretic bound that decomposes the token rejection rate into contributions from SLM-LLM distribution mismatch and from quantization distortion. Guided by this analysis, we propose the Sparse Quantize-and-Sample SD (SQS-SD) framework, which exploits distributional sparsity through structured sparsification and lattice-based quantization. Within this framework, K-SQS applies fixed top-K truncation, while C-SQS adaptively adjusts the retained token set via online conformal prediction to ensure bounded deviation from the dense distribution. Empirical results confirm that both approaches improve end-to-end latency and rejection rates in complimentary operating regimes.

CVSep 29, 2025
Rethinking Unsupervised Cross-modal Flow Estimation: Learning from Decoupled Optimization and Consistency Constraint

Runmin Zhang, Jialiang Wang, Si-Yuan Cao et al.

This work presents DCFlow, a novel unsupervised cross-modal flow estimation framework that integrates a decoupled optimization strategy and a cross-modal consistency constraint. Unlike previous approaches that implicitly learn flow estimation solely from appearance similarity, we introduce a decoupled optimization strategy with task-specific supervision to address modality discrepancy and geometric misalignment distinctly. This is achieved by collaboratively training a modality transfer network and a flow estimation network. To enable reliable motion supervision without ground-truth flow, we propose a geometry-aware data synthesis pipeline combined with an outlier-robust loss. Additionally, we introduce a cross-modal consistency constraint to jointly optimize both networks, significantly improving flow prediction accuracy. For evaluation, we construct a comprehensive cross-modal flow benchmark by repurposing public datasets. Experimental results demonstrate that DCFlow can be integrated with various flow estimation networks and achieves state-of-the-art performance among unsupervised approaches.

CVJun 25, 2025
Dynamic Bandwidth Allocation for Hybrid Event-RGB Transmission

Pujing Yang, Guangyi Zhang, Yunlong Cai et al.

Event cameras asynchronously capture pixel-level intensity changes with extremely low latency. They are increasingly used in conjunction with RGB cameras for a wide range of vision-related applications. However, a major challenge in these hybrid systems lies in the transmission of the large volume of triggered events and RGB images. To address this, we propose a transmission scheme that retains efficient reconstruction performance of both sources while accomplishing real-time deblurring in parallel. Conventional RGB cameras and event cameras typically capture the same scene in different ways, often resulting in significant redundant information across their outputs. To address this, we develop a joint event and image (E-I) transmission framework to eliminate redundancy and thereby optimize channel bandwidth utilization. Our approach employs Bayesian modeling and the information bottleneck method to disentangle the shared and domain-specific information within the E-I inputs. This disentangled information bottleneck framework ensures both the compactness and informativeness of extracted shared and domain-specific information. Moreover, it adaptively allocates transmission bandwidth based on scene dynamics, i.e., more symbols are allocated to events for dynamic details or to images for static information. Simulation results demonstrate that the proposed scheme not only achieves superior reconstruction quality compared to conventional systems but also delivers enhanced deblurring performance.

DBJun 25, 2025
Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

Jiayong Qin, Xianyu Zhu, Qiyu Liu et al.

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κ\cdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.

IVJan 16, 2025
Joint Transmission and Deblurring: A Semantic Communication Approach Using Events

Pujing Yang, Guangyi Zhang, Yunlong Cai et al.

Deep learning-based joint source-channel coding (JSCC) is emerging as a promising technology for effective image transmission. However, most existing approaches focus on transmitting clear images, overlooking real-world challenges such as motion blur caused by camera shaking or fast-moving objects. Motion blur often degrades image quality, making transmission and reconstruction more challenging. Event cameras, which asynchronously record pixel intensity changes with extremely low latency, have shown great potential for motion deblurring tasks. However, the efficient transmission of the abundant data generated by event cameras remains a significant challenge. In this work, we propose a novel JSCC framework for the joint transmission of blurry images and events, aimed at achieving high-quality reconstructions under limited channel bandwidth. This approach is designed as a deblurring task-oriented JSCC system. Since RGB cameras and event cameras capture the same scene through different modalities, their outputs contain both shared and domain-specific information. To avoid repeatedly transmitting the shared information, we extract and transmit their shared information and domain-specific information, respectively. At the receiver, the received signals are processed by a deblurring decoder to generate clear images. Additionally, we introduce a multi-stage training strategy to train the proposed model. Simulation results demonstrate that our method significantly outperforms existing JSCC-based image transmission schemes, addressing motion blur effectively.

SPMay 15, 2023
Deep-Unfolding for Next-Generation Transceivers

Qiyu Hu, Yunlong Cai, Guangyi Zhang et al.

The stringent performance requirements of future wireless networks, such as ultra-high data rates, extremely high reliability and low latency, are spurring worldwide studies on defining the next-generation multiple-input multiple-output (MIMO) transceivers. For the design of advanced transceivers in wireless communications, optimization approaches often leading to iterative algorithms have achieved great success for MIMO transceivers. However, these algorithms generally require a large number of iterations to converge, which entails considerable computational complexity and often requires fine-tuning of various parameters. With the development of deep learning, approximating the iterative algorithms with deep neural networks (DNNs) can significantly reduce the computational time. However, DNNs typically lead to black-box solvers, which requires amounts of data and extensive training time. To further overcome these challenges, deep-unfolding has emerged which incorporates the benefits of both deep learning and iterative algorithms, by unfolding the iterative algorithm into a layer-wise structure analogous to DNNs. In this article, we first go through the framework of deep-unfolding for transceiver design with matrix parameters and its recent advancements. Then, some endeavors in applying deep-unfolding approaches in next-generation advanced transceiver design are presented. Moreover, some open issues for future research are highlighted.

LGFeb 11, 2022
PARSE: Pairwise Alignment of Representations in Semi-Supervised EEG Learning for Emotion Recognition

Guangyi Zhang, Vandad Davoodnia, Ali Etemad

We propose PARSE, a novel semi-supervised architecture for learning strong EEG representations for emotion recognition. To reduce the potential distribution mismatch between the large amounts of unlabeled data and the limited amount of labeled data, PARSE uses pairwise representation alignment. First, our model performs data augmentation followed by label guessing for large amounts of original and augmented unlabeled data. This is then followed by sharpening of the guessed labels and convex combinations of the unlabeled and labeled data. Finally, representation alignment and emotion classification are performed. To rigorously test our model, we compare PARSE to several state-of-the-art semi-supervised approaches which we implement and adapt for EEG learning. We perform these experiments on four public EEG-based emotion recognition datasets, SEED, SEED-IV, SEED-V and AMIGOS (valence and arousal). The experiments show that our proposed framework achieves the overall best results with varying amounts of limited labeled samples in SEED, SEED-IV and AMIGOS (valence), while approaching the overall best result (reaching the second-best) in SEED-V and AMIGOS (arousal). The analysis shows that our pairwise representation alignment considerably improves the performance by reducing the distribution alignment between unlabeled and labeled data, especially when only 1 sample per class is labeled.

SPFeb 7, 2022
Robust Semantic Communications Against Semantic Noise

Qiyu Hu, Guangyi Zhang, Zhijin Qin et al.

Although the semantic communications have exhibited satisfactory performance in a large number of tasks, the impact of semantic noise and the robustness of the systems have not been well investigated. Semantic noise is a particular kind of noise in semantic communication systems, which refers to the misleading between the intended semantic symbols and received ones. In this paper, we first propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise. Particularly, we analyze the causes of semantic noise and propose a practical method to generate it. To remove the effect of semantic noise, adversarial training is proposed to incorporate the samples with semantic noise in the training dataset. Then, the masked autoencoder (MAE) is designed as the architecture of a robust semantic communication system, where a portion of the input is masked. To further improve the robustness of semantic communication systems, we firstly employ the vector quantization-variational autoencoder (VQ-VAE) to design a discrete codebook shared by the transmitter and the receiver for encoded feature representation. Thus, the transmitter simply needs to transmit the indices of these features in the codebook. Simulation results show that our proposed method significantly improves the robustness of semantic communication systems against semantic noise with significant reduction on the transmission overhead.

LGSep 24, 2021
Holistic Semi-Supervised Approaches for EEG Representation Learning

Guangyi Zhang, Ali Etemad

Recently, supervised methods, which often require substantial amounts of class labels, have achieved promising results for EEG representation learning. However, labeling EEG data is a challenging task. More recently, holistic semi-supervised learning approaches, which only require few output labels, have shown promising results in the field of computer vision. These methods, however, have not yet been adapted for EEG learning. In this paper, we adapt three state-of-the-art holistic semi-supervised approaches, namely MixMatch, FixMatch, and AdaMatch, as well as five classical semi-supervised methods for EEG learning. We perform rigorous experiments with all 8 methods on two public EEG-based emotion recognition datasets, namely SEED and SEED-IV. The experiments with different amounts of limited labeled samples show that the holistic approaches achieve strong results even when only 1 labeled sample is used per class. Further experiments show that in most cases, AdaMatch is the most effective method, followed by MixMatch and FixMatch.

LGJul 28, 2021
Deep Recurrent Semi-Supervised EEG Representation Learning for Emotion Recognition

Guangyi Zhang, Ali Etemad

EEG-based emotion recognition often requires sufficient labeled training samples to build an effective computational model. Labeling EEG data, on the other hand, is often expensive and time-consuming. To tackle this problem and reduce the need for output labels in the context of EEG-based emotion recognition, we propose a semi-supervised pipeline to jointly exploit both unlabeled and labeled data for learning EEG representations. Our semi-supervised framework consists of both unsupervised and supervised components. The unsupervised part maximizes the consistency between original and reconstructed input data using an autoencoder, while simultaneously the supervised part minimizes the cross-entropy between the input and output labels. We evaluate our framework using both a stacked autoencoder and an attention-based recurrent autoencoder. We test our framework on the large-scale SEED EEG dataset and compare our results with several other popular semi-supervised methods. Our semi-supervised framework with a deep attention-based recurrent autoencoder consistently outperforms the benchmark methods, even when small sub-sets (3\%, 5\% and 10\%) of the output labels are available during training, achieving a new state-of-the-art semi-supervised performance.

LGApr 30, 2021
Distilling EEG Representations via Capsules for Affective Computing

Guangyi Zhang, Ali Etemad

Affective computing with Electroencephalogram (EEG) is a challenging task that requires cumbersome models to effectively learn the information contained in large-scale EEG signals, causing difficulties for real-time smart-device deployment. In this paper, we propose a novel knowledge distillation pipeline to distill EEG representations via capsule-based architectures for both classification and regression tasks. Our goal is to distill information from a heavy model to a lightweight model for subject-specific tasks. To this end, we first pre-train a large model (teacher network) on large number of training samples. Then, we employ the teacher network to learn the discriminative features embedded in capsules by adopting a lightweight model (student network) to mimic the teacher using the privileged knowledge. Such privileged information learned by the teacher contain similarities among capsules and are only available during the training stage of the student network. We evaluate the proposed architecture on two large-scale public EEG datasets, showing that our framework consistently enables student networks with different compression ratios to effectively learn from the teacher, even when provided with limited training samples. Lastly, our method achieves state-of-the-art results on one of the two datasets.

AIJun 17, 2020
Diverse Rule Sets

Guangyi Zhang, Aristides Gionis

While machine-learning models are flourishing and transforming many aspects of everyday life, the inability of humans to understand complex models poses difficulties for these models to be fully trusted and embraced. Thus, interpretability of models has been recognized as an equally important quality as their predictive power. In particular, rule-based systems are experiencing a renaissance owing to their intuitive if-then representation. However, simply being rule-based does not ensure interpretability. For example, overlapped rules spawn ambiguity and hinder interpretation. Here we propose a novel approach of inferring diverse rule sets, by optimizing small overlap among decision rules with a 2-approximation guarantee under the framework of Max-Sum diversification. We formulate the problem as maximizing a weighted sum of discriminative quality and diversity of a rule set. In order to overcome an exponential-size search space of association rules, we investigate several natural options for a small candidate set of high-quality rules, including frequent and accurate rules, and examine their hardness. Leveraging the special structure in our formulation, we then devise an efficient randomized algorithm, which samples rules that are highly discriminative and have small overlap. The proposed sampling algorithm analytically targets a distribution of rules that is tailored to our objective. We demonstrate the superior predictive power and interpretability of our model with a comprehensive empirical study against strong baselines.

LGDec 17, 2019
Capsule Attention for Multimodal EEG-EOG Representation Learning with Application to Driver Vigilance Estimation

Guangyi Zhang, Ali Etemad

Driver vigilance estimation is an important task for transportation safety. Wearable and portable brain-computer interface devices provide a powerful means for real-time monitoring of the vigilance level of drivers to help with avoiding distracted or impaired driving. In this paper, we propose a novel multimodal architecture for in-vehicle vigilance estimation from Electroencephalogram and Electrooculogram. To enable the system to focus on the most salient parts of the learned multimodal representations, we propose an architecture composed of a capsule attention mechanism following a deep Long Short-Term Memory (LSTM) network. Our model learns hierarchical dependencies in the data through the LSTM and capsule feature representation layers. To better explore the discriminative ability of the learned representations, we study the effect of the proposed capsule attention mechanism including the number of dynamic routing iterations as well as other parameters. Experiments show the robustness of our method by outperforming other solutions and baseline techniques, setting a new state-of-the-art. We then provide an analysis on different frequency bands and brain regions to evaluate their suitability for driver vigilance estimation. Lastly, an analysis on the role of capsule attention, multimodality, and robustness to noise is performed, highlighting the advantages of our approach.

LGAug 6, 2019
Classification of Hand Movements from EEG using a Deep Attention-based LSTM Network

Guangyi Zhang, Vandad Davoodnia, Alireza Sepas-Moghaddam et al.

Classifying limb movements using brain activity is an important task in Brain-computer Interfaces (BCI) that has been successfully used in multiple application domains, ranging from human-computer interaction to medical and biomedical applications. This paper proposes a novel solution for classification of left/right hand movement by exploiting a Long Short-Term Memory (LSTM) network with attention mechanism to learn the electroencephalogram (EEG) time-series information. To this end, a wide range of time and frequency domain features are extracted from the EEG signals and used to train an LSTM network to perform the classification task. We conduct extensive experiments with the EEG Movement dataset and show that our proposed solution our method achieves improvements over several benchmarks and state-of-the-art methods in both intra-subject and cross-subject validation schemes. Moreover, we utilize the proposed framework to analyze the information as received by the sensors and monitor the activated regions of the brain by tracking EEG topography throughout the experiments.

LGJun 10, 2019
Errors-in-variables Modeling of Personalized Treatment-Response Trajectories

Guangyi Zhang, Reza Ashrafi, Anne Juuti et al.

Estimating the effect of a treatment on a given outcome, conditioned on a vector of covariates, is central in many applications. However, learning the impact of a treatment on a continuous temporal response, when the covariates suffer extensively from measurement error and even the timing of the treatments is uncertain, has not been addressed. We introduce a novel data-driven method that can estimate treatment-response trajectories in this challenging scenario. We model personalized treatment-response curves as a combination of parametric response functions, hierarchically sharing information across individuals, and a sparse Gaussian process for the baseline trend. Importantly, our model considers measurement error not only in treatment covariates, but also in treatment times, a problem which arises in practice for example when treatment information is based on self-reporting. In a challenging and timely problem of estimating the impact of diet on continuous blood glucose measurements, our model leads to significant improvements in estimation accuracy and prediction.