Deniz Erdogmus

LG
h-index40
59papers
2,311citations
Novelty49%
AI Score45

59 Papers

IVMay 2, 2022Code
Deep Learning Framework for Real-time Fetal Brain Segmentation in MRI

Razieh Faghihpirayesh, Davood Karimi, Deniz Erdogmus et al.

Fetal brain segmentation is an important first step for slice-level motion correction and slice-to-volume reconstruction in fetal MRI. Fast and accurate segmentation of the fetal brain on fetal MRI is required to achieve real-time fetal head pose estimation and motion tracking for slice re-acquisition and steering. To address this critical unmet need, in this work we analyzed the speed-accuracy performance of a variety of deep neural network models, and devised a symbolically small convolutional neural network that combines spatial details at high resolution with context features extracted at lower resolutions. We used multiple branches with skip connections to maintain high accuracy while devising a parallel combination of convolution and pooling operations as an input downsampling module to further reduce inference time. We trained our model as well as eight alternative, state-of-the-art networks with manually-labeled fetal brain MRI slices and tested on two sets of normal and challenging test cases. Experimental results show that our network achieved the highest accuracy and lowest inference time among all of the compared state-of-the-art real-time segmentation methods. We achieved average Dice scores of 97.99\% and 84.04\% on the normal and challenging test sets, respectively, with an inference time of 3.36 milliseconds per image on an NVIDIA GeForce RTX 2080 Ti. Code, data, and the trained models are available at https://github.com/bchimagine/real_time_fetal_brain_segmentation.

CVDec 8, 2025Code
SSplain: Sparse and Smooth Explainer for Retinopathy of Prematurity Classification

Elifnur Sunger, Tales Imbiriba, Peter Campbell et al.

Neural networks are frequently used in medical diagnosis. However, due to their black-box nature, model explainers are used to help clinicians understand better and trust model outputs. This paper introduces an explainer method for classifying Retinopathy of Prematurity (ROP) from fundus images. Previous methods fail to generate explanations that preserve input image structures such as smoothness and sparsity. We introduce Sparse and Smooth Explainer (SSplain), a method that generates pixel-wise explanations while preserving image structures by enforcing smoothness and sparsity. This results in realistic explanations to enhance the understanding of the given black-box model. To achieve this goal, we define an optimization problem with combinatorial constraints and solve it using the Alternating Direction Method of Multipliers (ADMM). Experimental results show that SSplain outperforms commonly used explainers in terms of both post-hoc accuracy and smoothness analyses. Additionally, SSplain identifies features that are consistent with domain-understandable features that clinicians consider as discriminative factors for ROP. We also show SSplain's generalization by applying it to additional publicly available datasets. Code is available at https://github.com/neu-spiral/SSplain.

SPMay 12, 2022
Neural Network-based OFDM Receiver for Resource Constrained IoT Devices

Nasim Soltani, Hai Cheng, Mauro Belgiovine et al.

Orthogonal Frequency Division Multiplexing (OFDM)-based waveforms are used for communication links in many current and emerging Internet of Things (IoT) applications, including the latest WiFi standards. For such OFDM-based transceivers, many core physical layer functions related to channel estimation, demapping, and decoding are implemented for specific choices of channel types and modulation schemes, among others. To decouple hard-wired choices from the receiver chain and thereby enhance the flexibility of IoT deployment in many novel scenarios without changing the underlying hardware, we explore a novel, modular Machine Learning (ML)-based receiver chain design. Here, ML blocks replace the individual processing blocks of an OFDM receiver, and we specifically describe this swapping for the legacy channel estimation, symbol demapping, and decoding blocks with Neural Networks (NNs). A unique aspect of this modular design is providing flexible allocation of processing functions to the legacy or ML blocks, allowing them to interchangeably coexist. Furthermore, we study the implementation cost-benefits of the proposed NNs in resource-constrained IoT devices through pruning and quantization, as well as emulation of these compressed NNs within Field Programmable Gate Arrays (FPGAs). Our evaluations demonstrate that the proposed modular NN-based receiver improves bit error rate of the traditional non-ML receiver by averagely 61% and 10% for the simulated and over-the-air datasets, respectively. We further show complexity-performance tradeoffs by presenting computational complexity comparisons between the traditional algorithms and the proposed compressed NNs.

SPJul 13, 2023
Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint Low-latency Auditory Attention Detection

Richard Gall, Deniz Kocanaogullari, Murat Akcakaya et al.

In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest. Studies into the human auditory attention network demonstrate cortical entrainment to speech envelopes resulting in highly correlated Electroencephalography (EEG) measurements. Current trends in EEG-based auditory attention detection (AAD) using artificial neural networks (ANN) are not practical for edge-computing platforms due to longer decision windows using several EEG channels, with higher power consumption and larger memory footprint requirements. Nor are ANNs capable of accurately modeling the brain's top-down attention network since the cortical organization is complex and layer. In this paper, we propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) corticomorphic architecture, inspired by the auditory cortex, which uses EEG data along with multi-speaker speech envelopes to successfully decode auditory attention with low latency down to 1 second, using only 8 EEG electrodes strategically placed close to the auditory cortex, at a significantly higher accuracy of 91.03%, compared to the state-of-the-art. Simultaneously, when compared to a traditional CNN reference model, our model uses ~15% fewer parameters at a lower bit precision resulting in ~57% memory footprint reduction. The results show great promise for edge-computing in brain-embedded devices, like smart hearing aids.

HCOct 30, 2023
Fast and Expressive Gesture Recognition using a Combination-Homomorphic Electromyogram Encoder

Niklas Smedemark-Margulies, Yunus Bicer, Elifnur Sunger et al.

We study the task of gesture recognition from electromyography (EMG), with the goal of enabling expressive human-computer interaction at high accuracy, while minimizing the time required for new subjects to provide calibration data. To fulfill these goals, we define combination gestures consisting of a direction component and a modifier component. New subjects only demonstrate the single component gestures and we seek to extrapolate from these to all possible single or combination gestures. We extrapolate to unseen combination gestures by combining the feature vectors of real single gestures to produce synthetic training data. This strategy allows us to provide a large and flexible gesture vocabulary, while not requiring new subjects to demonstrate combinatorially many example gestures. We pre-train an encoder and a combination operator using self-supervision, so that we can produce useful synthetic training data for unseen test subjects. To evaluate the proposed method, we collect a real-world EMG dataset, and measure the effect of augmented supervision against two baselines: a partially-supervised model trained with only single gesture data from the unseen subject, and a fully-supervised model trained with real single and real combination gesture data from the unseen subject. We find that the proposed method provides a dramatic improvement over the partially-supervised model, and achieves a useful classification accuracy that in some cases approaches the performance of the fully-supervised model.

SPOct 29, 2022
Recursive Estimation of User Intent from Noninvasive Electroencephalography using Discriminative Models

Niklas Smedemark-Margulies, Basak Celik, Tales Imbiriba et al.

We study the problem of inferring user intent from noninvasive electroencephalography (EEG) to restore communication for people with severe speech and physical impairments (SSPI). The focus of this work is improving the estimation of posterior symbol probabilities in a typing task. At each iteration of the typing procedure, a subset of symbols is chosen for the next query based on the current probability estimate. Evidence about the user's response is collected from event-related potentials (ERP) in order to update symbol probabilities, until one symbol exceeds a predefined confidence threshold. We provide a graphical model describing this task, and derive a recursive Bayesian update rule based on a discriminative probability over label vectors for each query, which we approximate using a neural network classifier. We evaluate the proposed method in a simulated typing task and show that it outperforms previous approaches based on generative modeling.

LGSep 29, 2024
KODA: A Data-Driven Recursive Model for Time Series Forecasting and Data Assimilation using Koopman Operators

Ashutosh Singh, Ashish Singh, Tales Imbiriba et al.

Approaches based on Koopman operators have shown great promise in forecasting time series data generated by complex nonlinear dynamical systems (NLDS). Although such approaches are able to capture the latent state representation of a NLDS, they still face difficulty in long term forecasting when applied to real world data. Specifically many real-world NLDS exhibit time-varying behavior, leading to nonstationarity that is hard to capture with such models. Furthermore they lack a systematic data-driven approach to perform data assimilation, that is, exploiting noisy measurements on the fly in the forecasting task. To alleviate the above issues, we propose a Koopman operator-based approach (named KODA - Koopman Operator with Data Assimilation) that integrates forecasting and data assimilation in NLDS. In particular we use a Fourier domain filter to disentangle the data into a physical component whose dynamics can be accurately represented by a Koopman operator, and residual dynamics that represents the local or time varying behavior that are captured by a flexible and learnable recursive model. We carefully design an architecture and training criterion that ensures this decomposition lead to stable and long-term forecasts. Moreover, we introduce a course correction strategy to perform data assimilation with new measurements at inference time. The proposed approach is completely data-driven and can be learned end-to-end. Through extensive experimental comparisons we show that KODA outperforms existing state of the art methods on multiple time series benchmarks such as electricity, temperature, weather, lorenz 63 and duffing oscillator demonstrating its superior performance and efficacy along the three tasks a) forecasting, b) data assimilation and c) state prediction.

LGOct 12, 2023
Stabilizing Subject Transfer in EEG Classification with Divergence Estimation

Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino et al.

Classification models for electroencephalogram (EEG) data show a large decrease in performance when evaluated on unseen test sub jects. We reduce this performance decrease using new regularization techniques during model training. We propose several graphical models to describe an EEG classification task. From each model, we identify statistical relationships that should hold true in an idealized training scenario (with infinite data and a globally-optimal model) but that may not hold in practice. We design regularization penalties to enforce these relationships in two stages. First, we identify suitable proxy quantities (divergences such as Mutual Information and Wasserstein-1) that can be used to measure statistical independence and dependence relationships. Second, we provide algorithms to efficiently estimate these quantities during training using secondary neural network models. We conduct extensive computational experiments using a large benchmark EEG dataset, comparing our proposed techniques with a baseline method that uses an adversarial classifier. We find our proposed methods significantly increase balanced accuracy on test subjects and decrease overfitting. The proposed methods exhibit a larger benefit over a greater range of hyperparameters than the baseline method, with only a small computational cost at training time. These benefits are largest when used for a fixed training period, though there is still a significant benefit for a subset of hyperparameters when our techniques are used in conjunction with early stopping regularization.

LGJun 29, 2023
Fast and Robust State Estimation and Tracking via Hierarchical Learning

Connor Mclaughlin, Matthew Ding, Deniz Erdogmus et al.

Fast and reliable state estimation and tracking are essential for real-time situation awareness in Cyber-Physical Systems (CPS) operating in tactical environments or complicated civilian environments. Traditional centralized solutions do not scale well whereas existing fully distributed solutions over large networks suffer slow convergence, and are vulnerable to a wide spectrum of communication failures. In this paper, we aim to speed up the convergence and enhance the resilience of state estimation and tracking for large-scale networks using a simple hierarchical system architecture. We propose two ``consensus + innovation'' algorithms, both of which rely on a novel hierarchical push-sum consensus component. We characterize their convergence rates under a linear local observation model and minimal technical assumptions. We numerically validate our algorithms through simulation studies of underwater acoustic networks and large-scale synthetic networks.

LGMar 7, 2025Code
Dependency-aware Maximum Likelihood Estimation for Active Learning

Beyza Kalkanli, Tales Imbiriba, Stratis Ioannidis et al.

Active learning aims to efficiently build a labeled training set by strategically selecting samples to query labels from annotators. In this sequential process, each sample acquisition influences subsequent selections, causing dependencies among samples in the labeled set. However, these dependencies are overlooked during the model parameter estimation stage when updating the model using Maximum Likelihood Estimation (MLE), a conventional method that assumes independent and identically distributed (i.i.d.) data. We propose Dependency-aware MLE (DMLE), which corrects MLE within the active learning framework by addressing sample dependencies typically neglected due to the i.i.d. assumption, ensuring consistency with active learning principles in the model parameter estimation process. This improved method achieves superior performance across multiple benchmark datasets, reaching higher performance in earlier cycles compared to conventional MLE. Specifically, we observe average accuracy improvements of 6%, 8.6%, and 10.5% for k=1, k=5, and k=10 respectively, after collecting the first 100 samples, where entropy is the acquisition function and k is the query batch size acquired at every active learning cycle. Our implementation is publicly available at: https://github.com/neu-spiral/DMLEforAL

LGDec 11, 2024
Learning Physics Informed Neural ODEs With Partial Measurements

Paul Ghanem, Ahmet Demirkaya, Tales Imbiriba et al.

Learning dynamics governing physical and spatiotemporal processes is a challenging problem, especially in scenarios where states are partially measured. In this work, we tackle the problem of learning dynamics governing these systems when parts of the system's states are not measured, specifically when the dynamics generating the non-measured states are unknown. Inspired by state estimation theory and Physics Informed Neural ODEs, we present a sequential optimization framework in which dynamics governing unmeasured processes can be learned. We demonstrate the performance of the proposed approach leveraging numerical simulations and a real dataset extracted from an electro-mechanical positioning system. We show how the underlying equations fit into our formalism and demonstrate the improved performance of the proposed method when compared with baselines.

SPFeb 28, 2024
Multistatic-Radar RCS-Signature Recognition of Aerial Vehicles: A Bayesian Fusion Approach

Michael Potter, Murat Akcakaya, Marius Necsoiu et al.

Radar Automated Target Recognition (RATR) for Unmanned Aerial Vehicles (UAVs) involves transmitting Electromagnetic Waves (EMWs) and performing target type recognition on the received radar echo, crucial for defense and aerospace applications. Previous studies highlighted the advantages of multistatic radar configurations over monostatic ones in RATR. However, fusion methods in multistatic radar configurations often suboptimally combine classification vectors from individual radars probabilistically. To address this, we propose a fully Bayesian RATR framework employing Optimal Bayesian Fusion (OBF) to aggregate classification probability vectors from multiple radars. OBF, based on expected 0-1 loss, updates a Recursive Bayesian Classification (RBC) posterior distribution for target UAV type, conditioned on historical observations across multiple time steps. We evaluate the approach using simulated random walk trajectories for seven drones, correlating target aspect angles to Radar Cross Section (RCS) measurements in an anechoic chamber. Comparing against single radar Automated Target Recognition (ATR) systems and suboptimal fusion methods, our empirical results demonstrate that the OBF method integrated with RBC significantly enhances classification accuracy compared to other fusion methods and single radar configurations.

LGFeb 24, 2024
Learning Semilinear Neural Operators : A Unified Recursive Framework For Prediction And Data Assimilation

Ashutosh Singh, Ricardo Augusto Borsoi, Deniz Erdogmus et al.

Recent advances in the theory of Neural Operators (NOs) have enabled fast and accurate computation of the solutions to complex systems described by partial differential equations (PDEs). Despite their great success, current NO-based solutions face important challenges when dealing with spatio-temporal PDEs over long time scales. Specifically, the current theory of NOs does not present a systematic framework to perform data assimilation and efficiently correct the evolution of PDE solutions over time based on sparsely sampled noisy measurements. In this paper, we propose a learning-based state-space approach to compute the solution operators to infinite-dimensional semilinear PDEs. Exploiting the structure of semilinear PDEs and the theory of nonlinear observers in function spaces, we develop a flexible recursive method that allows for both prediction and data assimilation by combining prediction and correction operations. The proposed framework is capable of producing fast and accurate predictions over long time horizons, dealing with irregularly sampled noisy measurements to correct the solution, and benefits from the decoupling between the spatial and temporal dynamics of this class of PDEs. We show through experiments on the Kuramoto-Sivashinsky, Navier-Stokes and Korteweg-de Vries equations that the proposed model is robust to noise and can leverage arbitrary amounts of measurements to correct its prediction over a long time horizon with little computational overhead.

LGJun 30, 2025
Exploring Theory-Laden Observations in the Brain Basis of Emotional Experience

Christiana Westlin, Ashutosh Singh, Deniz Erdogmus et al.

In the science of emotion, it is widely assumed that folk emotion categories form a biological and psychological typology, and studies are routinely designed and analyzed to identify emotion-specific patterns. This approach shapes the observations that studies report, ultimately reinforcing the assumption that guided the investigation. Here, we reanalyzed data from one such typologically-guided study that reported mappings between individual brain patterns and group-averaged ratings of 34 emotion categories. Our reanalysis was guided by an alternative view of emotion categories as populations of variable, situated instances, and which predicts a priori that there will be significant variation in brain patterns within a category across instances. Correspondingly, our analysis made minimal assumptions about the structure of the variance present in the data. As predicted, we did not observe the original mappings and instead observed significant variation across individuals. These findings demonstrate how starting assumptions can ultimately impact scientific conclusions and suggest that a hypothesis must be supported using multiple analytic methods before it is taken seriously.

LGApr 17, 2025
Recursive Deep Inverse Reinforcement Learning

Paul Ghanem, Owen Howell, Michael Potter et al.

Inferring an adversary's goals from exhibited behavior is crucial for counterplanning and non-cooperative multi-agent systems in domains like cybersecurity, military, and strategy games. Deep Inverse Reinforcement Learning (IRL) methods based on maximum entropy principles show promise in recovering adversaries' goals but are typically offline, require large batch sizes with gradient descent, and rely on first-order updates, limiting their applicability in real-time scenarios. We propose an online Recursive Deep Inverse Reinforcement Learning (RDIRL) approach to recover the cost function governing the adversary actions and goals. Specifically, we minimize an upper bound on the standard Guided Cost Learning (GCL) objective using sequential second-order Newton updates, akin to the Extended Kalman Filter (EKF), leading to a fast (in terms of convergence) learning algorithm. We demonstrate that RDIRL is able to recover cost and reward functions of expert agents in standard and adversarial benchmark tasks. Experiments on benchmark tasks show that our proposed approach outperforms several leading IRL algorithms.

LGDec 20, 2024
MarkovType: A Markov Decision Process Strategy for Non-Invasive Brain-Computer Interfaces Typing Systems

Elifnur Sunger, Yunus Bicer, Deniz Erdogmus et al.

Brain-Computer Interfaces (BCIs) help people with severe speech and motor disabilities communicate and interact with their environment using neural activity. This work focuses on the Rapid Serial Visual Presentation (RSVP) paradigm of BCIs using noninvasive electroencephalography (EEG). The RSVP typing task is a recursive task with multiple sequences, where users see only a subset of symbols in each sequence. Extensive research has been conducted to improve classification in the RSVP typing task, achieving fast classification. However, these methods struggle to achieve high accuracy and do not consider the typing mechanism in the learning procedure. They apply binary target and non-target classification without including recursive training. To improve performance in the classification of symbols while controlling the classification speed, we incorporate the typing setup into training by proposing a Partially Observable Markov Decision Process (POMDP) approach. To the best of our knowledge, this is the first work to formulate the RSVP typing task as a POMDP for recursive classification. Experiments show that the proposed approach, MarkovType, results in a more accurate typing system compared to competitors. Additionally, our experiments demonstrate that while there is a trade-off between accuracy and speed, MarkovType achieves the optimal balance between these factors compared to other methods.

LGNov 13, 2022
Inv-SENnet: Invariant Self Expression Network for clustering under biased data

Ashutosh Singh, Ashish Singh, Aria Masoomi et al.

Subspace clustering algorithms are used for understanding the cluster structure that explains the dataset well. These methods are extensively used for data-exploration tasks in various areas of Natural Sciences. However, most of these methods fail to handle unwanted biases in datasets. For datasets where a data sample represents multiple attributes, naively applying any clustering approach can result in undesired output. To this end, we propose a novel framework for jointly removing unwanted attributes (biases) while learning to cluster data points in individual subspaces. Assuming we have information about the bias, we regularize the clustering method by adversarially learning to minimize the mutual information between the data and the unwanted attributes. Our experimental result on synthetic and real-world datasets demonstrate the effectiveness of our approach.

LGDec 17, 2021
AutoTransfer: Subject Transfer Learning with Censored Representations on Biosignals Data

Niklas Smedemark-Margulies, Ye Wang, Toshiaki Koike-Akino et al.

We provide a regularization framework for subject transfer learning in which we seek to train an encoder and classifier to minimize classification loss, subject to a penalty measuring independence between the latent representation and the subject label. We introduce three notions of independence and corresponding penalty terms using mutual information or divergence as a proxy for independence. For each penalty term, we provide several concrete estimation algorithms, using analytic methods as well as neural critic functions. We provide a hands-off strategy for applying this diverse family of regularization algorithms to a new dataset, which we call "AutoTransfer". We evaluate the performance of these individual regularization strategies and our AutoTransfer method on EEG, EMG, and ECoG datasets, showing that these approaches can improve subject transfer learning for challenging real-world datasets.

NCOct 24, 2021
Variation is the Norm: Brain State Dynamics Evoked By Emotional Video Clips

Ashutosh Singh, Christiana Westlin, Hedwig Eisenbarth et al.

For the last several decades, emotion research has attempted to identify a "biomarker" or consistent pattern of brain activity to characterize a single category of emotion (e.g., fear) that will remain consistent across all instances of that category, regardless of individual and context. In this study, we investigated variation rather than consistency during emotional experiences while people watched video clips chosen to evoke instances of specific emotion categories. Specifically, we developed a sequential probabilistic approach to model the temporal dynamics in a participant's brain activity during video viewing. We characterized brain states during these clips as distinct state occupancy periods between state transitions in blood oxygen level dependent (BOLD) signal patterns. We found substantial variation in the state occupancy probability distributions across individuals watching the same video, supporting the hypothesis that when it comes to the brain correlates of emotional experience, variation may indeed be the norm.

LGOct 12, 2021
Cubature Kalman Filter Based Training of Hybrid Differential Equation Recurrent Neural Network Physiological Dynamic Models

Ahmet Demirkaya, Tales Imbiriba, Kyle Lockwood et al.

Modeling biological dynamical systems is challenging due to the interdependence of different system components, some of which are not fully understood. To fill existing gaps in our ability to mechanistically model physiological systems, we propose to combine neural networks with physics-based models. Specifically, we demonstrate how we can approximate missing ordinary differential equations (ODEs) coupled with known ODEs using Bayesian filtering techniques to train the model parameters and simultaneously estimate dynamic state variables. As a study case we leverage a well-understood model for blood circulation in the human retina and replace one of its core ODEs with a neural network approximation, representing the case where we have incomplete knowledge of the physiological state dynamics. Results demonstrate that state dynamics corresponding to the missing ODEs can be approximated well using a neural network trained using a recursive Bayesian filtering approach in a fashion coupled with the known state dynamic differential equations. This demonstrates that dynamics and impact of missing state variables can be captured through joint state estimation and model parameter estimation within a recursive Bayesian state estimation (RBSE) framework. Results also indicate that this RBSE approach to training the NN parameters yields better outcomes (measurement/state estimation accuracy) than training the neural network with backpropagation through time in the same setting.

SYOct 3, 2021
Efficient Modeling of Morphing Wing Flight Using Neural Networks and Cubature Rules

Paul Ghanem, Yunus Bicer, Deniz Erdogmus et al.

Fluidic locomotion of flapping Micro Aerial Vehicles (MAVs) can be very complex, particularly when the rules from insect flight dynamics (fast flapping dynamics and light wings) are not applicable. In these situations, widely used averaging techniques can fail quickly. The primary motivation is to find efficient models for complex forms of aerial locomotion where wings constitute a large part of body mass (i.e., dominant inertial effects) and deform in multiple directions (i.e., morphing wing). In these systems, high degrees of freedom yields complex inertial, Coriolis, and gravity terms. We use Algorithmic Differentiation (AD) and Bayesian filters computed with cubature rules conjointly to quickly estimate complex fluid-structure interactions. In general, Bayesian filters involve finding complex numerical integration (e.g., find posterior integrals). Using cubature rules to compute Gaussian-weighted integrals and AD, we show that the complex multi-degrees-of-freedom dynamics of morphing MAVs can be computed very efficiently and accurately. Therefore, our work facilitates closed-loop feedback control of these morphing MAVs.

CVJul 26, 2021
Circular-Symmetric Correlation Layer based on FFT

Bahar Azari, Deniz Erdogmus

Despite the vast success of standard planar convolutional neural networks, they are not the most efficient choice for analyzing signals that lie on an arbitrarily curved manifold, such as a cylinder. The problem arises when one performs a planar projection of these signals and inevitably causes them to be distorted or broken where there is valuable information. We propose a Circular-symmetric Correlation Layer (CCL) based on the formalism of roto-translation equivariant correlation on the continuous group $S^1 \times \mathbb{R}$, and implement it efficiently using the well-known Fast Fourier Transform (FFT) algorithm. We showcase the performance analysis of a general network equipped with CCL on various recognition and classification tasks and datasets. The PyTorch package implementation of CCL is provided online.

LGJun 16, 2021
EEG-GNN: Graph Neural Networks for Classification of Electroencephalogram (EEG) Signals

Andac Demir, Toshiaki Koike-Akino, Ye Wang et al.

Convolutional neural networks (CNN) have been frequently used to extract subject-invariant features from electroencephalogram (EEG) for classification tasks. This approach holds the underlying assumption that electrodes are equidistant analogous to pixels of an image and hence fails to explore/exploit the complex functional neural connectivity between different electrode sites. We overcome this limitation by tailoring the concepts of convolution and pooling applied to 2D grid-like inputs for the functional network of electrode sites. Furthermore, we develop various graph neural network (GNN) models that project electrodes onto the nodes of a graph, where the node features are represented as EEG channel samples collected over a trial, and nodes can be connected by weighted/unweighted edges according to a flexible policy formulated by a neuroscientist. The empirical evaluations show that our proposed GNN-based framework outperforms standard CNN classifiers across ErrP, and RSVP datasets, as well as allowing neuroscientific interpretability and explainability to deep learning methods tailored to EEG related classification problems. Another practical advantage of our GNN-based framework is that it can be used in EEG channel selection, which is critical for reducing computational cost, and designing portable EEG headsets.

MLMay 4, 2021
On the Sample Complexity of Rank Regression from Pairwise Comparisons

Berkan Kadioglu, Peng Tian, Jennifer Dy et al.

We consider a rank regression setting, in which a dataset of $N$ samples with features in $\mathbb{R}^d$ is ranked by an oracle via $M$ pairwise comparisons. Specifically, there exists a latent total ordering of the samples; when presented with a pair of samples, a noisy oracle identifies the one ranked higher with respect to the underlying total ordering. A learner observes a dataset of such comparisons and wishes to regress sample ranks from their features. We show that to learn the model parameters with $ε> 0$ accuracy, it suffices to conduct $M \in Ω(dN\log^3 N/ε^2)$ comparisons uniformly at random when $N$ is $Ω(d/ε^2)$.

LGMay 1, 2021
Stochastic Mutual Information Gradient Estimation for Dimensionality Reduction Networks

Ozan Ozdenizci, Deniz Erdogmus

Feature ranking and selection is a widely used approach in various applications of supervised dimensionality reduction in discriminative machine learning. Nevertheless there exists significant evidence on feature ranking and selection algorithms based on any criterion leading to potentially sub-optimal solutions for class separability. In that regard, we introduce emerging information theoretic feature transformation protocols as an end-to-end neural network training approach. We present a dimensionality reduction network (MMINet) training procedure based on the stochastic estimate of the mutual information gradient. The network projects high-dimensional features onto an output feature space where lower dimensional representations of features carry maximum mutual information with their associated class labels. Furthermore, we formulate the training objective to be estimated non-parametrically with no distributional assumptions. We experimentally evaluate our method with applications to high-dimensional biological data sets, and relate it to conventional feature selection algorithms to form a special case of our approach.

ROApr 26, 2021
End-to-end grasping policies for human-in-the-loop robots via deep reinforcement learning

Mohammadreza Sharif, Deniz Erdogmus, Christopher Amato et al.

State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromyography (EMG) inference robustness issues. As a workaround, researchers have been looking into integrating EMG with other signals, often in an ad hoc manner. In this paper, we are presenting a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories. For this purpose we use Reinforcement Learning (RL) and Imitation Learning (IL) in DEXTRON (DEXTerity enviRONment), a stochastic simulation environment with real human trajectories that are augmented and selected using a Monte Carlo (MC) simulation method. We also offer a success model which once trained on the expert policy data and the RL policy roll-out transitions, can provide transparency to how the deep policy works and when it is probably going to fail.

ROApr 19, 2021
Inference of Upcoming Human Grasp Using EMG During Reach-to-Grasp Movement

Mo Han, Mehrshad Zandigohar, Sezen Yagmur Gunay et al.

Electromyography (EMG) data has been extensively adopted as an intuitive interface for instructing human-robot collaboration. A major challenge of the real-time detection of human grasp intent is the identification of dynamic EMG from hand movements. Previous studies mainly implemented steady-state EMG classification with a small number of grasp patterns on dynamic situations, which are insufficient to generate differentiated control regarding the muscular activity variation in practice. In order to better detect dynamic movements, more EMG variability could be integrated into the model. However, only limited research were concentrated on such detection of dynamic grasp motions, and most existing assessments on non-static EMG classification either require supervised ground-truth timestamps of the movement status, or only contain limited kinematic variations. In this study, we propose a framework for classifying dynamic EMG signals into gestures, and examine the impact of different movement phases, using an unsupervised method to segment and label the action transitions. We collected and utilized data from large gesture vocabularies with multiple dynamic actions to encode the transitions from one grasp intent to another based on common sequences of the grasp movements. The classifier for identifying the gesture label was constructed afterwards based on the dynamic EMG signal, with no supervised annotation of kinematic movements required. Finally, we evaluated the performances of several training strategies using EMG data from different movement phases, and explored the information revealed from each phase. All experiments were evaluated in a real-time style with the performance transitions over time presented.

ROApr 8, 2021
Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control

Mehrshad Zandigohar, Mo Han, Mohammadreza Sharif et al.

Objective: For transradial amputees, robotic prosthetic hands promise to regain the capability to perform daily living activities. Current control methods based on physiological signals such as electromyography (EMG) are prone to yielding poor inference outcomes due to motion artifacts, muscle fatigue, and many more. Vision sensors are a major source of information about the environment state and can play a vital role in inferring feasible and intended gestures. However, visual evidence is also susceptible to its own artifacts, most often due to object occlusion, lighting changes, etc. Multimodal evidence fusion using physiological and vision sensor measurements is a natural approach due to the complementary strengths of these modalities. Methods: In this paper, we present a Bayesian evidence fusion framework for grasp intent inference using eye-view video, eye-gaze, and EMG from the forearm processed by neural network models. We analyze individual and fused performance as a function of time as the hand approaches the object to grasp it. For this purpose, we have also developed novel data processing and augmentation techniques to train neural network components. Results: Our results indicate that, on average, fusion improves the instantaneous upcoming grasp type classification accuracy while in the reaching phase by 13.66% and 14.8%, relative to EMG (81.64% non-fused) and visual evidence (80.5% non-fused) individually, resulting in an overall fusion accuracy of 95.3%. Conclusion: Our experimental data analyses demonstrate that EMG and visual evidence show complementary strengths, and as a consequence, fusion of multimodal evidence can outperform each individual evidence modality at any given time.

SPFeb 17, 2021
EEG-based Texture Roughness Classification in Active Tactile Exploration with Invariant Representation Learning Networks

Ozan Ozdenizci, Safaa Eldeeb, Andac Demir et al.

During daily activities, humans use their hands to grasp surrounding objects and perceive sensory information which are also employed for perceptual and motor goals. Multiple cortical brain regions are known to be responsible for sensory recognition, perception and motor execution during sensorimotor processing. While various research studies particularly focus on the domain of human sensorimotor control, the relation and processing between motor execution and sensory processing is not yet fully understood. Main goal of our work is to discriminate textured surfaces varying in their roughness levels during active tactile exploration using simultaneously recorded electroencephalogram (EEG) data, while minimizing the variance of distinct motor exploration movement patterns. We perform an experimental study with eight healthy participants who were instructed to use the tip of their dominant hand index finger while rubbing or tapping three different textured surfaces with varying levels of roughness. We use an adversarial invariant representation learning neural network architecture that performs EEG-based classification of different textured surfaces, while simultaneously minimizing the discriminability of motor movement conditions (i.e., rub or tap). Results show that the proposed approach can discriminate between three different textured surfaces with accuracies up to 70%, while suppressing movement related variability from learned representations.

SPFeb 16, 2021
On the use of generative deep neural networks to synthesize artificial multichannel EEG signals

Ozan Ozdenizci, Deniz Erdogmus

Recent promises of generative deep learning lately brought interest to its potential uses in neural engineering. In this paper we firstly review recently emerging studies on generating artificial electroencephalography (EEG) signals with deep neural networks. Subsequently, we present our feasibility experiments on generating condition-specific multichannel EEG signals using conditional variational autoencoders. By manipulating real resting-state EEG epochs, we present an approach to synthetically generate time-series multichannel signals that show spectro-temporal EEG patterns which are expected to be observed during distinct motor imagery conditions.

LGJan 13, 2021
NetCut: Real-Time DNN Inference Using Layer Removal

Mehrshad Zandigohar, Deniz Erdogmus, Gunar Schirner

Deep Learning plays a significant role in assisting humans in many aspects of their lives. As these networks tend to get deeper over time, they extract more features to increase accuracy at the cost of additional inference latency. This accuracy-performance trade-off makes it more challenging for Embedded Systems, as resource-constrained processors with strict deadlines, to deploy them efficiently. This can lead to selection of networks that can prematurely meet a specified deadline with excess slack time that could have potentially contributed to increased accuracy. In this work, we propose: (i) the concept of layer removal as a means of constructing TRimmed Networks (TRNs) that are based on removing problem-specific features of a pretrained network used in transfer learning, and (ii) NetCut, a methodology based on an empirical or an analytical latency estimator, which only proposes and retrains TRNs that can meet the application's deadline, hence reducing the exploration time significantly. We demonstrate that TRNs can expand the Pareto frontier that trades off latency and accuracy to provide networks that can meet arbitrary deadlines with potential accuracy improvement over off-the-shelf networks. Our experimental results show that such utilization of TRNs, while transferring to a simpler dataset, in combination with NetCut, can lead to the proposal of networks that can achieve relative accuracy improvement of up to 10.43% among existing off-the-shelf neural architectures while meeting a specific deadline, and 27x speedup in exploration time.

LGJan 13, 2021
Towards Creating a Deployable Grasp Type Probability Estimator for a Prosthetic Hand

Mehrshad Zandigohar, Mo Han, Deniz Erdogmus et al.

For lower arm amputees, prosthetic hands promise to restore most of physical interaction capabilities. This requires to accurately predict hand gestures capable of grabbing varying objects and execute them timely as intended by the user. Current approaches often rely on physiological signal inputs such as Electromyography (EMG) signal from residual limb muscles to infer the intended motion. However, limited signal quality, user diversity and high variability adversely affect the system robustness. Instead of solely relying on EMG signals, our work enables augmenting EMG intent inference with physical state probability through machine learning and computer vision method. To this end, we: (1) study state-of-the-art deep neural network architectures to select a performant source of knowledge transfer for the prosthetic hand, (2) use a dataset containing object images and probability distribution of grasp types as a new form of labeling where instead of using absolute values of zero and one as the conventional classification labels, our labels are a set of probabilities whose sum is 1. The proposed method generates probabilistic predictions which could be fused with EMG prediction of probabilities over grasps by using the visual information from the palm camera of a prosthetic hand. Our results demonstrate that InceptionV3 achieves highest accuracy with 0.95 angular similarity followed by 1.4 MobileNetV2 with 0.93 at ~20% the amount of operations.

SPSep 28, 2020
Universal Physiological Representation Learning with Soft-Disentangled Rateless Autoencoders

Mo Han, Ozan Ozdenizci, Toshiaki Koike-Akino et al.

Human computer interaction (HCI) involves a multidisciplinary fusion of technologies, through which the control of external devices could be achieved by monitoring physiological status of users. However, physiological biosignals often vary across users and recording sessions due to unstable physical/mental conditions and task-irrelevant activities. To deal with this challenge, we propose a method of adversarial feature encoding with the concept of a Rateless Autoencoder (RAE), in order to exploit disentangled, nuisance-robust, and universal representations. We achieve a good trade-off between user-specific and task-relevant features by making use of the stochastic disentanglement of the latent representations by adopting additional adversarial networks. The proposed model is applicable to a wider range of unknown users and tasks as well as different classifiers. Results on cross-subject transfer evaluations show the advantages of the proposed framework, with up to an 11.6% improvement in the average subject-transfer classification accuracy.

LGAug 26, 2020
Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in biosignal processing have enabled users to exploit their physiological status for manipulating devices in a reliable and safe manner. One major challenge of physiological sensing lies in the variability of biosignals across different users and tasks. To address this issue, we propose an adversarial feature extractor for transfer learning to exploit disentangled universal representations. We consider the trade-off between task-relevant features and user-discriminative information by introducing additional adversary and nuisance networks in order to manipulate the latent representations such that the learned feature extractor is applicable to unknown users and various tasks. Results on cross-subject transfer evaluations exhibit the benefits of the proposed framework, with up to 8.8% improvement in average accuracy of classification, and demonstrate adaptability to a broader range of subjects.

LGJul 30, 2020
Stopping Criterion Design for Recursive Bayesian Classification: Analysis and Decision Geometry

Aziz Kocanaogullari, Murat Akcakaya, Deniz Erdogmus

Systems that are based on recursive Bayesian updates for classification limit the cost of evidence collection through certain stopping/termination criteria and accordingly enforce decision making. Conventionally, two termination criteria based on pre-defined thresholds over (i) the maximum of the state posterior distribution; and (ii) the state posterior uncertainty are commonly used. In this paper, we propose a geometric interpretation over the state posterior progression and accordingly we provide a point-by-point analysis over the disadvantages of using such conventional termination criteria. For example, through the proposed geometric interpretation we show that confidence thresholds defined over maximum of the state posteriors suffer from stiffness that results in unnecessary evidence collection whereas uncertainty based thresholding methods are fragile to number of categories and terminate prematurely if some state candidates are already discovered to be unfavorable. Moreover, both types of termination methods neglect the evolution of posterior updates. We then propose a new stopping/termination criterion with a geometrical insight to overcome the limitations of these conventional methods and provide a comparison in terms of decision accuracy and speed. We validate our claims using simulations and using real experimental data obtained through a brain computer interfaced typing system.

LGJul 2, 2020
AutoBayes: Automated Bayesian Graph Exploration for Nuisance-Robust Inference

Andac Demir, Toshiaki Koike-Akino, Ye Wang et al.

Learning data representations that capture task-related features, but are invariant to nuisance variations remains a key challenge in machine learning. We introduce an automated Bayesian inference framework, called AutoBayes, that explores different graphical models linking classifier, encoder, decoder, estimator and adversarial network blocks to optimize nuisance-invariant machine learning pipelines. AutoBayes also enables learning disentangled representations, where the latent variable is split into multiple pieces to impose various relationships with the nuisance variation and task labels. We benchmark the framework on several public datasets, and provide analysis of its capability for subject-transfer learning with/without variational modeling and adversarial training. We demonstrate a significant performance improvement with ensemble learning across explored graphical models.

SPApr 15, 2020
Disentangled Adversarial Transfer Learning for Physiological Biosignals

Mo Han, Ozan Ozdenizci, Ye Wang et al.

Recent developments in wearable sensors demonstrate promising results for monitoring physiological status in effective and comfortable ways. One major challenge of physiological status assessment is the problem of transfer learning caused by the domain inconsistency of biosignals across users or different recording sessions from the same user. We propose an adversarial inference approach for transfer learning to extract disentangled nuisance-robust representations from physiological biosignal data in stress status level assessment. We exploit the trade-off between task-related features and person-discriminative information by using both an adversary network and a nuisance network to jointly manipulate and disentangle the learned latent representations by the encoder, which are then input to a discriminative classifier. Results on cross-subjects transfer evaluations demonstrate the benefits of the proposed adversarial framework, and thus show its capabilities to adapt to a broader range of subjects. Finally we highlight that our proposed adversarial transfer learning approach is also applicable to other deep feature learning frameworks.

LGApr 7, 2020
Active recursive Bayesian inference using Rényi information measures

Yeganeh M. Marghi, Aziz Kocanaogullari, Murat Akcakaya et al.

Recursive Bayesian inference (RBI) provides optimal Bayesian latent variable estimates in real-time settings with streaming noisy observations. Active RBI attempts to effectively select queries that lead to more informative observations to rapidly reduce uncertainty until a confident decision is made. However, typically the optimality objectives of inference and query mechanisms are not jointly selected. Furthermore, conventional active querying methods stagger due to misleading prior information. Motivated by information theoretic approaches, we propose an active RBI framework with unified inference and query selection steps through Renyi entropy and $α$-divergence. We also propose a new objective based on Renyi entropy and its changes called Momentum that encourages exploration for misleading prior cases. The proposed active RBI framework is applied to the trajectory of the posterior changes in the probability simplex that provides a coordinated active querying and decision making with specified confidence. Under certain assumptions, we analytically demonstrate that the proposed approach outperforms conventional methods such as mutual information by allowing the selections of unlikely events. We present empirical and experimental performance evaluations on two applications: restaurant recommendation and brain-computer interface (BCI) typing systems.

HCOct 2, 2019
User-Adaptive Text Entry for Augmentative and Alternative Communication

Matt Higger, Fernando Quivira, Deniz Erdogmus

The viability of an Augmentative and Alternative Communication device often depends on its ability to adapt to an individual user's unique abilities. Though human input can be noisy, there is often structure to our errors. For example, keyboard keys adjacent to a target may be more likely to be pressed in error. Furthermore, there can be structure in the input message itself (e.g. `u' is likely to follow `q'). In a previous work, `Recursive Bayesian Coding for BCIs' (IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2016), a query strategy considers these structures to offer an error-adaptive single-character text entry scheme. However, constraining ourselves to single-character entry limits performance. A single user input may be able to resolve more uncertainty than the next character has. In this work, we extend the previous framework to incorporate multi-character querying similar to word completion. During simulated spelling, our method requires $20\%$ fewer queries compared to single-character querying with no accuracy penalty. Most significantly, we show that this multi-character querying scheme converges to the information theoretic capacity of the discrete, memoryless user input model.

HCJul 22, 2019
Adversarial Feature Learning in Brain Interfacing: An Experimental Study on Eliminating Drowsiness Effects

Ozan Ozdenizci, Barry Oken, Tab Memmott et al.

Across- and within-recording variabilities in electroencephalographic (EEG) activity is a major limitation in EEG-based brain-computer interfaces (BCIs). Specifically, gradual changes in fatigue and vigilance levels during long EEG recording durations and BCI system usage bring along significant fluctuations in BCI performances even when these systems are calibrated daily. We address this in an experimental offline study from EEG-based BCI speller usage data acquired for one hour duration. As the main part of our methodological approach, we propose the concept of adversarial invariant feature learning for BCIs as a regularization approach on recently expanding EEG deep learning architectures, to learn nuisance-invariant discriminative features. We empirically demonstrate the feasibility of adversarial feature learning on eliminating drowsiness effects from event related EEG activity features, by using temporal recording block ordering as the source of drowsiness variability.

LGMar 28, 2019
Information Theoretic Feature Transformation Learning for Brain Interfaces

Ozan Ozdenizci, Deniz Erdogmus

Objective: A variety of pattern analysis techniques for model training in brain interfaces exploit neural feature dimensionality reduction based on feature ranking and selection heuristics. In the light of broad evidence demonstrating the potential sub-optimality of ranking based feature selection by any criterion, we propose to extend this focus with an information theoretic learning driven feature transformation concept. Methods: We present a maximum mutual information linear transformation (MMI-LinT), and a nonlinear transformation (MMI-NonLinT) framework derived by a general definition of the feature transformation learning problem. Empirical assessments are performed based on electroencephalographic (EEG) data recorded during a four class motor imagery brain-computer interface (BCI) task. Exploiting state-of-the-art methods for initial feature vector construction, we compare the proposed approaches with conventional feature selection based dimensionality reduction techniques which are widely used in brain interfaces. Furthermore, for the multi-class problem, we present and exploit a hierarchical graphical model based BCI decoding system. Results: Both binary and multi-class decoding analyses demonstrate significantly better performances with the proposed methods. Conclusion: Information theoretic feature transformations are capable of tackling potential confounders of conventional approaches in various settings. Significance: We argue that this concept provides significant insights to extend the focus on feature selection heuristics to a broader definition of feature transformation learning in brain interfaces.

LGMar 27, 2019
Adversarial Deep Learning in EEG Biometrics

Ozan Ozdenizci, Ye Wang, Toshiaki Koike-Akino et al.

Deep learning methods for person identification based on electroencephalographic (EEG) brain activity encounters the problem of exploiting the temporally correlated structures or recording session specific variability within EEG. Furthermore, recent methods have mostly trained and evaluated based on single session EEG data. We address this problem from an invariant representation learning perspective. We propose an adversarial inference approach to extend such deep learning models to learn session-invariant person-discriminative representations that can provide robustness in terms of longitudinal usability. Using adversarial learning within a deep convolutional network, we empirically assess and show improvements with our approach based on longitudinally collected EEG data for person identification from half-second EEG epochs.

DSJan 18, 2019
Accelerated Experimental Design for Pairwise Comparisons

Yuan Guo, Jennifer Dy, Deniz Erdogmus et al.

Pairwise comparison labels are more informative and less variable than class labels, but generating them poses a challenge: their number grows quadratically in the dataset size. We study a natural experimental design objective, namely, D-optimality, that can be used to identify which $K$ pairwise comparisons to generate. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naïve greedy implementation has $O(N^2d^2K)$ complexity, where $N$ is the dataset size, $d$ is the feature space dimension, and $K$ is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset--namely, that it consists of pairwise comparisons--the greedy algorithm's complexity can be reduced to $O(N^2(K+d)+N(dK+d^2) +d^2K).$ We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with $10^8$ comparisons; the naïve greedy algorithm on the same dataset would require more than 10 days to terminate.

LGDec 17, 2018
Transfer Learning in Brain-Computer Interfaces with Adversarial Variational Autoencoders

Ozan Ozdenizci, Ye Wang, Toshiaki Koike-Akino et al.

We introduce adversarial neural networks for representation learning as a novel approach to transfer learning in brain-computer interfaces (BCIs). The proposed approach aims to learn subject-invariant representations by simultaneously training a conditional variational autoencoder (cVAE) and an adversarial network. We use shallow convolutional architectures to realize the cVAE, and the learned encoder is transferred to extract subject-invariant features from unseen BCI users' data for decoding. We demonstrate a proof-of-concept of our approach based on analyses of electroencephalographic (EEG) data recorded during a motor imagery BCI experiment.

HCSep 15, 2018
Time-Series Prediction of Proximal Aggression Onset in Minimally-Verbal Youth with Autism Spectrum Disorder Using Physiological Biosignals

Ozan Ozdenizci, Catalina Cumpanasoiu, Carla Mazefsky et al.

It has been suggested that changes in physiological arousal precede potentially dangerous aggressive behavior in youth with autism spectrum disorder (ASD) who are minimally verbal (MV-ASD). The current work tests this hypothesis through time-series analyses on biosignals acquired prior to proximal aggression onset. We implement ridge-regularized logistic regression models on physiological biosensor data wirelessly recorded from 15 MV-ASD youth over 64 independent naturalistic observations in a hospital inpatient unit. Our results demonstrate proof-of-concept, feasibility, and incipient validity predicting aggression onset 1 minute before it occurs using global, person-dependent, and hybrid classifier models.

HCSep 15, 2018
Hierarchical Graphical Models for Context-Aware Hybrid Brain-Machine Interfaces

Ozan Ozdenizci, Sezen Yagmur Gunay, Fernando Quivira et al.

We present a novel hierarchical graphical model based context-aware hybrid brain-machine interface (hBMI) using probabilistic fusion of electroencephalographic (EEG) and electromyographic (EMG) activities. Based on experimental data collected during stationary executions and subsequent imageries of five different hand gestures with both limbs, we demonstrate feasibility of the proposed hBMI system through within session and online across sessions classification analyses. Furthermore, we investigate the context-aware extent of the model by a simulated probabilistic approach and highlight potential implications of our work in the field of neurophysiologically-driven robotic hand prosthetics.

LGAug 5, 2018
Structured Adversarial Attack: Towards General Implementation and Better Interpretability

Kaidi Xu, Sijia Liu, Pu Zhao et al.

When generating adversarial examples to attack deep neural networks (DNNs), Lp norm of the added perturbation is usually used to measure the similarity between original image and adversarial example. However, such adversarial attacks perturbing the raw input spaces may fail to capture structural information hidden in the input. This work develops a more general attack model, i.e., the structured attack (StrAttack), which explores group sparsity in adversarial perturbations by sliding a mask through images aiming for extracting key spatial structures. An ADMM (alternating direction method of multipliers)-based framework is proposed that can split the original problem into a sequence of analytically solvable subproblems and can be generalized to implement other attacking methods. Strong group sparsity is achieved in adversarial perturbations even with the same level of Lp norm distortion as the state-of-the-art attacks. We demonstrate the effectiveness of StrAttack by extensive experimental results onMNIST, CIFAR-10, and ImageNet. We also show that StrAttack provides better interpretability (i.e., better correspondence with discriminative image regions)through adversarial saliency map (Papernot et al., 2016b) and class activation map(Zhou et al., 2016).

LGMay 21, 2018
Invariant Representations from Adversarially Censored Autoencoders

Ye Wang, Toshiaki Koike-Akino, Deniz Erdogmus

We combine conditional variational autoencoders (VAE) with adversarial censoring in order to learn invariant representations that are disentangled from nuisance/sensitive variations. In this method, an adversarial network attempts to recover the nuisance variable from the representation, which the VAE is trained to prevent. Conditioning the decoder on the nuisance variable enables clean separation of the representation, since they are recombined for model learning and data reconstruction. We show this natural approach is theoretically well-founded with information-theoretic arguments. Experiments demonstrate that this method achieves invariance while preserving model learning performance, and results in visually improved performance for style transfer and generative sampling tasks.

CVMar 28, 2018
Asymmetric Loss Functions and Deep Densely Connected Networks for Highly Imbalanced Medical Image Segmentation: Application to Multiple Sclerosis Lesion Detection

Seyed Raein Hashemi, Seyed Sadegh Mohseni Salehi, Deniz Erdogmus et al.

Fully convolutional deep neural networks have been asserted to be fast and precise frameworks with great potential in image segmentation. One of the major challenges in training such networks raises when data is unbalanced, which is common in many medical imaging applications such as lesion segmentation where lesion class voxels are often much lower in numbers than non-lesion voxels. A trained network with unbalanced data may make predictions with high precision and low recall, being severely biased towards the non-lesion class which is particularly undesired in most medical applications where FNs are more important than FPs. Various methods have been proposed to address this problem, more recently similarity loss functions and focal loss. In this work we trained fully convolutional deep neural networks using an asymmetric similarity loss function to mitigate the issue of data imbalance and achieve much better tradeoff between precision and recall. To this end, we developed a 3D FC-DenseNet with large overlapping image patches as input and an asymmetric similarity loss layer based on Tversky index (using Fbeta scores). We used large overlapping image patches as inputs for intrinsic and extrinsic data augmentation, a patch selection algorithm, and a patch prediction fusion strategy using B-spline weighted soft voting to account for the uncertainty of prediction in patch borders. We applied this method to MS lesion segmentation based on two different datasets of MSSEG and ISBI longitudinal MS lesion segmentation challenge, where we achieved top performance in both challenges. Our network trained with focal loss ranked first according to the ISBI challenge overall score and resulted in the lowest reported lesion false positive rate among all submitted methods. Our network trained with the asymmetric similarity loss led to the lowest surface distance and the best lesion true positive rate.

CVMar 15, 2018
Real-time Deep Pose Estimation with Geodesic Loss for Image-to-Template Rigid Registration

Seyed Sadegh Mohseni Salehi, Shadab Khan, Deniz Erdogmus et al.

With an aim to increase the capture range and accelerate the performance of state-of-the-art inter-subject and subject-to-template 3D registration, we propose deep learning-based methods that are trained to find the 3D position of arbitrarily oriented subjects or anatomy based on slices or volumes of medical images. For this, we propose regression CNNs that learn to predict the angle-axis representation of 3D rotations and translations using image features. We use and compare mean square error and geodesic loss to train regression CNNs for 3D pose estimation used in two different scenarios: slice-to-volume registration and volume-to-volume registration. Our results show that in such registration applications that are amendable to learning, the proposed deep learning methods with geodesic loss minimization can achieve accurate results with a wide capture range in real-time (<100ms). We also tested the generalization capability of the trained CNNs on an expanded age range and on images of newborn subjects with similar and different MR image contrasts. We trained our models on T2-weighted fetal brain MRI scans and used them to predict the 3D pose of newborn brains based on T1-weighted MRI scans. We showed that the trained models generalized well for the new domain when we performed image contrast transfer through a conditional generative adversarial network. This indicates that the domain of application of the trained deep regression CNNs can be further expanded to image modalities and contrasts other than those used in training. A combination of our proposed methods with accelerated optimization-based registration algorithms can dramatically enhance the performance of automatic imaging devices and image processing methods of the future.