SDMar 16, 2023
Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP PlatformsJoseph Konan, Ojas Bhargave, Shikhar Agnihotri et al. · cmu
In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.
AO-PHJan 15, 2009
Data Assimilation for Wildland Fires: Ensemble Kalman filters in coupled atmosphere-surface modelsJan Mandel, Jonathan D. Beezley, Janice L. Coen et al.
Two wildland fire models are described, one based on reaction-diffusion-convection partial differential equations, and one based on semi-empirical fire spread by the level let method. The level set method model is coupled with the Weather Research and Forecasting (WRF) atmospheric model. The regularized and the morphing ensemble Kalman filter are used for data assimilation.
NAJan 15, 2008
A wildland fire model with data assimilationJan Mandel, Lynn S. Bennethum, Jonathan D. Beezley et al.
A wildfire model is formulated based on balance equations for energy and fuel, where the fuel loss due to combustion corresponds to the fuel reaction rate. The resulting coupled partial differential equations have coefficients that can be approximated from prior measurements of wildfires. An ensemble Kalman filter technique with regularization is then used to assimilate temperatures measured at selected points into running wildfire simulations. The assimilation technique is able to modify the simulations to track the measurements correctly even if the simulations were started with an erroneous ignition location that is quite far away from the correct one.
LGJul 1, 2023
Re-Think and Re-Design Graph Neural Networks in Spaces of Continuous Graph Diffusion FunctionalsTingting Dan, Jiaqi Ding, Ziquan Wei et al.
Graph neural networks (GNNs) are widely used in domains like social networks and biological systems. However, the locality assumption of GNNs, which limits information exchange to neighboring nodes, hampers their ability to capture long-range dependencies and global patterns in graphs. To address this, we propose a new inductive bias based on variational analysis, drawing inspiration from the Brachistochrone problem. Our framework establishes a mapping between discrete GNN models and continuous diffusion functionals. This enables the design of application-specific objective functions in the continuous domain and the construction of discrete deep models with mathematical guarantees. To tackle over-smoothing in GNNs, we analyze the existing layer-by-layer graph embedding models and identify that they are equivalent to l2-norm integral functionals of graph gradients, which cause over-smoothing. Similar to edge-preserving filters in image denoising, we introduce total variation (TV) to align the graph diffusion pattern with global community topologies. Additionally, we devise a selective mechanism to address the trade-off between model depth and over-smoothing, which can be easily integrated into existing GNNs. Furthermore, we propose a novel generative adversarial network (GAN) that predicts spreading flows in graphs through a neural transport equation. To mitigate vanishing flows, we customize the objective function to minimize transportation within each community while maximizing inter-community flows. Our GNN models achieve state-of-the-art (SOTA) performance on popular graph learning benchmarks such as Cora, Citeseer, and Pubmed.
CVMay 27
Anomaly as Non-Conformity via Training-Free Graph Laplacian Energy MinimizationJungwook Seo, Minjeong Kim, Younkwan Lee et al.
Detecting subtle visual anomalies in images remains challenging, particularly when only normal samples are available a priori. Such unsupervised anomaly detection is typically solved by measuring feature similarity of a query patch to a memory of normal patches. However, similarity alone does not reveal how strongly a query patch violates the structure of the normal feature manifold. We propose a training-free Laplacian graph energy optimization formulation, named ANoCo that scores Anomaly by the cost of Non-Conformity of a query patch to align with a fixed normal manifold. For each query patch, we construct a bipartite query to normal graph weighted by cosine affinity, explicitly removing query-query and normal-normal edges to prevent evidence dilution. We formulate anomaly scoring as a convex Laplacian energy with anchored normal nodes, and solve in closed form. In particular, we do not use the optimized features themselves-the anomaly score is the magnitude of the update required to satisfy normality constraints, reframing the graph Laplacian as a non-conformity operator rather than a smoothing prior. The proposed method introduces no learnable parameters, message passing, or sampling, and has complexity comparable to a single linear solve. Across standard benchmarks, it delivers strong image-level AUROC, stable localization maps, and improved robustness over prior methods, demonstrating the effectiveness of using optimization-induced feature drift as anomaly measure.
CVMar 28, 2023
Enhancing Breast Cancer Risk Prediction by Incorporating Prior ImagesHyeonsoo Lee, Junha Kim, Eunkyung Park et al.
Recently, deep learning models have shown the potential to predict breast cancer risk and enable targeted screening strategies, but current models do not consider the change in the breast over time. In this paper, we present a new method, PRIME+, for breast cancer risk prediction that leverages prior mammograms using a transformer decoder, outperforming a state-of-the-art risk prediction method that only uses mammograms from a single time point. We validate our approach on a dataset with 16,113 exams and further demonstrate that it effectively captures patterns of changes from prior mammograms, such as changes in breast density, resulting in improved short-term and long-term breast cancer risk prediction. Experimental results show that our model achieves a statistically significant improvement in performance over the state-of-the-art based model, with a C-index increase from 0.68 to 0.73 (p < 0.05) on held-out test sets.
CVSep 30, 2022
Embedded System Performance Analysis for Implementing a Portable Drowsiness Detection System for DriversMinjeong Kim, Jimin Koo
Drowsiness on the road is a widespread problem with fatal consequences; thus, a multitude of systems and techniques have been proposed. Among existing methods, Ghoddoosian et al. utilized temporal blinking patterns to detect early signs of drowsiness, but their algorithm was tested only on a powerful desktop computer, which is not practical to apply in a moving vehicle setting. In this paper, we propose an efficient platform to run Ghoddosian's algorithm, detail the performance tests we ran to determine this platform, and explain our threshold optimization logic. After considering the Jetson Nano and Beelink (Mini PC), we concluded that the Mini PC is the most efficient and practical to run our embedded system in a vehicle. To determine this, we ran communication speed tests and evaluated total processing times for inference operations. Based on our experiments, the average total processing time to run the drowsiness detection model was 94.27 ms for Jetson Nano and 22.73 ms for the Beelink (Mini PC). Considering the portability and power efficiency of each device, along with the processing time results, the Beelink (Mini PC) was determined to be most suitable. Also, we propose a threshold optimization algorithm, which determines whether the driver is drowsy or alert based on the trade-off between the sensitivity and specificity of the drowsiness detection model. Our study will serve as a crucial next step for drowsiness detection research and its application in vehicles. Through our experiment, we have determinend a favorable platform that can run drowsiness detection algorithms in real-time and can be used as a foundation to further advance drowsiness detection research. In doing so, we have bridged the gap between an existing embedded system and its actual implementation in vehicles to bring drowsiness technology a step closer to prevalent real-life implementation.
HCSep 27, 2024
Building Trust Through Voice: How Vocal Tone Impacts User Perception of Attractiveness of Voice AssistantsSabid Bin Habib Pias, Alicia Freel, Ran Huang et al.
Voice Assistants (VAs) are popular for simple tasks, but users are often hesitant to use them for complex activities like online shopping. We explored whether the vocal characteristics like the VA's vocal tone, can make VAs perceived as more attractive and trustworthy to users for complex tasks. Our findings show that the tone of the VA voice significantly impacts its perceived attractiveness and trustworthiness. Participants in our experiment were more likely to be attracted to VAs with positive or neutral tones and ultimately trusted the VAs they found more attractive. We conclude that VA's perceived trustworthiness can be enhanced through thoughtful voice design, incorporating a variety of vocal tones.
SDFeb 2, 2023
Speech Enhancement for Virtual Meetings on Cellular NetworksHojeong Lee, Minseon Gwak, Kawon Lee et al.
We study speech enhancement using deep learning (DL) for virtual meetings on cellular devices, where transmitted speech has background noise and transmission loss that affects speech quality. Since the Deep Noise Suppression (DNS) Challenge dataset does not contain practical disturbance, we collect a transmitted DNS (t-DNS) dataset using Zoom Meetings over T-Mobile network. We select two baseline models: Demucs and FullSubNet. The Demucs is an end-to-end model that takes time-domain inputs and outputs time-domain denoised speech, and the FullSubNet takes time-frequency-domain inputs and outputs the energy ratio of the target speech in the inputs. The goal of this project is to enhance the speech transmitted over the cellular networks using deep learning models.
NCJun 5, 2020Code
Neuropsychiatric Disease Classification Using Functional Connectomics -- Results of the Connectomics in NeuroImaging Transfer Learning ChallengeMarkus D. Schirmer, Archana Venkataraman, Islem Rekik et al.
Large, open-source consortium datasets have spurred the development of new and increasingly powerful machine learning approaches in brain connectomics. However, one key question remains: are we capturing biologically relevant and generalizable information about the brain, or are we simply overfitting to the data? To answer this, we organized a scientific challenge, the Connectomics in NeuroImaging Transfer Learning Challenge (CNI-TLC), held in conjunction with MICCAI 2019. CNI-TLC included two classification tasks: (1) diagnosis of Attention-Deficit/Hyperactivity Disorder (ADHD) within a pre-adolescent cohort; and (2) transference of the ADHD model to a related cohort of Autism Spectrum Disorder (ASD) patients with an ADHD comorbidity. In total, 240 resting-state fMRI time series averaged according to three standard parcellation atlases, along with clinical diagnosis, were released for training and validation (120 neurotypical controls and 120 ADHD). We also provided demographic information of age, sex, IQ, and handedness. A second set of 100 subjects (50 neurotypical controls, 25 ADHD, and 25 ASD with ADHD comorbidity) was used for testing. Models were submitted in a standardized format as Docker images through ChRIS, an open-source image analysis platform. Utilizing an inclusive approach, we ranked the methods based on 16 different metrics. The final rank was calculated using the rank product for each participant across all measures. Furthermore, we assessed the calibration curves of each method. Five participants submitted their model for evaluation, with one outperforming all other methods in both ADHD and ASD classification. However, further improvements are needed to reach the clinical translation of functional connectomics. We are keeping the CNI-TLC open as a publicly available resource for developing and validating new classification methodologies in the field of connectomics.
LGNov 8, 2016Code
PixelSNE: Visualizing Fast with Just Enough Precision via Pixel-Aligned Stochastic Neighbor EmbeddingMinjeong Kim, Minsuk Choi, Sunwoong Lee et al.
Embedding and visualizing large-scale high-dimensional data in a two-dimensional space is an important problem since such visualization can reveal deep insights out of complex data. Most of the existing embedding approaches, however, run on an excessively high precision, ignoring the fact that at the end, embedding outputs are converted into coarse-grained discrete pixel coordinates in a screen space. Motivated by such an observation and directly considering pixel coordinates in an embedding optimization process, we accelerate Barnes-Hut tree-based t-distributed stochastic neighbor embedding (BH-SNE), known as a state-of-the-art 2D embedding method, and propose a novel method called PixelSNE, a highly-efficient, screen resolution-driven 2D embedding method with a linear computational complexity in terms of the number of data items. Our experimental results show the significantly fast running time of PixelSNE by a large margin against BH-SNE, while maintaining the minimal degradation in the embedding quality. Finally, the source code of our method is publicly available at https://github.com/awesome-davian/PixelSNE
HCApr 21
Seeing Your Mindless Face: How Viewing One's Live Self Interrupts Mindless Short-Form Video ScrollingKyungjin Kim, Minjeong Kim, Soobeen Jeong et al.
The widespread, addictive consumption of short-form videos, which allegedly causes "brain rot," has become an urgent public concern. This study proposes that self-related cues serve as an intrinsic, self-reflective strategy that enhances self-control over media overuse. We developed an app that de-immerses users by periodically displaying different self-related cues (live camera, selfie, name in text, and black screen) and tested their effects in a laboratory experiment (N=84). Overall, findings show that self-related cues effectively disrupt mindless viewing, enabling users to voluntarily stop short-form video consumption. Interestingly, the black screen, intended as a control, elicited the greatest intention to use the app: Participants noted in the follow-up interview that they preferred the subtler reflection on a black screen over the explicit image from a live camera. The findings offer practical design guidelines for implementing self-awareness interventions in mobile contexts, including which modalities work best and how real-time contextual anchoring enhances effectiveness.
LGMar 3, 2025
Learning Covariance-Based Multi-Scale Representation of Neuroimaging Measures for Alzheimer ClassificationSeunghun Baek, Injun Choi, Mustafa Dere et al.
Stacking excessive layers in DNN results in highly underdetermined system when training samples are limited, which is very common in medical applications. In this regard, we present a framework capable of deriving an efficient high-dimensional space with reasonable increase in model size. This is done by utilizing a transform (i.e., convolution) that leverages scale-space theory with covariance structure. The overall model trains on this transform together with a downstream classifier (i.e., Fully Connected layer) to capture the optimal multi-scale representation of the original data which corresponds to task-specific components in a dual space. Experiments on neuroimaging measures from Alzheimer's Disease Neuroimaging Initiative (ADNI) study show that our model performs better and converges faster than conventional models even when the model size is significantly reduced. The trained model is made interpretable using gradient information over the multi-scale transform to delineate personalized AD-specific regions in the brain.
IVMar 3, 2025
Modality-Agnostic Style Transfer for Holistic Feature ImputationSeunghun Baek, Jaeyoon Sim, Mustafa Dere et al.
Characterizing a preclinical stage of Alzheimer's Disease (AD) via single imaging is difficult as its early symptoms are quite subtle. Therefore, many neuroimaging studies are curated with various imaging modalities, e.g., MRI and PET, however, it is often challenging to acquire all of them from all subjects and missing data become inevitable. In this regards, in this paper, we propose a framework that generates unobserved imaging measures for specific subjects using their existing measures, thereby reducing the need for additional examinations. Our framework transfers modality-specific style while preserving AD-specific content. This is done by domain adversarial training that preserves modality-agnostic but AD-specific information, while a generative adversarial network adds an indistinguishable modality-specific style. Our proposed framework is evaluated on the Alzheimer's Disease Neuroimaging Initiative (ADNI) study and compared with other imputation methods in terms of generated data quality. Small average Cohen's $d$ $< 0.19$ between our generated measures and real ones suggests that the synthetic data are practically usable regardless of their modality type.
LGApr 21, 2025
Edge-boosted graph learning for functional brain connectivity analysisDavid Yang, Mostafa Abdelmegeed, John Modl et al.
Predicting disease states from functional brain connectivity is critical for the early diagnosis of severe neurodegenerative diseases such as Alzheimer's Disease and Parkinson's Disease. Existing studies commonly employ Graph Neural Networks (GNNs) to infer clinical diagnoses from node-based brain connectivity matrices generated through node-to-node similarities of regionally averaged fMRI signals. However, recent neuroscience studies found that such node-based connectivity does not accurately capture ``functional connections" within the brain. This paper proposes a novel approach to brain network analysis that emphasizes edge functional connectivity (eFC), shifting the focus to inter-edge relationships. Additionally, we introduce a co-embedding technique to integrate edge functional connections effectively. Experimental results on the ADNI and PPMI datasets demonstrate that our method significantly outperforms state-of-the-art GNN methods in classifying functional brain networks.
CLOct 23, 2020
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language UnderstandingMinjeong Kim, Gyuwan Kim, Sang-Woo Lee et al.
Language model pre-training has shown promising results in various downstream tasks. In this context, we introduce a cross-modal pre-trained language model, called Speech-Text BERT (ST-BERT), to tackle end-to-end spoken language understanding (E2E SLU) tasks. Taking phoneme posterior and subword-level text as an input, ST-BERT learns a contextualized cross-modal alignment via our two proposed pre-training tasks: Cross-modal Masked Language Modeling (CM-MLM) and Cross-modal Conditioned Language Modeling (CM-CLM). Experimental results on three benchmarks present that our approach is effective for various SLU datasets and shows a surprisingly marginal performance degradation even when 1% of the training data are available. Also, our method shows further SLU performance gain via domain-adaptive pre-training with domain-specific speech-text pair data.
CRMay 13, 2019
Impossibility of Full Decentralization in Permissionless BlockchainsYujin Kwon, Jian Liu, Minjeong Kim et al.
Bitcoin uses blockchain technology and proof-of-work (PoW) mechanism where nodes spend computing resources and earn rewards in return for spending these resources. This incentive system has caused power to be significantly biased towards a few nodes, called mining pools. In fact, poor decentralization appears not only in PoW-based coins but also in coins adopting other mechanisms such as proof-of-stake (PoS) and delegated proof-of-stake (DPoS). In this paper, we target this centralization issue. To this end, we first define (m, \varepsilon, δ)-decentralization as a state that satisfies 1) there are at least m participants running a node and 2) the ratio between the total resource power of nodes run by the richest and δ-th percentile participants is less than or equal to 1+\varepsilon. To see if it is possible to achieve good decentralization, we introduce sufficient conditions for the incentive system of a blockchain to reach (m, \varepsilon, δ)-decentralization. When satisfying the conditions, a blockchain system can reach full decentralization with probability 1. However, to achieve this, the blockchain system should be able to assign a positive Sybil cost, where the Sybil cost is defined as the difference between the cost for one participant running multiple nodes and the total cost for multiple participants each running one node. On the other hand, we prove that when there is no Sybil cost, the probability of reaching (m, \varepsilon, δ)-decentralization is upper bounded by a value close to 0, considering a large rich-poor gap. To determine the conditions that each system cannot satisfy, we also analyze protocols of all PoW, PoS, and DPoS coins in the top 100 coins according to our conditions. Finally, we conduct data analysis of these coins to validate our theory.
CLJul 20, 2018
Question-Aware Sentence Gating Networks for Question and AnsweringMinjeong Kim, David Keetae Park, Hyungjong Noh et al.
Machine comprehension question answering, which finds an answer to the question given a passage, involves high-level reasoning processes of understanding and tracking the relevant contents across various semantic units such as words, phrases, and sentences in a document. This paper proposes the novel question-aware sentence gating networks that directly incorporate the sentence-level information into word-level encoding processes. To this end, our model first learns question-aware sentence representations and then dynamically combines them with word-level representations, resulting in semantically meaningful word representations for QA tasks. Experimental results demonstrate that our approach consistently improves the accuracy over existing baseline approaches on various QA datasets and bears the wide applicability to other neural network-based QA models.