CVJul 26, 2023Code
AIDE: A Vision-Driven Multi-View, Multi-Modal, Multi-Tasking Dataset for Assistive Driving PerceptionDingkang Yang, Shuai Huang, Zhi Xu et al.
Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.
IVJul 29, 2022Code
Robust Quantitative Susceptibility Mapping via Approximate Message Passing with Parameter EstimationShuai Huang, James J. Lah, Jason W. Allen et al.
Purpose: For quantitative susceptibility mapping (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theory: From a Bayesian perspective, the image wavelet coefficients are approximately sparse and modelled by the Laplace distribution. The measurement noise is modelled by a Gaussian-mixture distribution with two components, where the second component is used to model the noise outliers. Through probabilistic inference, the susceptibility map and distribution parameters can be jointly recovered using approximate message passing (AMP). Methods: We compare our proposed AMP with built-in parameter estimation (AMP-PE) to the state-of-the-art L1-QSM, FANSI and MEDI approaches on the simulated and in vivo datasets, and perform experiments to explore the optimal settings of AMP-PE. Reproducible code is available at https://github.com/EmoryCN2L/QSM_AMP_PE Results: On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE, DFCM and the highest SSIM, while MEDI achieved the lowest HFEN. On the in vivo datasets, AMP-PE is robust and successfully recovers the susceptibility maps using the estimated parameters, whereas L1-QSM, FANSI and MEDI typically require additional visual fine-tuning to select or double-check working parameters. Conclusion: AMP-PE provides automatic and adaptive parameter estimation for QSM and avoids the subjectivity from the visual fine-tuning step, making it an excellent choice for the clinical setting.
LGJul 23, 2022
Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical DataZepeng Huo, Xiaoning Qian, Shuai Huang et al.
Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature extraction and classification process, adjusting training batches separately for each component to mitigate bias caused by class density discrepancy; 2) we train the network with both a density-aware loss and a learnable cost matrix for misclassifications. We demonstrate our model's improved performance in real-world medical datasets (TOPCAT and MIMIC-III) to show improved AUC-ROC, AUC-PRC, Brier Skill Score compared with the baselines in the domain.
CVMar 21, 2023
Context De-confounded Emotion RecognitionDingkang Yang, Zhaoyu Chen, Yuzheng Wang et al.
Context-Aware Emotion Recognition (CAER) is a crucial and challenging task that aims to perceive the emotional states of the target person with contextual information. Recent approaches invariably focus on designing sophisticated architectures or mechanisms to extract seemingly meaningful representations from subjects and contexts. However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios. Concretely, the harmful bias is a confounder that misleads existing models to learn spurious correlations based on conventional likelihood estimation, significantly limiting the models' performance. To tackle the issue, this paper provides a causality-based perspective to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a tailored causal graph. Then, we propose a Contextual Causal Intervention Module (CCIM) based on the backdoor adjustment to de-confound the confounder and exploit the true causal effect for model training. CCIM is plug-in and model-agnostic, which improves diverse state-of-the-art approaches by considerable margins. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our CCIM and the significance of causal insight.
SPSep 26, 2022
DynImp: Dynamic Imputation for Wearable Sensing Data Through Sensory and Temporal RelatednessZepeng Huo, Taowei Ji, Yifei Liang et al.
In wearable sensing applications, data is inevitable to be irregularly sampled or partially missing, which pose challenges for any downstream application. An unique aspect of wearable data is that it is time-series data and each channel can be correlated to another one, such as x, y, z axis of accelerometer. We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors. We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis and then feeding the data into a LSTM-based denoising autoencoder which can reconstruct missingness along the time axis. We experiment the model on the extreme missingness scenario ($>50\%$ missing rate) which has not been widely tested in wearable data. Our experiments on activity recognition show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.
AIMar 4Code
LifeBench: A Benchmark for Long-Horizon Multi-Source MemoryZihao Cheng, Weixin Wang, Yu Zhao et al.
Long-term memory is fundamental for personalized agents capable of accumulating knowledge, reasoning over user experiences, and adapting across time. However, existing memory benchmarks primarily target declarative memory, specifically semantic and episodic types, where all information is explicitly presented in dialogues. In contrast, real-world actions are also governed by non-declarative memory, including habitual and procedural types, and need to be inferred from diverse digital traces. To bridge this gap, we introduce Lifebench, which features densely connected, long-horizon event simulation. It pushes AI agents beyond simple recall, requiring the integration of declarative and non-declarative memory reasoning across diverse and temporally extended contexts. Building such a benchmark presents two key challenges: ensuring data quality and scalability. We maintain data quality by employing real-world priors, including anonymized social surveys, map APIs, and holiday-integrated calendars, thus enforcing fidelity, diversity and behavioral rationality within the dataset. Towards scalability, we draw inspiration from cognitive science and structure events according to their partonomic hierarchy; enabling efficient parallel generation while maintaining global coherence. Performance results show that top-tier, state-of-the-art memory systems reach just 55.2\% accuracy, highlighting the inherent difficulty of long-horizon retrieval and multi-source integration within our proposed benchmark. The dataset and data synthesis code are available at https://github.com/1754955896/LifeBench.
OCJul 6, 2022
Orthogonal Matrix Retrieval with Spatial Consensus for 3D Unknown-View TomographyShuai Huang, Mona Zehni, Ivan Dokmanić et al.
Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR) approaches based on matrix factorization, which, while elegant, either require side information about the density that is not available, or fail to be sufficiently robust. For OMR to break free from those restrictions, we propose to jointly recover the density map and the orthogonal matrices by requiring that they be mutually consistent. We regularize the resulting non-convex optimization problem by a denoised reference projection and a nonnegativity constraint. This is enabled by the new closed-form expressions for spatial autocorrelation features. Further, we design an easy-to-compute initial density map which effectively mitigates the non-convexity of the reconstruction problem. Experimental results show that the proposed OMR with spatial consensus is more robust and performs significantly better than the previous state-of-the-art OMR approach in the typical low-SNR scenario of 3D UVT.
CEMay 18
Physics-Informed Neural Networks for Radial Consolidation of Combined Electroosmotic, Vacuum and Surcharge Preloading Considering Smear EffectsDong Li, Yapeng Cao, Shuai Huang et al.
This study develops a dimensionless multi-domain physics-informed neural network (PINN) framework for electro-osmotic radial consolidation considering smear effects and combined vacuum and surcharge loading. Three PINN-based models are investigated: a standard soft-constrained PINN (Std-PINN), a modified gated PINN (Mod-PINN), and a modified gated PINN with hard-constraint boundary encoding (Mod-HC-PINN). The models are evaluated against FEM reference solutions under four loading cases, including constant vacuum, exponential vacuum, exponential vacuum with ramp surcharge, and exponential vacuum with cyclic haversine surcharge. The results indicate that the gated architecture applied in Mod-PINN improves the resolution of steep pressure gradients near the cathode and smear-zone interface under constant vacuum loading. Under time-dependent loading, the soft-constrained Mod-PINN shows reduced accuracy because it must learn multiple competing objectives simultaneously. The Mod-HC-PINN mitigates this issue by embedding the cathode boundary and initial conditions into the output structure, thereby reducing the optimization burden and improving physical consistency. The Mod-HC-PINN achieves MAE values of 0.43, 0.41, and 0.27 kPa for the exponential vacuum, ramp surcharge, and cyclic surcharge cases, respectively. Sensitivity analyses further demonstrate that the proposed framework remains robust across practical ranges of network architecture, collocation density, and permeability contrast.
CLJan 15
HumanLLM: Benchmarking and Improving LLM Anthropomorphism via Human Cognitive PatternsXintao Wang, Jian Yang, Weiyuan Li et al.
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and generation, serving as the foundation for advanced persona simulation and Role-Playing Language Agents (RPLAs). However, achieving authentic alignment with human cognitive and behavioral patterns remains a critical challenge for these agents. We present HumanLLM, a framework treating psychological patterns as interacting causal forces. We construct 244 patterns from ~12,000 academic papers and synthesize 11,359 scenarios where 2-5 patterns reinforce, conflict, or modulate each other, with multi-turn conversations expressing inner thoughts, actions, and dialogue. Our dual-level checklists evaluate both individual pattern fidelity and emergent multi-pattern dynamics, achieving strong human alignment (r=0.91) while revealing that holistic metrics conflate simulation accuracy with social desirability. HumanLLM-8B outperforms Qwen3-32B on multi-pattern dynamics despite 4x fewer parameters, demonstrating that authentic anthropomorphism requires cognitive modeling--simulating not just what humans do, but the psychological processes generating those behaviors.
CVJun 15, 2025Code
SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Large Language ModelsXinyi Zhao, Congjing Zhang, Pei Guo et al.
Video anomaly detection (VAD) is essential for enhancing safety and security by identifying unusual events across different environments. Existing VAD benchmarks, however, are primarily designed for general-purpose scenarios, neglecting the specific characteristics of smart home applications. To bridge this gap, we introduce SmartHome-Bench, the first comprehensive benchmark specially designed for evaluating VAD in smart home scenarios, focusing on the capabilities of multi-modal large language models (MLLMs). Our newly proposed benchmark consists of 1,203 videos recorded by smart home cameras, organized according to a novel anomaly taxonomy that includes seven categories, such as Wildlife, Senior Care, and Baby Monitoring. Each video is meticulously annotated with anomaly tags, detailed descriptions, and reasoning. We further investigate adaptation methods for MLLMs in VAD, assessing state-of-the-art closed-source and open-source models with various prompting techniques. Results reveal significant limitations in the current models' ability to detect video anomalies accurately. To address these limitations, we introduce the Taxonomy-Driven Reflective LLM Chain (TRLC), a new LLM chaining framework that achieves a notable 11.62% improvement in detection accuracy. The benchmark dataset and code are publicly available at https://github.com/Xinyi-0724/SmartHome-Bench-LLM.
CLOct 27, 2025Code
SI-Bench: Benchmarking Social Intelligence of Large Language Models in Human-to-Human ConversationsShuai Huang, Wenxuan Zhao, Jun Gao
As large language models (LLMs) develop anthropomorphic abilities, they are increasingly being deployed as autonomous agents to interact with humans. However, evaluating their performance in realistic and complex social interactions remains a significant challenge. Most previous research built datasets through simulated agent-to-agent interactions, which fails to capture the authentic linguistic styles and relational dynamics found in real human conversations. To address this gap, we introduce SI-Bench, a novel benchmark designed to evaluate aspects of social intelligence in LLMs. Grounded in broad social science theories, SI-Bench contains 2,221 authentic multi-turn dialogues collected from a social networking application. We further selected a subset of 312 dialogues for manual annotation across 8 major models. The experiments show that SOTA models have surpassed the human expert in process reasoning under complex social situations, yet they still fall behind humans in reply quality. Moreover, introducing Chain-of-Thought (CoT) reasoning may degrade the performance of LLMs in social dialogue tasks. All datasets are openly available at https://github.com/SI-Bench/SI-Bench.git.
AIApr 2
A Role-Based LLM Framework for Structured Information Extraction from Healthy Food PoliciesCongjing Zhang, Ruoxuan Bao, Jingyu Li et al.
Current Large Language Model (LLM) approaches for information extraction (IE) in the healthy food policy domain are often hindered by various factors, including misinformation, specifically hallucinations, misclassifications, and omissions that result from the structural diversity and inconsistency of policy documents. To address these limitations, this study proposes a role-based LLM framework that automates the IE from unstructured policy data by assigning specialized roles: an LLM policy analyst for metadata and mechanism classification, an LLM legal strategy specialist for identifying complex legal approaches, and an LLM food system expert for categorizing food system stages. This framework mimics expert analysis workflows by incorporating structured domain knowledge, including explicit definitions of legal mechanisms and classification criteria, into role-specific prompts. We evaluate the framework using 608 healthy food policies from the Healthy Food Policy Project (HFPP) database, comparing its performance against zero-shot, few-shot, and chain-of-thought (CoT) baselines using Llama-3.3-70B. Our proposed framework demonstrates superior performance in complex reasoning tasks, offering a reliable and transparent methodology for automating IE from health policies.
LGMar 3
Causal Learning Should Embrace the Wisdom of the CrowdRyan Feng Lin, Yuantao Wei, Huiling Liao et al.
Learning causal structures typically represented by directed acyclic graphs (DAGs) from observational data is notoriously challenging due to the combinatorial explosion of possible graphs and inherent ambiguities in observations. This paper argues that causal learning is now ready for the emergence of a new paradigm supported by rapidly advancing technologies, fulfilling the long-standing vision of leveraging human causal knowledge. This paradigm integrates scalable crowdsourcing platforms for data collection, interactive knowledge elicitation for expert opinion modeling, robust aggregation techniques for expert reconciliation, and large language model (LLM)-based simulation for augmenting AI-driven information acquisition. In this paper, we focus on DAG learning for causal discovery and frame the problem as a distributed decision-making task, recognizing that each participant (human expert or LLM agent) possesses fragmented and imperfect knowledge about different subsets of the variables of interest in the causal graph. By proposing a systematic framework to synthesize these insights, we aim to enable the recovery of a global causal structure unachievable by any individual agent alone. We advocate for a new research frontier and outline a comprehensive framework for new research thrusts that range from eliciting, modeling, aggregating, and optimizing human causal knowledge contributions.
LGFeb 4
Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data GenerationCongjing Zhang, Ryan Feng Lin, Ruoxuan Bao et al.
While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity of observations, tabular datasets often exhibit critical deficiencies, such as class imbalance, selection bias, and low fidelity. To address these challenges, building on recent advances in Large Language Models (LLMs), this paper introduces Team-then-Trim (T$^2$), a framework that synthesizes high-quality tabular data through a collaborative team of LLMs, followed by a rigorous three-stage plug-in data quality control (QC) pipeline. In T$^2$, tabular data generation is conceptualized as a manufacturing process: specialized LLMs, guided by domain knowledge, are tasked with generating different data components sequentially, and the resulting products, i.e., the synthetic data, are systematically evaluated across multiple dimensions of QC. Empirical results on both simulated and real-world datasets demonstrate that T$^2$ outperforms state-of-the-art methods in producing high-quality tabular data, highlighting its potential to support downstream models when direct data collection is practically infeasible.
CVJan 23
CASP: Few-Shot Class-Incremental Learning with CLS Token Attention Steering PromptsShuai Huang, Xuhan Lin, Yuwu Lu
Few-shot class-incremental learning (FSCIL) presents a core challenge in continual learning, requiring models to rapidly adapt to new classes with very limited samples while mitigating catastrophic forgetting. Recent prompt-based methods, which integrate pretrained backbones with task-specific prompts, have made notable progress. However, under extreme few-shot incremental settings, the model's ability to transfer and generalize becomes critical, and it is thus essential to leverage pretrained knowledge to learn feature representations that can be shared across future categories during the base session. Inspired by the mechanism of the CLS token, which is similar to human attention and progressively filters out task-irrelevant information, we propose the CLS Token Attention Steering Prompts (CASP). This approach introduces class-shared trainable bias parameters into the query, key, and value projections of the CLS token to explicitly modulate the self-attention weights. To further enhance generalization, we also design an attention perturbation strategy and perform Manifold Token Mixup in the shallow feature space, synthesizing potential new class features to improve generalization and reserve the representation capacity for upcoming tasks. Experiments on the CUB200, CIFAR100, and ImageNet-R datasets demonstrate that CASP outperforms state-of-the-art methods in both standard and fine-grained FSCIL settings without requiring fine-tuning during incremental phases and while significantly reducing the parameter overhead.
NCSep 30, 2025
Atlas-free Brain Network TransformerShuai Huang, Xuan Kan, James J. Lah et al.
Current atlas-based approaches to brain network analysis rely heavily on standardized anatomical or connectivity-driven brain atlases. However, these fixed atlases often introduce significant limitations, such as spatial misalignment across individuals, functional heterogeneity within predefined regions, and atlas-selection biases, collectively undermining the reliability and interpretability of the derived brain networks. To address these challenges, we propose a novel atlas-free brain network transformer (atlas-free BNT) that leverages individualized brain parcellations derived directly from subject-specific resting-state fMRI data. Our approach computes ROI-to-voxel connectivity features in a standardized voxel-based feature space, which are subsequently processed using the BNT architecture to produce comparable subject-level embeddings. Experimental evaluations on sex classification and brain-connectome age prediction tasks demonstrate that our atlas-free BNT consistently outperforms state-of-the-art atlas-based methods, including elastic net, BrainGNN, Graphormer and the original BNT. Our atlas-free approach significantly improves the precision, robustness, and generalizability of brain network analyses. This advancement holds great potential to enhance neuroimaging biomarkers and clinical diagnostic tools for personalized precision medicine.
CVSep 24, 2025
MoTiC: Momentum Tightness and Contrast for Few-Shot Class-Incremental LearningZeyu He, Shuai Huang, Yuwu Lu et al.
Few-Shot Class-Incremental Learning (FSCIL) must contend with the dual challenge of learning new classes from scarce samples while preserving old class knowledge. Existing methods use the frozen feature extractor and class-averaged prototypes to mitigate against catastrophic forgetting and overfitting. However, new-class prototypes suffer significant estimation bias due to extreme data scarcity, whereas base-class prototypes benefit from sufficient data. In this work, we theoretically demonstrate that aligning the new-class priors with old-class statistics via Bayesian analysis reduces variance and improves prototype accuracy. Furthermore, we propose large-scale contrastive learning to enforce cross-category feature tightness. To further enrich feature diversity and inject prior information for new-class prototypes, we integrate momentum self-supervision and virtual categories into the Momentum Tightness and Contrast framework (MoTiC), constructing a feature space with rich representations and enhanced interclass cohesion. Experiments on three FSCIL benchmarks produce state-of-the-art performances, particularly on the fine-grained task CUB-200, validating our method's ability to reduce estimation bias and improve incremental learning robustness.
LGMar 31, 2022
VFDS: Variational Foresight Dynamic Selection in Bayesian Neural Networks for Efficient Human Activity RecognitionRandy Ardywibowo, Shahin Boluki, Zhangyang Wang et al.
In many machine learning tasks, input features with varying degrees of predictive capability are acquired at varying costs. In order to optimize the performance-cost trade-off, one would select features to observe a priori. However, given the changing context with previous observations, the subset of predictive features to select may change dynamically. Therefore, we face the challenging new problem of foresight dynamic selection (FDS): finding a dynamic and light-weight policy to decide which features to observe next, before actually observing them, for overall performance-cost trade-offs. To tackle FDS, this paper proposes a Bayesian learning framework of Variational Foresight Dynamic Selection (VFDS). VFDS learns a policy that selects the next feature subset to observe, by optimizing a variational Bayesian objective that characterizes the trade-off between model performance and feature cost. At its core is an implicit variational distribution on binary gates that are dependent on previous observations, which will select the next subset of features to observe. We apply VFDS on the Human Activity Recognition (HAR) task where the performance-cost trade-off is critical in its practice. Extensive results demonstrate that VFDS selects different features under changing contexts, notably saving sensory costs while maintaining or improving the HAR accuracy. Moreover, the features that VFDS dynamically select are shown to be interpretable and associated with the different activity types. We will release the code.
ROJan 5, 2022
Multi-layer VI-GNSS Global Positioning Framework with Numerical Solution aided MAP InitializationBing Han, Zhongyang Xiao, Shuai Huang et al.
Motivated by the goal of achieving long-term drift-free camera pose estimation in complex scenarios, we propose a global positioning framework fusing visual, inertial and Global Navigation Satellite System (GNSS) measurements in multiple layers. Different from previous loosely- and tightly- coupled methods, the proposed multi-layer fusion allows us to delicately correct the drift of visual odometry and keep reliable positioning while GNSS degrades. In particular, local motion estimation is conducted in the inner-layer, solving the problem of scale drift and inaccurate bias estimation in visual odometry by fusing the velocity of GNSS, pre-integration of Inertial Measurement Unit (IMU) and camera measurement in a tightly-coupled way. The global localization is achieved in the outer-layer, where the local motion is further fused with GNSS position and course in a long-term period in a loosely-coupled way. Furthermore, a dedicated initialization method is proposed to guarantee fast and accurate estimation for all state variables and parameters. We give exhaustive tests of the proposed framework on indoor and outdoor public datasets. The mean localization error is reduced up to 63%, with a promotion of 69% in initialization accuracy compared with state-of-the-art works. We have applied the algorithm to Augmented Reality (AR) navigation, crowd sourcing high-precision map update and other large-scale applications.
IVAug 4, 2020
Fast Nonconvex $T_2^*$ Mapping Using ADMMShuai Huang, James J. Lah, Jason W. Allen et al.
Magnetic resonance (MR)-$T_2^*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed images, which limits its wider applicability. In this paper we address this issue by performing $T_2^*$ mapping from undersampled data using compressive sensing (CS). We formulate the reconstruction as a nonconvex problem that can be decomposed into two subproblems. They can be solved either separately via the standard approach or jointly via the alternating direction method of multipliers (ADMM). Compared to previous CS-based approaches that only apply sparse regularization on the spin density $\boldsymbol X_0$ and the relaxation rate $\boldsymbol R_2^*$, our formulation enforces additional sparse priors on the $T_2^*$-weighted images at multiple echoes to improve the reconstruction performance. We performed convergence analysis of the proposed algorithm, evaluated its performance on in vivo data, and studied the effects of different sampling schemes. Experimental results showed that the proposed joint-recovery approach generally outperforms the state-of-the-art method, especially in the low-sampling rate regime, making it a preferred choice to perform fast 3D $T_2^*$ mapping in practice. The framework adopted in this work can be easily extended to other problems arising from MR or other imaging modalities with non-linearly coupled variables.
LGMar 3, 2020
Uncertainty Quantification for Deep Context-Aware Mobile Activity Recognition and Unknown Context DiscoveryZepeng Huo, Arash PakBin, Xiaohan Chen et al.
Activity recognition in wearable computing faces two key challenges: i) activity characteristics may be context-dependent and change under different contexts or situations; ii) unknown contexts and activities may occur from time to time, requiring flexibility and adaptability of the algorithm. We develop a context-aware mixture of deep models termed the α-\b{eta} network coupled with uncertainty quantification (UQ) based upon maximum entropy to enhance human activity recognition performance. We improve accuracy and F score by 10% by identifying high-level contexts in a data-driven way to guide model development. In order to ensure training stability, we have used a clustering-based pre-training in both public and in-house datasets, demonstrating improved accuracy through unknown context discovery.
MEFeb 21, 2019
UQ-CHI: An Uncertainty Quantification-Based Contemporaneous Health Index for Degenerative Disease MonitoringAven Samareh, Shuai Huang
Developing knowledge-driven contemporaneous health index (CHI) that can precisely reflect the underlying patient across the course of the condition's progression holds a unique value, like facilitating a range of clinical decision-making opportunities. This is particularly important for monitoring degenerative condition such as Alzheimer's disease (AD), where the condition of the patient will decay over time. Detecting early symptoms and progression sign, and continuous severity evaluation, are all essential for disease management. While a few methods have been developed in the literature, uncertainty quantification of those health index models has been largely neglected. To ensure the continuity of the care, we should be more explicit about the level of confidence in model outputs. Ideally, decision-makers should be provided with recommendations that are robust in the face of substantial uncertainty about future outcomes. In this paper, we aim at filling this gap by developing an uncertainty quantification based contemporaneous longitudinal index, named UQ-CHI, with a particular focus on continuous patient monitoring of degenerative conditions. Our method is to combine convex optimization and Bayesian learning using the maximum entropy learning (MEL) framework, integrating uncertainty on labels as well. Our methodology also provides closed-form solutions in some important decision making tasks, e.g., such as predicting the label of a new sample. Numerical studies demonstrate the effectiveness of the propose UQ-CHI method in prediction accuracy, monitoring efficacy, and unique advantages if uncertainty quantification is enabled practice.
LGJan 8, 2019
Adaptive Activity Monitoring with Uncertainty Quantification in Switching Gaussian Process ModelsRandy Ardywibowo, Guang Zhao, Zhangyang Wang et al.
Emerging wearable sensors have enabled the unprecedented ability to continuously monitor human activities for healthcare purposes. However, with so many ambient sensors collecting different measurements, it becomes important not only to maintain good monitoring accuracy, but also low power consumption to ensure sustainable monitoring. This power-efficient sensing scheme can be achieved by deciding which group of sensors to use at a given time, requiring an accurate characterization of the trade-off between sensor energy usage and the uncertainty in ignoring certain sensor signals while monitoring. To address this challenge in the context of activity monitoring, we have designed an adaptive activity monitoring framework. We first propose a switching Gaussian process to model the observed sensor signals emitting from the underlying activity states. To efficiently compute the Gaussian process model likelihood and quantify the context prediction uncertainty, we propose a block circulant embedding technique and use Fast Fourier Transforms (FFT) for inference. By computing the Bayesian loss function tailored to switching Gaussian processes, an adaptive monitoring procedure is developed to select features from available sensors that optimize the trade-off between sensor power consumption and the prediction performance quantified by state prediction entropy. We demonstrate the effectiveness of our framework on the popular benchmark of UCI Human Activity Recognition using Smartphones.
LGJun 15, 2018
Safe Active Feature Selection for Sparse LearningShaogang Ren, Jianhua Z. Huang, Shuai Huang et al.
We present safe active incremental feature selection~(SAIF) to scale up the computation of LASSO solutions. SAIF does not require a solution from a heavier penalty parameter as in sequential screening or updating the full model for each iteration as in dynamic screening. Different from these existing screening methods, SAIF starts from a small number of features and incrementally recruits active features and updates the significantly reduced model. Hence, it is much more computationally efficient and scalable with the number of features. More critically, SAIF has the safe guarantee as it has the convergence guarantee to the optimal solution to the original full LASSO problem. Such an incremental procedure and theoretical convergence guarantee can be extended to fused LASSO problems. Compared with state-of-the-art screening methods as well as working set and homotopy methods, which may not always guarantee the optimal solution, SAIF can achieve superior or comparable efficiency and high scalability with the safe guarantee when facing extremely high dimensional data sets. Experiments with both synthetic and real-world data sets show that SAIF can be up to 50 times faster than dynamic screening, and hundreds of times faster than computing LASSO or fused LASSO solutions without screening.
DSApr 6, 2018
Reconstructing Point Sets from Distance DistributionsShuai Huang, Ivan Dokmanić
We address the problem of reconstructing a set of points on a line or a loop from their unassigned noisy pairwise distances. When the points lie on a line, the problem is known as the turnpike; when they are on a loop, it is known as the beltway. We approximate the problem by discretizing the domain and representing the $N$ points via an $N$-hot encoding, which is a density supported on the discretized domain. We show how the distance distribution is then simply a collection of quadratic functionals of this density and propose to recover the point locations so that the estimated distance distribution matches the measured distance distribution. This can be cast as a constrained nonconvex optimization problem which we solve using projected gradient descent with a suitable spectral initializer. We derive conditions under which the proposed distance distribution matching approach locally converges to a global optimizer at a linear rate. Compared to the conventional backtracking approach, our method jointly reconstructs all the point locations and is robust to noise in the measurements. We substantiate these claims with state-of-the-art performance across a number of numerical experiments. Our method is the first practical approach to solve the large-scale noisy beltway problem where the points lie on a loop.
LGMar 18, 2018
A Robust AUC Maximization Framework with Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled ClassificationKe Ren, Haichuan Yang, Yu Zhao et al.
The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as "positive" together with a large volume of "unlabeled" samples that may contain both positive and negative samples. Building robust classifier for the PU problem is very challenging, especially for complex data where the negative samples overwhelm and mislabeled samples or corrupted features exist. To address these three issues, we propose a robust learning framework that unifies AUC maximization (a robust metric for biased labels), outlier detection (for excluding wrong labels), and feature selection (for excluding corrupted features). The generalization error bounds are provided for the proposed model that give valuable insight into the theoretical performance of the method and lead to useful practical guidance, e.g., to train a model, we find that the included unlabeled samples are sufficient as long as the sample size is comparable to the number of positive samples in the training process. Empirical comparisons and two real-world applications on surgical site infection (SSI) and EEG seizure detection are also conducted to show the effectiveness of the proposed model.
CVNov 29, 2017
Predicting Depression Severity by Multi-Modal Feature Engineering and FusionAven Samareh, Yan Jin, Zhangyang Wang et al.
We present our preliminary work to determine if patient's vocal acoustic, linguistic, and facial patterns could predict clinical ratings of depression severity, namely Patient Health Questionnaire depression scale (PHQ-8). We proposed a multi modal fusion model that combines three different modalities: audio, video , and text features. By training over AVEC 2017 data set, our proposed model outperforms each single modality prediction model, and surpasses the data set baseline with ice margin.
CVSep 10, 2017
Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning ApproachBowen Cheng, Zhangyang Wang, Zhaobin Zhang et al.
Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition reliability. We develop a novel framework to achieve robust emotion recognition from low bit rate video. While video frames are downsampled at the encoder side, the decoder is embedded with a deep network model for joint super-resolution (SR) and recognition. Notably, we propose a novel max-mix training strategy, leading to a single "One-for-All" model that is remarkably robust to a vast range of downsampling factors. That makes our framework well adapted for the varied bandwidths in real transmission scenarios, without hampering scalability or efficiency. The proposed framework is evaluated on the AVEC 2016 benchmark, and demonstrates significantly improved stand-alone recognition performance, as well as rate-distortion (R-D) performance, than either directly recognizing from LR frames, or separating SR and recognition.
LGNov 12, 2016
Prognostics of Surgical Site Infections using Dynamic Health DataChuyang Ke, Yan Jin, Heather Evans et al.
Surgical Site Infection (SSI) is a national priority in healthcare research. Much research attention has been attracted to develop better SSI risk prediction models. However, most of the existing SSI risk prediction models are built on static risk factors such as comorbidities and operative factors. In this paper, we investigate the use of the dynamic wound data for SSI risk prediction. There have been emerging mobile health (mHealth) tools that can closely monitor the patients and generate continuous measurements of many wound-related variables and other evolving clinical variables. Since existing prediction models of SSI have quite limited capacity to utilize the evolving clinical data, we develop the corresponding solution to equip these mHealth tools with decision-making capabilities for SSI prediction with a seamless assembly of several machine learning models to tackle the analytic challenges arising from the spatial-temporal data. The basic idea is to exploit the low-rank property of the spatial-temporal data via the bilinear formulation, and further enhance it with automatic missing data imputation by the matrix completion technique. We derive efficient optimization algorithms to implement these models and demonstrate the superior performances of our new predictive model on a real-world dataset of SSI, compared to a range of state-of-the-art methods.
LGAug 14, 2016
Stacked Approximated Regression Machine: A Simple Deep Learning ApproachZhangyang Wang, Shiyu Chang, Qing Ling et al.
With the agreement of my coauthors, I Zhangyang Wang would like to withdraw the manuscript "Stacked Approximated Regression Machine: A Simple Deep Learning Approach". Some experimental procedures were not included in the manuscript, which makes a part of important claims not meaningful. In the relevant research, I was solely responsible for carrying out the experiments; the other coauthors joined in the discussions leading to the main algorithm. Please see the updated text for more details.