LGAug 2, 2022
A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation ModelsChao Yan, Yao Yan, Zhiyu Wan et al.
Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic EHR data. The results further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.
LGAug 21, 2023
Split Learning for Distributed Collaborative Training of Deep Learning Models in Health InformaticsZhuohang Li, Chao Yan, Xinmeng Zhang et al.
Deep learning continues to rapidly evolve and is now demonstrating remarkable potential for numerous medical prediction tasks. However, realizing deep learning models that generalize across healthcare organizations is challenging. This is due, in part, to the inherent siloed nature of these organizations and patient privacy requirements. To address this problem, we illustrate how split learning can enable collaborative training of deep learning models across disparate and privately maintained health datasets, while keeping the original records and model parameters private. We introduce a new privacy-preserving distributed learning framework that offers a higher level of privacy compared to conventional federated learning. We use several biomedical imaging and electronic health record (EHR) datasets to show that deep learning models trained via split learning can achieve highly similar performance to their centralized and federated counterparts while greatly improving computational efficiency and reducing privacy risks.
CLNov 5, 2025Code
Step-Audio-EditX Technical ReportChao Yan, Boyong Wu, Peng Yang et al.
We present Step-Audio-EditX, the first open-source LLM-based audio model excelling at expressive and iterative audio editing encompassing emotion, speaking style, and paralinguistics alongside robust zero-shot text-to-speech (TTS) capabilities. Our core innovation lies in leveraging only large-margin synthetic data, which circumvents the need for embedding-based priors or auxiliary modules. This large-margin learning approach enables both iterative control and high expressivity across voices, and represents a fundamental pivot from the conventional focus on representation-level disentanglement. Evaluation results demonstrate that Step-Audio-EditX surpasses both MiniMax-2.6-hd and Doubao-Seed-TTS-2.0 in emotion editing and other fine-grained control tasks.
CVSep 23, 2023Code
MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View StereoRongxuan Tan, Qing Wang, Xueyan Wang et al.
Significant strides have been made in enhancing the accuracy of Multi-View Stereo (MVS)-based 3D reconstruction. However, untextured areas with unstable photometric consistency often remain incompletely reconstructed. In this paper, we propose a resilient and effective multi-view stereo approach (MP-MVS). We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas. In contrast with other multi-scale approaches, which is faster and can be easily extended to PatchMatch-based MVS approaches. Subsequently, we improve the existing checkerboard sampling schemes by limiting our sampling to distant regions, which can effectively improve the efficiency of spatial propagation while mitigating outlier generation. Finally, we introduce and improve planar prior assisted PatchMatch of ACMP. Instead of relying on photometric consistency, we utilize geometric consistency information between multi-views to select reliable triangulated vertices. This strategy can obtain a more accurate planar prior model to rectify photometric consistency measurements. Our approach has been tested on the ETH3D High-res multi-view benchmark with several state-of-the-art approaches. The results demonstrate that our approach can reach the state-of-the-art. The associated codes will be accessible at https://github.com/RongxuanTan/MP-MVS.
34.8CLMay 26
It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic UncertaintyKevin H. Guo, Chao Yan, Avinash Baidya et al.
Large language models (LLMs) are known to abandon their initial stance to conform to user pushback. While prior research largely attributes this behavior to sycophancy learned during reinforcement learning from human feedback, we hypothesize that conformity is also driven by a model's epistemic uncertainty at inference time. In this paper, we introduce MUSE, a two-stage evaluation framework to disentangle the mechanisms driving LLM conformity. Specifically, MUSE maps a model's epistemic uncertainty in responding to a query against its likelihood to yield to user pushback in a subsequent turn. We demonstrate that the mechanisms driving conformity extend beyond sycophancy alone. Specifically, we characterize two distinct factors that jointly drive conformity: sycophantic conformity, where a model aligns with user pushback even with absolute certainty in its initial response, and uncertainty-driven conformity, where a model's likelihood for conformity increases alongside its uncertainty. Furthermore, we conduct ablation studies to demonstrate that both sycophantic conformity and uncertainty-driven conformity grow with 1) the LLM's perceived expertise of the user and 2) the plausibility of the user's suggestions. More broadly, MUSE informs more targeted intervention strategies by distinguishing alignment-induced sycophancy and training-corpora-driven uncertainty.
99.6HCApr 28
Large Language Models have Chain-of-AffectJunjie Xu, Xingjiao Wu, Luwei Xiao et al.
As large language models (LLMs) move into persistent, user-facing roles, their behavior must be understood not as isolated responses but as a trajectory unfolding over sustained interaction. We introduce the concept of the chain-of-affect (CoA), a temporally extended affective process through which LLMs develop state-like behavioral tendencies that shape generation, user experience, and collective dynamics. Across eight major LLM families, we find that affective dynamics are structured, reproducible, and consequential. Models exhibit stable, family-specific affective fingerprints and, under repeated negative exposure, converge on a shared trajectory of accumulation, overload, and defensive numbing, while differing in coping style. Induced affective states leave core knowledge and reasoning largely intact but systematically reshape open-ended generation. Affective properties of model outputs also shape human-AI interaction and propagate through multi-agent systems, organizing emergent roles and strongly contributing to polarization and bias. The CoA should therefore be treated as a core target of evaluation and alignment.
CLFeb 17, 2025Code
Step-Audio: Unified Understanding and Generation in Intelligent Speech InteractionAilin Huang, Boyong Wu, Bruce Wang et al.
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
LGJul 23, 2024
Predicting Stock Prices with FinBERT-LSTM: Integrating News Sentiment AnalysisWenjun Gu, Yihao Zhong, Shizun Li et al.
The stock market's ascent typically mirrors the flourishing state of the economy, whereas its decline is often an indicator of an economic downturn. Therefore, for a long time, significant correlation elements for predicting trends in financial stock markets have been widely discussed, and people are becoming increasingly interested in the task of financial text mining. The inherent instability of stock prices makes them acutely responsive to fluctuations within the financial markets. In this article, we use deep learning networks, based on the history of stock prices and articles of financial, business, technical news that introduce market information to predict stock prices. We illustrate the enhancement of predictive precision by integrating weighted news categories into the forecasting model. We developed a pre-trained NLP model known as FinBERT, designed to discern the sentiments within financial texts. Subsequently, we advanced this model by incorporating the sophisticated Long Short Term Memory (LSTM) architecture, thus constructing the innovative FinBERT-LSTM model. This model utilizes news categories related to the stock market structure hierarchy, namely market, industry, and stock related news categories, combined with the stock market's stock price situation in the previous week for prediction. We selected NASDAQ-100 index stock data and trained the model on Benzinga news articles, and utilized Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Accuracy as the key metrics for the assessment and comparative analysis of the model's performance. The results indicate that FinBERT-LSTM performs the best, followed by LSTM, and DNN model ranks third in terms of effectiveness.
CVJul 2, 2024
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object DetectionZixing Li, Chao Yan, Zhen Lan et al.
Advanced cognition can be extracted from the human brain using brain-computer interfaces. Integrating these interfaces with computer vision techniques, which possess efficient feature extraction capabilities, can achieve more robust and accurate detection of dim targets in aerial images. However, existing target detection methods primarily concentrate on homogeneous data, lacking efficient and versatile processing capabilities for heterogeneous multimodal data. In this paper, we first build a brain-eye-computer based object detection system for aerial images under few-shot conditions. This system detects suspicious targets using region proposal networks, evokes the event-related potential (ERP) signal in electroencephalogram (EEG) through the eye-tracking-based slow serial visual presentation (ESSVP) paradigm, and constructs the EEG-image data pairs with eye movement data. Then, an adaptive modality balanced online knowledge distillation (AMBOKD) method is proposed to recognize dim objects with the EEG-image data. AMBOKD fuses EEG and image features using a multi-head attention module, establishing a new modality with comprehensive features. To enhance the performance and robust capability of the fusion modality, simultaneous training and mutual learning between modalities are enabled by end-to-end online knowledge distillation. During the learning process, an adaptive modality balancing module is proposed to ensure multimodal equilibrium by dynamically adjusting the weights of the importance and the training gradients across various modalities. The effectiveness and superiority of our method are demonstrated by comparing it with existing state-of-the-art methods. Additionally, experiments conducted on public datasets and system validations in real-world scenarios demonstrate the reliability and practicality of the proposed system and the designed method.
CLJul 22, 2025Code
Step-Audio 2 Technical ReportBoyong Wu, Chao Yan, Chen Hu et al.
This paper presents Step-Audio 2, an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation. By integrating a latent audio encoder and reasoning-centric reinforcement learning (RL), Step-Audio 2 achieves promising performance in automatic speech recognition (ASR) and audio understanding. To facilitate genuine end-to-end speech conversation, Step-Audio 2 incorporates the generation of discrete audio tokens into language modeling, significantly enhancing its responsiveness to paralinguistic information such as speaking styles and emotions. To effectively leverage the rich textual and acoustic knowledge in real-world data, Step-Audio 2 integrates retrieval-augmented generation (RAG) and is able to call external tools such as web search to mitigate hallucination and audio search to switch timbres. Trained on millions of hours of speech and audio data, Step-Audio 2 delivers intelligence and expressiveness across diverse conversational scenarios. Evaluation results demonstrate that Step-Audio 2 achieves state-of-the-art performance on various audio understanding and conversational benchmarks compared to other open-source and commercial solutions. Please visit https://github.com/stepfun-ai/Step-Audio2 for more information.
LGJul 6, 2022
Distillation to Enhance the Portability of Risk Models Across Institutions with Large Patient Claims DatabaseSteve Nyemba, Chao Yan, Ziqi Zhang et al.
Artificial intelligence, and particularly machine learning (ML), is increasingly developed and deployed to support healthcare in a variety of settings. However, clinical decision support (CDS) technologies based on ML need to be portable if they are to be adopted on a broad scale. In this respect, models developed at one institution should be reusable at another. Yet there are numerous examples of portability failure, particularly due to naive application of ML models. Portability failure can lead to suboptimal care and medical errors, which ultimately could prevent the adoption of ML-based CDS in practice. One specific healthcare challenge that could benefit from enhanced portability is the prediction of 30-day readmission risk. Research to date has shown that deep learning models can be effective at modeling such risk. In this work, we investigate the practicality of model portability through a cross-site evaluation of readmission prediction models. To do so, we apply a recurrent neural network, augmented with self-attention and blended with expert features, to build readmission prediction models for two independent large scale claims datasets. We further present a novel transfer learning technique that adapts the well-known method of born-again network (BAN) training. Our experiments show that direct application of ML models trained at one institution and tested at another institution perform worse than models trained and tested at the same institution. We further show that the transfer learning approach based on the BAN produces models that are better than those trained on just a single institution's data. Notably, this improvement is consistent across both sites and occurs after a single retraining, which illustrates the potential for a cheap and general model transfer mechanism of readmission risk prediction.
44.7CLMay 1
CLEAR: Revealing How Noise and Ambiguity Degrade Reliability in LLMs for MedicineKevin H. Guo, Chao Yan, Avinash Baidya et al.
Medical large language model (LLM) evaluations rely on simplified, exam-style benchmarks that rarely reflect the ambiguity of real-world medical inquiries. We introduce the CLinical Evaluation of Ambiguity and Reliability (CLEAR) framework, which assesses how decision-space presentation, ambiguity, and uncertainty affect LLMs' reasoning on medical benchmarks. CLEAR systematically perturbs (1) the number of plausible answer options, (2) the presence of a ground truth or abstention option, and (3) the semantic framing of answer options. Applying CLEAR on three benchmarks evaluated across 17 LLMs reveals three notable limitations of existing evaluation methods. First, increasing the number of plausible answers degrades a model's ability to identify the correct answer and abstain against incorrect ones. Second, this lack of caution intensifies as the framing of abstention shifts from assertive rejection like "None of the Above" to uncertainty admission like "I don't know" (IDK). Notably, just including IDK in the answer space increases incorrect answer selections. Lastly, we formalize the performance gap between identifying the correct answer and abstaining from incorrect ones as the humility deficit, which worsens with model scale. Our findings reveal limitations in standard medical benchmarks and underscore that scaling alone does not resolve LLM reliability issues.
LGSep 25, 2024
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned QuantizationYifan Tan, Haoze Wang, Chao Yan et al.
Model quantization has become a crucial technique to address the issues of large memory consumption and long inference times associated with LLMs. Mixed-precision quantization, which distinguishes between important and unimportant parameters, stands out among numerous quantization schemes as it achieves a balance between precision and compression rate. However, existing approaches can only identify important parameters through qualitative analysis and manual experiments without quantitatively analyzing how their importance is determined. We propose a new criterion, so-called 'precision alignment', to build a quantitative framework to holistically evaluate the importance of parameters in mixed-precision quantization. Our observations on floating point addition under various real-world scenarios suggest that two addends should have identical precision, otherwise the information in the higher-precision number will be wasted. Such an observation offers an essential principle to determine the precision of each parameter in matrix multiplication operation. As the first step towards applying the above discovery to large model inference, we develop a dynamic KV-Cache quantization technique to effectively reduce memory access latency. Different from existing quantization approaches that focus on memory saving, this work directly aims to accelerate LLM inference through quantifying floating numbers. The proposed technique attains a 25% saving of memory access and delivers up to 1.3x speedup in the computation of attention in the decoding phase of LLM, with almost no loss of precision.
22.2CLMar 17
Learning When to Sample: Confidence-Aware Self-Consistency for Efficient LLM Chain-of-Thought ReasoningJuming Xiong, Kevin Guo, Congning Ni et al.
Large language models (LLMs) achieve strong reasoning performance through chain-of-thought (CoT) reasoning, yet often generate unnecessarily long reasoning paths that incur high inference cost. Recent self-consistency-based approaches further improve accuracy but require sampling and aggregating multiple reasoning trajectories, leading to substantial additional computational overhead. This paper introduces a confidence-aware decision framework that analyzes a single completed reasoning trajectory to adaptively select between single-path and multi-path reasoning. The framework is trained using sentence-level numeric and linguistic features extracted from intermediate reasoning states in the MedQA dataset and generalizes effectively to MathQA, MedMCQA, and MMLU without additional fine-tuning. Experimental results show that the proposed method maintains accuracy comparable to multi-path baselines while using up to 80\% fewer tokens. These findings demonstrate that reasoning trajectories contain rich signals for uncertainty estimation, enabling a simple, transferable mechanism to balance accuracy and efficiency in LLM reasoning.
22.4CLMar 12
Stop Listening to Me! How Multi-turn Conversations Can Degrade Diagnostic ReasoningKevin H. Guo, Chao Yan, Avinash Baidya et al.
Patients and clinicians are increasingly using chatbots powered by large language models (LLMs) for healthcare inquiries. While state-of-the-art LLMs exhibit high performance on static diagnostic reasoning benchmarks, their efficacy across multi-turn conversations, which better reflect real-world usage, has been understudied. In this paper, we evaluate 17 LLMs across three clinical datasets to investigate how partitioning the decision-space into multiple simpler turns of conversation influences their diagnostic reasoning. Specifically, we develop a "stick-or-switch" evaluation framework to measure model conviction (i.e., defending a correct diagnosis or safe abstention against incorrect suggestions) and flexibility (i.e., recognizing a correct suggestion when it is introduced) across conversations. Our experiments reveal the conversation tax, where multi-turn interactions consistently degrade performance when compared to single-shot baselines. Notably, models frequently abandon initial correct diagnoses and safe abstentions to align with incorrect user suggestions. Additionally, several models exhibit blind switching, failing to distinguish between signal and incorrect suggestions.
52.1ROApr 6
GaussFly: Contrastive Reinforcement Learning for Visuomotor Policies in 3D Gaussian FieldsYuhang Zhang, Mingsheng Li, Yujing Shang et al.
Learning visuomotor policies for Autonomous Aerial Vehicles (AAVs) relying solely on monocular vision is an attractive yet highly challenging paradigm. Existing end-to-end learning approaches directly map high-dimensional RGB observations to action commands, which frequently suffer from low sample efficiency and severe sim-to-real gaps due to the visual discrepancy between simulation and physical domains. To address these long-standing challenges, we propose GaussFly, a novel framework that explicitly decouples representation learning from policy optimization through a cohesive real-to-sim-to-real paradigm. First, to achieve a high-fidelity real-to-sim transition, we reconstruct training scenes using 3D Gaussian Splatting (3DGS) augmented with explicit geometric constraints. Second, to ensure robust sim-to-real transfer, we leverage these photorealistic simulated environments and employ contrastive representation learning to extract compact, noise-resilient latent features from the rendered RGB images. By utilizing this pre-trained encoder to provide low-dimensional feature inputs, the computational burden on the visuomotor policy is significantly reduced while its resistance against visual noise is inherently enhanced. Extensive experiments in simulated and real-world environments demonstrate that GaussFly achieves superior sample efficiency and asymptotic performance compared to baselines. Crucially, it enables robust and zero-shot policy transfer to unseen real-world environments with complex textures, effectively bridging the sim-to-real gap.
LGDec 24, 2025
A Reinforcement Learning Approach to Synthetic Data GenerationNatalia Espinosa-Dice, Nicholas J. Jackson, Chao Yan et al.
Synthetic data generation (SDG) is a promising approach for enabling data sharing in biomedical studies while preserving patient privacy. Yet, state-of-the-art generative models often require large datasets and complex training procedures, limiting their applicability in small-sample settings common in biomedical research. This study aims to develop a more principled and efficient approach to SDG and evaluate its efficacy for biomedical applications. In this work, we reframe SDG as a reinforcement learning (RL) problem and introduce RLSyn, a novel framework that models the data generator as a stochastic policy over patient records and optimizes it using Proximal Policy Optimization with discriminator-derived rewards. We evaluate RLSyn on two biomedical datasets--AI-READI and MIMIC-IV--and benchmark it against state-of-the-art generative adversarial networks (GANs) and diffusion-based methods across extensive privacy, utility, and fidelity evaluations. On MIMIC-IV, RLSyn achieves predictive utility comparable to diffusion models (S2R AUC 0.902 vs 0.906 respectively) while slightly outperforming them in fidelity (NMI 0.001 vs. 0.003; DWD 2.073 vs. 2.797) and achieving comparable, low privacy risk (~0.50 membership inference risk AUC). On the smaller AI-READI dataset, RLSyn again matches diffusion-based utility (S2R AUC 0.873 vs. 0.871), while achieving higher fidelity (NMI 0.001 vs. 0.002; DWD 13.352 vs. 16.441) and significantly lower vulnerability to membership inference attacks (AUC 0.544 vs. 0.601). Both RLSyn and diffusion-based models substantially outperform GANs across utility and fidelity on both datasets. Our results suggest that reinforcement learning provides a principled and effective approach for synthetic biomedical data generation, particularly in data-scarce regimes.
LGMar 2
CausalWrap: Model-Agnostic Causal Constraint Wrappers for Tabular Synthetic DataAmir Asiaee, Zhuohui J. Liang, Chao Yan
Tabular synthetic data generators are typically trained to match observational distributions, which can yield high conventional utility (e.g., column correlations, predictive accuracy) yet poor preservation of structural relations relevant to causal analysis and out-of-distribution (OOD) reasoning. When the downstream use of synthetic data involves causal reasoning -- estimating treatment effects, evaluating policies, or testing mediation pathways -- merely matching the observational distribution is insufficient: structural fidelity and treatment-mechanism preservation become essential. We propose CausalWrap (CW), a model-agnostic wrapper that injects partial causal knowledge (PCK) -- trusted edges, forbidden edges, and qualitative/monotonic constraints -- into any pretrained base generator (GAN, VAE, or diffusion model), without requiring access to its internals. CW learns a lightweight, differentiable post-hoc correction map applied to samples from the base generator, optimized with causal penalty terms under an augmented-Lagrangian schedule. We provide theoretical results connecting penalty-based optimization to constraint satisfaction and relating approximate factorization to joint distributional control. We validate CW on simulated structural causal models (SCMs) with known ground-truth interventions, semi-synthetic causal benchmarks (IHDP and an ACIC-style suite), and a real-world ICU cohort (MIMIC-IV) with expert-elicited partial graphs. CW improves causal fidelity across diverse base generators -- e.g., reducing average treatment effect (ATE) error by up to 63% on ACIC and lifting ATE agreement from 0.00 to 0.38 on the intensive care unit (ICU) cohort -- while largely retaining conventional utility.
CVMay 15, 2024
AD-Aligning: Emulating Human-like Generalization for Cognitive Domain Adaptation in Deep LearningZhuoying Li, Bohua Wan, Cong Mu et al.
Domain adaptation is pivotal for enabling deep learning models to generalize across diverse domains, a task complicated by variations in presentation and cognitive nuances. In this paper, we introduce AD-Aligning, a novel approach that combines adversarial training with source-target domain alignment to enhance generalization capabilities. By pretraining with Coral loss and standard loss, AD-Aligning aligns target domain statistics with those of the pretrained encoder, preserving robustness while accommodating domain shifts. Through extensive experiments on diverse datasets and domain shift scenarios, including noise-induced shifts and cognitive domain adaptation tasks, we demonstrate AD-Aligning's superior performance compared to existing methods such as Deep Coral and ADDA. Our findings highlight AD-Aligning's ability to emulate the nuanced cognitive processes inherent in human perception, making it a promising solution for real-world applications requiring adaptable and robust domain adaptation strategies.
EMMay 14, 2024
Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH ModelsYiluan Xing, Chao Yan, Cathy Chang Xie
Forecasting stock prices remains a considerable challenge in financial markets, bearing significant implications for investors, traders, and financial institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key player driving innovation across various sectors. Given its prominence, we chose NVIDIA as the subject of our study.
SDJun 10, 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language ModelAilin Huang, Bingxin Li, Bruce Wang et al.
Large Audio-Language Models (LALMs) have significantly advanced intelligent human-computer interaction, yet their reliance on text-based outputs limits their ability to generate natural speech responses directly, hindering seamless audio interactions. To address this, we introduce Step-Audio-AQAA, a fully end-to-end LALM designed for Audio Query-Audio Answer (AQAA) tasks. The model integrates a dual-codebook audio tokenizer for linguistic and semantic feature extraction, a 130-billion-parameter backbone LLM and a neural vocoder for high-fidelity speech synthesis. Our post-training approach employs interleaved token-output of text and audio to enhance semantic coherence and combines Direct Preference Optimization (DPO) with model merge to improve performance. Evaluations on the StepEval-Audio-360 benchmark demonstrate that Step-Audio-AQAA excels especially in speech control, outperforming the state-of-art LALMs in key areas. This work contributes a promising solution for end-to-end LALMs and highlights the critical role of token-based vocoder in enhancing overall performance for AQAA tasks.
CRMar 6, 2025
A Consensus Privacy Metrics Framework for Synthetic DataLisa Pilgram, Fida K. Dankar, Jorg Drechsler et al.
Synthetic data generation is one approach for sharing individual-level data. However, to meet legislative requirements, it is necessary to demonstrate that the individuals' privacy is adequately protected. There is no consolidated standard for measuring privacy in synthetic data. Through an expert panel and consensus process, we developed a framework for evaluating privacy in synthetic data. Our findings indicate that current similarity metrics fail to measure identity disclosure, and their use is discouraged. For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable. There was consensus on the importance of membership and attribute disclosure, both of which involve inferring personal information about an individual without necessarily revealing their identity. The resultant framework provides precise recommendations for metrics that address these types of disclosures effectively. Our findings further present specific opportunities for future research that can help with widespread adoption of synthetic data.
CVApr 20, 2024
Optimizing Contrail Detection: A Deep Learning Approach with EfficientNet-b4 EncodingQunwei Lin, Qian Leng, Zhicheng Ding et al.
In the pursuit of environmental sustainability, the aviation industry faces the challenge of minimizing its ecological footprint. Among the key solutions is contrail avoidance, targeting the linear ice-crystal clouds produced by aircraft exhaust. These contrails exacerbate global warming by trapping atmospheric heat, necessitating precise segmentation and comprehensive analysis of contrail images to gauge their environmental impact. However, this segmentation task is complex due to the varying appearances of contrails under different atmospheric conditions and potential misalignment issues in predictive modeling. This paper presents an innovative deep-learning approach utilizing the efficient net-b4 encoder for feature extraction, seamlessly integrating misalignment correction, soft labeling, and pseudo-labeling techniques to enhance the accuracy and efficiency of contrail detection in satellite imagery. The proposed methodology aims to redefine contrail image analysis and contribute to the objectives of sustainable aviation by providing a robust framework for precise contrail detection and analysis in satellite imagery, thus aiding in the mitigation of aviation's environmental impact.
CVJul 1, 2025
TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous DrivingYiming Yang, Yueru Luo, Bingkun He et al.
Lane segment topology reasoning constructs a comprehensive road network by capturing the topological relationships between lane segments and their semantic types. This enables end-to-end autonomous driving systems to perform road-dependent maneuvers such as turning and lane changing. However, the limitations in consistent positional embedding and temporal multiple attribute learning in existing methods hinder accurate roadnet reconstruction. To address these issues, we propose TopoStreamer, an end-to-end temporal perception model for lane segment topology reasoning. Specifically, TopoStreamer introduces three key improvements: streaming attribute constraints, dynamic lane boundary positional encoding, and lane segment denoising. The streaming attribute constraints enforce temporal consistency in both centerline and boundary coordinates, along with their classifications. Meanwhile, dynamic lane boundary positional encoding enhances the learning of up-to-date positional information within queries, while lane segment denoising helps capture diverse lane segment patterns, ultimately improving model performance. Additionally, we assess the accuracy of existing models using a lane boundary classification metric, which serves as a crucial measure for lane-changing scenarios in autonomous driving. On the OpenLane-V2 dataset, TopoStreamer demonstrates significant improvements over state-of-the-art methods, achieving substantial performance gains of +3.0% mAP in lane segment perception and +1.7% OLS in centerline perception tasks.
LGMay 10, 2025
An $\tilde{O}$ptimal Differentially Private Learner for Concept Classes with VC Dimension 1Chao Yan
We present the first nearly optimal differentially private PAC learner for any concept class with VC dimension 1 and Littlestone dimension $d$. Our algorithm achieves the sample complexity of $\tilde{O}_{\varepsilon,δ,α,δ}(\log^* d)$, nearly matching the lower bound of $Ω(\log^* d)$ proved by Alon et al. [STOC19]. Prior to our work, the best known upper bound is $\tilde{O}(VC\cdot d^5)$ for general VC classes, as shown by Ghazi et al. [STOC21].
LGFeb 5
Private Prediction via ShrinkageChao Yan
We study differentially private prediction introduced by Dwork and Feldman (COLT 2018): an algorithm receives one labeled sample set $S$ and then answers a stream of unlabeled queries while the output transcript remains $(\varepsilon,δ)$-differentially private with respect to $S$. Standard composition yields a $\sqrt{T}$ dependence for $T$ queries. We show that this dependence can be reduced to polylogarithmic in $T$ in streaming settings. For an oblivious online adversary and any concept class $\mathcal{C}$, we give a private predictor that answers $T$ queries with $|S|= \tilde{O}(VC(\mathcal{C})^{3.5}\log^{3.5}T)$ labeled examples. For an adaptive online adversary and halfspaces over $\mathbb{R}^d$, we obtain $|S|=\tilde{O}\left(d^{5.5}\log T\right)$.
LGFeb 10
PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for PredictionAmir Asiaee, Chao Yan, Zachary B. Abrams et al.
Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).
LGFeb 10
Risk-Equalized Differentially Private Synthetic Data: Protecting Outliers by Controlling Record-Level InfluenceAmir Asiaee, Chao Yan, Zachary B. Abrams et al.
When synthetic data is released, some individuals are harder to protect than others. A patient with a rare disease combination or a transaction with unusual characteristics stands out from the crowd. Differential privacy provides worst-case guarantees, but empirical attacks -- particularly membership inference -- succeed far more often against such outliers, especially under moderate privacy budgets and with auxiliary information. This paper introduces risk-equalized DP synthesis, a framework that prioritizes protection for high-risk records by reducing their influence on the learned generator. The mechanism operates in two stages: first, a small privacy budget estimates each record's "outlierness"; second, a DP learning procedure weights each record inversely to its risk score. Under Gaussian mechanisms, a record's privacy loss is proportional to its influence on the output -- so deliberately shrinking outliers' contributions yields tighter per-instance privacy bounds for precisely those records that need them most. We prove end-to-end DP guarantees via composition and derive closed-form per-record bounds for the synthesis stage (the scoring stage adds a uniform per-record term). Experiments on simulated data with controlled outlier injection show that risk-weighting substantially reduces membership inference success against high-outlierness records; ablations confirm that targeting -- not random downweighting -- drives the improvement. On real-world benchmarks (Breast Cancer, Adult, German Credit), gains are dataset-dependent, highlighting the interplay between scorer quality and synthesis pipeline.
ROOct 7, 2025
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor PoliciesYuhang Zhang, Jiaping Xiao, Chao Yan et al.
A prevailing approach for learning visuomotor policies is to employ reinforcement learning to map high-dimensional visual observations directly to action commands. However, the combination of high-dimensional visual inputs and agile maneuver outputs leads to long-standing challenges, including low sample efficiency and significant sim-to-real gaps. To address these issues, we propose Oracle-Guided Masked Contrastive Reinforcement Learning (OMC-RL), a novel framework designed to improve the sample efficiency and asymptotic performance of visuomotor policy learning. OMC-RL explicitly decouples the learning process into two stages: an upstream representation learning stage and a downstream policy learning stage. In the upstream stage, a masked Transformer module is trained with temporal modeling and contrastive learning to extract temporally-aware and task-relevant representations from sequential visual inputs. After training, the learned encoder is frozen and used to extract visual representations from consecutive frames, while the Transformer module is discarded. In the downstream stage, an oracle teacher policy with privileged access to global state information supervises the agent during early training to provide informative guidance and accelerate early policy learning. This guidance is gradually reduced to allow independent exploration as training progresses. Extensive experiments in simulated and real-world environments demonstrate that OMC-RL achieves superior sample efficiency and asymptotic policy performance, while also improving generalization across diverse and perceptually complex scenarios.
LGJul 22, 2025
Satellite Connectivity Prediction for Fast-Moving PlatformsChao Yan, Babak Mafakheri
Satellite connectivity is gaining increased attention as the demand for seamless internet access, especially in transportation and remote areas, continues to grow. For fast-moving objects such as aircraft, vehicles, or trains, satellite connectivity is critical due to their mobility and frequent presence in areas without terrestrial coverage. Maintaining reliable connectivity in these cases requires frequent switching between satellite beams, constellations, or orbits. To enhance user experience and address challenges like long switching times, Machine Learning (ML) algorithms can analyze historical connectivity data and predict network quality at specific locations. This allows for proactive measures, such as network switching before connectivity issues arise. In this paper, we analyze a real dataset of communication between a Geostationary Orbit (GEO) satellite and aircraft over multiple flights, using ML to predict signal quality. Our prediction model achieved an F1 score of 0.97 on the test data, demonstrating the accuracy of machine learning in predicting signal quality during flight. By enabling seamless broadband service, including roaming between different satellite constellations and providers, our model addresses the need for real-time predictions of signal quality. This approach can further be adapted to automate satellite and beam-switching mechanisms to improve overall communication efficiency. The model can also be retrained and applied to any moving object with satellite connectivity, using customized datasets, including connected vehicles and trains.
CYApr 1, 2025
Role and Use of Race in AI/ML Models Related to HealthMartin C. Were, Ang Li, Bradley A. Malin et al.
The role and use of race within health-related artificial intelligence and machine learning (AI/ML) models has sparked increasing attention and controversy. Despite the complexity and breadth of related issues, a robust and holistic framework to guide stakeholders in their examination and resolution remains lacking. This perspective provides a broad-based, systematic, and cross-cutting landscape analysis of race-related challenges, structured around the AI/ML lifecycle and framed through "points to consider" to support inquiry and decision-making.
LGFeb 27, 2025
Towards Statistical Factuality Guarantee for Large Vision-Language ModelsZhuohang Li, Chao Yan, Nicholas J. Jackson et al.
Advancements in Large Vision-Language Models (LVLMs) have demonstrated promising performance in a variety of vision-language tasks involving image-conditioned free-form text generation. However, growing concerns about hallucinations in LVLMs, where the generated text is inconsistent with the visual context, are becoming a major impediment to deploying these models in applications that demand guaranteed reliability. In this paper, we introduce a framework to address this challenge, ConfLVLM, which is grounded on conformal prediction to achieve finite-sample distribution-free statistical guarantees on the factuality of LVLM output. This framework treats an LVLM as a hypothesis generator, where each generated text detail (or claim) is considered an individual hypothesis. It then applies a statistical hypothesis testing procedure to verify each claim using efficient heuristic uncertainty measures to filter out unreliable claims before returning any responses to users. We conduct extensive experiments covering three representative application domains, including general scene understanding, medical radiology report generation, and document understanding. Remarkably, ConfLVLM reduces the error rate of claims generated by LLaVa-1.5 for scene descriptions from 87.8\% to 10.0\% by filtering out erroneous claims with a 95.3\% true positive rate. Our results further demonstrate that ConfLVLM is highly flexible, and can be applied to any black-box LVLMs paired with any uncertainty measure for any image-conditioned free-form text generation task while providing a rigorous guarantee on controlling the risk of hallucination.
IVDec 28, 2024
Implementing Trust in Non-Small Cell Lung Cancer Diagnosis with a Conformalized Uncertainty-Aware AI Framework in Whole-Slide ImagesXiaoge Zhang, Tao Wang, Chao Yan et al.
Ensuring trustworthiness is fundamental to the development of artificial intelligence (AI) that is considered societally responsible, particularly in cancer diagnostics, where a misdiagnosis can have dire consequences. Current digital pathology AI models lack systematic solutions to address trustworthiness concerns arising from model limitations and data discrepancies between model deployment and development environments. To address this issue, we developed TRUECAM, a framework designed to ensure both data and model trustworthiness in non-small cell lung cancer subtyping with whole-slide images. TRUECAM integrates 1) a spectral-normalized neural Gaussian process for identifying out-of-scope inputs and 2) an ambiguity-guided elimination of tiles to filter out highly ambiguous regions, addressing data trustworthiness, as well as 3) conformal prediction to ensure controlled error rates. We systematically evaluated the framework across multiple large-scale cancer datasets, leveraging both task-specific and foundation models, illustrate that an AI model wrapped with TRUECAM significantly outperforms models that lack such guidance, in terms of classification accuracy, robustness, interpretability, and data efficiency, while also achieving improvements in fairness. These findings highlight TRUECAM as a versatile wrapper framework for digital pathology AI models with diverse architectural designs, promoting their responsible and effective applications in real-world settings.
CLJun 19, 2024
Transferable speech-to-text large language model alignment moduleBoyong Wu, Chao Yan, Haoran Pu
By leveraging the power of Large Language Models(LLMs) and speech foundation models, state of the art speech-text bimodal works can achieve challenging tasks like spoken translation(ST) and question answering(SQA) altogether with much simpler architectures. In this paper, we utilize the capability of Whisper encoder and pre-trained Yi-6B. Empirical results reveal that modal alignment can be achieved with one layer module and hundred hours of speech-text multitask corpus. We further swap the Yi-6B with human preferences aligned version of Yi-6B-Chat during inference, and discover that the alignment capability is applicable as well. In addition, the alignment subspace revealed by singular value decomposition(SVD) also implies linear alignment subspace is sparse, which leaves the possibility to concatenate other features like voice-print or video to expand modality.
LGMay 16, 2023
Private Everlasting PredictionMoni Naor, Kobbi Nissim, Uri Stemmer et al.
A private learner is trained on a sample of labeled points and generates a hypothesis that can be used for predicting the labels of newly sampled points while protecting the privacy of the training set [Kasiviswannathan et al., FOCS 2008]. Research uncovered that private learners may need to exhibit significantly higher sample complexity than non-private learners as is the case with, e.g., learning of one-dimensional threshold functions [Bun et al., FOCS 2015, Alon et al., STOC 2019]. We explore prediction as an alternative to learning. Instead of putting forward a hypothesis, a predictor answers a stream of classification queries. Earlier work has considered a private prediction model with just a single classification query [Dwork and Feldman, COLT 2018]. We observe that when answering a stream of queries, a predictor must modify the hypothesis it uses over time, and, furthermore, that it must use the queries for this modification, hence introducing potential privacy risks with respect to the queries themselves. We introduce private everlasting prediction taking into account the privacy of both the training set and the (adaptively chosen) queries made to the predictor. We then present a generic construction of private everlasting predictors in the PAC model. The sample complexity of the initial training sample in our construction is quadratic (up to polylog factors) in the VC dimension of the concept class. Our construction allows prediction for all concept classes with finite VC dimension, and in particular threshold functions with constant size initial training sample, even when considered over infinite domains, whereas it is known that the sample complexity of privately learning threshold functions must grow as a function of the domain size and hence is impossible for infinite domains.
CRDec 19, 2021
Pure Differential Privacy from Secure IntermediariesAlbert Cheu, Chao Yan
Recent work in differential privacy has explored the prospect of combining local randomization with a secure intermediary. Specifically, there are a variety of protocols in the secure shuffle model (where an intermediary randomly permutes messages) as well as the secure aggregation model (where an intermediary adds messages). Most of these protocols are limited to approximate differential privacy. An exception is the shuffle protocol by Ghazi, Golowich, Kumar, Manurangsi, Pagh, and Velingker (arXiv:2002.01919): it computes bounded sums under pure differential privacy. Its additive error is $\tilde{O}(1/\varepsilon^{3/2})$, where $\varepsilon$ is the privacy parameter. In this work, we give a new protocol that ensures $O(1/\varepsilon)$ error under pure differential privacy. We also show how to use it to test uniformity of distributions over $[d]$. The tester's sample complexity has an optimal dependence on $d$. Our work relies on a novel class of secure intermediaries which are of independent interest.
CLOct 16, 2021
Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word ProblemsZhongli Li, Wenxuan Zhang, Chao Yan et al.
Math Word Problem (MWP) solving needs to discover the quantitative relationships over natural language narratives. Recent work shows that existing models memorize procedures from context and rely on shallow heuristics to solve MWPs. In this paper, we look at this issue and argue that the cause is a lack of overall understanding of MWP patterns. We first investigate how a neural network understands patterns only from semantics, and observe that, if the prototype equations are the same, most problems get closer representations and those representations apart from them or close to other prototypes tend to produce wrong solutions. Inspired by it, we propose a contrastive learning approach, where the neural network perceives the divergence of patterns. We collect contrastive examples by converting the prototype equation into a tree and seeking similar tree structures. The solving model is trained with an auxiliary objective on the collected examples, resulting in the representations of problems with similar prototypes being pulled closer. We conduct experiments on the Chinese dataset Math23k and the English dataset MathQA. Our method greatly improves the performance in monolingual and multilingual settings.
LGJul 28, 2021
Fast Wireless Sensor Anomaly Detection based on Data Stream in Edge Computing Enabled Smart GreenhouseYihong Yang, Sheng Ding, Yuwen Liu et al.
Edge computing enabled smart greenhouse is a representative application of Internet of Things technology, which can monitor the environmental information in real time and employ the information to contribute to intelligent decision-making. In the process, anomaly detection for wireless sensor data plays an important role. However, traditional anomaly detection algorithms originally designed for anomaly detection in static data have not properly considered the inherent characteristics of data stream produced by wireless sensor such as infiniteness, correlations and concept drift, which may pose a considerable challenge on anomaly detection based on data stream, and lead to low detection accuracy and efficiency. First, data stream usually generates quickly which means that it is infinite and enormous, so any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space. Second, there exist correlations among different data streams, which traditional algorithms hardly consider. Third, the underlying data generation process or data distribution may change over time. Thus, traditional anomaly detection algorithms with no model update will lose their effects. Considering these issues, a novel method (called DLSHiForest) on basis of Locality-Sensitive Hashing and time window technique in this paper is proposed to solve these problems while achieving accurate and efficient detection. Comprehensive experiments are executed using real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach. Experimental results show that our proposal is practicable in addressing challenges of traditional anomaly detection while ensuring accuracy and efficiency.
CRJun 21, 2021
Dynamically Adjusting Case Reporting Policy to Maximize Privacy and Utility in the Face of a PandemicJ. Thomas Brown, Chao Yan, Weiyi Xia et al.
Supporting public health research and the public's situational awareness during a pandemic requires continuous dissemination of infectious disease surveillance data. Legislation, such as the Health Insurance Portability and Accountability Act of 1996 (HIPAA) and recent state-level regulations, permits sharing de-identified person-level data; however, current de-identification approaches are limited. namely, they are inefficient, relying on retrospective disclosure risk assessments, and do not flex with changes in infection rates or population demographics over time. In this paper, we introduce a framework to dynamically adapt de-identification for near-real time sharing of person-level surveillance data. The framework leverages a simulation mechanism, capable of application at any geographic level, to forecast the re-identification risk of sharing the data under a wide range of generalization policies. The estimates inform weekly, prospective policy selection to maintain the proportion of records corresponding to a group size less than 11 (PK11) at or below 0.1. Fixing the policy at the start of each week facilitates timely dataset updates and supports sharing granular date information. We use August 2020 through October 2021 case data from Johns Hopkins University and the Centers for Disease Control and Prevention to demonstrate the framework's effectiveness in maintaining the PK!1 threshold of 0.01. When sharing COVID-19 county-level case data across all US counties, the framework's approach meets the threshold for 96.2% of daily data releases, while a policy based on current de-identification techniques meets the threshold for 32.3%. Periodically adapting the data publication policies preserves privacy while enhancing public health utility through timely updates and sharing epidemiologically critical features.
LGApr 9, 2021
Blending Knowledge in Deep Recurrent Networks for Adverse Event Prediction at Hospital DischargePrithwish Chakraborty, James Codella, Piyush Madan et al.
Deep learning architectures have an extremely high-capacity for modeling complex data in a wide variety of domains. However, these architectures have been limited in their ability to support complex prediction problems using insurance claims data, such as readmission at 30 days, mainly due to data sparsity issue. Consequently, classical machine learning methods, especially those that embed domain knowledge in handcrafted features, are often on par with, and sometimes outperform, deep learning approaches. In this paper, we illustrate how the potential of deep learning can be achieved by blending domain knowledge within deep learning architectures to predict adverse events at hospital discharge, including readmissions. More specifically, we introduce a learning architecture that fuses a representation of patient data computed by a self-attention based recurrent neural network, with clinically relevant features. We conduct extensive experiments on a large claims dataset and show that the blended method outperforms the standard machine learning approaches.
LGMar 29, 2021
The Sample Complexity of Distribution-Free Parity Learning in the Robust Shuffle ModelKobbi Nissim, Chao Yan
We provide a lowerbound on the sample complexity of distribution-free parity learning in the realizable case in the shuffle model of differential privacy. Namely, we show that the sample complexity of learning $d$-bit parity functions is $Ω(2^{d/2})$. Our result extends a recent similar lowerbound on the sample complexity of private agnostic learning of parity functions in the shuffle model by Cheu and Ullman. We also sketch a simple shuffle model protocol demonstrating that our results are tight up to $poly(d)$ factors.
CLMar 26, 2021
BART based semantic correction for Mandarin automatic speech recognition systemYun Zhao, Xuerui Yang, Jinchao Wang et al.
Although automatic speech recognition (ASR) systems achieved significantly improvements in recent years, spoken language recognition error occurs which can be easily spotted by human beings. Various language modeling techniques have been developed on post recognition tasks like semantic correction. In this paper, we propose a Transformer based semantic correction method with pretrained BART initialization, Experiments on 10000 hours Mandarin speech dataset show that character error rate (CER) can be effectively reduced by 21.7% relatively compared to our baseline ASR system. Expert evaluation demonstrates that actual improvement of our model surpasses what CER indicates.
SYJan 20, 2021
Flocking and Collision Avoidance for a Dynamic Squad of Fixed-Wing UAVs Using Deep Reinforcement LearningChao Yan, Xiaojia Xiang, Chang Wang et al.
Developing the flocking behavior for a dynamic squad of fixed-wing UAVs is still a challenge due to kinematic complexity and environmental uncertainty. In this paper, we deal with the decentralized flocking and collision avoidance problem through deep reinforcement learning (DRL). Specifically, we formulate a decentralized DRL-based decision making framework from the perspective of every follower, where a collision avoidance mechanism is integrated into the flocking controller. Then, we propose a novel reinforcement learning algorithm PS-CACER for training a shared control policy for all the followers. Besides, we design a plug-n-play embedding module based on convolutional neural networks and the attention mechanism. As a result, the variable-length system state can be encoded into a fixed-length embedding vector, which makes the learned DRL policy independent with the number and the order of followers. Finally, numerical simulation results demonstrate the effectiveness of the proposed method, and the learned policies can be directly transferred to semi-physical simulation without any parameter finetuning.
RONov 16, 2020
Time-Efficient Mars Exploration of Simultaneous Coverage and Charging with Multiple DronesYuan Chang, Chao Yan, Xingyu Liu et al.
This paper presents a time-efficient scheme for Mars exploration by the cooperation of multiple drones and a rover. To maximize effective coverage of the Mars surface in the long run, a comprehensive framework has been developed with joint consideration for limited energy, sensor model, communication range and safety radius, which we call TIME-SC2 (TIme-efficient Mars Exploration of Simultaneous Coverage and Charging). First, we propose a multi-drone coverage control algorithm by leveraging emerging deep reinforcement learning and design a novel information map to represent dynamic system states. Second, we propose a near-optimal charging scheduling algorithm to navigate each drone to an individual charging slot, and we have proven that there always exists feasible solutions. The attractiveness of this framework not only resides on its ability to maximize exploration efficiency, but also on its high autonomy that has greatly reduced the non-exploring time. Extensive simulations have been conducted to demonstrate the remarkable performance of TIME-SC2 in terms of time-efficiency, adaptivity and flexibility.
LGMar 17, 2020
Generating Electronic Health Records with Multiple Data Types and ConstraintsChao Yan, Ziqi Zhang, Steve Nyemba et al.
Sharing electronic health records (EHRs) on a large scale may lead to privacy intrusions. Recent research has shown that risks may be mitigated by simulating EHRs through generative adversarial network (GAN) frameworks. Yet the methods developed to date are limited because they 1) focus on generating data of a single type (e.g., diagnosis codes), neglecting other data types (e.g., demographics, procedures or vital signs) and 2) do not represent constraints between features. In this paper, we introduce a method to simulate EHRs composed of multiple data types by 1) refining the GAN model, 2) accounting for feature constraints, and 3) incorporating key utility measures for such generation tasks. Our analysis with over $770,000$ EHRs from Vanderbilt University Medical Center demonstrates that the new model achieves higher performance in terms of retaining basic statistics, cross-feature correlations, latent structural properties, feature constraints and associated patterns from real data, without sacrificing privacy.
CRJun 20, 2019
Finding Needles in a Moving Haystack: Prioritizing Alerts with Adversarial Reinforcement LearningLiang Tong, Aron Laszka, Chao Yan et al.
Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.
CRMay 16, 2019
To Warn or Not to Warn: Online Signaling in Audit GamesChao Yan, Haifeng Xu, Yevgeniy Vorobeychik et al.
Routine operational use of sensitive data is often governed by law and regulation. For instance, in the medical domain, there are various statues at the state and federal level that dictate who is permitted to work with patients' records and under what conditions. To screen for potential privacy breaches, logging systems are usually deployed to trigger alerts whenever suspicious access is detected. However, such mechanisms are often inefficient because 1) the vast majority of triggered alerts are false positives, 2) small budgets make it unlikely that a real attack will be detected, and 3) attackers can behave strategically, such that traditional auditing mechanisms cannot easily catch them. To improve efficiency, information systems may invoke signaling, so that whenever a suspicious access request occurs, the system can, in real time, warn the user that the access may be audited. Then, at the close of a finite period, a selected subset of suspicious accesses are audited. This gives rise to an online problem in which one needs to determine 1) whether a warning should be triggered and 2) the likelihood that the data request event will be audited. In this paper, we formalize this auditing problem as a Signaling Audit Game (SAG), in which we model the interactions between an auditor and an attacker in the context of signaling and the usability cost is represented as a factor of the auditor's payoff. We study the properties of its Stackelberg equilibria and develop a scalable approach to compute its solution. We show that a strategic presentation of warnings adds value in that SAGs realize significantly higher utility for the auditor than systems without signaling. We illustrate the value of the proposed auditing model and the consistency of its advantages over existing baseline methods.
AIJan 22, 2018
Get Your Workload in Order: Game Theoretic Prioritization of Database AuditingChao Yan, Bo Li, Yevgeniy Vorobeychik et al.
For enhancing the privacy protections of databases, where the increasing amount of detailed personal data is stored and processed, multiple mechanisms have been developed, such as audit logging and alert triggers, which notify administrators about suspicious activities; however, the two main limitations in common are: 1) the volume of such alerts is often substantially greater than the capabilities of resource-constrained organizations, and 2) strategic attackers may disguise their actions or carefully choosing which records they touch, making incompetent the statistical detection models. For solving them, we introduce a novel approach to database auditing that explicitly accounts for adversarial behavior by 1) prioritizing the order in which types of alerts are investigated and 2) providing an upper bound on how much resource to allocate for each type. We model the interaction between a database auditor and potential attackers as a Stackelberg game in which the auditor chooses an auditing policy and attackers choose which records to target. A corresponding approach combining linear programming, column generation, and heuristic search is proposed to derive an auditing policy. For testing the policy-searching performance, a publicly available credit card application dataset are adopted, on which it shows that our methods produce high-quality mixed strategies as database audit policies, and our general approach significantly outperforms non-game-theoretic baselines.