CRJul 8, 2022
Online Evasion Attacks on Recurrent Models:The Power of Hallucinating the FutureByunggill Joe, Insik Shin, Jihun Hamm
Recurrent models are frequently being used in online tasks such as autonomous driving, and a comprehensive study of their vulnerability is called for. Existing research is limited in generality only addressing application-specific vulnerability or making implausible assumptions such as the knowledge of future input. In this paper, we present a general attack framework for online tasks incorporating the unique constraints of the online setting different from offline tasks. Our framework is versatile in that it covers time-varying adversarial objectives and various optimization constraints, allowing for a comprehensive study of robustness. Using the framework, we also present a novel white-box attack called Predictive Attack that `hallucinates' the future. The attack achieves 98 percent of the performance of the ideal but infeasible clairvoyant attack on average. We validate the effectiveness of the proposed framework and attacks through various experiments.
HCDec 4, 2023
Explore, Select, Derive, and Recall: Augmenting LLM with Human-like Memory for Mobile Task AutomationSunjae Lee, Junyoung Choi, Jungjae Lee et al.
The advent of large language models (LLMs) has opened up new opportunities in the field of mobile task automation. Their superior language understanding and reasoning capabilities allow users to automate complex and repetitive tasks. However, due to the inherent unreliability and high operational cost of LLMs, their practical applicability is quite limited. To address these issues, this paper introduces MobileGPT, an innovative LLM-based mobile task automator equipped with a human-like app memory. MobileGPT emulates the cognitive process of humans interacting with a mobile app -- explore, select, derive, and recall. This approach allows for a more precise and efficient learning of a task's procedure by breaking it down into smaller, modular sub-tasks that can be re-used, re-arranged, and adapted for various objectives. We implement MobileGPT using online LLMs services (GPT-3.5 and GPT-4) and evaluate its performance on a dataset of 185 tasks across 18 mobile apps. The results indicate that MobileGPT can automate and learn new tasks with 82.7% accuracy, and is able to adapt them to different contexts with near perfect (98.75%) accuracy while reducing both latency and cost by 62.5% and 68.8%, respectively, compared to the GPT-4 powered baseline.
HCMar 24, 2025
VeriSafe Agent: Safeguarding Mobile GUI Agent via Logic-based Action VerificationJungjae Lee, Dongjae Lee, Chihun Choi et al.
Large Foundation Models (LFMs) have unlocked new possibilities in human-computer interaction, particularly with the rise of mobile Graphical User Interface (GUI) Agents capable of interacting with mobile GUIs. These agents allow users to automate complex mobile tasks through simple natural language instructions. However, the inherent probabilistic nature of LFMs, coupled with the ambiguity and context-dependence of mobile tasks, makes LFM-based automation unreliable and prone to errors. To address this critical challenge, we introduce VeriSafe Agent (VSA): a formal verification system that serves as a logically grounded safeguard for Mobile GUI Agents. VSA deterministically ensures that an agent's actions strictly align with user intent before executing the action. At its core, VSA introduces a novel autoformalization technique that translates natural language user instructions into a formally verifiable specification. This enables runtime, rule-based verification of agent's actions, detecting erroneous actions even before they take effect. To the best of our knowledge, VSA is the first attempt to bring the rigor of formal verification to GUI agents, bridging the gap between LFM-driven actions and formal software verification. We implement VSA using off-the-shelf LFM services (GPT-4o) and evaluate its performance on 300 user instructions across 18 widely used mobile apps. The results demonstrate that VSA achieves 94.33%-98.33% accuracy in verifying agent actions, outperforming existing LFM-based verification methods by 30.00%-16.33%, and increases the GUI agent's task completion rate by 90%-130%.
AIDec 14, 2025
Modular and Multi-Path-Aware Offline Benchmarking for Mobile GUI AgentsYoungmin Im, Byeongung Jo, Jaeyoung Wi et al.
Mobile GUI Agents, AI agents capable of interacting with mobile applications on behalf of users, have the potential to transform human computer interaction. However, current evaluation practices for GUI agents face two fundamental limitations. First, they either rely on single path offline benchmarks or online live benchmarks. Offline benchmarks using static, single path annotated datasets unfairly penalize valid alternative actions, while online benchmarks suffer from poor scalability and reproducibility due to the dynamic and unpredictable nature of live evaluation. Second, existing benchmarks treat agents as monolithic black boxes, overlooking the contributions of individual components, which often leads to unfair comparisons or obscures key performance bottlenecks. To address these limitations, we present MobiBench, the first modular and multi path aware offline benchmarking framework for mobile GUI agents that enables high fidelity, scalable, and reproducible evaluation entirely in offline settings. Our experiments demonstrate that MobiBench achieves 94.72 percent agreement with human evaluators, on par with carefully engineered online benchmarks, while preserving the scalability and reproducibility of static offline benchmarks. Furthermore, our comprehensive module level analysis uncovers several key insights, including a systematic evaluation of diverse techniques used in mobile GUI agents, optimal module configurations across model scales, the inherent limitations of current LFMs, and actionable guidelines for designing more capable and cost efficient mobile agents.
HCAug 4, 2025
mCardiacDx: Radar-Driven Contactless Monitoring and Diagnosis of ArrhythmiaArjun Kumar, Noppanat Wadlom, Jaeheon Kwak et al.
Arrhythmia is a common cardiac condition that can precipitate severe complications without timely intervention. While continuous monitoring is essential for timely diagnosis, conventional approaches such as electrocardiogram and wearable devices are constrained by their reliance on specialized medical expertise and patient discomfort from their contact nature. Existing contactless monitoring, primarily designed for healthy subjects, face significant challenges when analyzing reflected signals from arrhythmia patients due to disrupted spatial stability and temporal consistency. In this paper, we introduce mCardiacDx, a radar-driven contactless system that accurately analyzes reflected signals and reconstructs heart pulse waveforms for arrhythmia monitoring and diagnosis. The key contributions of our work include a novel precise target localization (PTL) technique that locates reflected signals despite spatial disruptions, and an encoder-decoder model that transforms these signals into HPWs, addressing temporal inconsistencies. Our evaluation on a large dataset of healthy subjects and arrhythmia patients shows that both mCardiacDx and PTL outperform state-of-the-art approach in arrhythmia monitoring and diagnosis, also demonstrating improved performance in healthy subjects.
LGJun 15, 2021
Machine Learning with Electronic Health Records is vulnerable to Backdoor Trigger AttacksByunggill Joe, Akshay Mehra, Insik Shin et al.
Electronic Health Records (EHRs) provide a wealth of information for machine learning algorithms to predict the patient outcome from the data including diagnostic information, vital signals, lab tests, drug administration, and demographic information. Machine learning models can be built, for example, to evaluate patients based on their predicted mortality or morbidity and to predict required resources for efficient resource management in hospitals. In this paper, we demonstrate that an attacker can manipulate the machine learning predictions with EHRs easily and selectively at test time by backdoor attacks with the poisoned training data. Furthermore, the poison we create has statistically similar features to the original data making it hard to detect, and can also attack multiple machine learning models without any knowledge of the models. With less than 5% of the raw EHR data poisoned, we achieve average attack success rates of 97% on mortality prediction tasks with MIMIC-III database against Logistic Regression, Multilayer Perceptron, and Long Short-term Memory models simultaneously.
LGDec 7, 2020
Learning to Separate Clusters of Adversarial Representations for Robust Adversarial DetectionByunggill Joe, Jihun Hamm, Sung Ju Hwang et al.
Although deep neural networks have shown promising performances on various tasks, they are susceptible to incorrect predictions induced by imperceptibly small perturbations in inputs. A large number of previous works proposed to detect adversarial attacks. Yet, most of them cannot effectively detect them against adaptive whitebox attacks where an adversary has the knowledge of the model and the defense method. In this paper, we propose a new probabilistic adversarial detector motivated by a recently introduced non-robust feature. We consider the non-robust features as a common property of adversarial examples, and we deduce it is possible to find a cluster in representation space corresponding to the property. This idea leads us to probability estimate distribution of adversarial representations in a separate cluster, and leverage the distribution for a likelihood based adversarial detector.
LGSep 10, 2019
Learning to Disentangle Robust and Vulnerable Features for Adversarial DetectionByunggill Joe, Sung Ju Hwang, Insik Shin
Although deep neural networks have shown promising performances on various tasks, even achieving human-level performance on some, they are shown to be susceptible to incorrect predictions even with imperceptibly small perturbations to an input. There exists a large number of previous works which proposed to defend against such adversarial attacks either by robust inference or detection of adversarial inputs. Yet, most of them cannot effectively defend against whitebox attacks where an adversary has a knowledge of the model and defense. More importantly, they do not provide a convincing reason why the generated adversarial inputs successfully fool the target models. To address these shortcomings of the existing approaches, we hypothesize that the adversarial inputs are tied to latent features that are susceptible to adversarial perturbation, which we call vulnerable features. Then based on this intuition, we propose a minimax game formulation to disentangle the latent features of each instance into robust and vulnerable ones, using variational autoencoders with two latent spaces. We thoroughly validate our model for both blackbox and whitebox attacks on MNIST, Fashion MNIST5, and Cat & Dog datasets, whose results show that the adversarial inputs cannot bypass our detector without changing its semantics, in which case the attack has failed.
CRMay 23, 2019
SynFuzz: Efficient Concolic Execution via Branch Condition SynthesisWookhyun Han, Md Lutfor Rahman, Yuxuan Chen et al.
Concolic execution is a powerful program analysis technique for exploring execution paths in a systematic manner. Compare to random-mutation-based fuzzing, concolic execution is especially good at exploring paths that are guarded by complex and tight branch predicates (e.g., (a*b) == 0xdeadbeef). The drawback, however, is that concolic execution engines are much slower than native execution. One major source of the slowness is that concolic execution engines have to the interpret instructions to maintain the symbolic expression of program variables. In this work, we propose SynFuzz, a novel approach to perform scalable concolic execution. SynFuzz achieves this goal by replacing interpretation with dynamic taint analysis and program synthesis. In particular, to flip a conditional branch, SynFuzz first uses operation-aware taint analysis to record a partial expression (i.e., a sketch) of its branch predicate. Then it uses oracle-guided program synthesis to reconstruct the symbolic expression based on input-output pairs. The last step is the same as traditional concolic execution - SynFuzz consults a SMT solver to generate an input that can flip the target branch. By doing so, SynFuzz can achieve an execution speed that is close to fuzzing while retain concolic execution's capability of flipping complex branch predicates. We have implemented a prototype of SynFuzz and evaluated it with three sets of programs: real-world applications, the LAVA-M benchmark, and the Google Fuzzer Test Suite (FTS). The evaluation results showed that SynFuzz was much more scalable than traditional concolic execution engines, was able to find more bugs in LAVA-M than most state-of-the-art concolic execution engine (QSYM), and achieved better code coverage on real-world applications and FTS.