h-index11
17papers
954citations
Novelty57%
AI Score53

17 Papers

LGFeb 3, 2023
DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

Minkyoung Cho, Kang G. Shin

As deep neural networks (DNNs) prove their importance and feasibility, more and more DNN-based apps, such as detection and classification of objects, have been developed and deployed on autonomous vehicles (AVs). To meet their growing expectations and requirements, AVs should "optimize" use of their limited onboard computing resources for multiple concurrent in-vehicle apps while satisfying their timing requirements (especially for safety). That is, real-time AV apps should share the limited on-board resources with other concurrent apps without missing their deadlines dictated by the frame rate of a camera that generates and provides input images to the apps. However, most, if not all, of existing DNN solutions focus on enhancing the concurrency of their specific hardware without dynamically optimizing/modifying the DNN apps' resource requirements, subject to the number of running apps, owing to their high computational cost. To mitigate this limitation, we propose DynaMIX (Dynamic MIXed-precision model construction), which optimizes the resource requirement of concurrent apps and aims to maximize execution accuracy. To realize a real-time resource optimization, we formulate an optimization problem using app performance profiles to consider both the accuracy and worst-case latency of each app. We also propose dynamic model reconfiguration by lazy loading only the selected layers at runtime to reduce the overhead of loading the entire model. DynaMIX is evaluated in terms of constraint satisfaction and inference accuracy for a multi-tasking system and compared against state-of-the-art solutions, demonstrating its effectiveness and feasibility under various environmental/operating conditions.

HCSep 23, 2024Code
Ads that Talk Back: Implications and Perceptions of Injecting Personalized Advertising into LLM Chatbots

Brian Jay Tang, Kaiwen Sun, Noah T. Curran et al.

Recent advances in large language models (LLMs) have enabled the creation of highly effective chatbots. However, the compute costs of widely deploying LLMs have raised questions about profitability. Companies have proposed exploring ad-based revenue streams for monetizing LLMs, which could serve as the new de facto platform for advertising. This paper investigates the implications of personalizing LLM advertisements to individual users via a between-subjects experiment with 179 participants. We developed a chatbot that embeds personalized product advertisements within LLM responses, inspired by similar forays by AI companies. The evaluation of our benchmarks showed that ad injection only slightly impacted LLM performance, particularly response desirability. Results revealed that participants struggled to detect ads, and even preferred LLM responses with hidden advertisements. Rather than clicking on our advertising disclosure, participants tried changing their advertising settings using natural language queries. We created an advertising dataset and an open-source LLM, Phi-4-Ads, fine-tuned to serve ads and flexibly adapt to user preferences.

46.3CVMar 18
Feeling the Space: Egomotion-Aware Video Representation for Efficient and Accurate 3D Scene Understanding

Shuyao Shi, Kang G. Shin

Recent Multimodal Large Language Models (MLLMs) have shown high potential for spatial reasoning within 3D scenes. However, they typically rely on computationally expensive 3D representations like point clouds or reconstructed Bird's-Eye View (BEV) maps, or lack physical grounding to resolve ambiguities in scale and size. This paper significantly enhances MLLMs with egomotion modality data, captured by Inertial Measurement Units (IMUs) concurrently with the video. In particular, we propose a novel framework, called Motion-MLLM, introducing two key components: (1) a cascaded motion-visual keyframe filtering module that leverages both IMU data and visual features to efficiently select a sparse yet representative set of keyframes, and (2) an asymmetric cross-modal fusion module where motion tokens serve as intermediaries that channel egomotion cues and cross-frame visual context into the visual representation. By grounding visual content in physical egomotion trajectories, Motion-MLLM can reason about absolute scale and spatial relationships across the scene. Our extensive evaluation shows that Motion-MLLM makes significant improvements in various tasks related to 3D scene understanding and spatial reasoning. Compared to state-of-the-art (SOTA) methods based on video frames and explicit 3D data, Motion-MLLM exhibits similar or even higher accuracy with significantly less overhead (i.e., 1.40$\times$ and 1.63$\times$ higher cost-effectiveness, respectively).

CVMay 17, 2025Code
Keypoints as Dynamic Centroids for Unified Human Pose and Segmentation

Niaz Ahmad, Jawad Khan, Kang G. Shin et al.

The dynamic movement of the human body presents a fundamental challenge for human pose estimation and body segmentation. State-of-the-art approaches primarily rely on combining keypoint heatmaps with segmentation masks but often struggle in scenarios involving overlapping joints or rapidly changing poses during instance-level segmentation. To address these limitations, we propose Keypoints as Dynamic Centroid (KDC), a new centroid-based representation for unified human pose estimation and instance-level segmentation. KDC adopts a bottom-up paradigm to generate keypoint heatmaps for both easily distinguishable and complex keypoints and improves keypoint detection and confidence scores by introducing KeyCentroids using a keypoint disk. It leverages high-confidence keypoints as dynamic centroids in the embedding space to generate MaskCentroids, allowing for swift clustering of pixels to specific human instances during rapid body movements in live environments. Our experimental evaluations on the CrowdPose, OCHuman, and COCO benchmarks demonstrate KDC's effectiveness and generalizability in challenging scenarios in terms of both accuracy and runtime performance. The implementation is available at: https://sites.google.com/view/niazahmad/projects/kdc.

AISep 23, 2024
Steward: Natural Language Web Automation

Brian Tang, Kang G. Shin

Recently, large language models (LLMs) have demonstrated exceptional capabilities in serving as the foundation for AI assistants. One emerging application of LLMs, navigating through websites and interacting with UI elements across various web pages, remains somewhat underexplored. We introduce Steward, a novel LLM-powered web automation tool designed to serve as a cost-effective, scalable, end-to-end solution for automating web interactions. Traditional browser automation frameworks like Selenium, Puppeteer, and Playwright are not scalable for extensive web interaction tasks, such as studying recommendation algorithms on platforms like YouTube and Twitter. These frameworks require manual coding of interactions, limiting their utility in large-scale or dynamic contexts. Steward addresses these limitations by integrating LLM capabilities with browser automation, allowing for natural language-driven interaction with websites. Steward operates by receiving natural language instructions and reactively planning and executing a sequence of actions on websites, looping until completion, making it a practical tool for developers and researchers to use. It achieves high efficiency, completing actions in 8.52 to 10.14 seconds at a cost of $0.028 per action or an average of $0.18 per task, which is further reduced to 4.8 seconds and $0.022 through a caching mechanism. It runs tasks on real websites with a 40% completion success rate. We discuss various design and implementation challenges, including state representation, action sequence selection, system responsiveness, detecting task completion, and caching implementation.

CVFeb 11
Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception

Liangkai Liu, Kang G. Shin, Jinkyu Lee et al.

Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV's perception pipeline is challenging due to the large gap between the computation requirement and the AV's limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV's surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.

85.6HCApr 26
StateScribe: Towards Accessible Change Awareness Across Real-World Revisits

Ruei-Che Chang, Xirui Jiang, Rosiana Natalie et al.

Real-world environments evolve continuously, yet blind and low-vision (BLV) individuals often have limited access to understanding how they change over time. Unexpected or relocated objects, layout modifications, and content updates (e.g., price changes) can introduce safety risks and cognitive burden. While existing visual assistive technologies can describe immediate surroundings, they operate as one-off interactions and lack mechanisms to surface meaningful changes across revisits. Informed by a survey of 33 BLV individuals, we develop StateScribe, a system that supports accessible awareness of real-world changes across revisits. StateScribe employs a dual-layer memory architecture that integrates episodic scene memory and object-centric temporal memory to enable scalable and structured change tracking. It provides both live descriptions of the current scene, and descriptions of what has changed, when and where it occurred across revisits, such as "The shop on your right has a "CLOSED" sign; it was open at this time last week.'' Our evaluation shows that StateScribe maintains high accuracy (F1-score=83.1%) across 11 revisits, while remaining low-latency (mean<1.54s) and memory-efficient (<54MB) across 110 revisits. A user study with nine BLV participants demonstrates that StateScribe improves change awareness across revisits in three real-world locations. Finally, we discuss implications for long-term AI-assisted companions that support broader change observation using multimodal sensing, extend beyond changes to other memory capabilities, and adapt to individual users, intents, and contexts.

ARFeb 3, 2021
Fuzzing Hardware Like Software

Timothy Trippel, Kang G. Shin, Alex Chernyakhovsky et al.

Hardware flaws are permanent and potent: hardware cannot be patched once fabricated, and any flaws may undermine any software executing on top. Consequently, verification time dominates implementation time. The gold standard in hardware Design Verification (DV) is concentrated at two extremes: random dynamic verification and formal verification. Both struggle to root out the subtle flaws in complex hardware that often manifest as security vulnerabilities. The root problem with random verification is its undirected nature, making it inefficient, while formal verification is constrained by the state-space explosion problem, making it infeasible against complex designs. What is needed is a solution that is directed, yet under-constrained. Instead of making incremental improvements to existing DV approaches, we leverage the observation that existing software fuzzers already provide such a solution, and adapt them for hardware DV. Specifically, we translate RTL hardware to a software model and fuzz that model. The central challenge we address is how best to mitigate the differences between the hardware execution model and software execution model. This includes: 1) how to represent test cases, 2) what is the hardware equivalent of a crash, 3) what is an appropriate coverage metric, and 4) how to create a general-purpose fuzzing harness for hardware. To evaluate our approach, we fuzz four IP blocks from Google's OpenTitan SoC. Our experiments reveal a two orders-of-magnitude reduction in run time to achieve Finite State Machine (FSM) coverage over traditional dynamic verification schemes. Moreover, with our design-agnostic harness, we achieve over 88% HDL line coverage in three out of four of our designs -- even without any initial seeds.

LGSep 27, 2019
Federated User Representation Learning

Duc Bui, Kshitiz Malik, Jack Goetz et al.

Collaborative personalization, such as through learned user representations (embeddings), can improve the prediction accuracy of neural-network-based models significantly. We propose Federated User Representation Learning (FURL), a simple, scalable, privacy-preserving and resource-efficient way to utilize existing neural personalization techniques in the Federated Learning (FL) setting. FURL divides model parameters into federated and private parameters. Private parameters, such as private user embeddings, are trained locally, but unlike federated parameters, they are not transferred to or averaged on the server. We show theoretically that this parameter split does not affect training for most model personalization approaches. Storing user embeddings locally not only preserves user privacy, but also improves memory locality of personalization compared to on-server training. We evaluate FURL on two datasets, demonstrating a significant improvement in model quality with 8% and 51% performance increases, and approximately the same level of performance as centralized training with only 0% and 4% reductions. Furthermore, we show that user embeddings learned in FL and the centralized setting have a very similar structure, indicating that FURL can learn collaboratively through the shared parameters while preserving user privacy.

CRJun 20, 2019
T-TER: Defeating A2 Trojans with Targeted Tamper-Evident Routing

Timothy Trippel, Kang G. Shin, Kevin B. Bush et al.

Since the inception of the Integrated Circuit (IC), the size of the transistors used to construct them has continually shrunk. While this advancement significantly improves computing capability, fabrication costs have skyrocketed. As a result, most IC designers must now outsource fabrication. Outsourcing, however, presents a security threat: comprehensive post-fabrication inspection is infeasible given the size of modern ICs, so it is nearly impossible to know if the foundry has altered the original design during fabrication (i.e., inserted a hardware Trojan). Defending against a foundry-side adversary is challenging because---even with as few as two gates---hardware Trojans can completely undermine software security. Researchers have attempted to both detect and prevent foundry-side attacks, but all existing defenses are ineffective against Trojans with footprints of a few gates or less. We present Targeted Tamper-Evident Routing (T-TER), a preventive layout-level defense against untrusted foundries, capable of thwarting the insertion of even the stealthiest hardware Trojans. T-TER is directed and routing-centric: it prevents foundry-side attackers from routing Trojan wires to, or directly adjacent to, security-critical wires by shielding them with guard wires. Unlike shield wires commonly deployed for cross-talk reduction, T-TER guard wires pose an additional technical challenge: they must be tamper-evident in both the digital (deletion attacks) and analog (move and jog attacks) domains. We address this challenge by developing a class of designed-in guard wires, that are added to the design specifically to protect security-critical wires. T-TER's guard wires incur minimal overhead, scale with design complexity, and provide tamper-evidence against attacks.

CRJun 20, 2019
An Extensible Framework for Quantifying the Coverage of Defenses Against Untrusted Foundries

Timothy Trippel, Kang G. Shin, Kevin B. Bush et al.

The transistors used to construct Integrated Circuits (ICs) continue to shrink. While this shrinkage improves performance and density, it also reduces trust: the price to build leading-edge fabrication facilities has skyrocketed, forcing even nation states to outsource the fabrication of high-performance ICs. Outsourcing fabrication presents a security threat because the black-box nature of a fabricated IC makes comprehensive inspection infeasible. Since prior work shows the feasibility of fabrication-time attackers' evasion of existing post-fabrication defenses, IC designers must be able to protect their physical designs before handing them off to an untrusted foundry. To this end, recent work suggests methods to harden IC layouts against attack. Unfortunately, no tool exists to assess the effectiveness of the proposed defenses---meaning gaps may exist. This paper presents an extensible IC layout security analysis tool called IC Attack Surface (ICAS) that quantifies defensive coverage. For researchers, ICAS identifies gaps for future defenses to target, and enables the quantitative comparison of existing and future defenses. For practitioners, ICAS enables the exploration of the impact of design decisions on an IC's resilience to fabrication-time attack. ICAS takes a set of metrics that encode the challenge of inserting a hardware Trojan into an IC layout, a set of attacks that the defender cares about, and a completed IC layout and reports the number of ways an attacker can add each attack to the design. While the ideal score is zero, practically, our experience is that lower scores correlate with increased attacker effort.

CLFeb 7, 2018
Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning

Hamza Harkous, Kassem Fawaz, Rémi Lebret et al.

Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. These policies are often long and difficult to comprehend. Short notices based on information extracted from privacy policies have been shown to be useful but face a significant scalability hurdle, given the number of policies and their evolution over time. Companies, users, researchers, and regulators still lack usable and scalable tools to cope with the breadth and depth of privacy policies. To address these hurdles, we propose an automated framework for privacy policy analysis (Polisis). It enables scalable, dynamic, and multi-dimensional queries on natural language privacy policies. At the core of Polisis is a privacy-centric language model, built with 130K privacy policies, and a novel hierarchy of neural-network classifiers that accounts for both high-level aspects and fine-grained details of privacy practices. We demonstrate Polisis' modularity and utility with two applications supporting structured and free-form querying. The structured querying application is the automated assignment of privacy icons from privacy policies. With Polisis, we can achieve an accuracy of 88.4% on this task. The second application, PriBot, is the first freeform question-answering system for privacy policies. We show that PriBot can produce a correct answer among its top-3 results for 82% of the test questions. Using an MTurk user study with 700 participants, we show that at least one of PriBot's top-3 answers is relevant to users for 89% of the test questions.

CRJan 23, 2018
Who Killed My Parked Car?

Kyong-Tak Cho, Yuseung Kim, Kang G. Shin

We find that the conventional belief of vehicle cyber attacks and their defenses---attacks are feasible and thus defenses are required only when the vehicle's ignition is turned on---does not hold. We verify this fact by discovering and applying two new practical and important attacks: battery-drain and Denial-of-Body-control (DoB). The former can drain the vehicle battery while the latter can prevent the owner from starting or even opening/entering his car, when either or both attacks are mounted with the ignition off. We first analyze how operation (e.g., normal, sleep, listen) modes of ECUs are defined in various in-vehicle network standards and how they are implemented in the real world. From this analysis, we discover that an adversary can exploit the wakeup function of in-vehicle networks---which was originally designed for enhanced user experience/convenience (e.g., remote diagnosis, remote temperature control)---as an attack vector. Ironically, a core battery-saving feature in in-vehicle networks makes it easier for an attacker to wake up ECUs and, therefore, mount and succeed in battery-drain and/or DoB attacks. Via extensive experimental evaluations on various real vehicles, we show that by mounting the battery-drain attack, the adversary can increase the average battery consumption by at least 12.57x, drain the car battery within a few hours or days, and therefore immobilize/cripple the vehicle. We also demonstrate the proposed DoB attack on a real vehicle, showing that the attacker can cut off communications between the vehicle and the driver's key fob by indefinitely shutting down an ECU, thus making the driver unable to start and/or even enter the car.

CROct 12, 2017
Mobile IMUs Reveal Driver's Identity From Vehicle Turns

Dongyao Chen, Kyong-Tak Cho, Kang G. Shin

As vehicle maneuver data becomes abundant for assisted or autonomous driving, their implication of privacy invasion/leakage has become an increasing concern. In particular, the surface for fingerprinting a driver will expand significantly if the driver's identity can be linked with the data collected from his mobile or wearable devices which are widely deployed worldwide and have increasing sensing capabilities. In line with this trend, this paper investigates a fast emerging driving data source that has driver's privacy implications. We first show that such privacy threats can be materialized via any mobile device with IMUs (e.g., gyroscope and accelerometer). We then present Dri-Fi (Driver Fingerprint), a driving data analytic engine that can fingerprint the driver with vehicle turn(s). Dri-Fi achieves this based on IMUs data taken only during the vehicle's turn(s). Such an approach expands the attack surface significantly compared to existing driver fingerprinting schemes. From this data, Dri-Fi extracts three new features --- acceleration along the end-of-turn axis, its deviation, and the deviation of the yaw rate --- and exploits them to identify the driver. Our extensive evaluation shows that an adversary equipped with Dri-Fi can correctly fingerprint the driver within just one turn with 74.1%, 83.5%, and 90.8% accuracy across 12, 8, and 5 drivers --- typical of an immediate family or close-friends circle --- respectively. Moreover, with measurements on more than one turn, the adversary can achieve up to 95.3%, 95.4%, and 96.6% accuracy across 12, 8, and 5 drivers, respectively.

CRJan 17, 2017
Continuous Authentication for Voice Assistants

Huan Feng, Kassem Fawaz, Kang G. Shin

Voice has become an increasingly popular User Interaction (UI) channel, mainly contributing to the ongoing trend of wearables, smart vehicles, and home automation systems. Voice assistants such as Siri, Google Now and Cortana, have become our everyday fixtures, especially in scenarios where touch interfaces are inconvenient or even dangerous to use, such as driving or exercising. Nevertheless, the open nature of the voice channel makes voice assistants difficult to secure and exposed to various attacks as demonstrated by security researchers. In this paper, we present VAuth, the first system that provides continuous and usable authentication for voice assistants. We design VAuth to fit in various widely-adopted wearable devices, such as eyeglasses, earphones/buds and necklaces, where it collects the body-surface vibrations of the user and matches it with the speech signal received by the voice assistant's microphone. VAuth guarantees that the voice assistant executes only the commands that originate from the voice of the owner. We have evaluated VAuth with 18 users and 30 voice commands and find it to achieve an almost perfect matching accuracy with less than 0.1% false positive rate, regardless of VAuth's position on the body and the user's language, accent or mobility. VAuth successfully thwarts different practical attacks, such as replayed attacks, mangled voice attacks, or impersonation attacks. It also has low energy and latency overheads and is compatible with most existing voice assistants.

CRApr 23, 2016
BinderCracker: Assessing the Robustness of Android System Services

Huan Feng, Kang G. Shin

In Android, communications between apps and system services are supported by a transaction-based Inter-Process Communication (IPC) mechanism. Binder, as the cornerstone of this IPC mechanism, separates two communicating parties as client and server. As with any client-server model, the server should not make any assumption on the validity (sanity) of client-side transaction. To our surprise, we find this principle has frequently been overlooked in the implementation of Android system services. In this paper, we demonstrate the prevalence and severity of this vulnerability surface and try to answer why developers keep making this seemingly simple mistake. Specifically, we design and implement BinderCracker, an automatic testing framework that supports parameter-aware fuzzing and has identified more than 100 vulnerabilities in six major versions of Android, including the latest version Android 6.0, Marshmallow. Some of the vulnerabilities have severe security implications, causing privileged code execution or permanent Denial-of-Service (DoS). We analyzed the root causes of these vulnerabilities to find that most of them exist because system service developers only considered exploitations via public APIs. We thus highlight the deficiency of testing only on client-side public APIs and argue for the necessity of testing and protection on the Binder interface - the actual security boundary. Specifically, we discuss the effectiveness and practicality of potential countermeasures, such as precautionary testing and runtime diagnostic.

AIFeb 13, 2013
Plan Development using Local Probabilistic Models

Ella M. Atkins, Edmund H. Durfee, Kang G. Shin

Approximate models of world state transitions are necessary when building plans for complex systems operating in dynamic environments. External event probabilities can depend on state feature values as well as time spent in that particular state. We assign temporally -dependent probability functions to state transitions. These functions are used to locally compute state probabilities, which are then used to select highly probable goal paths and eliminate improbable states. This probabilistic model has been implemented in the Cooperative Intelligent Real-time Control Architecture (CIRCA), which combines an AI planner with a separate real-time system such that plans are developed, scheduled, and executed with real-time guarantees. We present flight simulation tests that demonstrate how our probabilistic model may improve CIRCA performance.