CLFeb 28, 2023
Goal Driven Discovery of Distributional Differences via Language DescriptionsRuiqi Zhong, Peter Zhang, Steve Li et al.
Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal "$\textit{comparing the side effects of drug A and drug B}$" and a corpus pair (two large collections of patients' self-reported reactions after taking each drug). The output is a language description (discovery) of how these corpora differ (patients taking drug A "$\textit{mention feelings of paranoia}$" more often). We build a D5 system, and to quantitatively measure its performance, we 1) contribute a meta-dataset, OpenD5, aggregating 675 open-ended problems ranging across business, social sciences, humanities, machine learning, and health, and 2) propose a set of unified evaluation metrics: validity, relevance, novelty, and significance. With the dataset and the unified metrics, we confirm that language models can use the goals to propose more relevant, novel, and significant candidate discoveries. Finally, our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5, including temporal and demographic differences in discussion topics, political stances and stereotypes in speech, insights in commercial reviews, and error patterns in NLP models.
94.3LGMar 11Code
LookaheadKV: Fast and Accurate KV Cache Eviction by Glimpsing into the Future without GenerationJinwoo Ahn, Ingyu Seong, Akhil Kedia et al.
Transformer-based large language models (LLMs) rely on key-value (KV) caching to avoid redundant computation during autoregressive inference. While this mechanism greatly improves efficiency, the cache size grows linearly with the input sequence length, quickly becoming a bottleneck for long-context tasks. Existing solutions mitigate this problem by evicting prompt KV that are deemed unimportant, guided by estimated importance scores. Notably, a recent line of work proposes to improve eviction quality by "glimpsing into the future", in which a draft generator produces a surrogate future response approximating the target model's true response, and this surrogate is subsequently used to estimate the importance of cached KV more accurately. However, these approaches rely on computationally expensive draft generation, which introduces substantial prefilling overhead and limits their practicality in real-world deployment. To address this challenge, we propose LookaheadKV, a lightweight eviction framework that leverages the strength of surrogate future response without requiring explicit draft generation. LookaheadKV augments transformer layers with parameter-efficient modules trained to predict true importance scores with high accuracy. Our design ensures negligible runtime overhead comparable to existing inexpensive heuristics, while achieving accuracy superior to more costly approximation methods. Extensive experiments on long-context understanding benchmarks, across a wide range of models, demonstrate that our method not only outperforms recent competitive baselines in various long-context understanding tasks, but also reduces the eviction cost by up to 14.5x, leading to significantly faster time-to-first-token. Our code is available at https://github.com/SamsungLabs/LookaheadKV.
CVNov 21, 2024Code
VAGUE: Visual Contexts Clarify Ambiguous ExpressionsHeejeong Nam, Jinwoo Ahn, Keummin Ka et al.
Human communication often relies on visual cues to resolve ambiguity. While humans can intuitively integrate these cues, AI systems often find it challenging to engage in sophisticated multimodal reasoning. We introduce VAGUE, a benchmark evaluating multimodal AI systems' ability to integrate visual context for intent disambiguation. VAGUE consists of 1.6K ambiguous textual expressions, each paired with an image and multiple-choice interpretations, where the correct answer is only apparent with visual context. The dataset spans both staged, complex (Visual Commonsense Reasoning) and natural, personal (Ego4D) scenes, ensuring diversity. Our experiments reveal that existing multimodal AI models struggle to infer the speaker's true intent. While performance consistently improves from the introduction of more visual cues, the overall accuracy remains far below human performance, highlighting a critical gap in multimodal reasoning. Analysis of failure cases demonstrates that current models fail to distinguish true intent from superficial correlations in the visual scene, indicating that they perceive images but do not effectively reason with them. We release our code and data at https://hazel-heejeong-nam.github.io/vague/.
LGMar 21, 2025
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMsAnshumann, Mohd Abbas Zaidi, Akhil Kedia et al.
Knowledge distillation can be a cost-effective technique to distill knowledge in Large Language Models, if the teacher output logits can be pre-computed and cached. However, successfully applying this to pre-training remains largely unexplored. In this work, we prove that naive approaches for sparse knowledge distillation such as caching Top-K probabilities, while intuitive, provide biased estimates of teacher probability distribution to the student, resulting in suboptimal performance and calibration. We propose an importance-sampling-based method `Random Sampling Knowledge Distillation', which provides unbiased estimates, preserves the gradient in expectation, and requires storing significantly sparser logits. Our method enables faster training of student models with marginal overhead (<10%) compared to cross-entropy based training, while maintaining competitive performance compared to full distillation, across a range of model sizes from 300M to 3B.
CLFeb 5, 2024
Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant PromptingJinwoo Ahn, Kyuseung Shin
Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions that require multi-step reasoning as an input. Upon response, we repetitively prompt meaningless feedback (e.g. 'make another attempt') requesting additional trials. Surprisingly, our preliminary results show that repeated meaningless feedback gradually decreases the quality of the responses, eventually leading to a larger deviation from the intended outcome. To alleviate these troubles, we propose a novel method, Recursive Chain-of-Feedback (R-CoF). Following the logic of recursion in computer science, R-CoF recursively revises the initially incorrect response by breaking down each incorrect reasoning step into smaller individual problems. Our preliminary results show that majority of questions that LLMs fail to respond correctly can be answered using R-CoF without any sample data outlining the logical process.
CVNov 23, 2024
Fine-Grained Open-Vocabulary Object Recognition via User-Guided SegmentationJinwoo Ahn, Hyeokjoon Kwon, Hwiyeon Yoo
Recent advent of vision-based foundation models has enabled efficient and high-quality object detection at ease. Despite the success of previous studies, object detection models face limitations on capturing small components from holistic objects and taking user intention into account. To address these challenges, we propose a novel foundation model-based detection method called FOCUS: Fine-grained Open-Vocabulary Object ReCognition via User-Guided Segmentation. FOCUS merges the capabilities of vision foundation models to automate open-vocabulary object detection at flexible granularity and allow users to directly guide the detection process via natural language. It not only excels at identifying and locating granular constituent elements but also minimizes unnecessary user intervention yet grants them significant control. With FOCUS, users can make explainable requests to actively guide the detection process in the intended direction. Our results show that FOCUS effectively enhances the detection capabilities of baseline models and shows consistent performance across varying object types.
CVJun 10, 2024
Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024Jinwoo Ahn, Junhyeok Park, Min-Jun Kim et al.
In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set.
CRAug 14, 2021
A Policy-based Versioning SSD with Intel SGXJinwoo Ahn, Seungjin Lee, Jinhoon Lee et al.
Privileged malware neutralizes software-based versioning systems and destroys data. To counter this threat, a versioning solid-state drive (SSD) that performs versioning inside the SSD has been studied. An SSD is a suitable candidate for data versioning because it can preserve previous versions without additional copying, and provide high security with a very small trusted computing base (TCB). However, the versioning SSDs studied so far commonly use a full disk versioning method that preserves all file versions in a batch. This paper demonstrates that SSDs, which provide full disk versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with this threat, we propose SGX-SSD, a policy-based per-file versioning SSD to keep a deeper history for only the important files of users. However, since the SSD isn't aware of a file semantic, and the versioning policy information should be securely received from the untrusted host computer, implementing the per-file versioning in SSD is a huge challenge. To solve this problem, SGX-SSD utilizes the Intel SGX and has a secure host interface to securely receive policy information (configuration values) from the user. Also, to solve the file semantic unawareness problem of the SSD, a piggyback module is designed to give a file hint at the host layer, and an algorithm for selective versioning based on the policy is implemented in the SSD. To prove our system, we prototyped SGX-SSD the Jasmine OpenSSD platform in Linux environment. In the experimental evaluation, we proved that SGX-SSD provides strong security with little additional overhead for selective per-file versioning.
CRApr 28, 2020
SGX-SSD: A Policy-based Versioning SSD with Intel SGXJinwoo Ahn, Seungjin Lee, Jinhoon Lee et al.
This paper demonstrates that SSDs, which perform device-level versioning, can be exposed to data tampering attacks when the retention time of data is less than the malware's dwell time. To deal with that threat, we propose SGX-SSD, a SGX-based versioning SSD which selectively preserves file history based on the given policy. The proposed system adopts Intel SGX to implement the version policy management system that is safe from high-privileged malware. Based on the policy, only the necessary data is selectively preserved in SSD that prevents files with less priority from wasting space and also ensures the integrity of important files.
CRApr 10, 2019
KEY-SSD: Access-Control Drive to Protect Files from Ransomware AttacksJinwoo Ahn, Donggyu Park, Chang-Gyu Lee et al.
Traditional techniques to prevent damage from ransomware attacks are to detect and block attacks by monitoring the known behaviors such as frequent name changes, recurring access to cryptographic libraries and exchange keys with remote servers. Unfortunately, intelligent ransomware can easily bypass these techniques. Another prevention technique is to recover from the backup copy when a file is infected with ransomware. However, the data backup technique requires extra storage space and can be removed with ransomware. In this paper, we propose to implement an access control mechanism on a disk drive, called a KEY-SSD disk drive. KEY-SSD is the data store and the last barrier to data protection. Unauthorized applications will not be able to read file data even if they bypass the file system defense, thus denying the block request without knowing the disk's registered block key and completely eliminating the possibility of the file becoming hostage to ransomware. We have prototyped KEY-SSD and validated the usefulness of KEY-SSD by demonstrating 1) selective block access control, 2) unauthorized data access blocking and 3) negligible performance overhead. Our comprehensive evaluation of KEY-SSD for various workloads show the KEY-SSD performance is hardly degraded due to OS lightweight key transmission and access control drive optimization. We also confirmed that KEY-SSD successfully protects the files in the actual ransomware sample.