Changhyeon Kim

CL
h-index7
4papers
40citations
Novelty36%
AI Score38

4 Papers

CLDec 17, 2024
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning

Seunghee Kim, Changhyeon Kim, Taeuk Kim

Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels-Easy, Medium, and Hard-facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase.

CLAug 22, 2025
CMR-SPB: Cross-Modal Multi-Hop Reasoning over Text, Image, and Speech with Path Balance

Seunghee Kim, Ingyu Bang, Seokgyu Jang et al.

Cross-modal multi-hop reasoning (CMR) is a valuable yet underexplored capability of multimodal large language models (MLLMs), entailing the integration of information from multiple modalities to produce a coherent output for a given context. We argue that existing benchmarks for evaluating this ability have critical shortcomings: (1) they largely overlook the speech modality, and (2) they exhibit heavily biased reasoning path distributions, which can severely undermine fair evaluation. To address these limitations, we introduce a novel benchmark -- Cross-Modal Multi-Hop Reasoning over Text, Image and Speech with Path Balance (CMR-SPB) -- designed to assess tri-modal multi-hop reasoning while ensuring both unbiased and diverse reasoning paths. Our experiments with the new dataset reveal consistent model failures in specific reasoning sequences and show that biased benchmarks risk misrepresenting model performance. Finally, based on our extensive analysis, we propose a new ECV (Extract, Connect, Verify) prompting technique that effectively mitigates the performance gap across different reasoning paths. Overall, we call for more careful evaluation in CMR to advance the development of robust multimodal AI.

CLMar 14, 2024
Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Young Hyun Yoo, Jii Cha, Changhyeon Kim et al.

While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations. In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions. Evaluation on two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time. We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms.

RODec 13, 2021
Aerial Chasing of a Dynamic Target in Complex Environments

Boseong Felipe Jeon, Changhyeon Kim, Hojoon Shin et al.

Rapidly generating an optimal chasing motion of a drone to follow a dynamic target among obstacles is challenging due to numerical issues rising from multiple conflicting objectives and non-convex constraints. This study proposes to resolve the difficulties with a fast and reliable pipeline that incorporates 1) a target movement forecaster and 2) a chasing planner. They are based on a sample-and-check approach that consists of the generation of high-quality candidate primitives and the feasibility tests with a light computation load. We forecast the movement of the target by selecting an optimal prediction among a set of candidates built from past observations. Based on the prediction, we construct a set of prospective chasing trajectories which reduce the high-order derivatives, while maintaining the desired relative distance from the predicted target movement. Then, the candidate trajectories are tested on safety of the chaser and visibility toward the target without loose approximation of the constraints. The proposed algorithm is thoroughly evaluated in challenging scenarios involving dynamic obstacles. Also, the overall process from the target recognition to the chasing motion planning is implemented fully onboard on a drone, demonstrating real-world applicability.