AIApr 18, 2024
CAUS: A Dataset for Question Generation based on Human Cognition Leveraging Large Language ModelsMinjung Shin, Donghyun Kim, Jeh-Kwang Ryu
We introduce the Curious About Uncertain Scene (CAUS) dataset, designed to enable Large Language Models, specifically GPT-4, to emulate human cognitive processes for resolving uncertainties. Leveraging this dataset, we investigate the potential of LLMs to engage in questioning effectively. Our approach involves providing scene descriptions embedded with uncertainties to stimulate the generation of reasoning and queries. The queries are then classified according to multi-dimensional criteria. All procedures are facilitated by a collaborative system involving both LLMs and human researchers. Our results demonstrate that GPT-4 can effectively generate pertinent questions and grasp their nuances, particularly when given appropriate context and instructions. The study suggests that incorporating human-like questioning into AI models improves their ability to manage uncertainties, paving the way for future advancements in Artificial Intelligence (AI).
AIMay 15, 2025
"There Is No Such Thing as a Dumb Question," But There Are Good OnesMinjung Shin, Donghyun Kim, Jeh-Kwang Ryu
Questioning has become increasingly crucial for both humans and artificial intelligence, yet there remains limited research comprehensively assessing question quality. In response, this study defines good questions and presents a systematic evaluation framework. We propose two key evaluation dimensions: appropriateness (sociolinguistic competence in context) and effectiveness (strategic competence in goal achievement). Based on these foundational dimensions, a rubric-based scoring system was developed. By incorporating dynamic contextual variables, our evaluation framework achieves structure and flexibility through semi-adaptive criteria. The methodology was validated using the CAUS and SQUARE datasets, demonstrating the ability of the framework to access both well-formed and problematic questions while adapting to varied contexts. As we establish a flexible and comprehensive framework for question evaluation, this study takes a significant step toward integrating questioning behavior with structured analytical methods grounded in the intrinsic nature of questioning.
AIOct 8, 2021
Toward a Human-Level Video Understanding IntelligenceYu-Jung Heo, Minsu Lee, Seongho Choi et al.
We aim to develop an AI agent that can watch video clips and have a conversation with human about the video story. Developing video understanding intelligence is a significantly challenging task, and evaluation methods for adequately measuring and analyzing the progress of AI agent are lacking as well. In this paper, we propose the Video Turing Test to provide effective and practical assessments of video understanding intelligence as well as human-likeness evaluation of AI agents. We define a general format and procedure of the Video Turing Test and present a case study to confirm the effectiveness and usefulness of the proposed test.
CVJul 21, 2021
CogME: A Cognition-Inspired Multi-Dimensional Evaluation Metric for Story UnderstandingMinjung Shin, Seongho Choi, Yu-Jung Heo et al.
We introduce CogME, a cognition-inspired, multi-dimensional evaluation metric designed for AI models focusing on story understanding. CogME is a framework grounded in human thinking strategies and story elements that involve story understanding. With a specific breakdown of the questions, this approach provides a nuanced assessment revealing not only AI models' particular strengths and weaknesses but also the characteristics of the benchmark dataset. Our case study with the DramaQA dataset demonstrates a refined analysis of the model and the benchmark dataset. We argue the need for metrics based on understanding the nature of tasks and designed to align closely with human cognitive processes. This approach provides insights beyond traditional overall scores and paves the way for more sophisticated AI development targeting higher cognitive functions.
HCMay 16, 2020
Gentlemen on the Road: Understanding How Pedestrians Interpret Yielding Behavior of Autonomous Vehicles using Machine LearningYoon Kyung Lee, Yong-Eun Rhee, Jeh-Kwang Ryu et al.
Autonomous vehicles (AVs) can prevent collisions by understanding pedestrian intention. We conducted a virtual reality experiment with 39 participants and measured crossing times (seconds) and head orientation (yaw degrees). We manipulated AV yielding behavior (no-yield, slow-yield, and fast-yield) and the AV size (small, medium, and large). Using machine learning approach, we classified head orientation change of pedestrians by time into 6 clusters of patterns. Results indicate that pedestrian head orientation change was influenced by AV yielding behavior as well as the size of the AV. Participants fixated on the front most of the time even when the car approached near. Participants changed head orientation most frequently when a large size AV did not yield (no-yield). In post-experiment interviews, participants reported that yielding behavior and size affected their decision to cross and perceived safety. For autonomous vehicles to be perceived more safe and trustful, vehicle-specific factors such as size and yielding behavior should be considered in the designing process.
AIApr 1, 2019
Constructing Hierarchical Q&A Datasets for Video Story UnderstandingYu-Jung Heo, Kyoung-Woon On, Seongho Choi et al.
Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets for video Q&A tasks, they did not really incorporate story-level understanding, resulting in highly-biased and lack of variance in degree of question difficulty. In this paper, we propose a hierarchical method for building Q&A datasets, i.e. hierarchical difficulty levels. We introduce three criteria for video story understanding, i.e. memory capacity, logical complexity, and DIKW (Data-Information-Knowledge-Wisdom) pyramid. We discuss how three-dimensional map constructed from these criteria can be used as a metric for evaluating the levels of intelligence relating to video story understanding.