LGJun 28, 2023
Mass Spectra Prediction with Structural Motif-based Graph Neural NetworksJiwon Park, Jeonghee Jo, Sungroh Yoon
Mass spectra, which are agglomerations of ionized fragments from targeted molecules, play a crucial role across various fields for the identification of molecular structures. A prevalent analysis method involves spectral library searches,where unknown spectra are cross-referenced with a database. The effectiveness of such search-based approaches, however, is restricted by the scope of the existing mass spectra database, underscoring the need to expand the database via mass spectra prediction. In this research, we propose the Motif-based Mass Spectrum Prediction Network (MoMS-Net), a system that predicts mass spectra using the information derived from structural motifs and the implementation of Graph Neural Networks (GNNs). We have tested our model across diverse mass spectra and have observed its superiority over other existing models. MoMS-Net considers substructure at the graph level, which facilitates the incorporation of long-range dependencies while using less memory compared to the graph transformer model.
46.0ROMar 17
DreamFlow: Local Navigation Beyond Observation via Conditional Flow Matching in the Latent SpaceJiwon Park, Dongkyu Lee, I Made Aswin Nahrendra et al.
Local navigation in cluttered environments often suffers from dense obstacles and frequent local minima. Conventional local planners rely on heuristics and are prone to failure, while deep reinforcement learning(DRL)based approaches provide adaptability but are constrained by limited onboard sensing. These limitations lead to navigation failures because the robot cannot perceive structures outside its field of view. In this paper, we propose DreamFlow, a DRL-based local navigation framework that extends the robot's perceptual horizon through conditional flow matching(CFM). The proposed CFM based prediction module learns probabilistic mapping between local height map latent representation and broader spatial representation conditioned on navigation context. This enables the navigation policy to predict unobserved environmental features and proactively avoid potential local minima. Experimental results demonstrate that DreamFlow outperforms existing methods in terms of latent prediction accuracy and navigation performance in simulation. The proposed method was further validated in cluttered real world environments with a quadrupedal robot. The project page is available at https://dreamflow-icra.github.io.
44.7CLMar 10
DEO: Training-Free Direct Embedding Optimization for Negation-Aware RetrievalTaegyeong Lee, Jiwon Park, Seunghyun Hwang et al.
Recent advances in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) have enabled diverse retrieval methods. However, existing retrieval methods often fail to accurately retrieve results for negation and exclusion queries. To address this limitation, prior approaches rely on embedding adaptation or fine-tuning, which introduce additional computational cost and deployment complexity. We propose Direct Embedding Optimization (DEO), a training-free method for negation-aware text and multimodal retrieval. DEO decomposes queries into positive and negative components and optimizes the query embedding with a contrastive objective. Without additional training data or model updates, DEO outperforms baselines on NegConstraint, with gains of +0.0738 nDCG@10 and +0.1028 MAP@100, while improving Recall@5 by +6\% over OpenAI CLIP in multimodal retrieval. These results demonstrate the practicality of DEO for negation- and exclusion-aware retrieval in real-world settings.
CLMar 18, 2024
Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning CorpusSeungpil Lee, Woochang Sim, Donghyeon Shin et al.
The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the inference and contextual understanding abilities of LLMs in a process-centric manner, focusing on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our carefully designed experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects. The main contribution of this paper lies in introducing the LoTH perspective, which provides a method for evaluating the reasoning process that conventional results-oriented approaches fail to capture, thereby offering new insights into the development of human-level reasoning in artificial intelligence systems.
AISep 2, 2025
Exploring Diffusion Models for Generative Forecasting of Financial ChartsTaegyeong Lee, Jiwon Park, Kyunga Bang et al.
Recent advances in generative models have enabled significant progress in tasks such as generating and editing images from text, as well as creating videos from text prompts, and these methods are being applied across various fields. However, in the financial domain, there may still be a reliance on time-series data and a continued focus on transformer models, rather than on diverse applications of generative models. In this paper, we propose a novel approach that leverages text-to-image model by treating time-series data as a single image pattern, thereby enabling the prediction of stock price trends. Unlike prior methods that focus on learning and classifying chart patterns using architectures such as ResNet or ViT, we experiment with generating the next chart image from the current chart image and an instruction prompt using diffusion models. Furthermore, we introduce a simple method for evaluating the generated chart image against ground truth image. We highlight the potential of leveraging text-to-image generative models in the financial domain, and our findings motivate further research to address the current limitations and expand their applicability.
CLAug 20, 2025
DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document CollectionsJiwon Park, Seohyun Pyeon, Jinwoo Kim et al.
Despite recent advances in large language models (LLMs), most QA benchmarks are still confined to single-paragraph or single-document settings, failing to capture the complexity of real-world information-seeking tasks. Practical QA often requires multi-hop reasoning over information distributed across multiple documents, modalities, and structural formats. Although prior datasets made progress in this area, they rely heavily on Wikipedia-based content and unimodal plain text, with shallow reasoning paths that typically produce brief phrase-level or single-sentence answers, thus limiting their realism and generalizability. We propose DocHop-QA, a large-scale benchmark comprising 11,379 QA instances for multimodal, multi-document, multi-hop question answering. Constructed from publicly available scientific documents sourced from PubMed, DocHop-QA is domain-agnostic and incorporates diverse information formats, including textual passages, tables, and structural layout cues. Unlike existing datasets, DocHop-QA does not rely on explicitly hyperlinked documents; instead, it supports open-ended reasoning through semantic similarity and layout-aware evidence synthesis. To scale realistic QA construction, we designed an LLM-driven pipeline grounded in 11 high-frequency scientific question concepts. We evaluated DocHop-QA through four tasks spanning structured index prediction, generative answering, and multimodal integration, reflecting both discriminative and generative paradigms. These tasks demonstrate DocHop-QA's capacity to support complex, multimodal reasoning across multiple documents.
CVApr 22, 2025
Structure-Preserving Zero-Shot Image Editing via Stage-Wise Latent Injection in Diffusion ModelsDasol Jeong, Donggoo Kang, Jiwon Park et al.
We propose a diffusion-based framework for zero-shot image editing that unifies text-guided and reference-guided approaches without requiring fine-tuning. Our method leverages diffusion inversion and timestep-specific null-text embeddings to preserve the structural integrity of the source image. By introducing a stage-wise latent injection strategy-shape injection in early steps and attribute injection in later steps-we enable precise, fine-grained modifications while maintaining global consistency. Cross-attention with reference latents facilitates semantic alignment between the source and reference. Extensive experiments across expression transfer, texture transformation, and style infusion demonstrate state-of-the-art performance, confirming the method's scalability and adaptability to diverse image editing scenarios.
LGJun 29, 2021
GeoT: A Geometry-aware Transformer for Reliable Molecular Property Prediction and Chemically Interpretable Representation LearningBumju Kwak, Jiwon Park, Taewon Kang et al.
In recent years, molecular representation learning has emerged as a key area of focus in various chemical tasks. However, many existing models fail to fully consider the geometric information of molecular structures, resulting in less intuitive representations. Moreover, the widely used message-passing mechanism is limited to provide the interpretation of experimental results from a chemical perspective. To address these challenges, we introduce a novel Transformer-based framework for molecular representation learning, named the Geometry-aware Transformer (GeoT). GeoT learns molecular graph structures through attention-based mechanisms specifically designed to offer reliable interpretability, as well as molecular property prediction. Consequently, GeoT can generate attention maps of interatomic relationships associated with training objectives. In addition, GeoT demonstrates comparable performance to MPNN-based models while achieving reduced computational complexity. Our comprehensive experiments, including an empirical simulation, reveal that GeoT effectively learns the chemical insights into molecular structures, bridging the gap between artificial intelligence and molecular sciences.