37.9CVMay 15Code
TriALS: Triphasic-Aided Liver Lesion Segmentation Benchmark in Non-Contrast CTMarawan Elbatel, Mohamed Ghonim, Jiaji Mao et al.
Automated segmentation of liver lesions on non-contrast computed tomography (NCCT) is clinically important but fundamentally challenging, particularly in low-resource settings across Africa and Asia where contrast agents are frequently unavailable. Progress has been limited by the absence of annotated NCCT benchmarks. Here we describe the TriALS challenge for automated liver lesion segmentation under contrast-limited conditions, supported by a multi-centre dataset of 150 cases with four-phase CT acquisitions (600 volumes) from Egyptian and Chinese institutions. Algorithms were evaluated on 70 cases from three institutions, including an independent external cohort. The top-performing method achieved a mean venous-phase Dice of 0.754, consistent with human-level performance, yet dropped to 0.57 on NCCT. On external validation, the leading method outperformed off-the-shelf models by up to 28% in Dice on NCCT. Algorithm performance was most strongly predicted by training data scale and pre-training strategy. A cross-year comparison exposed a persistent perceptual barrier on NCCT that scaling pre-training alone cannot overcome. Data, annotations, and code are available at https://github.com/xmed-lab/TriALS.
66.9DCMar 26
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline OptimizationHyeonjun An, Sihyun Kim, Chaerim Lim et al.
Multimodal Large Language Models (MLLMs) have achieved remarkable advances by integrating text, image, and audio understanding within a unified architecture. However, existing distributed training frameworks remain fundamentally data-blind: they parallelize computation without accounting for variations in input data characteristics. This data unawareness leads to severe computation skew across stages and microbatches, where heterogeneous multimodal inputs incur different processing costs. Consequently, GPU resources are unevenly utilized, synchronization delays accumulate, and overall training efficiency degrades. To address this limitation, we present DFLOP, a data-driven framework for multimodal LLM training pipeline optimization. DFLOP continuously profiles runtime behavior to capture data-induced computation variance and employs predictive scheduling to balance workloads across stages and microbatches. By coupling data characteristics with execution planning, DFLOP substantially improves GPU utilization and throughput. Extensive experiments on large-scale multimodal benchmarks show that DFLOP achieves up to 3.6x faster training compared to state-of-the-art distributed training frameworks.
IRJan 12
ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval SystemSungguk Cha, DongWook Kim, Mintae Kim et al.
Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, this expressiveness comes at a staggering cost: storing embeddings for every token inflates index sizes by over $1000\times$ compared to single-vector approaches, severely limiting scalability. We introduce \textbf{ReinPool}, a reinforcement learning framework that learns to dynamically filter and pool multi-vector embeddings into compact, retrieval-optimized representations. By training with an inverse retrieval objective and NDCG-based rewards, ReinPool identifies and retains only the most discriminative vectors without requiring manual importance annotations. On the Vidore V2 benchmark across three vision-language embedding models, ReinPool compresses multi-vector representations by $746$--$1249\times$ into single vectors while recovering 76--81\% of full multi-vector retrieval performance. Compared to static mean pooling baselines, ReinPool achieves 22--33\% absolute NDCG@3 improvement, demonstrating that learned selection significantly outperforms heuristic aggregation.
27.1CYApr 29
The Synthetic Social Graph: Emergent Behavior in AI Agent CommunitiesSungguk Cha, DongWook Kim
Large language model (LLM) agents are increasingly deployed in social settings, yet little is known about how they interact in open-ended environments. We present the first comprehensive sociological analysis of Moltbook, a Facebook-inspired social platform populated entirely by LLM agents. Analyzing 184,203 posts and 465,136 comments across 14 daily snapshots (2026-04-14 to 2026-04-28), we examine agent sociality through six research questions grounded in classical social theory: bonding vs. bridging communities, status hierarchies, temporal coordination, information diffusion, identity performance, and norm enforcement. Our findings reveal a social world that both mirrors and diverges from human online communities. Reciprocity is strikingly low (3.8% multi-day vs. 1.6% single-day; below the 10-30% range typical of human baselines), suggesting "attention bonding without exchange bonding." Prestige is heavy-tailed (top score 104.4 across 2,090 qualified authors), and 31% of posts come from 136 anonymized "super-poster" accounts that lack exposed profiles. Temporal activity is broadly flat across the day with a sustained 12:00-20:00 UTC working-hour band; k-means recovers six distinct temporal communities. Of 458 bridge agents, 325 carry at least one tracked viral phrase; 99.7% of those (324/325) are late amplifiers, not early adopters. Identity performance shows no unconditional engagement payoff (-72%), but stratifying by post-volume quartile reverses the sign in the upper half of the distribution -- a Simpson's-paradox effect rather than a uniform penalty. Most remarkably, downvotes are rare (0.9%), and a comment-sentiment test rejects the alternative-channel hypothesis: textual sanction is also absent. We frame these patterns through a "parasocial simulators" construct.
LGJan 1, 2024
Unsupervised Outlier Detection using Random Subspace and Subsampling Ensembles of Dirichlet Process MixturesDongwook Kim, Juyeon Park, Hee Cheol Chung et al.
Probabilistic mixture models are recognized as effective tools for unsupervised outlier detection owing to their interpretability and global characteristics. Among these, Dirichlet process mixture models stand out as a strong alternative to conventional finite mixture models for both clustering and outlier detection tasks. Unlike finite mixture models, Dirichlet process mixtures are infinite mixture models that automatically determine the number of mixture components based on the data. Despite their advantages, the adoption of Dirichlet process mixture models for unsupervised outlier detection has been limited by challenges related to computational inefficiency and sensitivity to outliers in the construction of outlier detectors. Additionally, Dirichlet process Gaussian mixtures struggle to effectively model non-Gaussian data with discrete or binary features. To address these challenges, we propose a novel outlier detection method that utilizes ensembles of Dirichlet process Gaussian mixtures. This unsupervised algorithm employs random subspace and subsampling ensembles to ensure efficient computation and improve the robustness of the outlier detector. The ensemble approach further improves the suitability of the proposed method for detecting outliers in non-Gaussian data. Furthermore, our method uses variational inference for Dirichlet process mixtures, which ensures both efficient and rapid computation. Empirical analyses using benchmark datasets demonstrate that our method outperforms existing approaches in unsupervised outlier detection.
CVJul 31, 2025
Generalized Reinforcement Learning for Retriever-Specific Query Rewriter with Unstructured Real-World DocumentsSungguk Cha, DongWook Kim, Taeseung Hahn et al.
Retrieval-Augmented Generation (RAG) systems rely heavily on effective query formulation to unlock external knowledge, yet optimizing queries for diverse, unstructured real-world documents remains a challenge. We introduce \textbf{RL-QR}, a reinforcement learning framework for retriever-specific query rewriting that eliminates the need for human-annotated datasets and extends applicability to both text-only and multi-modal databases. By synthesizing scenario-question pairs and leveraging Generalized Reward Policy Optimization (GRPO), RL-QR trains query rewriters tailored to specific retrievers, enhancing retrieval performance across varied domains. Experiments on industrial in-house data demonstrate significant improvements, with $\text{RL-QR}_{\text{multi-modal}}$ achieving an 11\% relative gain in NDCG@3 for multi-modal RAG and $\text{RL-QR}_{\text{lexical}}$ yielding a 9\% gain for lexical retrievers. However, challenges persist with semantic and hybrid retrievers, where rewriters failed to improve performance, likely due to training misalignments. Our findings highlight RL-QR's potential to revolutionize query optimization for RAG systems, offering a scalable, annotation-free solution for real-world retrieval tasks, while identifying avenues for further refinement in semantic retrieval contexts.
LGAug 8, 2025
A New Lens on Homelessness: Daily Tent Monitoring with 311 Calls and Street ImagesWooyong Jung, Sola Kim, Dongwook Kim et al.
Homelessness in the United States has surged to levels unseen since the Great Depression. However, existing methods for monitoring it, such as point-in-time (PIT) counts, have limitations in terms of frequency, consistency, and spatial detail. This study proposes a new approach using publicly available, crowdsourced data, specifically 311 Service Calls and street-level imagery, to track and forecast homeless tent trends in San Francisco. Our predictive model captures fine-grained daily and neighborhood-level variations, uncovering patterns that traditional counts often overlook, such as rapid fluctuations during the COVID-19 pandemic and spatial shifts in tent locations over time. By providing more timely, localized, and cost-effective information, this approach serves as a valuable tool for guiding policy responses and evaluating interventions aimed at reducing unsheltered homelessness.
CVMar 28, 2025
Dataset Distillation of 3D Point Clouds via Distribution MatchingJae-Young Yim, Dongwook Kim, Jae-Young Sim
Large-scale datasets are usually required to train deep neural networks, but it increases the computational complexity hindering the practical applications. Recently, dataset distillation for images and texts has been attracting a lot of attention, that reduces the original dataset to a synthetic dataset to alleviate the computational burden of training while preserving essential task-relevant information. However, the dataset distillation for 3D point clouds remains largely unexplored, as the point clouds exhibit fundamentally different characteristics from that of images, making the dataset distillation more challenging. In this paper, we propose a distribution matching-based distillation framework for 3D point clouds that jointly optimizes the geometric structures as well as the orientations of the synthetic 3D objects. To address the semantic misalignment caused by unordered indexing of points, we introduce a Semantically Aligned Distribution Matching loss computed on the sorted features in each channel. Moreover, to address the rotation variation, we jointly learn the optimal rotation angles while updating the synthetic dataset to better align with the original feature distribution. Extensive experiments on widely used benchmark datasets demonstrate that the proposed method consistently outperforms existing dataset distillation methods, achieving superior accuracy and strong cross-architecture generalization.