Monisha Jothi. R

3.3AINov 28, 2025

Chunking Strategies for Multimodal AI Systems

Shashanka B R, Mohith Charan R, Seema Banu F

Chunking has emerged as a critical technique that enhances generative models by grounding their responses in efficiently segmented knowledge [1]. While initially developed for unimodal (primarily textual) domains, recent advances in multimodal foundation models have extended chunking approaches to incorporate diverse data types, including images, audio, and video [2]. A critical component underpinning the success of these systems is the chunking strategy how large, continuous streams of multimodal data are segmented into semantically meaningful units suitable for processing [3]. Despite its importance, chunking remains an under-explored area, especially in the context of multimodal systems where modality-specific constraints, semantic preservation, and alignment across modalities introduce unique challenges. Our goal is to consolidating the landscape of multimodal chunking strategies, providing researchers and practitioners with a technical foundation and design space for developing more effective and efficient multimodal AI systems. This survey paves the way for innovations in robust chunking pipelines that scale with modality complexity, enhance processing accuracy, and improve generative coherence in real-world applications. This survey provides a comprehensive taxonomy and technical analysis of chunking strategies tailored for each modality: text, images, audio, video, and cross-modal data. We examine classical and modern approaches such as fixed-size token windowing, recursive text splitting, object-centric visual chunking, silence-based audio segmentation, and scene detection in videos. Each approach is analyzed in terms of its underlying methodology, supporting tools (e.g., LangChain, Detectron2, PySceneDetect), benefits, and challenges, particularly those related to granularity-context trade-offs and multimodal alignment. Furthermore, we explore emerging cross-modal chunking strategies that aim to preserve alignment and semantic consistency across disparate data types [4]. We also include comparative insights, highlight open problems such as asynchronous information density and noisy alignment signals, and identify opportunities for future research in adaptive, learning-based, and task-specific chunking.

0.2CLJan 19, 2019

Adaptive Artificial Intelligent Q&A Platform

M. R, Akram, C. P et al.

The paper presents an approach to build a question and answer system that is capable of processing the information in a large dataset and allows the user to gain knowledge from this dataset by asking questions in natural language form. Key content of this research covers four dimensions which are; Corpus Preprocessing, Question Preprocessing, Deep Neural Network for Answer Extraction and Answer Generation. The system is capable of understanding the question, responds to the user's query in natural language form as well. The goal is to make the user feel as if they were interacting with a person than a machine.

Monisha Jothi. R

2 Papers