Prateek Singh

CL
h-index2
3papers
Novelty50%
AI Score39

3 Papers

73.1CLJun 2
MM-BizRAG: Rethinking Multimodal Retrieval-Augmented Generation for General Purpose Enterprise Q&A

Hanoz Bhathena, Parin Rajesh Jhaveri, Rohan Mittal et al.

Recent advances in multimodal retrieval-augmented generation (MM-RAG) have shifted toward minimal parsing, relying on page-level images for producing retriever embeddings and for answer generation. While efficient, this trend often neglects explicit handling of the rich, structured information in complex enterprise documents, instead depending on pre-trained embeddings or vision-language models to implicitly capture such structure. In this work, we take a more direct approach: MM-BizRAG proactively extracts and represents document structure via a document structure-aware split that dynamically routes documents through orientation-specific ingestion pipelines, applying explicit layout-aware parsing for vertically structured documents (e.g., reports) and holistic page-level representations for horizontally structured documents (e.g., slide decks). A unified LLM-driven artifact transformation pipeline with placeholder-based positional alignment preserves natural reading order, while inference-time multimodal assembly decouples retrieval representations from generation context, enabling richer, more grounded answers without any finetuning requirement. Through experiments on a large, heterogeneous enterprise dataset and two public benchmarks (SlideVQA and FinRAGBench-V), MM-BizRAG consistently outperforms state-of-the-art vision-centric baselines by up to 32% points, with especially strong gains on report-style layouts. Furthermore, we introduce FastRAGEval, a single-call LLM Judge metric for fine-grained generative recall that halves RAGChecker's cost while achieving stronger human alignment.

CVNov 8, 2025
A Dual-Mode ViT-Conditioned Diffusion Framework with an Adaptive Conditioning Bridge for Breast Cancer Segmentation

Prateek Singh, Moumita Dholey, P. K. Vinod

In breast ultrasound images, precise lesion segmentation is essential for early diagnosis; however, low contrast, speckle noise, and unclear boundaries make this difficult. Even though deep learning models have demonstrated potential, standard convolutional architectures frequently fall short in capturing enough global context, resulting in segmentations that are anatomically inconsistent. To overcome these drawbacks, we suggest a flexible, conditional Denoising Diffusion Model that combines an enhanced UNet-based generative decoder with a Vision Transformer (ViT) encoder for global feature extraction. We introduce three primary innovations: 1) an Adaptive Conditioning Bridge (ACB) for efficient, multi-scale fusion of semantic features; 2) a novel Topological Denoising Consistency (TDC) loss component that regularizes training by penalizing structural inconsistencies during denoising; and 3) a dual-head architecture that leverages the denoising objective as a powerful regularizer, enabling a lightweight auxiliary head to perform rapid and accurate inference on smaller datasets and a noise prediction head. Our framework establishes a new state-of-the-art on public breast ultrasound datasets, achieving Dice scores of 0.96 on BUSI, 0.90 on BrEaST and 0.97 on BUS-UCLM. Comprehensive ablation studies empirically validate that the model components are critical for achieving these results and for producing segmentations that are not only accurate but also anatomically plausible.

SPJan 26, 2022
Automated Atrial Fibrillation Classification Based on Denoising Stacked Autoencoder and Optimized Deep Network

Prateek Singh, Ambalika Sharma, Shreesha Maiya

The incidences of atrial fibrillation (AFib) are increasing at a daunting rate worldwide. For the early detection of the risk of AFib, we have developed an automatic detection system based on deep neural networks. For achieving better classification, it is mandatory to have good pre-processing of physiological signals. Keeping this in mind, we have proposed a two-fold study. First, an end-to-end model is proposed to denoise the electrocardiogram signals using denoising autoencoders (DAE). To achieve denoising, we have used three networks including, convolutional neural network (CNN), dense neural network (DNN), and recurrent neural networks (RNN). Compared the three models and CNN based DAE performance is found to be better than the other two. Therefore, the signals denoised by the CNN based DAE were used to train the deep neural networks for classification. Three neural networks' performance has been evaluated using accuracy, specificity, sensitivity, and signal to noise ratio (SNR) as the evaluation criteria. The proposed end-to-end deep learning model for detecting atrial fibrillation in this study has achieved an accuracy rate of 99.20%, a specificity of 99.50%, a sensitivity of 99.50%, and a true positive rate of 99.00%. The average accuracy of the algorithms we compared is 96.26%, and our algorithm's accuracy is 3.2% higher than this average of the other algorithms. The CNN classification network performed better as compared to the other two. Additionally, the model is computationally efficient for real-time applications, and it takes approx 1.3 seconds to process 24 hours ECG signal. The proposed model was also tested on unseen dataset with different proportions of arrhythmias to examine the model's robustness, which resulted in 99.10% of recall and 98.50% of precision.