Devichand Budagam

CL
3papers
7citations
Novelty50%
AI Score35

3 Papers

CLJan 9
Router-Suggest: Dynamic Routing for Multimodal Auto-Completion in Visually-Grounded Dialogs

Sandeep Mishra, Devichand Budagam, Anubhab Mandal et al.

Real-time multimodal auto-completion is essential for digital assistants, chatbots, design tools, and healthcare consultations, where user inputs rely on shared visual context. We introduce Multimodal Auto-Completion (MAC), a task that predicts upcoming characters in live chats using partially typed text and visual cues. Unlike traditional text-only auto-completion (TAC), MAC grounds predictions in multimodal context to better capture user intent. To enable this task, we adapt MMDialog and ImageChat to create benchmark datasets. We evaluate leading vision-language models (VLMs) against strong textual baselines, highlighting trade-offs in accuracy and efficiency. We present Router-Suggest, a router framework that dynamically selects between textual models and VLMs based on dialog context, along with a lightweight variant for resource-constrained environments. Router-Suggest achieves a 2.3x to 10x speedup over the best-performing VLM. A user study shows that VLMs significantly excel over textual models on user satisfaction, notably saving user typing effort and improving the quality of completions in multi-turn conversations. These findings underscore the need for multimodal context in auto-completions, leading to smarter, user-aware assistants.

CLJun 18, 2024
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles

Devichand Budagam, Ashutosh Kumar, Mahsa Khoshnoodi et al.

Assessing the effectiveness of large language models (LLMs) in performing different tasks is crucial for understanding their strengths and weaknesses. This paper presents Hierarchical Prompting Taxonomy (HPT), grounded on human cognitive principles and designed to assess LLMs by examining the cognitive demands of various tasks. The HPT utilizes the Hierarchical Prompting Framework (HPF), which structures five unique prompting strategies in a hierarchical order based on their cognitive requirement on LLMs when compared to human mental capabilities. It assesses the complexity of tasks with the Hierarchical Prompting Index (HPI), which demonstrates the cognitive competencies of LLMs across diverse datasets and offers insights into the cognitive demands that datasets place on different LLMs. This approach enables a comprehensive evaluation of an LLMs problem solving abilities and the intricacy of a dataset, offering a standardized metric for task complexity. Extensive experiments with multiple datasets and LLMs show that HPF enhances LLM performance by 2% to 63% compared to baseline performance, with GSM8k being the most cognitively complex task among reasoning and coding tasks with an average HPI of 3.20 confirming the effectiveness of HPT. To support future research and reproducibility in this domain, the implementations of HPT and HPF are available here.

CVJun 6, 2024
OralBBNet: Spatially Guided Dental Segmentation of Panoramic X-Rays with Bounding Box Priors

Devichand Budagam, Azamat Zhanatuly Imanbayev, Iskander Rafailovich Akhmetov et al.

Teeth segmentation and recognition play a vital role in a variety of dental applications and diagnostic procedures. The integration of deep learning models has facilitated the development of precise and automated segmentation methods. Although prior research has explored teeth segmentation, not many methods have successfully performed tooth segmentation and detection simultaneously. This study presents UFBA-425, a dental dataset derived from the UFBA-UESC dataset, featuring bounding box and polygon annotations for 425 panoramic dental X-rays. In addition, this paper presents the OralBBNet architecture, which is based on the best segmentation and detection qualities of architectures such as U-Net and YOLOv8, respectively. OralBBNet is designed to improve the accuracy and robustness of tooth classification and segmentation on panoramic X-rays by leveraging the complementary strengths of U-Net and YOLOv8. Our approach achieved a 1-3% improvement in mean average precision (mAP) for tooth detection compared to existing techniques and a 15-20% improvement in the dice score for teeth segmentation over state-of-the-art (SOTA) solutions for various tooth categories and 2-4% improvement in the dice score compared to other SOTA segmentation architectures. The results of this study establish a foundation for the wider implementation of object detection models in dental diagnostics.