Sameer Segal

CL
h-index47
3papers
410citations
Novelty33%
AI Score27

3 Papers

CLMar 22, 2023
MEGA: Multilingual Evaluation of Generative AI

Kabir Ahuja, Harshita Diddee, Rishav Hada et al. · microsoft-research

Generative AI models have shown impressive performance on many Natural Language Processing tasks such as language understanding, reasoning, and language generation. An important question being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages. We compare the performance of generative LLMs including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.

LGDec 17, 2024
An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions

Shreeyash Gowaikar, Srinivasan Iyengar, Sameer Segal et al.

The Piping and Instrumentation Diagrams (P&IDs) are foundational to the design, construction, and operation of workflows in the engineering and process industries. However, their manual creation is often labor-intensive, error-prone, and lacks robust mechanisms for error detection and correction. While recent advancements in Generative AI, particularly Large Language Models (LLMs) and Vision-Language Models (VLMs), have demonstrated significant potential across various domains, their application in automating generation of engineering workflows remains underexplored. In this work, we introduce a novel copilot for automating the generation of P&IDs from natural language descriptions. Leveraging a multi-step agentic workflow, our copilot provides a structured and iterative approach to diagram creation directly from Natural Language prompts. We demonstrate the feasibility of the generation process by evaluating the soundness and completeness of the workflow, and show improved results compared to vanilla zero-shot and few-shot generation approaches.

CLOct 25, 2024
KAHANI: Culturally-Nuanced Visual Storytelling Tool for Non-Western Cultures

Hamna, Deepthi Sudharsan, Agrima Seth et al.

Large Language Models (LLMs) and Text-To-Image (T2I) models have demonstrated the ability to generate compelling text and visual stories. However, their outputs are predominantly aligned with the sensibilities of the Global North, often resulting in an outsider's gaze on other cultures. As a result, non-Western communities have to put extra effort into generating culturally specific stories. To address this challenge, we developed a visual storytelling tool called Kahani that generates culturally grounded visual stories for non-Western cultures. Our tool leverages off-the-shelf models GPT-4 Turbo and Stable Diffusion XL (SDXL). By using Chain of Thought (CoT) and T2I prompting techniques, we capture the cultural context from user's prompt and generate vivid descriptions of the characters and scene compositions. To evaluate the effectiveness of Kahani, we conducted a comparative user study with ChatGPT-4 (with DALL-E3) in which participants from different regions of India compared the cultural relevance of stories generated by the two tools. The results of the qualitative and quantitative analysis performed in the user study show that Kahani's visual stories are more culturally nuanced than those generated by ChatGPT-4. In 27 out of 36 comparisons, Kahani outperformed or was on par with ChatGPT-4, effectively capturing cultural nuances and incorporating more Culturally Specific Items (CSI), validating its ability to generate culturally grounded visual stories.