Maximus Powers

CL
h-index8
3papers
1citation
Novelty50%
AI Score39

3 Papers

CLNov 10, 2025
ConvFill: Model Collaboration for Responsive Conversational Voice Agents

Vidya Srinivas, Zachary Englhardt, Maximus Powers et al.

Deploying conversational voice agents with large language models faces a critical challenge: cloud-based foundation models provide deep reasoning and domain knowledge but introduce latency that disrupts natural conversation, while on-device models respond immediately but lack sophistication. We propose conversational infill, a task where a lightweight on-device model generates contextually appropriate dialogue while seamlessly incorporating streaming knowledge from a powerful backend model. This approach decouples response latency from model capability, enabling systems that feel responsive while accessing the full power of large-scale models. We present ConvFill, a 360M parameter model trained on synthetic multi-domain conversations. Evaluation across multiple backend models shows that conversational infill can be successfully learned, with ConvFill achieving accuracy improvements of 36-42% over standalone small models of the same size while consistently retaining sub-200ms response latencies. Our results demonstrate the promise of this approach for building on-device conversational agents that are both immediately responsive and knowledgeable.

CLAug 31, 2025Code
Prompting Away Stereotypes? Evaluating Bias in Text-to-Image Models for Occupations

Shaina Raza, Maximus Powers, Partha Pratim Saha et al.

Text-to-Image (TTI) models are powerful creative tools but risk amplifying harmful social biases. We frame representational societal bias assessment as an image curation and evaluation task and introduce a pilot benchmark of occupational portrayals spanning five socially salient roles (CEO, Nurse, Software Engineer, Teacher, Athlete). Using five state-of-the-art models: closed-source (DALLE 3, Gemini Imagen 4.0) and open-source (FLUX.1-dev, Stable Diffusion XL Turbo, Grok-2 Image), we compare neutral baseline prompts against fairness-aware controlled prompts designed to encourage demographic diversity. All outputs are annotated for gender (male, female) and race (Asian, Black, White), enabling structured distributional analysis. Results show that prompting can substantially shift demographic representations, but with highly model-specific effects: some systems diversify effectively, others overcorrect into unrealistic uniformity, and some show little responsiveness. These findings highlight both the promise and the limitations of prompting as a fairness intervention, underscoring the need for complementary model-level strategies. We release all code and data for transparency and reproducibility https://github.com/maximus-powers/img-gen-bias-analysis.

AIDec 22, 2024
ViLBias: Detecting and Reasoning about Bias in Multimodal Content

Shaina Raza, Caesar Saleh, Azib Farooq et al.

Detecting bias in multimodal news requires models that reason over text--image pairs, not just classify text. In response, we present ViLBias, a VQA-style benchmark and framework for detecting and reasoning about bias in multimodal news. The dataset comprises 40,945 text--image pairs from diverse outlets, each annotated with a bias label and concise rationale using a two-stage LLM-as-annotator pipeline with hierarchical majority voting and human-in-the-loop validation. We evaluate Small Language Models (SLMs), Large Language Models (LLMs), and Vision--Language Models (VLMs) across closed-ended classification and open-ended reasoning (oVQA), and compare parameter-efficient tuning strategies. Results show that incorporating images alongside text improves detection accuracy by 3--5\%, and that LLMs/VLMs better capture subtle framing and text--image inconsistencies than SLMs. Parameter-efficient methods (LoRA/QLoRA/Adapters) recover 97--99\% of full fine-tuning performance with $<5\%$ trainable parameters. For oVQA, reasoning accuracy spans 52--79\% and faithfulness 68--89\%, both improved by instruction tuning; closed accuracy correlates strongly with reasoning ($r = 0.91$). ViLBias offers a scalable benchmark and strong baselines for multimodal bias detection and rationale quality.