CVOct 24, 2025Code
VLM-SlideEval: Evaluating VLMs on Structured Comprehension and Perturbation Sensitivity in PPTHyeonsu Kang, Emily Bao, Anjan Goswami
Vision-language models (VLMs) are increasingly used to evaluate multimodal content, including presentation slides, yet their slide-specific understanding remains underexplored {despite their growing role as critics in agentic, model-forward pipelines}. We introduce VLM-SlideEval, an evaluation framework that probes VLMs along three axes: (1) element-level extraction from slide images aligned to ground truth; (2) robustness to controlled perturbations in geometry, style, and text; and (3) higher-level comprehension, such as recovering a deck's narrative order from shuffled slides. Using publicly available decks from Zenodo (https://huggingface.co/datasets/Forceless/Zenodo10K/viewer/default/pptx), we standardize ground-truth element metadata from PowerPoint XML and live renderings into a unified, verifiable schema. Empirically, VLMs underperform on pixel-accurate extraction and show non-trivial agreement, fidelity, and consistency under controlled perturbations, while performing better on single-slide content understanding; however, they do not reliably capture narrative structure across slides. These results highlight the limits of current VLMs for slide evaluation and motivate calibrated, critic-in-the-loop evaluators that drive iterative refinement and selection in agentic pipelines.
IRFeb 20, 2020
Towards a Soft Faceted Browsing Scheme for Information AccessYinan Zhang, Parikshit Sondhi, Anjan Goswami et al.
Faceted browsing is a commonly supported feature of user interfaces for access to information. Existing interfaces generally treat facet values selected by a user as hard filters and respond to the user by only displaying information items strictly satisfying the filters and in their original ranking order. We propose a novel alternative strategy for faceted browsing, called soft faceted browsing, where the system also includes some possibly relevant items outside the selected filter in a non-intrusive way and re-ranks the items to better satisfy the user's information need. Such a soft faceted browsing strategy can be beneficial when the user does not have a very confident and strict preference for the selected facet values, and is especially appropriate for applications such as e-commerce search where the user would like to explore a larger space before finalizing a purchasing decision. We propose a probabilistic framework for modeling and solving the soft faceted browsing problem, and apply the framework to study the case of facet filter selection in e-commerce search engines. Preliminary experiment results demonstrate the soft faceted browsing scheme is better than the traditional faceted browsing scheme in terms of its efficiency in helping users navigate in the information space.