Shayan Vassef

AI
h-index8
4papers
14citations
Novelty49%
AI Score47

4 Papers

AIJun 2
VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari et al.

Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

AIJul 24, 2025
Agentic AI framework for End-to-End Medical Data Inference

Soorya Ram Shimgekar, Shayan Vassef, Abhay Goyal et al.

Building and deploying machine learning solutions in healthcare remains expensive and labor-intensive due to fragmented preprocessing workflows, model compatibility issues, and stringent data privacy constraints. In this work, we introduce an Agentic AI framework that automates the entire clinical data pipeline, from ingestion to inference, through a system of modular, task-specific agents. These agents handle both structured and unstructured data, enabling automatic feature selection, model selection, and preprocessing recommendation without manual intervention. We evaluate the system on publicly available datasets from geriatrics, palliative care, and colonoscopy imaging. For example, in the case of structured data (anxiety data) and unstructured data (colonoscopy polyps data), the pipeline begins with file-type detection by the Ingestion Identifier Agent, followed by the Data Anonymizer Agent ensuring privacy compliance, where we first identify the data type and then anonymize it. The Feature Extraction Agent identifies features using an embedding-based approach for tabular data, extracting all column names, and a multi-stage MedGemma-based approach for image data, which infers modality and disease name. These features guide the Model-Data Feature Matcher Agent in selecting the best-fit model from a curated repository. The Preprocessing Recommender Agent and Preprocessing Implementor Agent then apply tailored preprocessing based on data type and model requirements. Finally, the ``Model Inference Agent" runs the selected model on the uploaded data and generates interpretable outputs using tools like SHAP, LIME, and DETR attention maps. By automating these high-friction stages of the ML lifecycle, the proposed framework reduces the need for repeated expert intervention, offering a scalable, cost-efficient pathway for operationalizing AI in clinical environments.

AIAug 22, 2025
One VLM, Two Roles: Stage-Wise Routing and Specialty-Level Deployment for Clinical Workflows

Shayan Vassef, Soorya Ram Shimegekar, Abhay Goyal et al.

Clinical ML workflows are often fragmented and inefficient: triage, task selection, and model deployment are handled by a patchwork of task-specific networks. These pipelines are rarely aligned with data-science practice, reducing efficiency and increasing operational cost. They also lack data-driven model identification (from imaging/tabular inputs) and standardized delivery of model outputs. We present a framework that employs a single vision-language model (VLM) in two complementary, modular roles. First (Solution 1): the VLM acts as an aware model-card matcher that routes an incoming image to the appropriate specialist model via a three-stage workflow (modality -> primary abnormality -> model-card ID). Reliability is improved by (i) stage-wise prompts enabling early termination via "None"/"Other" and (ii) a calibrated top-2 answer selector with a stage-wise cutoff. This raises routing accuracy by +9 and +11 percentage points on the training and held-out splits, respectively, compared with a baseline router, and improves held-out calibration (lower Expected Calibration Error, ECE). Second (Solution 2): we fine-tune the same VLM on specialty-specific datasets so that one model per specialty covers multiple downstream tasks, simplifying deployment while maintaining performance. Across gastroenterology, hematology, ophthalmology, pathology, and radiology, this single-model deployment matches or approaches specialized baselines. Together, these solutions reduce data-science effort through more accurate selection, simplify monitoring and maintenance by consolidating task-specific models, and increase transparency via per-stage justifications and calibrated thresholds. Each solution stands alone, and in combination they offer a practical, modular path from triage to deployment.

CLJul 21, 2025
SOI Matters: Analyzing Multi-Setting Training Dynamics in Pretrained Language Models via Subsets of Interest

Shayan Vassef, Amirhossein Dabiriaghdam, Mohammadreza Bakhtiari et al.

This work investigates the impact of multi-task, multi-lingual, and multi-source learning approaches on the robustness and performance of pretrained language models. To enhance this analysis, we introduce Subsets of Interest (SOI), a novel categorization framework that identifies six distinct learning behavior patterns during training, including forgettable examples, unlearned examples, and always correct examples. Through SOI transition heatmaps and dataset cartography visualization, we analyze how examples shift between these categories when transitioning from single-setting to multi-setting configurations. We perform comprehensive experiments across three parallel comparisons: multi-task vs. single-task learning using English tasks (entailment, paraphrase, sentiment), multi-source vs. single-source learning using sentiment analysis datasets, and multi-lingual vs. single-lingual learning using intent classification in French, English, and Persian. Our results demonstrate that multi-source learning consistently improves out-of-distribution performance by up to 7%, while multi-task learning shows mixed results with notable gains in similar task combinations. We further introduce a two-stage fine-tuning approach where the second stage leverages SOI-based subset selection to achieve additional performance improvements. These findings provide new insights into training dynamics and offer practical approaches for optimizing multi-setting language model performance.