28.5CVMar 17
Understanding Pruning Regimes in Vision-Language Models Through Domain-Aware Layer SelectionSaeed Khaki, Nima Safaei, Kamal Ginotra
Transformer-based vision-language models (VLMs) contain substantial depth redundancy, yet the effect of removing specific decoder layers remains poorly understood, especially for domains that require tight coupling between perception and multi-step reasoning. We study structured decoder layer pruning through the lens of domain-aware activation similarity, measuring how strongly each layer transforms representations for math versus non-math inputs. This yields simple math-aware, non-math-aware, and mixed ranking criteria that identify layers whose input-output activations change least within a target domain. Across two state-of-the-art VLMs and a broad suite of math and general multimodal benchmarks, we uncover a consistent three-regime structure: at low pruning budgets, performance is highly sensitive to which layers are removed; at moderate budgets, methods converge as structural damage accumulates; and at high budgets, structural continuity dominates, favoring spacing-aware strategies. Our domain-aware rankings achieve the strongest stability in the ranking-sensitive regime, while matching or exceeding structure-aware baselines at larger budgets. These results provide a clearer picture of how depth contributes to domain-specific behavior in VLMs and offer a practical, interpretable approach to reducing model depth without sacrificing essential mathematical or general vision-language capabilities.
AIJan 20
VisTIRA: Closing the Image-Text Modality Gap in Visual Math Reasoning via Structured Tool IntegrationSaeed Khaki, Ashudeep Singh, Nima Safaei et al.
Vision-language models (VLMs) lag behind text-only language models on mathematical reasoning when the same problems are presented as images rather than text. We empirically characterize this as a modality gap: the same question in text form yields markedly higher accuracy than its visually typeset counterpart, due to compounded failures in reading dense formulas, layout, and mixed symbolic-diagrammatic context. First, we introduce VisTIRA (Vision and Tool-Integrated Reasoning Agent), a tool-integrated reasoning framework that enables structured problem solving by iteratively decomposing a given math problem (as an image) into natural language rationales and executable Python steps to determine the final answer. Second, we build a framework to measure and improve visual math reasoning: a LaTeX-based pipeline that converts chain-of-thought math corpora (e.g., NuminaMath) into challenging image counterparts, and a large set of synthetic tool-use trajectories derived from a real-world, homework-style image dataset (called SnapAsk) for fine-tuning VLMs. Our experiments show that tool-integrated supervision improves image-based reasoning, and OCR grounding can further narrow the gap for smaller models, although its benefit diminishes at scale. These findings highlight that modality gap severity inversely correlates with model size, and that structured reasoning and OCR-based grounding are complementary strategies for advancing visual mathematical reasoning.
LGMay 4, 2021
Winter wheat yield prediction using convolutional neural networks from environmental and phenological dataAmit Kumar Srivastava, Nima Safaei, Saeed Khaki et al.
Crop yield forecasting depends on many interactive factors, including crop genotype, weather, soil, and management practices. This study analyzes the performance of machine learning and deep learning methods for winter wheat yield prediction using an extensive dataset of weather, soil, and crop phenology variables in 271 counties across Germany from 1999 to 2019. We proposed a Convolutional Neural Network (CNN) model, which uses a 1-dimensional convolution operation to capture the time dependencies of environmental variables. We used eight supervised machine learning models as baselines and evaluated their predictive performance using RMSE, MAE, and correlation coefficient metrics to benchmark the yield prediction results. Our findings suggested that nonlinear models such as the proposed CNN, Deep Neural Network (DNN), and XGBoost were more effective in understanding the relationship between the crop yield and input data compared to the linear models. Our proposed CNN model outperformed all other baseline models used for winter wheat yield prediction (7 to 14% lower RMSE, 3 to 15% lower MAE, and 4 to 50% higher correlation coefficient than the best performing baseline across test data). We aggregated soil moisture and meteorological features at the weekly resolution to address the seasonality of the data. We also moved beyond prediction and interpreted the outputs of our proposed CNN model using SHAP and force plots which provided key insights in explaining the yield prediction results (importance of variables by time). We found DUL, wind speed at week ten, and radiation amount at week seven as the most critical features in winter wheat yield prediction.
CVMar 17, 2021
WheatNet: A Lightweight Convolutional Neural Network for High-throughput Image-based Wheat Head Detection and CountingSaeed Khaki, Nima Safaei, Hieu Pham et al.
For a globally recognized planting breeding organization, manually-recorded field observation data is crucial for plant breeding decision making. However, certain phenotypic traits such as plant color, height, kernel counts, etc. can only be collected during a specific time-window of a crop's growth cycle. Due to labor-intensive requirements, only a small subset of possible field observations are recorded each season. To help mitigate this data collection bottleneck in wheat breeding, we propose a novel deep learning framework to accurately and efficiently count wheat heads to aid in the gathering of real-time data for decision making. We call our model WheatNet and show that our approach is robust and accurate for a wide range of environmental conditions of the wheat field. WheatNet uses a truncated MobileNetV2 as a lightweight backbone feature extractor which merges feature maps with different scales to counter image scale variations. Then, extracted multi-scale features go to two parallel sub-networks for simultaneous density-based counting and localization tasks. Our proposed method achieves an MAE and RMSE of 3.85 and 5.19 in our wheat head counting task, respectively, while having significantly fewer parameters when compared to other state-of-the-art methods. Our experiments and comparisons with other state-of-the-art methods demonstrate the superiority and effectiveness of our proposed method.
LGDec 2, 2020
Regularization and False Alarms Quantification: Two Sides of the Explainability CoinNima Safaei, Pooria Assadi
Regularization is a well-established technique in machine learning (ML) to achieve an optimal bias-variance trade-off which in turn reduces model complexity and enhances explainability. To this end, some hyper-parameters must be tuned, enabling the ML model to accurately fit the unseen data as well as the seen data. In this article, the authors argue that the regularization of hyper-parameters and quantification of costs and risks of false alarms are in reality two sides of the same coin, explainability. Incorrect or non-existent estimation of either quantities undermines the measurability of the economic value of using ML, to the extent that might make it practically useless.
LGDec 4, 2019
Correspondent Banking Networks: Theory and ExperimentNima Safaei, Ivan A. Sergienko
We employ the mathematical programming approach in conjunction with the graph theory to study the structure of correspondent banking networks. Optimizing the network requires decisions to be made to onboard, terminate or restrict the bank relationships to optimize the size and overall risk of the network. This study provides theoretical foundation to detect the components, the removal of which does not affect some key properties of the network such as connectivity and diameter. We find that the correspondent banking networks have a feature we call k-accessibility, which helps to drastically reduce the computational burden required for finding the above mentioned components. We prove a number of fundamental theorems related to k-accessible directed graphs, which should be also applicable beyond the particular problem of financial networks. The theoretical findings are verified through the data from a large international bank.
AIMar 3, 2018
A Swift Heuristic Method for Work Order Scheduling under the Skilled-Workforce ConstraintNima Safaei, Corey Kiassat
The considered problem is how to optimally allocate a set of jobs to technicians of different skills such that the number of technicians of each skill does not exceed the number of persons with that skill designation. The key motivation is the quick sensitivity analysis in terms of the workforce size which is quite necessary in many industries in the presence of unexpected work orders. A time-indexed mathematical model is proposed to minimize the total weighted completion time of the jobs. The proposed model is decomposed into a number of single-skill sub-problems so that each one is a combination of a series of nested binary Knapsack problems. A heuristic procedure is proposed to solve the problem. Our experimental results, based on a real-world case study, reveal that the proposed method quickly produces a schedule statistically close to the optimal one while the classical optimal procedure is very time-consuming.