CVJun 18, 2025
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a ServiceXiasi Wang, Tianliang Yao, Simin Chen et al.
Vision-Language Models (VLMs) have demonstrated great potential in real-world applications. While existing research primarily focuses on improving their accuracy, the efficiency remains underexplored. Given the real-time demands of many applications and the high inference overhead of VLMs, efficiency robustness is a critical issue. However, previous studies evaluate efficiency robustness under unrealistic assumptions, requiring access to the model architecture and parameters -- an impractical scenario in ML-as-a-service settings, where VLMs are deployed via inference APIs. To address this gap, we propose VLMInferSlow, a novel approach for evaluating VLM efficiency robustness in a realistic black-box setting. VLMInferSlow incorporates fine-grained efficiency modeling tailored to VLM inference and leverages zero-order optimization to search for adversarial examples. Experimental results show that VLMInferSlow generates adversarial images with imperceptible perturbations, increasing the computational cost by up to 128.47%. We hope this research raises the community's awareness about the efficiency robustness of VLMs.
CVMay 21, 2024
Interpretable Interaction Modeling for Trajectory Prediction via Agent Selection and Physical CoefficientShiji Huang, Lei Ye, Min Chen et al.
A thorough understanding of the interaction between the target agent and surrounding agents is a prerequisite for accurate trajectory prediction. Although many methods have been explored, they assign correlation coefficients to surrounding agents in a purely learning-based manner. In this study, we present ASPILin, which manually selects interacting agents and replaces the attention scores in Transformer with a newly computed physical correlation coefficient, enhancing the interpretability of interaction modeling. Surprisingly, these simple modifications can significantly improve prediction performance and substantially reduce computational costs. We intentionally simplified our model in other aspects, such as map encoding. Remarkably, experiments conducted on the INTERACTION, highD, and CitySim datasets demonstrate that our method is efficient and straightforward, outperforming other state-of-the-art methods.
LGJan 4
Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth EmbeddingsPengfei Qu, Wenyu Ouyang, Chi Zhang et al.
Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe some of these differences, but they cannot fully represent the complexity of natural environments. This study examines whether AlphaEarth Foundation embeddings, which are learned from large collections of satellite images rather than designed by experts, offer a more informative way to describe basin characteristics. These embeddings summarize patterns in vegetation, land surface properties, and long-term environmental dynamics. We find that models using them achieve higher accuracy when predicting flows in basins not used for training, suggesting that they capture key physical differences more effectively than traditional attributes. We further investigate how selecting appropriate donor basins influences prediction in ungauged regions. Similarity based on the embeddings helps identify basins with comparable environmental and hydrological behavior, improving performance, whereas adding many dissimilar basins can reduce accuracy. The results show that satellite-informed environmental representations can strengthen hydrological forecasting and support the development of models that adapt more easily to different landscapes.
LGJan 12, 2021
Continental-scale streamflow modeling of basins with reservoirs: towards a coherent deep-learning-based strategyWenyu Ouyang, Kathryn Lawson, Dapeng Feng et al.
A large fraction of major waterways have dams influencing streamflow, which must be accounted for in large-scale hydrologic modeling. However, daily streamflow prediction for basins with dams is challenging for various modeling approaches, especially at large scales. Here we examined which types of dammed basins could be well represented by long short-term memory (LSTM) models using readily-available information, and delineated the remaining challenges. We analyzed data from 3557 basins (83% dammed) over the contiguous United States and noted strong impacts of reservoir purposes, degree of regulation (dor), and diversion on streamflow modeling. While a model trained on a widely-used reference-basin dataset performed poorly for non-reference basins, the model trained on the whole dataset presented a median Nash-Sutcliffe efficiency coefficient (NSE) of 0.74. The zero-dor, small-dor (with storage of approximately a month of average streamflow or less), and large-dor basins were found to have distinct behaviors, so migrating models between categories yielded catastrophic results, which means we must not treat small-dor basins as reference ones. However, training with pooled data from different sets yielded optimal median NSEs of 0.72, 0.79, and 0.64 for these respective groups, noticeably stronger than existing models. These results support a coherent modeling strategy where smaller dams (storing about a month of average streamflow or less) are modeled implicitly as part of basin rainfall-runoff processes; then, large-dor reservoirs of certain types can be represented explicitly. However, dammed basins must be present in the training dataset. Future work should examine separate modeling of large reservoirs for fire protection and irrigation, hydroelectric power generation, and flood control.