CYMar 16Code
InterveneBench: Benchmarking LLMs for Intervention Reasoning and Causal Study Design in Real Social SystemsShaojie Shi, Zhengyu Shi, Lingran Zheng et al.
Causal inference in social science relies on end-to-end, intervention-centered research-design reasoning grounded in real-world policy interventions, but current benchmarks fail to evaluate this capability of large language models (LLMs). We present InterveneBench, a benchmark designed to assess such reasoning in realistic social settings. Each instance in InterveneBench is derived from an empirical social science study and requires models to reason about policy interventions and identification assumptions without access to predefined causal graphs or structural equations. InterveneBench comprises 744 peer-reviewed studies across diverse policy domains. Experimental results show that state-of-the-art LLMs struggle under this setting. To address this limitation, we further propose a multi-agent framework, STRIDES. It achieves significant performance improvements over state-of-the-art reasoning models. Our code and data are available at https://github.com/Sii-yuning/STRIDES.
CVDec 22, 2024Code
Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target DetectionJiangnan Yang, Shuangli Liu, Jingjun Wu et al.
These recent years have witnessed that convolutional neural network (CNN)-based methods for detecting infrared small targets have achieved outstanding performance. However, these methods typically employ standard convolutions, neglecting to consider the spatial characteristics of the pixel distribution of infrared small targets. Therefore, we propose a novel pinwheel-shaped convolution (PConv) as a replacement for standard convolutions in the lower layers of the backbone network. PConv better aligns with the pixel Gaussian spatial distribution of dim small targets, enhances feature extraction, significantly increases the receptive field, and introduces only a minimal increase in parameters. Additionally, while recent loss functions combine scale and location losses, they do not adequately account for the varying sensitivity of these losses across different target scales, limiting detection performance on dim-small targets. To overcome this, we propose a scale-based dynamic (SD) Loss that dynamically adjusts the influence of scale and location losses based on target size, improving the network's ability to detect targets of varying scales. We construct a new benchmark, SIRST-UAVB, which is the largest and most challenging dataset to date for real-shot single-frame infrared small target detection. Lastly, by integrating PConv and SD Loss into the latest small target detection algorithms, we achieved significant performance improvements on IRSTD-1K and our SIRST-UAVB dataset, validating the effectiveness and generalizability of our approach. Code -- https://github.com/JN-Yang/PConv-SDloss-Data
LGNov 27, 2024Code
DualCast: A Model to Disentangle Aperiodic Events from Traffic SeriesXinyu Su, Feng Liu, Yanchuan Chang et al.
Traffic forecasting is crucial for transportation systems optimisation. Current models minimise the mean forecasting errors, often favouring periodic events prevalent in the training data, while overlooking critical aperiodic ones like traffic incidents. To address this, we propose DualCast, a dual-branch framework that disentangles traffic signals into intrinsic spatial-temporal patterns and external environmental contexts, including aperiodic events. DualCast also employs a cross-time attention mechanism to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets. Our source code is available at https://github.com/suzy0223/DualCast.
AIJan 29
Zero-Shot Statistical Downscaling via Diffusion Posterior SamplingRuian Tie, Wenbo Xiong, Zhengyu Shi et al.
Conventional supervised climate downscaling struggles to generalize to Global Climate Models (GCMs) due to the lack of paired training data and inherent domain gaps relative to reanalysis. Meanwhile, current zero-shot methods suffer from physical inconsistencies and vanishing gradient issues under large scaling factors. We propose Zero-Shot Statistical Downscaling (ZSSD), a zero-shot framework that performs statistical downscaling without paired data during training. ZSSD leverages a Physics-Consistent Climate Prior learned from reanalysis data, conditioned on geophysical boundaries and temporal information to enforce physical validity. Furthermore, to enable robust inference across varying GCMs, we introduce Unified Coordinate Guidance. This strategy addresses the vanishing gradient problem in vanilla DPS and ensures consistency with large-scale fields. Results show that ZSSD significantly outperforms existing zero-shot baselines in 99th percentile errors and successfully reconstructs complex weather events, such as tropical cyclones, across heterogeneous GCMs.
CEMay 11
QuantWeather: Quantile-Aware Probabilistic Forecasting for Subseasonal PrecipitationLei Chen, Xinyu Su, Xiaohui Zhong et al.
Subseasonal precipitation forecasting is inherently uncertain due to chaotic atmospheric dynamics, making reliable uncertainty estimation essential for real-world applications. Existing approaches typically represent uncertainty through ensemble forecasts rather than directly modeling predictive distributions. However, due to systematic model biases, raw ensemble outputs are often not well calibrated and cannot be directly interpreted as reliable uncertainty estimates. As a result, operational systems rely on post-hoc calibration based on reforecast datasets, which are computationally expensive to generate and maintain. To address these limitations, we propose QuantWeather, an end-to-end probabilistic forecasting framework with a dual-head design. The probabilistic and deterministic heads are supervised with separate objectives and optimized jointly. The framework further supports stochastic sampling, enabling probabilistic outputs even with a single stochastic forward pass and allowing optional multi-sample aggregation. Extensive experiments show that QuantWeather demonstrates superior probabilistic forecasting skill while substantially reducing inference-time computational and storage costs.
LGJan 19, 2024Code
Spatial-temporal Forecasting for Regions without ObservationsXinyu Su, Jianzhong Qi, Egemen Tanin et al.
Spatial-temporal forecasting plays an important role in many real-world applications, such as traffic forecasting, air pollutant forecasting, crowd-flow forecasting, and so on. State-of-the-art spatial-temporal forecasting models take data-driven approaches and rely heavily on data availability. Such models suffer from accuracy issues when data is incomplete, which is common in reality due to the heavy costs of deploying and maintaining sensors for data collection. A few recent studies attempted to address the issue of incomplete data. They typically assume some data availability in a region of interest either for a short period or at a few locations. In this paper, we further study spatial-temporal forecasting for a region of interest without any historical observations, to address scenarios such as unbalanced region development, progressive deployment of sensors or lack of open data. We propose a model named STSM for the task. The model takes a contrastive learning-based approach to learn spatial-temporal patterns from adjacent regions that have recorded data. Our key insight is to learn from the locations that resemble those in the region of interest, and we propose a selective masking strategy to enable the learning. As a result, our model outperforms adapted state-of-the-art models, reducing errors consistently over both traffic and air pollutant forecasting tasks. The source code is available at https://github.com/suzy0223/STSM.
LGAug 12, 2025
Generalising Traffic Forecasting to Regions without Traffic ObservationsXinyu Su, Majid Sarvi, Feng Liu et al.
Traffic forecasting is essential for intelligent transportation systems. Accurate forecasting relies on continuous observations collected by traffic sensors. However, due to high deployment and maintenance costs, not all regions are equipped with such sensors. This paper aims to forecast for regions without traffic sensors, where the lack of historical traffic observations challenges the generalisability of existing models. We propose a model named GenCast, the core idea of which is to exploit external knowledge to compensate for the missing observations and to enhance generalisation. We integrate physics-informed neural networks into GenCast, enabling physical principles to regularise the learning process. We introduce an external signal learning module to explore correlations between traffic states and external signals such as weather conditions, further improving model generalisability. Additionally, we design a spatial grouping module to filter localised features that hinder model generalisability. Extensive experiments show that GenCast consistently reduces forecasting errors on multiple real-world datasets.