Kevin Vinsen

2.1LGJul 9

DeepPySR -- A Symbolic Regression Framework with Dynamic Pruning, Pareto Selection, and Hierarchical Composition for Real-World Scientific Discovery

Fuling Chen, Kevin Vinsen, Phillip Melton et al.

Symbolic regression (SR) discovers analytical equations from data, yielding glass-box models with directly interpretable formulas, unlike black-box methods that rely on unstable post-hoc tools such as SHAP or LIME. This transparency is crucial in clinical medicine and social science, but SR faces three challenges: high-dimensional inputs, principled selection of Pareto-front formulae, and data irregularities such as multicollinearity and class imbalance. We introduce DeepPySR, which addresses these issues with a dynamic variable-pruning schedule to remove irrelevant features during search, an exponential Pareto selection criterion that eliminates trade-offs between accuracy and complexity, and a multi-layer architecture for hierarchical symbolic composition. On four Feynman physics benchmarks and seven biomedical and social-science datasets, DeepPySR outperforms PySR and baselines on body fat (R$^2$: 0.794 vs.\ 0.702), heart disease (F1: 0.898 vs.\ 0.787), student performance (R$^2$: 0.964 vs.\ 0.948), and Raine BMI (R$^2$: 0.525 vs.\ 0.370), producing interpretable formulas aligned with domain risk factors.

2.6LGJul 26, 2024

Spatial Temporal Approach for High-Resolution Gridded Wind Forecasting across Southwest Western Australia

Fuling Chen, Kevin Vinsen, Arthur Filoche

Accurate wind speed and direction forecasting is paramount across many sectors, spanning agriculture, renewable energy generation, and bushfire management. However, conventional forecasting models encounter significant challenges in precisely predicting wind conditions at high spatial resolutions for individual locations or small geographical areas (< 20 km2) and capturing medium to long-range temporal trends and comprehensive spatio-temporal patterns. This study focuses on a spatial temporal approach for high-resolution gridded wind forecasting at the height of 3 and 10 metres across large areas of the Southwest of Western Australia to overcome these challenges. The model utilises the data that covers a broad geographic area and harnesses a diverse array of meteorological factors, including terrain characteristics, air pressure, 10-metre wind forecasts from the European Centre for Medium-Range Weather Forecasts, and limited observation data from sparsely distributed weather stations (such as 3-metre wind profiles, humidity, and temperature), the model demonstrates promising advancements in wind forecasting accuracy and reliability across the entire region of interest. This paper shows the potential of our machine learning model for wind forecasts across various prediction horizons and spatial coverage. It can help facilitate more informed decision-making and enhance resilience across critical sectors.

Kevin Vinsen

2 Papers