Out-of-Sample Hydrocarbon Production Forecasting: Time Series Machine Learning using Productivity Index-Driven Features and Inductive Conformal Prediction
This work addresses the problem of unreliable out-of-sample forecasts for hydrocarbon production, which is critical for reservoir engineers and the oil industry, though it is incremental as it combines existing methods with domain-specific features.
This research tackled hydrocarbon production forecasting by developing a machine learning framework that integrates Productivity Index-driven feature selection and Inductive Conformal Prediction for uncertainty quantification, resulting in the LSTM model achieving the lowest MAE of 19.468 on test data and 29.638 on out-of-sample forecasts for well PF14.
This research introduces a new ML framework designed to enhance the robustness of out-of-sample hydrocarbon production forecasting, specifically addressing multivariate time series analysis. The proposed methodology integrates Productivity Index (PI)-driven feature selection, a concept derived from reservoir engineering, with Inductive Conformal Prediction (ICP) for rigorous uncertainty quantification. Utilizing historical data from the Volve (wells PF14, PF12) and Norne (well E1H) oil fields, this study investigates the efficacy of various predictive algorithms-namely Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and eXtreme Gradient Boosting (XGBoost) - in forecasting historical oil production rates (OPR_H). All the models achieved "out-of-sample" production forecasts for an upcoming future timeframe. Model performance was comprehensively evaluated using traditional error metrics (e.g., MAE) supplemented by Forecast Bias and Prediction Direction Accuracy (PDA) to assess bias and trend-capturing capabilities. The PI-based feature selection effectively reduced input dimensionality compared to conventional numerical simulation workflows. The uncertainty quantification was addressed using the ICP framework, a distribution-free approach that guarantees valid prediction intervals (e.g., 95% coverage) without reliance on distributional assumptions, offering a distinct advantage over traditional confidence intervals, particularly for complex, non-normal data. Results demonstrated the superior performance of the LSTM model, achieving the lowest MAE on test (19.468) and genuine out-of-sample forecast data (29.638) for well PF14, with subsequent validation on Norne well E1H. These findings highlight the significant potential of combining domain-specific knowledge with advanced ML techniques to improve the reliability of hydrocarbon production forecasts.