Two preprints on arXiv.
One benchmarks deep learning architectures across U.S. grids; one introduces tail-risk evaluation for safety-critical load forecasting.
Benchmarking Deep Learning Architectures for US Grid Demand Forecasting
Hong, S. and Lee, R.
Benchmark study comparing modern deep learning architectures for electricity demand forecasting across major U.S. grids. The paper shows how weather covariates change performance rankings and provides the public research basis for Gramm's evaluation framework.
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Hong, S. and Lee, R.
Introduces evaluation metrics focused on under-prediction risk for grid load forecasting. Benchmarks five neural architectures, including state space models and Transformers, on 24 months of California grid data. Shows that models with similar MAPE can have vastly different operational safety profiles, and proposes bias-controlled objectives to balance tail-risk minimization with preventing systematic over-forecasting.
What ships, how it’s measured.
The papers above explain the research path. The current /accuracy page now shows live production evidence first, then the historical archive. This prevents older validation results from being mistaken for current product quality.
Current production claims use the latest canonical h24 scorecard. Historical Aug 2025 to Apr 2026 results are kept as research archive.
ISO references are shown only when paired h24 rows are available. Pending references are marked pending rather than filled by proxy.
Mean of |forecast - actual| / actual over paired issue-time, target-time, grid, and horizon rows.
Research candidates must use forecast-vintage weather available at issue time. Oracle weather can be studied only as a diagnostic ceiling.
The older low-MAPE results explain the research direction but do not control public product claims until reproduced live.
Architecture and training changes must pass leak-free validation and live shadow checks before promotion.
How to use this research
Read the papers for technical context. For a buying decision, use the live evidence page and request a market-specific backtest with your own evaluation window. We will deliver paired rows against your dates and horizon.
