Two preprints on arXiv.

One benchmarks deep learning architectures across U.S. grids; one introduces tail-risk evaluation for safety-critical load forecasting.

arXiv:2601.01410· 2026

Benchmarking Deep Learning Architectures for US Grid Demand Forecasting

Hong, S. and Lee, R.

Benchmark study comparing modern deep learning architectures for electricity demand forecasting across major U.S. grids. The paper shows how weather covariates change performance rankings and provides the public research basis for Gramm's evaluation framework.

Deep LearningDemand ForecastingCAISOWeather Integration

arXiv:2602.21415· 2026

Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems

Hong, S. and Lee, R.

Introduces evaluation metrics focused on under-prediction risk for grid load forecasting. Benchmarks five neural architectures, including state space models and Transformers, on 24 months of California grid data. Shows that models with similar MAPE can have vastly different operational safety profiles, and proposes bias-controlled objectives to balance tail-risk minimization with preventing systematic over-forecasting.

State Space ModelsSafety-CriticalTail RiskCAISO

What ships, how it’s measured.

The papers above explain the research path. The current /accuracy page now shows live production evidence first, then the historical archive. This prevents older validation results from being mistaken for current product quality.

Evidence split

Live scorecard first

Current production claims use the latest canonical h24 scorecard. Historical Aug 2025 to Apr 2026 results are kept as research archive.

Baselines

Published ISO day-ahead forecasts

ISO references are shown only when paired h24 rows are available. Pending references are marked pending rather than filled by proxy.

Metric

Standard Metric h24 MAPE

Mean of |forecast - actual| / actual over paired issue-time, target-time, grid, and horizon rows.

Weather inputs

Causal weather only

Research candidates must use forecast-vintage weather available at issue time. Oracle weather can be studied only as a diagnostic ceiling.

Archive status

Not production evidence

The older low-MAPE results explain the research direction but do not control public product claims until reproduced live.

Promotion cadence

Shadow before production

Architecture and training changes must pass leak-free validation and live shadow checks before promotion.

How to use this research

Read the papers for technical context. For a buying decision, use the live evidence page and request a market-specific backtest with your own evaluation window. We will deliver paired rows against your dates and horizon.

View evidence Request a backtest