Two preprints on arXiv.

One benchmarks deep learning architectures across U.S. grids; one introduces tail-risk evaluation for safety-critical load forecasting.

What ships, how it’s measured.

The papers above explain the research path. The current /accuracy page now shows live production evidence first, then the historical archive. This prevents older validation results from being mistaken for current product quality.

Evidence split
Live scorecard first

Current production claims use the latest canonical h24 scorecard. Historical Aug 2025 to Apr 2026 results are kept as research archive.

Baselines
Published ISO day-ahead forecasts

ISO references are shown only when paired h24 rows are available. Pending references are marked pending rather than filled by proxy.

Metric
Standard Metric h24 MAPE

Mean of |forecast - actual| / actual over paired issue-time, target-time, grid, and horizon rows.

Weather inputs
Causal weather only

Research candidates must use forecast-vintage weather available at issue time. Oracle weather can be studied only as a diagnostic ceiling.

Archive status
Not production evidence

The older low-MAPE results explain the research direction but do not control public product claims until reproduced live.

Promotion cadence
Shadow before production

Architecture and training changes must pass leak-free validation and live shadow checks before promotion.

How to use this research

Read the papers for technical context. For a buying decision, use the live evidence page and request a market-specific backtest with your own evaluation window. We will deliver paired rows against your dates and horizon.