We started with CAISO. California was the natural first target because the duck curve makes forecasting interesting — solar generation creates a midday trough and a steep evening ramp that trips up simple models. Once CAISO was working, we expanded to ERCOT, then PJM, and eventually all seven major U.S. ISOs. Here is what we learned running the same architecture across very different grids.
Each grid has a distinct personality. CAISO is solar-heavy, with net load patterns that look nothing like gross load. The duck curve means you are really forecasting two things: total demand and the solar generation that offsets it. ERCOT is electrically isolated from the rest of the continent, which makes it volatile — there is no interconnection cushion when things go wrong. PJM is the largest grid by load, spanning 13 states with a mix of industrial, commercial, and residential demand. MISO stretches across climate zones from the Gulf Coast to the Canadian border. NYISO has the densest urban load center in the country (New York City), where a hot day means millions of window AC units turning on within the same hour. ISO-NE is heating-driven in winter, with natural gas constraints that create price spikes when everyone needs both gas for heating and gas for power generation. SPP is wind-heavy, and wind variability makes the net load signal noisy.
We use one model architecture across all seven regions, but train per-region. The architecture is the same — same layer structure, same attention mechanisms, same weather covariate inputs. What changes is the training data, the hyperparameters selected during tuning, and the regional weather station mappings. This is a deliberate choice. A single architecture means one codebase to maintain, one inference pipeline, one monitoring framework. Region-specific training means the model learns that CAISO cares about solar irradiance while ISO-NE cares about heating degree hours.
The key finding: weather integration matters more in some regions than others. In MISO, the model achieves 1.48% MAPE — the lowest of any region. MISO has large, stable industrial load that creates a predictable base, and weather effects are moderated by the geographic spread. At the other end, SPP comes in at 2.82% MAPE. SPP is the hardest grid to forecast because wind generation variability directly impacts the net load signal, and wind is inherently harder to predict than temperature. The remaining regions fall in between: CAISO at 1.88%, ERCOT at 1.62%, PJM at 1.73%, NYISO at 2.14%, and ISO-NE at 2.31%.
A few patterns emerged that I did not expect. First, grid size does not predict forecastability. PJM is by far the largest grid, but it is not the easiest to forecast — MISO is, despite being smaller. The stability of the underlying load matters more than aggregation effects. Second, regions with high renewable penetration are harder across the board, not because renewables make load harder to predict, but because the net load signal (what the grid actually needs to serve from dispatchable generation) has more variance. Third, the gap between Gramm and the ISO baseline is largest in regions where the ISO is still using older methods — ERCOT showed a 68.2% MAPE reduction, while regions with more modern ISO forecasts showed 25-40% reductions.
Expanding from one grid to seven forced us to build infrastructure that is region-aware at every layer: weather data ingestion routes to the right stations, model versioning is per-region, accuracy monitoring breaks out by territory, and the API lets you query by ISO code. It would have been easier to just do CAISO. But the benchmarking paper covered all seven, so the product had to as well. You cannot publish results you are not willing to ship.