After the first paper we had five architectures benchmarked. None felt right. Transformers were wobbly at the edges. Recurrent networks were slow to adapt. State space models faded past day-ahead.

In manufacturing, when every tool has the same blind spot, you rethink the fixture.

Every credible architecture, both with and without weather

We ran a broader search. Every credible architecture in the literature, each tested with and without weather covariates. Our first paper had shown that adding weather flips the performance ranking entirely, so both configurations had to be in the search.

Seven U.S. grids. One architecture per region, trained from scratch. The grids are too different for transfer learning. What works in sun-drenched California fails in wind-battered Oklahoma.

The archive result looked strong: lower MAPE than ISO baselines across the seven-grid held-out run. Later live evaluation did not reproduce that advantage, so those numbers now live in the research archive rather than in production marketing claims.

What convinced us to start a company was the tails, not the average.

The architecture we converged on reduced the worst 5 percent of hourly errors by more than half on ERCOT. Those are the hours that trigger emergency dispatch. Those are the hours that cost real money.

By the time we submitted the paper, we had already found configurations that outperformed what we published. The paper describes the search. The product uses the result.

We are a new company. We are not pretending otherwise. Two papers. Seven grids. A team that learned, on manufacturing lines, that the tail is where it matters. The benchmarks are on the site.

The architecture search that found something better

Every credible architecture, both with and without weather