Docs/Data Reference/Known Gaps

Known Gaps

Known data quality gaps and limitations across the powerplantsinfo.org pipeline. We surface these transparently so users can account for them in analysis.

Summary

GapDimensionAffected rowsSeverity
SPV name maskingOwnership~2,320 plantsHigh
M&A name stalenessOwnershipSubset of aboveMedium
USPVDB coverageEngineering (solar)2,096 / 8,431 generatorsMedium
Offshore wind USWTDBEngineering (wind)8 generatorsLow count, high importance
Chain single-branch traversalOwnershipAll chain plantsLow
Financial coverage ceilingFinancial~21.6% of plantsExpected
News classificationNews~51% classifiedMedium
Queue ISO field coverageInterconnectionVaries by ISOMedium
County boundary matchingInterconnection93.2% match rateLow

SPV name masking

EIA Schedule 4 lists the legal Special Purpose Vehicle (SPV) name as the owner, while GEM identifies the real beneficial owners behind the SPV. For example, EIA lists "Danish Fields Solar, LLC" while GEM correctly attributes ownership to TotalEnergies SE and Apollo Global Management.

About 2,320 plants have owner_agreement == false in the ownership data. A subset of these (~3,100 generators) have an EIA owner name that is an SPV while GEM correctly resolves to the parent entity.

Anyone looking up plant ownership through EIA data alone sees the SPV shell company, not the actual investor. The plant detail page shows both names and flags the discrepancy.

Stale entity names after M&A

EIA reflects post-merger entity names while GEM retains pre-merger names. Both are correct at different points in time, but they fail string comparison. For example, Exelon spun off Constellation Energy in 2022 — EIA updated to "Constellation Nuclear" while GEM still records "Exelon Generation."

USPVDB coverage gaps (solar)

The US Large-Scale Solar Photovoltaic Database (USPVDB) does not cover every solar generator in the EIA inventory. Plants without a USPVDB match have null values for all panel specification columns (tracking type, panel area, DC/AC ratio, etc.).

2,096 of 8,431 solar generators (24.9%) have no USPVDB match. The UI clearly indicates when engineering specs are unavailable vs genuinely null.

Offshore wind not in USWTDB

The US Wind Turbine Database covers land-based turbines only. Offshore wind plants appear in the EIA data but have zero per-turbine rows in the USWTDB. Currently 8 offshore wind generators are affected. Per-turbine specs (rotor diameter, hub height, manufacturer) are unavailable for these plants.

Ownership chain single-branch traversal

When a GEM entity has multiple parent companies (joint ownership), the chain walker follows only one branch. The n_parents field records the total count of parents, but only one branch is fully traced. This affects the ownership graph visualization for jointly-owned plants.

Financial coverage ceiling

Financial data covers approximately 3,246 plants (21.6%) from three sources: FERC Form 1 (~1,400 regulated utilities), LBNL Solar (~1,569 projects), and FERC EQR (~594 plants matched by seller name). The remaining ~78% are merchant generators, IPPs, and smaller facilities that do not file public financial data. This is a structural limitation of public data, not a pipeline gap. The theoretical ceiling with all structured public sources is approximately 40–45%.

News classification accuracy

News classification is LLM-based and currently in Beta. Approximately 51% of articles have been classified into categories (deals, hazards, regulatory, grid, industry, development). The remaining articles are displayed without category badges or entity extraction. Classification accuracy improves with each build cycle as prompts and caching are refined.

Interconnection queue data variability

ISO-specific fields (study phase, service type, status detail) are available only for projects in the 7 major ISOs (CAISO, ERCOT, MISO, PJM, NYISO, ISO-NE, SPP). Projects from smaller entities (BPA, PacifiCorp, SOCO) have LBNL baseline data only. County-level geographic matching succeeds for 93.2% of projects — the remaining 6.8% have data quality issues in county names from queue filings.

All gaps are surfaced in the UI with clear indicators rather than silently showing null values. If you encounter data that looks incorrect, check this page first.