Known data quality gaps and limitations across the powerplantsinfo.org pipeline. We surface these transparently so users can account for them in analysis.
Summary
| Gap | Dimension | Affected rows | Severity |
|---|---|---|---|
| SPV name masking | Ownership | ~2,320 plants | High |
| M&A name staleness | Ownership | Subset of above | Medium |
| USPVDB coverage | Engineering (solar) | 2,096 / 8,431 generators | Medium |
| Offshore wind USWTDB | Engineering (wind) | 8 generators | Low count, high importance |
| Chain single-branch traversal | Ownership | All chain plants | Low |
| Financial coverage ceiling | Financial | ~21.6% of plants | Expected |
| News classification | News | ~51% classified | Medium |
| Queue ISO field coverage | Interconnection | Varies by ISO | Medium |
| County boundary matching | Interconnection | 93.2% match rate | Low |
SPV name masking
EIA Schedule 4 lists the legal Special Purpose Vehicle (SPV) name as the owner, while GEM identifies the real beneficial owners behind the SPV. For example, EIA lists "Danish Fields Solar, LLC" while GEM correctly attributes ownership to TotalEnergies SE and Apollo Global Management.
About 2,320 plants have owner_agreement == false in the ownership data. A subset of these (~3,100 generators) have an EIA owner name that is an SPV while GEM correctly resolves to the parent entity.
Stale entity names after M&A
EIA reflects post-merger entity names while GEM retains pre-merger names. Both are correct at different points in time, but they fail string comparison. For example, Exelon spun off Constellation Energy in 2022 — EIA updated to "Constellation Nuclear" while GEM still records "Exelon Generation."
USPVDB coverage gaps (solar)
The US Large-Scale Solar Photovoltaic Database (USPVDB) does not cover every solar generator in the EIA inventory. Plants without a USPVDB match have null values for all panel specification columns (tracking type, panel area, DC/AC ratio, etc.).
2,096 of 8,431 solar generators (24.9%) have no USPVDB match. The UI clearly indicates when engineering specs are unavailable vs genuinely null.
Offshore wind not in USWTDB
The US Wind Turbine Database covers land-based turbines only. Offshore wind plants appear in the EIA data but have zero per-turbine rows in the USWTDB. Currently 8 offshore wind generators are affected. Per-turbine specs (rotor diameter, hub height, manufacturer) are unavailable for these plants.
Ownership chain single-branch traversal
When a GEM entity has multiple parent companies (joint ownership), the chain walker follows only one branch. The n_parents field records the total count of parents, but only one branch is fully traced. This affects the ownership graph visualization for jointly-owned plants.
Financial coverage ceiling
Financial data covers approximately 3,246 plants (21.6%) from three sources: FERC Form 1 (~1,400 regulated utilities), LBNL Solar (~1,569 projects), and FERC EQR (~594 plants matched by seller name). The remaining ~78% are merchant generators, IPPs, and smaller facilities that do not file public financial data. This is a structural limitation of public data, not a pipeline gap. The theoretical ceiling with all structured public sources is approximately 40–45%.
News classification accuracy
News classification is LLM-based and currently in Beta. Approximately 51% of articles have been classified into categories (deals, hazards, regulatory, grid, industry, development). The remaining articles are displayed without category badges or entity extraction. Classification accuracy improves with each build cycle as prompts and caching are refined.
Interconnection queue data variability
ISO-specific fields (study phase, service type, status detail) are available only for projects in the 7 major ISOs (CAISO, ERCOT, MISO, PJM, NYISO, ISO-NE, SPP). Projects from smaller entities (BPA, PacifiCorp, SOCO) have LBNL baseline data only. County-level geographic matching succeeds for 93.2% of projects — the remaining 6.8% have data quality issues in county names from queue filings.