When a regulator asks "how do you know," the answer cannot be a colour ramp

Two farmers. Different countries, different regulations, different enforcement bodies. The same problem.

One grows coffee in Oaxaca, Mexico. His family has farmed the same plots for decades. In 2025, researchers including Melvin Lippe of the Thuenen Institute of Forestry in Hamburg tested satellite-derived forest maps against coffee plots in Oaxaca, Mexico. Reporting by Mongabay said the team examined around 600 plots and found roughly three-quarters were flagged as non-compliant. The satellite said forest. The ground said agroforestry. The canopy had been there since long before the regulation’s December 2020 cutoff. By any reasonable reading, compliant. By the map, flagged.

The other farms livestock in the Netherlands. For years, nitrogen impacts on Natura 2000 areas were calculated through AERIUS, the calculation tool used within the Dutch permitting framework. In 2019, the Council of State ruled that the PAS could no longer be used as a basis for permitting, because the programme relied on expected future mitigation and restoration benefits that were not sufficiently certain in advance. The permitting framework could not provide sufficient guarantees. The question “how do you know” - asked in the formal register of administrative law - received an insufficient answer. A EUR 24 billion transition fund followed. The Dutch rural economy is still rearranging itself around the consequences.

Both cases turned complex environmental evidence into an administrative result, and then met a question they could not answer.

The map is not the measurement

A satellite-derived compliance map is the end of a long inference chain. A satellite records radiance at the top of the atmosphere. An atmospheric correction converts that to surface reflectance, depositing uncertainty from the correction model and its inputs. A radiometric calibration maintains the sensor’s response against a reference standard, depositing instrument uncertainty. A classification algorithm converts reflectance values into land-cover categories, depositing model uncertainty. Each step adds to the chain. None of those contributions travel forward in a form that survives to the map the operator downloads.

The map arrives as a polygon: forest or not-forest, compliant or non-compliant. The colour carries no confidence interval. It cannot tell you which sensor captured which observation, what the calibration state of that sensor was on that date, what atmospheric correction was applied, or how the classification would shift if any of those inputs moved by one standard deviation. The map is an assertion.

When a farmer’s right to export, or a developer’s building permit, depends on that assertion, the question “how do you know” is not obstructive. It is the question the compliance framework is implicitly asking. And it is the question the current generation of EO data products cannot answer.

The deforestation problem: maps that fail in opposite directions

The EUDR situation is not one map being wrong. It is multiple maps being wrong in different ways, with no shared calibration to arbitrate between them.

EO firm Space Intelligence published a comparison in February 2026 of three forest maps against verified ground truth on Brazilian coffee farms. The results illustrate the arbitration problem directly. MapBiomas Collection 10 - one of the most respected national land-cover datasets in the world, the basis of thousands of peer-reviewed papers, produced by Brazilian scientists at INPE and other research institutes - missed 65% of genuinely non-compliant farms, passing them as compliant when they should have been flagged. JRC GFC v2, the dataset the EU itself funds to support EUDR compliance, had the opposite failure mode: it incorrectly failed 14% of genuinely compliant farms. Space Intelligence’s own product performed better on both metrics - which is the obvious caveat, since Space Intelligence sells that product.

The maps are not disagreeing about the magnitude of the same error. They are failing in opposite directions. A farm that JRC classifies as non-compliant might sail through MapBiomas. A farm that MapBiomas clears might fail with JRC. For a downstream operator making an import decision, there is no way to know which map’s failure mode applies to their specific plot without independent ground truth - which is the one thing the compliance framework was supposed to replace.

The Mexico result gives the human scale. Researchers found three-quarters of the coffee plots they studied were flagged non-compliant by satellite maps, despite having been worked for decades. The problem is spectral: a mature agroforestry canopy and primary forest return similar reflectance signatures to many sensors. A system that monitors tree cover rather than EUDR-defined forest - which explicitly excludes land in agricultural use at the cutoff date - will misclassify these plots regardless of classifier sophistication. The classifier is working on inputs that do not contain the information the question requires.

A 2025 paper by Van Noordwijk et al. on this problem cites a legal statistics standard the EO community rarely encounters: statements with a 15% chance of error are generally not seen as convincing evidence in courts. Classification error rates in satellite-derived land-cover products for complex canopy types in smallholder landscapes routinely appear above this threshold in published accuracy assessments. The numbers are in the literature. They are simply not attached to the compliance verdict.

The nitrogen problem: what happened when the model was asked to account for itself

The Dutch nitrogen crisis is a different regulatory context and a structurally identical failure.

AERIUS uses the OPS atmospheric transport model to calculate nitrogen deposition across Natura 2000 areas from emission inventories and meteorological inputs. It is technically serious work. In 2019, the Council of State ruled that the PAS could no longer be used as a basis for permitting, because the anticipated future benefits of nitrogen-reduction and restoration measures were not sufficiently certain in advance. AERIUS was the calculation infrastructure within that framework, but the legal failure was the programme’s evidentiary basis - its reliance on projected future mitigation that had not yet materialised. The permits froze. Farmers who had done nothing wrong lost their right to operate while the government tried to construct a replacement framework that would survive the same question.

The question is about to follow the technology.

TROPOMI, the atmospheric composition sensor aboard ESA’s Sentinel-5P, is one of Europe’s principal satellite instruments for monitoring nitrogen dioxide at continental scale. Its operational pixel size is 3.5 by 5.5 kilometres. The average Dutch farm is 32 hectares. One TROPOMI pixel covers approximately 1,925 hectares - roughly sixty Dutch farms. Attributing a specific farm’s NO2 contribution from a satellite column measurement is not possible at that resolution.

The bias characteristics compound this. Independent validation from the Sentinel-5P Mission Performance Centre (Report #30, covering April 2018 to February 2026) shows a median bias of -28% against ground-based reference stations for the offline processed product. The bias is strongly pollution-dependent: +12% in clean-air conditions and -42% in highly polluted environments. The broader Dutch nitrogen debate includes both ammonia from agriculture and nitrogen oxides from combustion. TROPOMI’s NO2 product is therefore not a farm-level livestock enforcement tool; it is a useful illustration of the spatial and uncertainty gap that appears when satellite atmospheric products are pulled toward regulatory attribution.

None of this makes TROPOMI unsuitable for atmospheric science, for which its mission requirements are met. It describes the gap between what a satellite column measurement can tell you and what a regulatory enforcement decision needs to assert. If a deposition surface derived from TROPOMI columns, dispersion modelling, and emission inventories were used to classify parcels as above or below a regulatory threshold, it would produce a colour ramp. A farmer who watched a predecessor framework collapse in the Council of State precisely because the underlying programme could not account for its own evidentiary basis would ask how you know. The ramp would not answer.

What traceability actually requires

The instinct in the EO industry is to treat failures like these as model problems: better classifiers, higher resolution, radar-optical fusion, more training data. Some of those improvements matter. None of them address the layer they sit on.

Before a pixel can be reliably classified as forest or not-forest, before a NO2 column can be attributed to a source area with a quantified confidence interval, the spectral or radiometric measurement feeding the model has to mean something consistent across the sensors, dates, and atmospheric conditions contributing to the pipeline. That requires per-observation calibration uncertainty that is tracked rather than discarded at the processing stage, and inter-sensor normalisation whose residuals are documented rather than absorbed into model noise.

This layer is currently treated as preprocessing: it happens before the data product, and its uncertainty disappears at the product boundary. The compliance frameworks now consuming EO data were not designed with this in mind. They were designed assuming “satellite-derived” implied something operationally consistent. It does not yet.

Making a satellite-derived compliance claim legally defensible requires three things absent as standard practice. Per-observation calibration uncertainty: a value attached to every input pixel, stating the radiometric uncertainty and the reference chain it traces back to. Propagated uncertainty through processing: not a global accuracy assessment from a validation study, but an uncertainty budget that follows each pixel through atmospheric correction and classification, so the final verdict arrives with a confidence interval specific to that location, that date, and those sensors. Versioned, reproducible provenance: the ability to reproduce any compliance finding against the exact inputs and processing versions that produced it, so a changed verdict can be attributed to a data update, a methodology revision, or a bug fix, with the magnitude of that change quantified at the relevant spatial scale.

Metrology-grade satellite programmes produce traceable uncertainty chains. What is missing is the infrastructure that makes those chains accessible for commercial satellite inputs and distributable in a form that compliance workflows can use.

The window that is open now

The EUDR deadline for large and medium operators is 30 December 2026. CSRD limited assurance requirements are rolling out now. The Dutch nitrogen framework has been through two governments, a EUR 24 billion transition fund, and a district court order since 2019, and is still being litigated. EU Taxonomy assessments that use land-use or vegetation metrics are already in audit scope.

Regulators are not approaching this naively. EFI-facilitated dry runs in 2025 found that competent authorities consistently required multiple datasets rather than any single source. A systematic review found only two of twenty-one global forest maps met all EUDR criteria - eight cleared a minimum shortlisting bar. The Council of State found a well-regarded dispersion model could not sustain a permitting programme. In each case, the same question was asked. In each case, the colour ramp was not the answer.

The Mexican coffee farmer and the Dutch livestock farmer are not anomalies. They are early cases in a long docket of regulatory proceedings that will eventually require EO data to account for itself. The infrastructure that lets it do so - per-observation, traceable, versioned - is what this field has to build.

Space Intelligence, “Comparing Forest Maps for EUDR Compliance in Brazil,” 17 February 2026 (Space Intelligence sells competing maps; figures are self-reported): https://www.space-intelligence.com/comparing-forest-maps-for-eur-compliance/

Freitas Beyer et al., “Assessing the Suitability of Available Global Forest Maps as Reference Tools for EUDR-Compliant Deforestation Monitoring,” Remote Sensing 17 (2025) 3012: https://doi.org/10.3390/rs17173012

Van Noordwijk et al., “Beyond imperfect maps: Evidence for EUDR-compliant agroforestry,” People and Nature 7 (2025) 1713-1723: https://doi.org/10.1002/pan3.70088

Mexico agroforestry results: Melvin Lippe (Thuenen Institute of Forestry, Hamburg), preliminary results reported in Mongabay, December 2025: https://news.mongabay.com/2025/12/researchers-find-concerning-gaps-in-global-maps-used-for-eudr-compliance/

TROPOMI NO2 validation: Sentinel-5P MPC VDAF, Quarterly Validation Report #30 (April 2018-February 2026): https://mpc-vdaf.tropomi.eu/index.php/nitrogen-dioxide

Dutch farm size: European Commission CAP Strategic Plan for the Netherlands: https://agriculture.ec.europa.eu/cap-my-country/cap-strategic-plans/netherlands_en

Filed underComplianceUncertaintyCross-sensor calibration