Data Minimisation: A 20th-Century Rule in a 21st-Century Data Economy

Introduction

Europe’s ambition to lead in trustworthy AI sits uneasily beside one of its oldest privacy rules: data minimisation. Like its sibling principle, purpose limitation, it was drafted in an analogue world — when computer memory was expensive and data collection was clumsy.

In the era of machine learning, the rule that data must be “adequate, relevant, and limited to what is necessary” (GDPR Article 5(1)(c)) is colliding with the realities of modern innovation. In Opinion 28/2024, the European Data Protection Board (EDPB) highlighted the importance of the data minimisation principle, even in the context of AI development and deployment. Yet AI thrives on more, not less — more variety, more volume, more iteration. If Europe insists on training tomorrow’s AI with the data philosophy of yesterday, it risks building the world’s most ethical technology that no one actually uses.

The Origins of Data Minimisation

Data minimisation was born out of a perfectly rational fear: the rise of computerised government databases in the 1960s and 70s. Early data protection laws, from Sweden’s 1973 Data Act to the OECD Guidelines (1980) and the Council of Europe’s Convention 108 (1981), all shared the same goal — stop bureaucracies and corporations from gathering and storing excessive amounts of personal information. Back then, more data meant more danger. Each record carried risk; storage was costly; and state surveillance, not statistical insight, was the primary threat. The regulatory philosophy was simplistic and moral: collect only what you need, for as long as you need it.

By the time the GDPR took effect in 2018, the digital world had turned that logic upside down. Storage was cheap, computing was fast, and data had become the raw material of innovation. Yet Europe kept the same guiding principle, barely touched since 1980.

A Rule Designed for Scarcity in an Age of Abundance

The problem is not that minimisation is wrong, it is that it’s anachronistic. The data minimisation principle assumes that the value of data is known in advance and that collecting “too much” creates risk without benefit. But in AI, the relationship is reversed: we often don’t know which data will matter until after the model learns from it. AI models need data that are diverse. Limiting data collection too tightly risks producing biased, inaccurate, or fragile systems. Precisely the sort of AI development Europe says it wants to avoid, in order to extract the benefits of innovation.

The EDPB, however, maintains a strict line, the data minimisation principle still applies.

Why 20th Century Minimisation Doesn’t Fit the 21st Century Digital World

Three structural features of the digital era make the minimisation rule increasingly unworkable:

Exploratory Data Use: AI development is inherently experimental. Researchers don’t always know what data are “necessary” until they’ve tested hypotheses. Minimisation freezes inquiry before it starts.
Data Diversity = Fairness: Limiting data inputs can unintentionally create bias. Diverse datasets are essential to making models fairer and more accurate — but minimisation discourages their use.
Synthetic and Derived Data: Even when personal data are anonymised or generated synthetically, the principle can still bite if there’s a risk of re-identification. This discourages innovation in safe data engineering.

The irony is sharp: a rule designed to protect individuals from data misuse now risks depriving them of the benefits of responsible data use — from better healthcare diagnostics to climate modelling to AI technology.

In short, Europe’s devotion to small data may be producing small results.

The Global Comparison: Europe, the U.S., and China

In the global AI race, the consequences are visible.

The United States embraces data abundance. Its legal system allows broad data aggregation under sectoral rules. AI research invariably depends on massive datasets that would be difficult or impossible to process under GDPR’s minimisation standard.
China takes a state-centric approach, combining tight political oversight with wide data reuse for economic and industrial purposes. Its AI firms operate at extraordinary scale, with state-facilitated access to population-level datasets.
Europe limits its innovators to collecting “only what is necessary.” The result is predictable: Europe’s AI innovators face higher compliance costs, narrower datasets, and slower model improvement cycles.

Europe’s advantage in regulation has become its disadvantage in innovation.

5. Modernising Data Minimisation

Europe does not need to abandon privacy to stay competitive — it needs to modernise how it interprets its principles. Europe’s approach to AI development should not prioritise minimal data but must demand responsible data.

Pragmatic reforms could align data minimisation with digital reality. Such reforms might focus on increased risk-based interpretations that shift the focus from data quantity to risk mitigation. They might also include dynamic proportionality tests, allowing broader data use when clear societal benefits and public interests are at stake. And moving away from compliance checklists to outcome based accountability, giving AI developers flexibility whilst preserving public trust.

These changes would align the GDPR’s spirit — protecting individuals — with its new context: enabling responsible AI that benefits them.

Conclusion: The Paradox of European Restraint

The principle of data minimisation clearly made sense in the fledgling days of digital technology but the technologies we now seek to regulate have changed beyond recognition. Europe is trying to train modern AI under rules written for another era. Insistence on ever-smaller datasets in the name of privacy might deliver the cleanest compliance record, but at the cost of innovation.

Less data may mean more virtue — but it also means less progress.

17 December 2025

Resources