
A simple assumption underpins how most people interact with information: if data is lawfully available to everyone, its reuse should not be treated in the same way as the use of private or confidential information. When individuals deliberately share personal data, such as publishing blog posts, authoring research papers, or sharing content openly online, they do so with the knowledge and expectation that this information will be seen, referenced, and reused.
Under GDPR, that assumption is often wrong. Publicly available personal data is frequently subject to almost the same legal constraints as data that was never meant to be disclosed at all. This reality is poorly understood outside specialist circles, and increasingly difficult to reconcile with both social norms and Europe’s broader regulatory objectives.
The legal source of this tension lies primarily in the purpose limitation principle in Article 5(1)(b) GDPR, which requires that personal data be collected for “specified, explicit and legitimate purposes” and not further processed in an incompatible manner.
GDPR does not relax this obligation simply because information is publicly available. Nor does it treat intentional publication by the data subject as a decisive legal differentiator. As the European Data Protection Board (EDPB) has consistently emphasised, personal data remains personal data regardless of its accessibility.
In practice, this means that a new use of publicly available personal data may still require a separate lawful basis and a compatibility assessment under Article 6(4), even where the data subject deliberately placed the information in the public domain.
Despite the centrality of purpose limitation in the GDPR, and its grounding in Article 8 of the Charter of Fundamental Rights, the EDPB has still not issued a dedicated, principle-level set of Guidelines on purpose limitation since the pre-GDPR period, as far back as 2013. Instead, the principle continues to appear only in passing within thematic texts. For example, in Guidelines 2/2019 on Article 6(1)(b) GDPR, the EDPB simply restates that “Article 5(1)(b) … requires that personal data must be collected for specified, explicit, and legitimate purposes and not further processed in a manner that is incompatible with those purposes”. This is a faithful formulation of the rule — but it remains interpretively thin when compared with the much earlier WP29 Opinion 03/2013, which is still the most detailed treatment of the principle but which clearly needs to be updated to take into account significant technological developments in the intervening 12 years.
A similar pattern appears in later texts. The EDPB’s draft Guidelines 1/2024 on legitimate interests incorporate purpose limitation mainly as a constraint within the Article 6(1)(f) analysis rather than as an independent doctrinal framework. And in Opinion 28/2024 on data protection aspects of AI models, the Board recognises that training datasets may include personal data “collected via web scraping”, but it does not clearly articulate how purpose limitation and the Article 6(4) compatibility test should apply to such large-scale secondary uses.
This absence of modern, standalone guidance matters. Purpose limitation directly challenges the assumption that publicly available data may be reused freely. The principle requires that any processing, even of information obtained from public sources, must remain tied to a specified, explicit and legitimate purpose, consistent with the context of collection and transparently communicated to individuals. Against rapidly evolving data ecosystems, the continuing reliance on a 2013 opinion underscores the case for the EDPB to provide updated, context-aware guidance on this foundational rule and how it should be applied.
This approach to publicly available personal information may be doctrinally coherent, but it is increasingly misaligned with policy reality.
Firstly, it creates a significant gap between public understanding and legal effect. Many organisations are surprised to learn that using information they can freely access, sometimes even information published by the data subject themselves, may still entail legal risk. This undermines trust in data protection law and contributes to the perception that it is detached from common sense.
Secondly, it generates unnecessary compliance friction, particularly for small and medium-sized organisations, researchers, and civil society. Actors engaging in low-risk reuse of public information must navigate complex legal assessments that add cost and uncertainty without clearly enhancing privacy outcomes.
Thirdly, it raises questions of regulatory prioritisation. When supervisory authorities devote time and resources to enforcing rigid restrictions on the reuse of public information, less attention is available for genuinely harmful practices involving sensitive data, coercive collection, or opaque processing.
These issues are directly relevant to ongoing discussions around GDPR reform and simplification, including the European Commission’s Digital Omnibus initiative and broader efforts to reduce regulatory burden while preserving EU fundamental rights.
Notably, recent policy debates have acknowledged, implicitly at least, that GDPR’s current framework struggles to accommodate how personal data is shared and used in today’s hyperconnected world. This is most visible in proposals to introduce a new or clarified legal basis for the use of personal data in the training of AI models, particularly where that data is lawfully available to the public. Such proposals are intended, in part, to overcome the rigidity of the purpose limitation principle when applied to large-scale, general-purpose reuse of information.
This recognition is important. It reflects an emerging consensus that applying purpose limitation too strictly to publicly available information can produce outcomes that are disproportionate, impractical, and detached from risk.
However, focusing narrowly on AI model training misses the broader point. The tension between purpose limitation and public information is not confined to any single technology or sector. It affects research, market analysis, public-interest investigations, archival work, and a wide range of ordinary, low-risk further processing of information that individuals and organisations have intentionally made public.
In other words, the problem is not only technological, it is structural. Creating a bespoke legal basis for one category of processing may address a specific pressure point, but it leaves untouched the underlying issue: GDPR lacks a clear, context-sensitive approach to how personal data intentionally made public should be treated.
If reform efforts are limited to carve-outs for particular use cases, they risk adding complexity. A more durable response would align with the stated objectives of the reform agenda itself—legal certainty, proportionality, and effective enforcement—by clarifying how purpose limitation should operate where data subjects have deliberately placed information in the public domain.
GDPR already contains concepts that could support a more balanced approach. Articles 5(1)(a) and 6(4) refer to fairness, proportionality, and reasonable expectations. Recital 47 explicitly links legitimate interests to what data subjects can reasonably expect. In practice, however, these concepts rarely override purpose limitation when public data is involved. Intentional publication is treated as legally incidental rather than substantively relevant.
A failure to clarify the application of purpose limitation to public information is a missed opportunity. Information that is deliberately made public carries different expectations and typically presents lower risks than information disclosed under necessity or obligation. Failing to reflect this distinction does not strengthen protections—it blurs them.
None of this argues for exempting publicly available personal information from data protection law. Public data can still be misused, aggregated harmfully, or processed in ways that undermine dignity and autonomy.
But regulation should be risk-based and context-sensitive. A more proportionate approach would:
These changes could largely be achieved through interpretative guidance, without reopening the GDPR’s core architecture.
As the EU considers how to modernise and simplify its data protection framework, the treatment of publicly available personal data deserves more systematic attention. GDPR already contains the conceptual tools—fairness, proportionality, reasonable expectations—to support a more balanced approach. What it lacks is a willingness to let context play a decisive role outside sector-specific exceptions.
Addressing the reuse of public information solely through technology-specific legal bases risks treating symptoms rather than causes. The credibility of data protection law depends not only on safeguarding EU fundamental rights, but on doing so in ways that are intelligible, proportionate, and aligned with social reality. Recognising that public information is not the same as private information is not a concession to innovation pressures. It is a necessary step toward a data protection regime that is both effective and sustainable—regardless of the technology involved.
23 December, 2025