Anonymisation in 2026: when your data stops being personal

The EDPB finalized new anonymisation guidelines this April, the first major update since the 2014 Article 29 working party opinion. For SaaS that builds analytics, ML datasets or research outputs, this matters: data that is truly anonymous escapes the GDPR entirely. Data that is only pseudonymous does not. Most teams treat both the same and pay GDPR overhead on both.

The 2026 standard for anonymisation

The EDPB consolidated three tests. A dataset is anonymous only if all three fail:

Singling out: can you isolate a single individual in the data?
Linkability: can you join this dataset to another to identify someone?
Inference: can you deduce an attribute with high probability?

If any test passes, the data is at most pseudonymous and still within GDPR scope.

Common SaaS data that is NOT anonymous

Dataset	Why it fails
Hashed user IDs in event logs	Linkable back via session correlation
Aggregated metrics with small cohorts (n < 10)	Singling out is trivial
Behavioral fingerprints (timing, click patterns)	Linkability via the same fingerprint
K-anonymised data with k < 5 in real terms	Inference attacks work

What does work as 2026 anonymisation

Differential privacy with documented epsilon budget under 1.0
Synthetic data from a model trained on real data but with documented privacy proofs
Strong aggregation with cohort sizes > 100 and rounded to nearest 10
Random sampling without keys from a population large enough to break linkability

Anything weaker is pseudonymisation, which is a GDPR safeguard but does not exit GDPR scope.

The operational payoff

If your analytics pipeline produces truly anonymous output:

The output is not personal data; no consent or legal basis needed for further processing
You can share with sub-processors without DPA scope expansions
You can retain indefinitely

This is the carrot. The stick: regulators in 2026 are more sophisticated. Claiming anonymisation without the three-test discipline is a red flag in audits.

The most common mistake: treating data as anonymous because the table has no name column. Names are not the only identifiers. Email hashes, IPs, device IDs, cookie hashes, behavior patterns all count.

How to document anonymisation in your policy

Add a section that distinguishes:

Data we collect (personal)
Data we derive (pseudonymous, GDPR scope)
Data we anonymise for research/analytics (out of GDPR scope, method documented)

Document the method publicly. "We use k-anonymity with k=50 and bucket sizes ≥100" is more credible than "we anonymise data".

The EDPB framing in the new guidelines is sharp: anonymisation is a result, not a label. The dataset is anonymous or it is not, regardless of what the README says.

Conclusion

True anonymisation is a competitive advantage in 2026. It expands what your SaaS can do with data without expanding GDPR exposure. The three-test discipline is the entry ticket; without it, anonymisation claims are theater.

To document data flows by category (personal, pseudonymous, anonymous) in your privacy policy with the right legal language for each, try Termerly free.