Can AI ‘launder’ an open source licence? The chardet case explained

By Lawgitech

19 mars 2026

Licence open source et intelligence artificielle - analyse juridique droit auteur logiciel

March 2026 — A widely-used Python library was relicensed in a matter of days using AI. Its original author, who had vanished from the internet in 2011, resurfaced to contest it. This case raises a fundamental question for the future of software copyright.

I. The facts: a few days, one LLM, one licence erased

On 4 March 2026, chardet 7.0.0 was published on PyPI as a ‘ground-up, MIT-licensed rewrite’ of the chardet library, previously distributed under the LGPL since 2006 (the project documentation also shows 2 March 2026 in the changelog; both dates appear in public sources). The project retained the same package name and public API. The library is widely distributed in the Python ecosystem with high visibility on PyPI.

The rewrite was carried out using Anthropic’s Claude Code. Maintainer Dan Blanchard publicly argued it constituted an independent work rather than a modification of the prior code. Published benchmarks report a performance gain of approximately 40 to 44 times over chardet 6.0.0. Structural similarity to prior versions, measured by JPlag, is — per Blanchard’s public statements as reported by Simon Willison — at most 1.29% against the preceding release.

Also on 4 March 2026, Mark Pilgrim — chardet’s original author, creator of ‘Dive Into Python’, who stepped away from public internet life in 2011 in what the community informally calls his ‘infosuicide’ — resurfaced after fifteen years of silence. His first public post: GitHub Issue #327, ‘No right to relicense this project’.

‘I respectfully insist that they revert the project to its original licence. Adding a fancy code generator into the mix does not somehow grant them any additional rights.’
Mark Pilgrim, Issue #327, 4 March 2026

II. The central legal question: clean room implementation in the age of AI

The concept of clean room implementation is well established in IP law: to legally rewrite protected software, one must organise a strict separation between the team analysing the original code and the development team, which has never had access to the source code. The canonical example is Compaq’s rewrite of the IBM BIOS in the early 1980s, enabling IBM-compatible PCs without accessing IBM’s proprietary code.

The Oracle v. Google ruling (US Supreme Court, 2021) confirmed that, in the specific circumstances of that case, Google’s reimplementation of the Java API could qualify as fair use — without establishing a general rule for all API reimplementations.

Blanchard’s defence:

Started in an empty repository, without copying any files;
Explicitly instructed Claude not to reference any LGPL or GPL code;
JPlag results show very low structural similarity to prior versions;
Only common Python patterns match — elements not individually protectable by copyright.

Pilgrim and his supporters counter:

Twelve years of immersion in the original code per se precludes any clean room claim;
Claude was very likely trained on public corpora that may have included chardet’s source code, though this cannot be established with certainty — making it considerably harder to demonstrate hermetic separation;
Simon Willison notes a public artefact in the repository shows Claude Code consulting metadata/charsets.py during the rewrite;
Same PyPI package name — reinforcing the derivative work argument.

Can a copyleft licence be laundered through an LLM? If so, the entire open source ecosystem protected by the GPL is potentially undermined.

III. Legal analysis

A. The LGPL regime and derivative works

The LGPL requires that any modification be redistributed under the same terms. If chardet 7.0.0 is characterised as a derivative work, relicensing to MIT without rights holders’ consent would be legally highly contestable — but the entire dispute hinges on that characterisation.

Case law established that ideas and algorithms per se are not protected — only their expressive form is. The abstraction-filtration-comparison test (Computer Associates v. Altai, 2d Cir. 1992) isolates protectable elements.

B. The role of AI in the clean room assessment

If Claude was trained on data including chardet’s code — plausible but not proven — there is no simple mechanism to attest that no relevant trace from that training influences the output;
The watertight barrier of the classical clean room is far more difficult to guarantee with an LLM trained on public corpora than with two siloed human teams;
Blanchard himself acknowledges his approach was not a ‘traditional’ clean room.

Simon Willison: ‘The arguments on both sides are entirely credible.’ Bruce Perens to The Register: ‘The entire economics of software development are dead, gone, over, kaput!’ Zoë Kooyman (FSF) to The Register: ‘Refusing to grant others the rights you yourself received as a user is highly antisocial, no matter what method you use.’ The FSF had not, as of writing, published a formal position specifically on this case.

IV. Five practical implications

1. Provenance contamination

The paradox is striking: MIT has become, in this specific context, less usable than LGPL. Provenance uncertainty is legally more problematic than the copyleft constraint itself.

2. The systemic threat to copyleft

If the technique were recognised as legally valid, any GPL project maintainer could theoretically commission a functionally equivalent rewrite from an LLM and relicense under MIT — deeply weakening copyleft as a protection for the digital commons.

3. The copyright status of AI-generated code

The US Supreme Court on 2 March 2026 declined to hear Thaler v. Perlmutter, leaving in place the human authorship requirement for copyright protection. If Claude’s output is not independently protectable, who owns it?

4. LLM provider liability

Using Claude Code as the primary rewrite tool raises the question of the liability chain between the model user and its designer. No answer has been established at this stage.

5. The EU AI Act as a future evidentiary tool

Article 53 of the EU AI Act (art. 53(1)(c) and (d)) requires GPAI providers to implement a copyright compliance policy and publish a sufficiently detailed summary of training content — obligations that could, in time, allow verification of whether an LLM incorporated a given work in its training data.

V. Conclusion

The chardet case reveals a structural gap in our legal frameworks: the clean room implementation, designed for siloed human developers, cannot be directly transposed to language models trained on the collective digital heritage.

Legal and legitimate are distinct things. Even if the law were to validate the rewrite, that would not mean the act was legitimate toward the contributors who participated in a project historically distributed under copyleft.

The real question: will LLMs, by drastically lowering the cost of software reimplementation, permanently alter the balance between copyleft and permissive licences — and is existing law equipped to respond?

Similaire

← Previous Post Next Post →

+32(0)475337590

info@lawgitech.eu

Can AI ‘launder’ an open source licence? The chardet case explained

By Lawgitech

I. The facts: a few days, one LLM, one licence erased

II. The central legal question: clean room implementation in the age of AI

III. Legal analysis

A. The LGPL regime and derivative works

B. The role of AI in the clean room assessment

IV. Five practical implications

1. Provenance contamination

2. The systemic threat to copyleft

3. The copyright status of AI-generated code

4. LLM provider liability

5. The EU AI Act as a future evidentiary tool

V. Conclusion

Similaire

+32 2 537 94 31

info@lawgitech.eu

Av. Michel-Ange 86, 1000 Bruxelles, Belgique

+32 2 537 94 31

info@lawgitech.eu

Av. Michel-Ange 86, 1000 Bruxelles, Belgique

Can AI ‘launder’ an open source licence? The chardet case explained

By Lawgitech

I. The facts: a few days, one LLM, one licence erased

II. The central legal question: clean room implementation in the age of AI

III. Legal analysis

A. The LGPL regime and derivative works

B. The role of AI in the clean room assessment

IV. Five practical implications

1. Provenance contamination

2. The systemic threat to copyleft

3. The copyright status of AI-generated code

4. LLM provider liability

5. The EU AI Act as a future evidentiary tool

V. Conclusion

Partager :

Similaire

En savoir plus sur Lawgitech