A library with 130 million monthly downloads was rewritten by AI and its license was changed. After being "cyberbullied", the open-source maintainer spoke out: "I think people's reaction is a bit over the top. I couldn't contact the original author at that time, and the new version was rewritten."
Not long ago, we reported that chardet, a classic Python encoding detection tool with over 130 million monthly downloads and directly embedded in countless software programs, was in the midst of a public storm.
The cause was that the original author of the project, Mark Pilgrim, chose to leave the internet and live a “hermit life” after five years of development. Subsequently, the maintenance and development of the project almost entirely fell on the shoulders of the open - source maintainer Dan Blanchard, who has single - handedly supported the project for over a decade.
Recently, Dan Blanchard used Claude Code to completely rewrite the entire library from scratch in just five days and announced its release under a new open - source license.
Ironically, Mark Pilgrim, the original author who had been silent for 15 years, suddenly reappeared and left a message, questioning Dan's “license change” and demanding that he stop.
This move instantly ignited the community, and the controversy quickly spread.
Some people believe that “as an open - source maintainer, you can rewrite the code and adopt a new license agreement, but you can't directly override the original open - source license of the project. If you want to start anew, you should give it a new name and then release it.”
Others stand on the side of the maintainer, believing that “years of investment and the current rewrite are all aimed at making the project better.”
Meanwhile, deeper questions have also been raised: Is it reasonable to use AI for rewriting? Does the so - called “clean room” method hold? And where is the future of software licensing headed?
In response, Bruce Perens, a well - known American programmer who proposed the “open - source definition,” even compared this event to the invention of the printing press and the scientific method. In an interview with foreign media The Register, he bluntly stated that “the entire economic model of software development has completely collapsed.”
Amid the controversy, Dan Blanchard, the long - term maintainer, also came forward and had an in - depth interview with Sachin Kamdar, the CEO of the American enterprise - level proxy platform Elvex, and Doyle Irvin, the marketing director, to respond to various external doubts.
In his opinion, this rewrite is completely independent in structure (as evaluated by the third - party open - source software code analysis tool JPlag, the similarity is less than 1.3%, while the similarity of the previously manually maintained version was as high as 80%). At the same time, he also consulted legal counsel and kept the entire process open and transparent. More importantly, this rewrite has indeed brought a better version for users.
Dan admitted that he had never profited from maintaining chardet over the years and had no plans for commercialization in the future. The new version uses a more open and less restrictive license, and its performance has been improved by up to 48 times. Considering that the software runs on a large number of devices, some people even joked that this might be “the first AI application to achieve negative carbon emissions” because the overall CPU usage has actually decreased.
What exactly is the truth?
Here is the full content of Dan's conversation:
What exactly is chardet, which has 130 million monthly downloads and 2.5k GitHub Stars?
Sachin Kamdar: I'm really glad to have Dan Blanchard on the show today.
I've known Dan for a long time. About 10 years ago, we worked together at a company called Parsley Energy. Dan was one of the very talented engineers at that time, working on many different projects. Later, he moved on to several other companies and is now an engineer at the American fintech company Monarch Money.
But today, we invited him here because what he did in the past 10 days has gone viral on the Internet. Dan, could you briefly introduce yourself? Talk about your background, what you usually focus on, and also explain what chardet is?
Dan Blanchard: As you just mentioned, I used to work with you at Parsley for about six years, so we've collaborated for quite a while.
Actually, I was already maintaining the open - source Python character encoding detection library chardet when I was at Parsley.
I've maintained this library for about 12 years in total. I first got in touch with it around 2012 or 2013 and officially took over around 2014. I can't remember the exact time, but it's about 12 years.
Sachin Kamdar: So what exactly does chardet do? “Character encoding detection” sounds a bit abstract. First, explain this concept, and then tell me why you decided to take over its maintenance?
Dan Blanchard: This is actually quite interesting. When I first started this, I was working at ETS (Educational Testing Service), responsible for software for automatic essay scoring, which was an early AI application “before AI became popular.”
The so - called “character encoding detection” can be understood like this: All text files on your computer or the Internet are essentially a series of numbers at the bottom. The commonly mentioned 0s and 1s can also be understood as a series of numbers.
“Character encoding” is a set of mapping rules, a bit like the Caesar cipher: a certain number corresponds to a certain letter, another number may correspond to a symbol, or any character you can type on your keyboard.
The earliest and most classic character encoding is ASCII encoding, which only supports 127 characters. For a long time, especially in English - speaking environments like the United States, people thought that 127 characters were more than enough, as if they covered everything on the keyboard.
However, in this situation, other languages such as Chinese and Hebrew were “ignored.”
So in the 1970s and 1980s, various different character encodings gradually emerged. Later, there were also some encodings from the DOS era, which are not mainstream now but can still be seen occasionally. Even earlier, there were encoding systems from the mainframe era, but they are basically no longer in use.
The core problem of “character encoding detection” is: Given a file, you need to determine whether it is in English, Japanese, Chinese, etc. The system will use statistical methods and some rules to infer its encoding method and the language it uses. Once the language is determined, the range of possible encodings can be greatly narrowed, as many encodings are actually bound to specific languages.
For example, if you see a text where the number combinations corresponding to “T,” “H,” and “E” often appear, it is most likely in English. Such patterns can help determine the encoding.
So, in essence, chardet does exactly this - it helps you determine the encoding of text .
Its earliest origin can be traced back to a component in the Netscape browser in the late 1990s called the “Universal Character Encoding Detector.” Later, Netscape became Mozilla, and then Firefox. It was on this basis that the earliest Python version of chardet was developed.
This Python version was written by Mark Pilgrim in 2006. He directly translated the C - language implementation into Python.
Sachin Kamdar: To explain it in a more popular way, you can think of it as a “Lego block”: when you build software, you embed it as a basic module. Its function is to help the software determine some content, such as what language a document is written in and what character encoding it uses.
You mentioned earlier that it has 130 million monthly downloads. This basically means that it has been embedded in various software programs and is widely used everywhere, right?
Dan Blanchard: Yes, that's right. One of the main reasons for its wide use is a Python library called requests. Almost anyone who has written Python should have used it.
requests is used for network requests, such as downloading web pages and accessing APIs. It is one of the most commonly used network libraries in Python. At the underlying level, it relies on character encoding detection.
In the early versions, it only used chardet. Later, the new version also supports another library called CharSet Normalizer, and users can choose between the two.
Since requests is used by almost all Python programs involving networks, chardet is also widely used.
The original author almost disappeared from the “Internet,” and Dan stepped in as a core maintainer
Sachin Kamdar: So this library is extremely useful and widely covered. How did you become its maintainer?
Dan Blanchard: I was working at ETS at that time. We were one of the early teams to adopt Python 3.
About 15 to 20 years ago, there was a big controversy in the Python community - the language itself changed significantly, which meant that most of the code that could run in Python 2 had to be rewritten in Python 3.
The chardet project was written by Mark Pilgrim in 2006, and he stopped maintaining it around 2009 or 2010 (I can't remember the exact time). He almost “disappeared from the Internet” and deleted all his related code repositories, including chardet. Later, others retrieved these codes, and we can talk about this in more detail later.
At that time, there was a chardet based on Python 2, and there was also a Python 3 branch called charade, which was created by Ian Cordasco. He is also one of the maintainers of chardet and one of the maintainers of the Requests library. Requests was trying to migrate to Python 3 at that time, so they said, “We need a Python 3 version of this library,” and he forked a copy.
I joined at this stage. My goal at that time was very simple: to turn it back into a unified codebase. That is, instead of having one version for Python 2 and another for Python 3, I merged them into a single set of code that was compatible with both. At that time, it was actually quite technically challenging to do something “compatible with both 2 and 3,” but I happened to have a lot of experience in this area.
So I merged the two codebases together. The merged version was probably chardet 2.3. After that, we started releasing versions that were compatible with both Python 2 and Python 3, and it took a long time before we finally stopped supporting Python 2.
To be honest, this time point was actually much later than ideal, but that's how it was.
“I wanted to change the license as soon as I took over, but not to make money”
Sachin Kamdar: About 10 to 15 years have passed, and you've been maintaining it all this time. Mark Pilgrim hasn't been online for a long time and hasn't contributed code to the project. At this point, it's basically you and a few others maintaining this library.
So, when did you start considering changing to a new license? When did you start thinking about the things that ultimately caused the controversy?
Dan Blanchard: I've been considering changing the license for a long time . In fact, from the day I started maintaining chardet, I didn't want it to continue using the LGPL license.
Let me briefly explain:
LGPL stands for “Lesser GNU Public License.” GPL stands for “Greater GNU Public License” (without the “L” in front). LGPL is generally used for libraries because it is more lenient than GPL. GPL has stricter restrictions, and some of its regulations are not very friendly to commercial use. If you want to use it for a commercial project, you must open - source your code.
The MIT license is even more open - it means you can use this code however you want, and I'm not responsible for what you do with it. If you modify the code, it's best to contribute it back, but there is no mandatory requirement. Overall, it's a very flexible license.
Sachin Kamdar: So these three are all licenses for open - source projects, and they have different levels of usage and restrictions. If a developer is the creator of a project, they can choose one of them to apply to their project. Are you dissatisfied with LGPL because of some of its restrictions?
Dan Blanchard: Yes. I'm dissatisfied with some of the restrictions of LGPL, not because I want to make money from it. In fact, one of the most frustrating things about this controversy is that some people say, “He just wants to make a fortune from this.” I have absolutely no intention of making money. It would be great if someone was willing to pay me to support it, but I can't make millions of dollars from just one role.
This is a character encoding detection library that has been around for 30 years. The new version is completely rewritten, so technically, it's a different thing. This is also why some people are accusing me of doing something to it.
Doyle Irvin: If others want to use it in commercial software, this new license will make things much easier. In other words, even though the library has already been widely adopted, the change in the license may make it even more widely used. More attention to an open - source project also means more support from people - especially for someone like you who has been maintaining it for free for a long time, a little more help is always good, right?
Dan Blanchard: Yes, I completely agree. For example, look at issue #36 in the chardet repository from October 2014. I said something in it that seems quite interesting now, something like, “This is a bit strange for me because I'm one of the maintainers, but I still want to ask, can we change the license?”
Then I had a discussion with Ian and Eric Rose, who was also quite involved at that time. Finally, we came to the conclusion that we needed to contact Mark. But no one knew how to contact Mark because he had disappeared, so we had absolutely no idea what to do.
About six months later, I actually found some clues. Unfortunately, I deleted my Twitter account later because I was talking to Guido van Rossum, the father of Python, on Twitter in 2015 about this.
The conversation at that time was roughly like this: Guido asked me if I supported adding chardet to the Python standard library. I replied, “Of course, that would be great.”
The reason for this discussion was that someone proposed adding requests to the standard library at that time. It didn't happen in the end, but requests has two dependencies. One of them is chardet, and I can't remember the other one. I remember they said, “chardet is under the LGPL license,” so this was a problem because the standard library can't have LGPL - licensed dependencies.
Sachin Kamdar: This is what Guido meant - he thought that if the license didn't allow it, it couldn't be added to the standard library.
Dan Blanchard: Yes, he said, “But the license...”
So we thought maybe we could change the license. His attitude at that time was, “If you can find a way to change it, that would be great.”
I remember in the discussion, someone contacted Mark after seeing this, but I don't know this person. They asked Mark, and Mark replied, “No, the license is LGPL because the basis of my code is LGPL.”
This is one of the characteristics of the GPL - series licenses. It has a so - called “copyleft” clause - any derivative work based on the original work must use the same license. So translating code from one language to another is essentially a derivative work and obviously requires the same license.
Interestingly, when Mark copied these codes, it was indeed under the LGPL, but later the license of these codes was changed to the Mozilla Public License (MPL), which is more lenient than LGPL. But at that time, it was impossible for us to change our own license to MPL.
The reason is that to change the license, we need Mark's signature authorization and the signatures of all the contributors at that time - because usually, changing the license requires the consent of each contributor, which is almost impossible to achieve. It can be done, but it's very rare and not practical from an operational perspective.
I remember I was also involved in an open - source project before, but I can't remember which one. That project was run by a company, and they wanted to change the license, so they sent an email to everyone who had ever contributed code - even if they only changed one line of code, they had to sign an agreement to agree to the license change. If you didn't agree, they would delete your contribution. Their attitude was basically, “It doesn't matter, we can do without that part of the code.”
So what I'm trying to say is that changing the license has always been something I really wanted to do. The new version I rewrote uses a different license. Strictly speaking, this is a kind of re - licensing because I kept the original name. I