arXiv undergoes a major change: It will become independent from Cornell University, recruit a CEO. Netizens: Can we still use it for free in the future?
The logo of "Cornell University" that you can see every time you open arXiv is very likely to become history.
Recently, arXiv released an official statement saying that "after decades of productive cooperation with Cornell University and with the support of the Simons Foundation, arXiv is transforming into an independent non - profit organization. This also marks a new stage in the 35 - year development of this platform that pioneered open - access science."
Meanwhile, they also posted a recruitment notice for a CEO position and stated that the salary range for this position is expected to be around $300,000, and the actual salary offered may vary depending on work - related knowledge, skills, and experience.
What has arXiv gone through on its way to independence?
arXiv is now the most influential electronic pre - print system in the world. It was founded in August 1991 by Paul Ginsparg. Initially, it was just a set of shell scripts (distributed via email and FTP servers) written by Ginsparg on a NeXT computer at the Los Alamos National Laboratory to automatically distribute pre - prints.
Its birth stemmed from a simple pain point: at that time, physicists exchanged pre - prints via email lists, and their mailboxes quickly got overwhelmed. Ginsparg thought, why not let the server handle these requests automatically? Initially, it only served the high - energy physics field, expecting to receive about 100 papers per year. However, nearly a hundred submissions flooded in during its first month online. The early participation of string theory master Ed Witten quickly gave it academic legitimacy.
In 1993, with the release of the Mosaic browser, Ginsparg built a web interface for arXiv - this "native of the World Wide Web era" was thus born.
In 1994, it changed its domain from xxx.lanl.gov to arXiv.org, and its subject scope expanded from physics to fields such as mathematics and computer science.
However, the real turning point for arXiv occurred in 2001: due to internal turmoil at the Los Alamos National Laboratory, Ginsparg brought the project back to his alma mater, Cornell University, and arXiv was subsequently located in the university's library.
Since then, arXiv has experienced explosive growth: it had 500,000 papers in 2008, reached 1 million by the end of 2014, and exceeded 2 million by the end of 2021. As of now, the total number of papers accumulated on the platform has exceeded 2.7 million, covering more than 150 categories in eight subject areas, and it has provided 32 billion downloads to date.
During its more than twenty years at Cornell, arXiv has undergone a difficult transformation from a personal project to an institutional infrastructure. Ginsparg has tried to "step back" many times, but code maintenance, review disputes, and technical debt have always prevented him from truly letting go. It wasn't until 2022, with a $10 million grant from the Simons Foundation, that arXiv finally underwent a large - scale technical upgrade and team expansion, and its codebase was refactored from Perl to Python.
However, in recent years, arXiv has begun to face greater challenges. The most pressing crisis comes from the surge in AI submissions - since 2023, the number of computer science paper submissions has skyrocketed, and low - quality manuscripts have flooded in like a tide. The volunteer - based review system that arXiv relies on is overwhelmed.
Ginsparg himself is still struggling to debug the hard drive of the "holy grail crackpot filter", trying to use early language models to automatically screen out pseudo - science papers. But he also admits that without his personal supervision, quality cannot be guaranteed. This "founder dependence" exposes the structural vulnerability of arXiv - after more than thirty years, it is still struggling with the gravitational pull that the founder tried to escape from.
The deeper tension lies in the eternal game between openness and quality. On the one hand, linguist Emily Bender accuses arXiv of being a "cancer" that allows "junk science" to run rampant. On the other hand, physicist Jorge Hirsch accuses the platform of "censoring" qualified research and withdrawing his paper due to "incendiary content". This two - way squeeze puts arXiv's review policy in a dilemma: tightening the standards will be accused of being an academic gatekeeper, while relaxing the control will turn it into a hotbed for pseudo - science. The sensational "room - temperature superconductivity" paper in 2023 - which was eventually proven false - precisely demonstrates this dilemma: arXiv's rapid dissemination mechanism can both accelerate the spread of truth and magnify falsehoods.
Technical debt is another ticking time bomb. Ginsparg wrote code in the way of Fortran programmers in the 1960s - "real programmers don't write documentation" - which has led to a long - term lack of maintainability in the system. Although the Python refactoring started after 2022, the historical problems and the pressure of continuous growth are still pulling at the foundation of this old - fashioned platform.
The existence of these challenges forces arXiv to seek deeper - level changes.
Where will arXiv go after recruiting a new CEO?
arXiv currently has an annual budget of about $6 million and has about 27 employees (mostly working remotely in the United States). These employees, together with volunteers, serve more than 5 million monthly users.
The responsibilities of the new CEO cover strategic planning, financial management, technical infrastructure, and personnel supervision. In addition, the CEO also needs to work closely with the board representatives of Cornell University and the Simons Foundation to jointly establish the organization's independence.
arXiv said in the announcement that "independence enables arXiv to keep up with all aspects of its development: modernize its infrastructure, expand its subject coverage, and engage more deeply with international stakeholders."
In the face of this sudden change, the attitudes of the academic community and the developer community are surprisingly consistent - anxiety far outweighs expectation.
In the past, arXiv had the support of universities and foundations, but now it has to raise funds independently to survive. Many netizens have pointed out sharply that "this is usually the beginning of a change in nature. To survive, you will soon see premium features and sponsored content quietly appear on the page."
What researchers fear most is that the high operating costs after independence will eventually be passed on to the academic community in the form of advertising, paywalls, or institutional subscription fees. Some pessimistic netizens even joke that this is a classic "assembly line from non - profit to subscription - based charging."
Some people question whether it really costs $300,000 to hire an executive to manage a "website that just uploads and downloads PDF files." However, some industry insiders argue that considering the CEO needs to lead the financing, legal, and compliance affairs of a technology - based non - profit organization in a place like New York, this salary is actually not high locally.
As the "infrastructure" on which the entire hardcore technology circle of AI, physics, mathematics, etc. depends, arXiv's transformation will undoubtedly have a profound impact on the future direction of academic communication. For those of us who are used to getting papers for free, perhaps we can only wait for the new CEO to take office and see where this 35 - year - old academic giant will sail.
Reference links:
https://www.reddit.com/r/MachineLearning/
https://jobs.chronicle.com/job/37961678/chief-executive-officer
This article is from the WeChat official account "MachineHeart". Author: MachineHeart Editorial Department. Republished by 36Kr with permission.