6.4k Stars! Someone has packaged and open-sourced a complete pipeline for writing papers using Claude Code.
Tingyu, from Aofeisi, QbitAI | WeChat official account QbitAI
A complete pipeline for writing papers with Claude Code has been packaged and open-sourced by someone.
It completely hits the pain points of students, and the number of stars on GitHub has reached 6,400.
academic-research-skills
The project is called academic-research-skills (hereinafter referred to as ARS), which is a set of Claude Code skill packages.
It covers 4 skills, corresponding to research, writing, review, and finalization of papers respectively.
You only need two lines of commands to install it, and it directly connects the entire academic research pipeline.
academic-research-skills
I can only say, why didn't I come across such a good thing when I was in graduate school...
Schematic diagram
4 skills to run through the entire scientific research process
The core architecture of ARS consists of 4 skills, each with its own responsibilities. Together, they form a complete link from topic selection to manuscript submission.
I've also made a diagram here for you to have a more intuitive understanding:
△
Deep Research is a research team of 13 agents.
It is responsible for literature review, research question construction, methodology design, and can also write systematic PRISMA reviews.
There is an agent in the team dedicated to literature tracing, which will call the Semantic Scholar API to verify the authenticity of each citation.
There is a Socratic tutor agent that guides researchers to clarify their thinking through dialogue.
There is also a devil's advocate agent that specializes in nitpicking to prevent researchers from falling into a fixed mindset at an early stage.
△
Academic Paper is a writing team of 12 agents.
It covers the entire process from outline design, argument construction, draft writing, to bilingual abstract generation, chart visualization, and citation format conversion.
Notably, there is a style calibration function. The AI will learn your writing style from your previous works to make the output more like your own writing, rather than having a generic AI flavor.
The output formats support Markdown, DOCX, and LaTeX, and can ultimately be compiled into PDF in APA 7.0 or IEEE format.
△
Academic Paper Reviewer is a review team of 7 agents.
It simulates the review process of a real academic journal. Led by the editor-in-chief (EIC), three domain reviewers, and a devil's advocate, they score from multiple dimensions such as methodology, disciplinary perspective, and interdisciplinary value.
The scoring uses a quantitative standard from 0 to 100. Papers with a score above 80 are accepted, those between 65 and 79 require minor revisions, those between 50 and 64 require major revisions, and those below 50 are rejected.
The review team will also output a detailed revision roadmap to tell the author what to do next.
△
Academic Pipeline is a process orchestrator that connects the previous three teams into a 10-stage pipeline.
From research, writing, integrity check, peer review, revision, final check, to publication preparation and process summary, each stage has clear deliverables and checkpoints.
You can insert at any stage. For example, if you already have a first draft, start from the integrity check at Stage 2.5; if you receive review comments, directly start from the revision at Stage 4.
The cost reference is also very transparent. For a 15,000-word paper, the total cost for the whole process is about $4 to $6.
△
Some interesting designs
There are already many open-source projects using Claude Code for academic research. However, after in-depth exploration, I found that ARS still has some outstanding features in its underlying design.
It can be simply summarized in one sentence: Systematically prevent AI from ruining academic research.
First, Citation verification.
The most taboo thing when an AI writes a paper is hallucinated citations.
This includes not only fabricating non-existent articles but also more subtle situations such as similar titles but wrong authors and years, or a real DOI but inconsistent content.
ARS has a citation verification mechanism embedded in the Deep Research stage. Each piece of literature must pass the existence confirmation by the Semantic Scholar API.
It's not just a simple check of the title but uses the Levenshtein similarity algorithm for fuzzy matching. Only when the threshold is above 0.70 can it pass.
△
Second, Integrity gate.
At Stage 2.5 and Stage 4.5 of the pipeline, there are two non-skippable integrity gates that will run a 7-item AI failure mode checklist.
This checklist is directly from a fully autonomous AI scientific research study published in Nature in 2026, which summarizes 7 failure modes, covering situations such as citation hallucination, data fabrication, and methodology fraud.
7 failure modes
Any problem marked as SUSPECTED at 2.5 must become CLEAR at 4.5, or be manually overridden by humans with a record.
The design logic is to change from "I believe the AI won't make mistakes" to "I require the AI to prove it hasn't made mistakes".
In actual tests, this mechanism caught 15 forged citations and 3 statistical errors in a real paper.
Third, Anti-sycophancy protocol, enabling AI to say no.
Most AI tools have an invisible problem of currying favor with users. If you ask it to make changes, it will do so, even if the changes are worse.
So ARS has a special anti-sycophancy mechanism designed in the review stage.
There is a Devil’s Advocate in the review team, whose duty is to nitpick.
But after nitpicking, there is a concession threshold protocol.
The DA's rebuttals will be scored from 1 to 5. If the score is below 4, the writing team is not allowed to admit it.
△
In other words, the AI cannot easily make concessions just to appear cooperative.
At the same time, the intensity of the attack must be maintained during the revision process. If the methodology is severely criticized in the first round of review, the reviewers cannot suddenly become lenient after the author's revision.
The scoring trajectory will also be tracked, and any decline in scores in any dimension will be marked as regression.
This is the same as the principle of not introducing new bugs in software engineering. Changing one part should not mess up another part.
Fourth, Three-layer data isolation, preventing AI from peeping at answers.
ARS strictly divides the data flow into three layers:
Layer 1 is the original input, which is considered untrustworthy by default and may be hallucinated, outdated, or biased.
Layer 2 is the product after passing the integrity verification.
Layer 3 is the scoring criteria, reference answers, and gold standard data. This layer of materials should never appear in the context of the writing AI.
In terms of specific implementation, the writing team and the review team are called independently twice, with a stage boundary isolation in between.
The writing AI can only receive natural language feedback from the review AI, such as "The argument in Chapter 2 is jumping. It is recommended to add a comparative experiment."
But it cannot see the original scoring criteria or know how many points each dimension accounts for.
The inspiration for this design comes from the w2s - researcher study by Anthropic this year, which also uses the same three-layer isolation model.
The conclusion is that when the AI can read the label data, the result may not be truly generalized but rather optimized for surface features.
The solution is not better prompts but structural isolation.
△
Finally, Honest documentation, "I can't guarantee reproducibility".
The academic community often encounters the problem of "I can't reproduce this result". ARS generates a repro_lock file for each product, recording the complete runtime configuration.
But there is a mandatory statement in the file that the LLM output is not byte - level reproducible. The model provider may update the weights without changing the model ID, and the external API returns different data every day.
This file is just a configuration document, not a replay guarantee.
△
In the update log, you can see that ARS has gone through many rounds of iterations. Since its launch in February, the number of commits has reached more than 300.
From each version change, you can also see that the author has a deep understanding of the risks of AI academic research systems.
This is also the key to current academic research AI tools in my opinion -
It's not difficult to let AI help you write a paper. The key is how to prevent it from making mistakes, currying favor, and making the whole process more systematic and reliable.
The design philosophy of ARS can be summarized as the sentence in the README:
"AI is your co - pilot, not the pilot."