Karpathy's "Crazy Creation": Train Your Own "Mini GPT" for $100 in 4 Hours
AI legend and former Tesla AI director Karpathy has launched a new open-source project called "nanochat", which reproduces the entire ChatGPT process with less than 8,000 lines of code. It only requires one GPU, about 4 hours, and costs only about $100. The project received 4.2k stars on GitHub within less than 12 hours of its launch!
The AI legend and former Tesla AI director Karpathy announced the release of a new project nanochat!
A minimalist but complete "Build ChatGPT from Scratch" training framework.
Karpathy said this is one of the craziest projects he's ever written!
It's like everyone can have their own exclusive ChatGPT.
Within less than 12 hours after the project was released, the number of stars on GitHub exceeded 4.2k (and it's still rising rapidly)!
GitHub project: https://github.com/karpathy/nanochat
All the traffic comes from the community spontaneously. This is the influence of Karpathy in the AI field!
Different from the early nanoGPT, nanochat not only covers pre-training but also includes the entire process from data preparation, pre-training, mid-term training (dialogue, multiple-choice questions, tool usage), SFT, RL fine-tuning to inference deployment.
The entire system has only about 8,000 lines of clean code. Start a GPU machine, run a single script, and after 4 hours, you can have a conversation with your own trained "mini ChatGPT" on the web interface.
Karpathy calls it the "grand finale" of LLM101n. It may also become a research baseline for the future and an experimental platform for the open-source community.
Let's take a closer look at how to "clone" ChatGPT with just 8,000 lines of code:
Use the new Rust to implement the training tokenizer
Pre-train the Transformer LLM on FineWeb and evaluate the CORE score under multiple metrics
Conduct mid-term training on user-assistant dialogue, multiple-choice questions, and tool usage data from SmolTalk
Perform SFT and evaluate the chat model on world knowledge multiple-choice questions (ARC-E/C, MMLU), mathematics (GSM8K), and code (HumanEval)
Use "GRPO" to perform reinforcement learning fine-tuning (RL) on the model on GSM8K
Implement efficient inference in an engine with KV cache, simple pre-filling/decoding, tool usage (a Python interpreter in a lightweight sandbox), and interact with it through the CLI or a ChatGPT-like web interface.
Write a single Markdown report card to summarize and gamify the entire process.
The total cost of the project is as low as about $100 (about 4 hours of training on an 8XH100 node).
You can train and clone a small conversational ChatGPT that can create stories/poems and answer simple questions.
It only takes about 12 hours of training to exceed the core metrics of GPT-2.
With further expansion to about $1,000 (about 41.6 hours of training), the model will quickly become more coherent and can solve simple math/code problems and answer multiple-choice questions.
The model trained for 24 hours (whose FLOPs are roughly equivalent to GPT-3 Small 125M, about 1/1000 of GPT-3) can reach the 40s on MMLU and the 70s on ARC-Easy.
To summarize:
$100 → You can train a "mini ChatGPT" like OpenAI's that can write poems and answer basic questions;
$1,000 → Achieve performance close to or better than GPT-2 and perform basic reasoning and code generation.
This project reflects his core concept:
"Lower the threshold for LLM research and reproduction, allowing everyone to train their own models by hand."
This democratic approach is consistent with his advocacy of "implementing Transformer from scratch" during the nanoGPT period.
Project address: https://github.com/karpathy/nanoGPT
Karpathy says his goal is to integrate the complete "strong baseline" stack into one coherent, minimalist, readable, modifiable, and maximally derivable repository.
nanochat will be the grand finale project of LLM101n (still in development).
Karpathy believes that nanochat may also develop into a research tool or benchmark, just like the previous nanoGPT.
nanoGPT teaches you to build a brain, and nanochat teaches you to build ChatGPT.
If nanoGPT is a "Transformer source code teaching project".
Then, nanochat is a "miniature version of the LLM ecosystem", an OpenAI-like, and your exclusive AI.
The relationship between the two can be understood as a two-step closed-loop from "neural network basics to a product-level dialogue system".
From Vibe Coding to nanoGPT, and now to nanochat, Karpathy is truly the best spokesperson for the "AI educator".
This "crazy work" is not a wild fantasy but another implementation of Karpathy's ideal of an open, learnable, and reproducible AI.
Demo of the Mini ChatGPT
Karpathy deployed the nanochat project on the WebUI.
He also provided sample dialogues with the nanochat trained for 4 hours at a cost of $100.
It's... interesting!
The following image shows part of the "report card" generated by Karpathy in the nanochat "$100 speed run" experiment (a small ChatGPT model trained with one GPU in about 4 hours), indicating the model scale, training time, and performance on various standard evaluations.
Characters: 333,989 —— The total number of code characters.
Lines: 8,304 —— Approximately 8,300 lines of clean, well-commented code.
Files: 44 —— The number of project files.
Tokens: About 83,497 —— The number of tokens in the code (roughly corresponding to 80,000 words).
Dependencies: 2,004 lines of uv.lock dependency list —— indicating few dependencies and a lightweight project structure.
These numbers demonstrate the "minimalist" spirit of nanochat: it fully implements the training, fine-tuning, and inference of ChatGPT while still keeping the code within 8,000 lines.
References:
https://x.com/karpathy/status/1977755427569111362
https://github.com/karpathy/nanochat
This article is from the WeChat official account "New Intelligence Yuan". Author: New Intelligence Yuan. Editor: Ding Hui. Republished by 36Kr with permission.