HomeArticle

Hugging Face Model TOP List, now I only admire yuxinlu1

量子位2026-06-28 09:27
Carve out a place among big players

Could an individual developer break into the top of the Hugging Face Models Trending list among numerous big tech companies?!

It was an ordinary day, and I was casually browsing the Trending list on Hugging Face.

The first one was GLM - 5.2, the latest open - source model from Zhipu. It's an old acquaintance, with over 60,000 downloads. No surprise there.

The second was Baidu's Unlimited OCR, which was quietly open - sourced recently. It can parse over 40 pages of documents at once, and its download volume has reached 70,000.

Looking further down, suddenly an individual account appeared: yuxinlu1.

Huh... What?!

Moreover, it occupied two positions.

Looking at the download volume again - the latest data shows as high as 207,000 and 536,000. Goodness, what kind of amazing model is this?

Even in the previous week, this individual developer's model once dominated the Hugging Face list, outperforming GLM - 5.2. Even the person in charge of Zhipu publicly recommended it on X:

That is to say, among well - known names like Zhipu, Baidu, Qwen, and NVIDIA, an individual developer's account managed to squeeze into the top, and with such a high download volume.

It's really curious: Who exactly is luyuxin? How can he have such great influence?

"Amateur Model" Rises to the Top of the Hugging Face Trending List

In this wave of the Hugging Face Trending list, the top positions are mostly occupied by big tech companies, star teams, and popular tracks.

For example, Zhipu's GLM - 5.2 has a huge 753B parameter and is a well - known domestic large - scale model; Baidu's Unlimited OCR hits the popular OCR and document understanding directions.

Further down the list are Qwen's AgentWorld, NVIDIA's LocateAnything, and Microsoft's FastContext.

Familiar domestic open - source large - scale models are also on the list: MiniMax M3, Kimi - K2.7 - Code, DeepSeek - V4 - Pro.

In the image generation field, there is also Krea. Its new models, Krea - 2 - Turbo and Krea - 2 - Raw, are both on the list.

As a result, there are also two 12B GGUF models by luyuxin in the list.

Well... luyuxin, you're really standing out!

Taking a closer look, these two new models mainly integrate the programming reasoning ability of Fable 5 into a small Gemma4 - 12B model that can run locally.

It can run with 4.5GB of video memory, locally, offline, and with zero API cost. An ordinary consumer - grade graphics card or even a Mac with unified memory can run it.

The two models also have different functions.

V1 is the Coder version, focusing on writing code, solving problems, and generating runnable code.

According to the model card, its training data is "verifiable" code reasoning: the code corresponding to each thought chain must pass real - world tests before being retained.

The teacher data mainly comes from Cursor's Composer 2.5, plus Fable 5 - for the questions that Composer 2.5 gets wrong, Fable 5 will be used to re - reason and generate new reasoning chains and correct code.

After the release of V1, it dominated the top of the Hugging Face Trending list for many consecutive days.

V2 is the agentic version, with the ability to call multiple - step tools. It can be used as a local agent, reading, reasoning, acting, and verifying on its own.

The author also conducted a benchmark - on the telecom subset of tau2 - bench, the base gemma - 4 - 12B scored 15%, while the V2 model scored 55%, about 3.5 times the basic performance.

However, the author also said that this is a relative value obtained from local self - testing in a single domain with 20 tasks and cannot be directly compared with the official list. He also admitted that there is still a significant gap compared with the frontier large - scale models.

The author also mentioned that Fable 5 was later taken offline, and only his own dataset retains the "original" reasoning process of Fable 5.

For the missing reasoning part in the community - contributed data, he used Claude Opus 4.8(xhigh) to regenerate and fill it in one by one.

He also admitted that the reconstructed trajectory "may differ from the original Fable 5", but this was the only feasible solution at that time.

He also revealed in the discussion that this set of fine - tuning data actually only has about 10,000 examples. He emphasized that the quantity of data is not as important as people think. The real key is quality, screening, and verification.

There is a very practical reason why this set of models has such high popularity on Hugging Face: It can run locally.

These two models are both GGUF quantized versions.

GGUF is a common local model format in the llama.cpp ecosystem. Users can directly load it with tools such as llama.cpp, Ollama, LM Studio, and Jan.

This is especially attractive for coding scenarios. After all, writing code, viewing repositories, running commands, and debugging often involve private projects and local environments. Being able to run on one's own machine means not having to upload code to the cloud and not having to pay API call costs every time.

More importantly, the threshold is not very high.

The V1 model card states that the smallest Q2_K version is about 4.5GB. As long as there is about 4.5GB of video memory or unified memory, a private, offline programming assistant can be run.

The author recommends the Q4_K_M version, which is about 6.87GB; the higher - quality Q8_0 version is about 11.8GB.

For V2, since it is more agentic, the author did not release the Q2_K version. The reason is that it did not pass the stress test and is not reliable enough.

So the smallest reliable version of V2 starts from Q3_K_M, which is about 5.7GB; the recommended Q4_K_M is still about 6.87GB.

The author also previewed the follow - up plan in advance - V3 is on the way.

He said that V3 will still focus on the coding + agentic direction along the 12B line. The author said that he didn't expect such a significant improvement from the post - training this time, so he will continue to move forward.

Especially on the tau2 - bench telecom, V2 still has some problems of "over - trying and repeated retries". V3 will continue to improve through more training.

On the other hand, he is also working on a larger version: Qwen3.6 - 27B. It's like applying the same coding + agentic formula to a larger base for users with more abundant video memory.

One Person, 40 Hours, Breaks into the Middle of Big Tech Companies

Being able to single - handedly rise to the top of the Hugging Face Trending list, with a total download volume of over 700,000, and carve out a place among numerous big tech companies.

Who exactly is this author?

After contacting the author, QbitAI learned his story.

His name is Lu Yuxin. Currently, he is a graduate student in the field of AI at a university in the United States. He majored in data and business analysis as an undergraduate and also took a full - stack development course to learn front - end, back - end, software development, and data processing.

These two popular models are not his main job but purely self - funded personal projects.

"Open - source actually only costs money and doesn't bring any income." He is well aware of this, so his initial motivation for making V1 was actually "self - improvement":

The knowledge taught in school is updated too slowly. When he was in graduate school, the professor was still teaching content from two or three years ago. Since AI is evolving rapidly, he used this project to force himself to keep up with the latest developments.

To make these models, he used up an entire Claude Max 20× package, and it took him over 40 hours just for V2.

He synthesized data one by one, manually cleaned it, trained, evaluated, and retrained, almost all by himself.

For hardware, he used an RTX 5090 with 32GB of VRAM; he also had about 96GB of local SSD resources to use in combination. The actual available resource scale is about 128GB.

It's not bad for an individual developer, but it's not in the same league as the computing power pools of big tech companies and AI labs.

He told QbitAI that the most time - consuming part of the whole process is actually data processing.

Especially for agentic data, real - world conversations are often very long. A single task may have more than a dozen steps, with thousands or even tens of thousands of tokens. But limited by the video memory, he can only feed a maximum of 2048 tokens at a time during training.

So he did a "sliding window" - like processing: in each multi - round conversation, using the most recent user message as an anchor point, he trimmed the context within the budget around a single tool call.

Both V1 and V2 are based on Gemma 4 - 12B. Choosing it was not because it was easy to work with. On the contrary, the format and tool protocols of Gemma 4 are quite special, and it's very troublesome to adapt. Even the support of many clients is not perfect.

Lu Yuxin said that on the one hand, it was to challenge himself; on the other hand, the 12B size is very attractive.

He calculated that if quantized to about 3 bits, many Mac users with 8GB of unified memory can also run it, leaving a certain context window.

I know that many people are still using computers with about 8GB of unified memory. So I want to make it available to more people with the largest possible parameter quantity.

Lu Yuxin summarized the value of local models in two words:

Privacy, Free.

He thinks that many people just want AI to help them organize files, process data, make PPTs, or experience an agent, and they may not be willing to pay for Claude or GPT every month.

People may just want to have some fun. Why do they have to pay?

After the release of V1, he didn't pay much attention to the list at first. He just said in the model card as usual that if people liked it and the download volume and likes were high, he would continue to make V2.

Unexpectedly, two or three days later, the model suddenly jumped from nowhere to the eighth place; after a night's sleep, it rushed to the first place.

Subsequently, a large number of comments and issues flooded in.

He read almost every one of them. At most, he spent three or four hours a day reading Hugging Face comments, answering questions, testing user feedback, and then telling the users the results.

He said: "The community has needs, and I'm really doing something about it. That's the most important thing."

It Turns Out He Loves Reading Online Novels...

On HF, Lu Yuxin has published a total of 9 public models. Besides the two popular models, he also made a model that "directly distills Claude".