StartseiteArtikel

Gründer von Engram: Selbst das größte Modell nützt nichts, wenn es sich nichts merken kann.

品玩Global2026-06-26 11:29
Das Modell hat kein Gedächtnis, und selbst die besten Prompts nützen nichts.

Optimizing the prompt to the extreme is not enough to wait for a new version. This is the assessment of Dan Biderman, co-founder of Engram, and also the starting point for the establishment of this company – while the entire industry is dealing with context engineering, RAG (Retrieval-Augmented Generation) and tool invocation, Biderman and his partner Jessy Lin turned their attention to another way: Training. Not the training of more intelligent models, but the training of models to remember you.

Biderman has a background in neuroscience, and Lin comes from the field of NLP (Natural Language Processing) and cognitive computing. The two have assembled a small but highly qualified team and founded a rising AI lab called "Neolab". Engram does not develop generic large language models. Its customers are teams that need the AI to actually understand their business – companies like Notion, Microsoft and Harvey are already using it to train their "exclusive models" so that the AI engine not only answers questions, but remembers every decision, every iteration and all industry-specific implicit knowledge like a long - serving employee.

In this 45 - minute podcast interview, the two moderators with a venture - capital background ask a central question: If AI models are already intelligent enough, what is the next bottleneck? Engram's answer is clear – Memory. Not inserting more content into the context window, but imprinting memories into the weights of the model.

This article is a translation of a YouTube podcast interview called "Memory and Continual Learning: Engram's Dan Biderman and Jessy Lin". Here is the full translation.

1

"Our model is constantly being trained" – Breaking the boundary between pre - training and post - training

A statement on the Engram website challenged the moderators from the start: "We don't view the world from the perspective of pre - training or post - training. Our model is constantly being trained."

What does that mean?

Jessy Lin explains it directly: "Today's models are already very intelligent. The bottleneck in making these models more useful no longer lies in the original intelligence, but in the understanding of new, constantly changing context – like a new project you're currently working on, or an industry - specific way of working. The question is: How can we imprint this information as deeply into the weights of the model as pre - training imprints 'Paris is the capital of France'?"

Dan adds with a more vivid metaphor: "We humans go to sleep at night and come back to work the next day. In our heads, we don't just have notes, but also a new intuition – we know where to look and how to think. Today's AI solutions are based on external memories: Information is written into the context window or into notes. But this has two problems: First, the number of tokens you generate daily quickly reaches tens of millions, which makes the search very expensive. Second, the external memory understands nothing, it can only search."

The two agree: Context engineering, RAG and tool invocation – all of this has value, but there is a greatly underestimated tool: Training. One can train every vertical field or a company's private data in the same way as leading labs train the best math or code models.

2

Engram architecture: "Burning" corporate knowledge into the model weights

What Engram does can be summarized in one sentence: Train an exclusive model for each team so that it deeply understands the team's context and develops over time.

Jessy describes the product's working method: They connect to platforms like Notion, Microsoft and Harvey, which have large amounts of long - term work data. The documents, conversations and feedback – the original signals that people generate in their daily work – are converted into training data. Then, using adapter fine - tuning techniques like LoRA (Low - Rank Adaptation), this knowledge is burned into the weights of the model.

The goal is not to "read a file during the inference process", but for the model to be like a long - serving employee in a company and understand the company:

  • Know the company's strategic directions
  • Understand the specific way of working
  • Master the hiring process, writing style and internal rules
  • Be able to give accurate answers directly without searching in documents

Dan gives a quantitative assessment: The best current model may need 100,000 tokens to answer a question about internal corporate knowledge and search and draw conclusions. After Engram's training, it may only need 100 tokens – a saving of up to 100 times, not 50%, but two orders of magnitude.

Technically, Engram requires white - box access to the model weights. Therefore, it focuses more on open - source models, but can also cooperate with companies that have closed - source weights. Any model based on the Transformer architecture can be processed by Engram.

3

Should memory be incorporated into the weights? – The limitations of RAG

A question that one cannot avoid is: Can RAG (Retrieval - Augmented Generation) solve this problem?

The moderator presses Dan, and he gives an apt analogy:

"Do you have to remember the access code to your apartment? Yes, because you use it every day. Do you have to remember the room number of a hotel from last year? No, you can write it down."

Then, however, he points out the core problem of RAG: Do you know what to search for?

The search system solves the problem of what to store and where to store it, but the most difficult part is knowing what to search for. Often, valuable associations cannot be retrieved in advance – for example, when you see someone in a team working on a certain research direction and you intuitively think of something else related to it, this "unsolicited" association can only occur in the weights of the model, not in a search system.

Jessy adds from another perspective: If you constantly rely on RAG, you always conduct static searches and cannot accumulate and combine knowledge. "It's like reading your notes over and over again instead of really understanding them – the understanding will never get deeper."

Dan says it even more directly: In a way, Engram's direction is a "RAG killer" – not that RAG has no value, but for the knowledge that really needs to be internalized, training into the weights is the better option.

4

What does "Remember only important things" mean? – Forgetting is part of intelligence

A profound philosophical question arises: Is it a feature or a bug when a large language model imprints all facts into the weights?

Jessy gives her opinion: "You can't completely separate fact memory and skill memory. Some researchers have tried to remove all 'facts' from the model and only keep the 'algorithm skills' – the result was a very unnatural model that couldn't even answer basic questions. You have to internalize some things to build more abstract concepts on them."

But she also admits the core of the problem: Not all facts are worth remembering. Existing academic benchmarks often require the model to remember the length of a bridge in an African country – this information doesn't necessarily need to take up the model's capacity.

Dan looks at the problem from the perspective of neuroscience: "Human memory is lossy. This is not a defect, but part of intelligence – compressing the important and filtering out the unimportant. The magic of deep learning lies in the fact that gradient descent can compress huge amounts of information into very few parameters. The 70B - Llama model has a parameter file of about 100 GB and can store the knowledge of the entire Internet. But if you only store the KV - cache entries of a Wikipedia article about Taylor Swift, you need 80 GB of GPU memory – you've converted a few kilobytes of text into an 80 - GB 'brain state'."

His conclusion is: Training is compression. If you can compress these 80 GB offline to a few hundred megabytes, the loading speed will be 1000 times faster, which is of revolutionary significance for the entire inference infrastructure.

5

Why don't large language model providers do this themselves?

The moderator asks a sharp question: Why don't leading labs like OpenAI and Anthropic do continuous learning themselves?

Dan's answer is open:

"The top goal of leading labs is AGI (Artificial General Intelligence) – an extremely universal supermodel in programming and mathematics. The way to achieve it is clear: More pre - training, larger models, more data, more RL (Reinforcement Learning) and more inference calculations. That's 95% of their energy and budget."

He doesn't think that large companies don't think about memory and continuous learning – in fact, Demis Hassabis of DeepMind clearly said at a Sequoia event that "new breakthroughs are needed in this area" – but for large companies, it's more of a product problem and not yet a core research problem.

Jessy adds from another perspective: This problem requires a deep integration of research and product development. In the existing working method of large companies, researchers train a model and hand it over to the product team. The product team then conducts context engineering and prompt engineering. In Engram's world, every user interaction is a training signal – research and product development must work in a common loop, which is a completely different organizational form.

She also mentions a structural difference: What every person and every company wants is often private and contradictory. "My writing style is different from yours, and my work process is different from that of your company. These things will never appear in a post - training dataset."

6

Memory Wallet, personal models and the ultimate goal

At the end of the conversation, the moderator presents an interesting idea: Since we now have "token wallets" that we can carry around with us, maybe there will also be "Memory - Wallets" in the future – to take the skills and working methods you've learned in one company to your next job?

Dan sees this as one of the ultimate goals:

"You create a lot of value in your work. The IP and secrets stay in the company, but the skills you've learned and your unique way of thinking – something that has been 'neutralized' – you should be able to take with you. We've always done this on a biological level, but we're restricted by NDAs and professional ethics. The digital version will be even more interesting because it will force everyone to integrate AI deeper into their work and be rewarded for it."

Jessy's vision is more concrete: Everyone has their own model – a model that really differs from other models and from leading models and works for you or your team.

Dan ends with a discovery from neuroscience: The neural circuits in the brain responsible for memory and navigation are almost identical – memory is essentially a navigation in the cognitive space. He imagines Engram as a "neuronal interface" – not as an index of a file system, but as a brain - state representation of the entire data level, which is more associative, more efficient and closer to human information processing.

"This is something like Databricks or Oracle," he says, "except that we store neural memories, the models are individual and there will be hundreds of millions of them."

7

Language vs. Vision – A "crazy theory"

At the end of the podcast, the moderator Sean shares his long - held "crazy theory":

Why have language models finally overtaken vision models? His hypothesis is: In the biological world, the visual information bandwidth is far greater than that of language (photons vs. sound waves), so the brain has allocated more "computing resources" to vision. In the computer world, all signals are electronic, and the processing costs for vision and language are "equalized" – so language models have a fair chance of competing in the biological world, which they didn't have before.

Dan and Jessy find that this direction makes some sense, but they also warn: Most of the knowledge work that people do today (writing notes, reading documents, chatting with AI)