Conversation with Kang Hongwen, Founder of Clipto.AI: An AI without Memory Is Just an "Amnesiac" Smart Person

Models will be upgraded, Agents will be restructured, but the memories accumulated by users over a long period of time will not be easily migrated.

Hardware in Place, Software Missing

In 1945, American scientist Vannevar Bush put forward a concept called Memex (Memory Extension) in his article "As We May Think," which influenced the development of the entire computer science field.

In his imagination, everyone would own a machine. It could store one's reading materials, photos, notes, and knowledge, and like human memory, it could help the owner recall, associate, and retrieve information at any time. Later, people regarded it as one of the earliest ideological sources of personal computers, hypertext, and even the Internet.

Vannevar Bush Memex

In the past 80 years, computers, the Internet, and smartphones have emerged one after another. The storage capacity has increased by millions of times, and the information accumulated by humans has also expanded at an unprecedented speed. However, the dream depicted by Bush has never been truly realized.

The reason is not complicated. Machines are becoming better at storing information, but they still cannot form memories. They can store all your life's data, but they cannot retrieve a specific moment for you when you need it.

It's only recently that things have started to change.

In the past year, the AI industry has almost completed a collective upgrade of its infrastructure.

For the first time, edge computing power is no longer just a concept but has begun to become a standard configuration for consumer electronics. NVIDIA launched RTX Spark, directly deploying AI computing power to PCs. Intel's Lunar Lake and Qualcomm's Snapdragon X Elite have increased the NPU computing power of laptops to 60 TOPS and 45 TOPS respectively. Apple has also continuously integrated AI capabilities into its M-series chips.

The models have also reached a new turning point. Open-source models such as Llama 3, Qianwen, Gemma, and Phi are constantly reducing in size while continuously improving in capabilities. The maturity of inference frameworks such as llama.cpp and MLX has enabled large models to run stably on ordinary personal devices for the first time. Meanwhile, Apple Intelligence, Copilot + PC, and the development toolchain built by NVIDIA around edge AI have further embedded the models into the operating system.

The chips, models, systems, and with market education, "edge AI" has gradually won the trust of users. Almost every layer of infrastructure is ready.

However, even when all these pieces are put together, it's still difficult to get an AI product that ordinary users will use every day. The problem doesn't lie in single-point technologies but in the lack of a product that can truly integrate models, hardware, systems, and personal data.

Once-popular "consumer-grade edge devices" like Rabbit R1 and Humane AI Pin quickly became a passing fad due to product definition failures. Rabbit R1 hoped to become a new cross-terminal interaction entrance, but it failed to answer the question of "why buy a device other than a phone." Humane AI Pin had the ambition to replace the phone, but its cool hardware didn't create demand; instead, it increased the entropy of the user experience.

More importantly, these new edge species haven't solved a core pain point: even though they are closest to the user's personal database, the AI brain often falls into the embarrassing state of "amnesia."

The industry lacks a player that can integrate models, edge devices, and memory systems.

When everyone is discussing Agents, a more fundamental question begins to surface: what does an Agent rely on for long-term existence?

Two years ago, when the entire industry was still immersed in the idea of "bigger cloud models," Kang Hongwen, the founder of Clipto.AI, made a rather counter-consensus judgment:

The real new opportunity will emerge in a new layer of infrastructure spawned after the convergence of edge computing power and large model capabilities.

In his view, only when two technological curves - the maturity of edge computing power and the maturity of large model capabilities - reach the critical point simultaneously, can AI truly become the "second brain" in everyone's device, rather than just a chatbot.

And the real opportunity is not limited to the models themselves but also belongs to the "Memory Layer" built on them.

The product developed by Kang Hongwen and his team, Clipto, is exactly the testing ground for this hypothesis.

Users only need to describe in natural language what they want to find, and Clipto can quickly locate relevant segments and information in terabytes of local videos, audios, pictures, and documents.

However, search is only the first ability exposed by Clipto.

Behind Clipto is a Memory Layer constructed by more than a dozen self-developed edge large models, inference architectures, computing power scheduling systems, and data organization capabilities. It allows the originally scattered massive data to continuously precipitate into personal memories that can be called by AI and retrieve information that has long been forgotten by users from the massive content within milliseconds.

In May 2026, after releasing the new version of the Mac app, Clipto topped the daily list of Product Hunt. The imagination space built by edge devices and memory is gradually becoming a reality.

Screenshot of the top of Product Hunt

"A Smart Person Without Memory Is Just an Amnesiac"

In the past year, Agent has become the hottest keyword in the AI industry.

Almost all large model companies are talking about Agents. Startups are developing Agents, and capital is chasing after them. From programming and office work to shopping and customer service, more and more people believe that Agents will be the next product revolution in AI after ChatGPT.

In a report in April 2026, Gartner described the industry's attitude towards Agentic AI as reaching "the Peak of Inflated Expectations." More than 60% of enterprises plan to deploy AI Agents in the next two years, even though only 17% of enterprises have completed the deployment so far.

However, in this almost unanimous pursuit, Kang Hongwen, the founder of Clipto, keeps raising a seemingly simple but rarely answered question: Can an Agent without memory really understand the user?

In his view, most Agents today are built on a dangerous assumption: as long as the model is smart enough, it can become the user's assistant.

But the fact is the opposite. Every time you open an Agent, it's like it's meeting you for the first time. It doesn't know what meetings you had yesterday, where your photos are stored, or what documents you've accumulated in the past year. It can reason but has no experience; it can answer but cannot continue the conversation.

"A smart person, without memory, is just an amnesiac." Kang Hongwen said.

This is also the question he has been researching for the past two decades.

In the first ten years, Kang Hongwen's research topic was how machines understand the world. In 2004, Kang Hongwen interned at Microsoft Research Asia, making Xbox automatically analyze a large number of family photos and videos taken by users and extract key segments from hours of footage to automatically generate a family video.

Then, he went to the Robotics Institute at Carnegie Mellon University to pursue a doctorate, studying under Takeo Kanade, a scholar in the field of computer vision, and continued to research image and video understanding. In his view, understanding videos is essentially understanding the real world.

In the past ten years, Kang Hongwen has turned to researching how machines generate content. In 2017, he founded the AIGC company "Huichuan Intelligence." Later, its creative platform "Zhiying" was acquired by Tencent at the end of 2020. After joining Tencent, Kang Hongwen continued to be responsible for the R & D of full-stack AIGC products such as text-to-image, text-to-video, and digital humans.

Today, at Clipto, Kang Hongwen has brought the question back to "understanding." Because he believes that generation is no longer the biggest bottleneck for AI, "what's really missing is memory."

The emergence of edge large models has brought the opportunity for this technological route to mature for the first time.

Kang Hongwen told 36Kr that cloud models are more like a "global brain," responsible for learning public knowledge and understanding the whole world, while edge AI should become "personal memory," understanding each specific person.

In his view, the future AI architecture will not be a simple competition between Cloud AI and Edge AI. The real evolution direction is Cloud Intelligence + Edge Memory - the cloud is responsible for world knowledge, the edge is responsible for personal memory, and the Agent is just the interaction layer connecting the two.

"The Agent is just the interaction interface at the top, and what really determines whether it's smart is not just the model itself but whether there is a continuously growing Memory Layer underneath." He mentioned that in his view, this is an architectural problem that has been long ignored by the industry.

Living Memory Graph

The model will be upgraded, the Agent will be reconstructed, but the user's long - term accumulated memory will not be easily migrated. He mentioned.

Around the "Memory Layer," Clipto has rebuilt a complete set of edge AI technology systems from the bottom up.

In Kang Hongwen's view, what many people understand as Memory is more like the model having a longer context or accessing a vector database. But the real Memory Layer is far more than that.

"Memory is not a model but a complete system." He mentioned in an interview.

The first layer is the model.

Multimodal data is naturally highly heterogeneous. Videos, audios, pictures, and documents each require different ways of understanding. Around capabilities such as person recognition, speech understanding, OCR, scene analysis, and event understanding, Clipto has self - developed more than a dozen edge AI models. Some of them are based on open - source basic models for targeted post - training, and some are completely self - developed. Each model needs to be redesigned for edge computing power instead of directly migrating cloud models.

The second layer is the edge computing power architecture.

Different from the cloud, which has almost unlimited computing power, edge devices are limited by CPU, GPU, NPU, memory, storage bandwidth, and system resources. To enable multiple models to work together for a long time, Clipto has built an edge inference framework and a computing power scheduling system from scratch, dynamically scheduling different models according to device resources instead of letting them compete for computing resources.

Kang Hongwen introduced that Clipto's architecture can automatically be compatible with devices of various configurations, even including an M1 MacBook with only 8GB of memory. On the latest generation of M5 MacBook Pro, Clipto can complete the offline analysis of 2TB of local videos in about 24 hours. If relying entirely on the cloud, the same processing cost would be about $400.

Screenshot of the user's computer desktop when using Clipto to make a video

The third layer, and the most important one, is to build the memory itself.

The model can understand the content, but it doesn't naturally form memory. The system also needs to continuously organize the scattered multimodal information into structured relationships such as time, location, person, and event, and continuously establish associations across files, time, and sources, ultimately forming a personal memory network that can continuously grow.

What the Agent calls is no longer a single model but this continuously accumulated and evolving Memory Layer.

In Kang Hongwen's view, this is also the most difficult part of the Memory Layer.

It spans multiple technical levels, including model R & D, edge inference, computing power scheduling, multimodal understanding, data organization, spatio - temporal databases, knowledge graphs, and retrieval systems. No single module can form real Memory alone. Only by integrating these capabilities into a long - running and continuously growing system can AI truly have "memory."

"The model will be continuously upgraded, and the Agent will also evolve continuously, but the user's long - term accumulated memory will not be easily migrated. The real moat is the entire technical system built around Memory." He summarized to 36Kr.

If today's large models solve the problem of how AI understands the world, then Clipto solves the problem of how AI remembers a person in the long term.

Clipto Is Not a Creation Tool but a Memory Infrastructure

After Clipto topped the daily list of Product Hunt, what really surprised Kang Hongwen was not the achievement itself but the user feedback in the comment section.

According to the convention, most users discuss whether the product is easy to use and whether the functions are rich enough. But after Clipto was launched, another voice appeared in the comment section:

Many developers began to ask whether the API was open, whether it could be used as the long - term memory backend for Agents, and even discussed how to integrate Clipto into their own products - at that time, Clipto hadn't even released the SDK.

This signals that users are no longer just concerned about a search tool but are starting to regard it as a layer of infrastructure.

This change also exceeded the initial expectations of the Clipto team.

At first, Kang Hongwen thought that the first to pay would be content producers such as video creators and photographers. But as the number of users grew, the team found that the rapidly expanding group was not only creators but also knowledge workers such as financial analysts, lawyers, doctors, and consultants.

According to official data, currently, about one - third of Clipto's users are creators, and the remaining two - thirds are professional workers from industries such as finance, law, and healthcare.

This means that memory management is a larger and more rigid demand than content creation.

In the past, people always thought that multimodal data management was a problem that only needed to be solved in professional scenarios such as video editing and film production. In fact, every knowledge worker is constantly generating audios, pictures, meeting records, and documents. Meeting recordings, training videos, mobile phone screenshots, podcast collections, PDF files... These information are growing every day, but they are rarely effectively recalled.

When AI can truly understand this data, "memory management" is no longer just a need for creators but a need for everyone.

Business data further verifies this judgment. Three months after Clipto was launched, it achieved break - even. In 2025, the company's ARR (Annual Recurring Revenue) reached $15 million.

For an AI company that is still in the early stage of product development and adheres to the edge deployment route, such a commercialization speed is a strong signal: the market is willing to pay not just for one - time AI capabilities but for long - term accumulated personal memory. Memory is not a future market but a verified real - world demand.

More importantly, it also verifies the capabilities of the Clipto team. While many AI

This article is originally produced by「晓曦」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Conversation with Kang Hongwen, Founder of Clipto.AI: An AI without memory is just a "amnesiac" smart person

Hardware in Place, Software Missing

"A Smart Person Without Memory Is Just an Amnesiac"

Clipto Is Not a Creation Tool but a Memory Infrastructure