They developed an enhanced version of the "Lobster", which is the size of a palm, and successfully completed two consecutive rounds of financing worth tens of millions of yuan.
If you're interested in the recently popular "Lobster" (OpenClaw), you may be aware that it still has issues such as high installation barriers, poor usability, and significant security risks.
Violoop, on the other hand, is an enhanced version in every aspect.
Simply put, Violoop is a palm-sized hardware device with a touch screen that displays the tasks it's currently processing.
This is a plug-and-play product that doesn't occupy the computer's CPU/GPU resources and doesn't require any software installation. A regular computer can be instantly transformed into an "AI computer" and an upgraded version of "Lobster" by connecting to Violoop via an HDMI cable.
Violoop is in the bottom-left corner and can be placed on the desktop | Image source: Violoop
It comes with pre-installed common Skills, and there are no barriers. You can command it to work 24/7. Of course, its features don't stop there, and they will be introduced in detail below.
Violoop is not a project chasing the trend. When we first communicated with them in early November 2025, OpenClaw hadn't even been born yet. And Violoop started even earlier.
Two months later, OpenClaw became extremely popular, and Violoop "accidentally" caught the wave, quickly completing seed and angel rounds of financing worth tens of millions of yuan.
It is reported that this round of financing will mainly be used for product mass production, global market promotion, and the continuous construction of the Action Model dataset. The Violoop product will soon launch a global crowdfunding campaign on Kickstarter in April.
Undoubtedly, it has become one of the most sought-after AI hardware projects at present.
Violoop has two founders. CEO Jaylen He is a serial entrepreneur who graduated from the CS major at UC San Diego. His previous project, which provided long-term rental apartment management services for international students, was once selected for the YC Startup Program.
CTO King Zhu is a genius academic. He completed his undergraduate and master's degrees in the EECS major at MIT in 3.5 years, graduating the fastest in his class. Later, he served as a core engineer in multiple business lines at Microsoft, such as Xbox, HoloLens, and Surface.
The emergence of Violoop indicates that the hardware and software forms of the AI OS are far from being finalized, and the competition has just begun.
Just one cable, no need to download software, turn an ordinary computer into an upgraded "Lobster" instantly
The purpose of physically connecting Violoop to the computer via HDMI is to obtain the complete data chain of "video stream + operating system API + HID operations" without loss and in full volume.
In terms of device linkage, it supports access to Telegram and Feishu, and the team has also developed a dedicated APP. The advantage of the dedicated APP is that it can achieve functions that IM tools can't - real-time viewing of the connected computer's screen and observing the AI's operation process.
Send commands via the mobile phone, and the tasks running on the computer can be displayed on Violoop's screen | Image source: Violoop
"For example, when asking the AI to write front-end code, since HTML files can't be previewed on the mobile phone, users can directly check if the result meets their requirements through the real-time video stream."
One of Violoop's highlights is that it is more proactive and really "knows what to do".
Jaylen He introduced, "Based on ensuring security, having the ability to control the host, and perceiving the user's screen state, it actively provides services to users."
For instance, when the AI sees the user organizing invoices on the computer, even if the user doesn't know what the AI can do, the AI will actively push a message asking, "We detected that you're organizing invoices. Do you need us to take over and automate the process?" Or when it sees the user learning AI-related videos on Bilibili or YouTube, it will ask, "Are you interested in this field? We can provide you with relevant reports or collect other video links."
Violoop can see the user's operations mainly based on their self-developed visual model, which can recognize the screen content and operate the software on the computer like a human.
This design mainly takes into account that many software don't provide APIs or command-line interfaces. "We always follow the principle of 'using the command line whenever possible'. Only when the software doesn't have a command-line interface do we take over through vision."
In contrast , even for some very old software systems that OpenClaw can't operate, Violoop can handle them, greatly expanding its scope of action.
Violoop's hardware parameters | Image source: Violoop
In terms of security, Violoop has designed a dual-chip architecture in the device.
The main control chip is responsible for running the AI and the system. Another independent security chip is specifically responsible for permission review.
For example, if the AI wants to read a file, the security chip can automatically approve it. But if the AI wants to delete a file, send a message, or access sensitive data, it must be confirmed by the user.
The user can approve through the mobile App or the touch screen on the device.
This design essentially adds a layer of "guardrail" between the AI and the system.
Some time ago, when a person in charge of Meta's security department ran OpenClaw on a Mac mini, the AI accidentally deleted more than 2,000 emails.
This kind of problem may become more common in the Agent era.
Violoop hopes to control the risks within a more controllable range through the hardware-level design.
No need for prompts, learn after one watch
Another very interesting design is Violoop's skill learning system.
It doesn't require the user to input any prompts. It can learn after watching the user's operations and workflows once, and then upgrade itself.
The method is very simple: the user only needs to swipe left on the device's touch screen to enter the screen recording state. The AI will record the entire operation process and extract a complete chain of behavioral evidence:
- What the user input
- What response the operating system made
- What changes occurred in the GUI interface
This data will be packaged and sent to the cloud for analysis.
Then the system will break down the task into a series of steps and find a better execution path through reinforcement learning.
It's worth noting that the AI won't completely copy the user's operations. In many cases, human operations are not the most efficient. The system will try to find the execution method with the lowest cost and the highest success rate.
For example, "When the AI knows that the starting point is 'finding a file' and the ending point is 'sending it to a WeChat friend', it will learn how to complete the task with the lowest cost, the fastest speed, and the highest success rate."
Jaylen He said, "We will design a reward function to encourage the AI to move the mouse as little as possible and make fewer judgments through screenshots, thus optimizing the execution efficiency."
Finally, a reusable skill will be generated.
Skills may be an "intermediate state", and personalized edge models are the future
In the view of the Violoop team, the currently popular Skills are mostly structured texts. On the one hand, they are building a community that facilitates users to share Skills. On the other hand, they are also exploring a more long-term direction.
When the user records enough data, and the AI has enough understanding of the user and accumulates enough personal memories, the team plans to train this personal data into a dedicated edge model through post-training.
"Currently, AI memory is mainly extracted through external databases (such as RAG). In the future, the edge model can directly internalize these memories and skills. In this way, the model will have a qualitative improvement in understanding user information, memory retrieval speed, and the generalization ability of executing skills."
In every second of coexistence, it silently captures the user's intentions and decision-making preferences, accumulating them into a "personal memory" | Image source: Violoop
That is to say, after the user accumulates enough data, the AI will not just call the workflow but directly "internalize" these abilities into the model.
In this case, everyone's AI will gradually become different.
It will remember your habits, understand your work style, and gradually evolve into a model exclusive to you.
Jaylen He said that they envision that the future will definitely be a combination of "edge model + cloud model", and the edge model will become more and more highly customized.
"Just like the updates of mobile apps now, when Meituan updates, everyone updates together. But in the future, software will definitely be highly customized , and each person's update frequency and content may be different. When the personal data accumulates to a certain amount, it will automatically undergo an independent model update."
For both the user and Violoop itself, this will build a more long-term moat than simply building a workflow and a Skills sharing community.
OpenClaw is the Linux of the AI era, and Violoop wants to be the Mac
When comparing OpenClaw and Violoop, Jaylen He gave an analogy: "OpenClaw is the Linux of this era, an open-source underlying operating system."
After Linux, no pure software OS has been able to compete with it, but commercial giants like Windows and macOS, which are distributed based on hardware, have emerged.
Therefore, Violoop wants to be the Mac of the AI era, creating a product that combines hardware and software.
"We believe that the next-generation OS will definitely be a combination of 'cloud model + edge model'."
Jaylen He said, "The computing power cost of cloud large models is extremely high. Our team's daily usage cost on our own software may exceed $1,500. Through edge capabilities, we can perform multi-modal processing locally, which not only saves costs but also takes advantage of the instant response and perception of the edge."
In this form, the cloud large model is like the CPU, the memory is like DRAM, and the edge model is like the GPU, sharing the specific part of the computing.
Multi-modal processing is mainly completed locally because there is a transmission time when audio and video are sent to the cloud, and the cost of cloud multi-modal processing is higher.
The edge model understands the content and then sends the information (such as in JSON format) to the cloud, and the cloud then performs subsequent reasoning and task planning.
In addition to computers, Violoop can also easily connect to the IoT system of smart homes, such as controlling lights, music, air conditioners, etc.
"Currently, smart homes have relatively unified protocols, such as HomeKit. We install the protocol on the hardware and scan the devices through the local area network to directly update and control them."
Additionally, they run an Android virtual machine on the phone to achieve functions similar to controlling the phone.
This is not directly controlling the user's physical phone but creating a simulated environment for the Agent to interact with at the underlying level. It's a bit like the 'Doubao Phone' but without a physical entity.
Their team introduced that if users want to use Meituan or Ctrip, they need to download and log in again. Since these apps support multi-device login, after logging in, it can act as an assistant to help you book tickets or order takeout.
By now, the prototype of an AI-era OS has become clear: any device such as a computer, phone, or smart home has become a physical peripheral of this OS.
In other words, the computer is just an entrance. In the future, this kind of product even has the opportunity to become the AI control center of the entire family.
"Violoop is self-iterating"
Violoop's origin stems from a real pain point - a "lazy" need.
From 2023 to 2024, the Violoop team mainly focused on deploying and fine-tuning edge models for Fortune 500 companies and other enterprises.
Jaylen He recalled that in mid-2024, the customer's needs for knowledge base retrieval and business decision-making became stable, but the needs were very scattered and could come in at any time. "We didn't want to stay in front of the computer all the time. We tried TeamViewer or Sunflower, but the interaction experience was very poor."
So, they started research and finally found the current direction in mid-2025.
"Recently, what amazed us the most is that our AI has almost achieved self-writing."
The Violoop team found that they only need to define the R & D scope and distribute the components each Agent is responsible for, and the AI can achieve self-iteration.
Currently, they maintain an amazing rhythm of "refactoring once every three days".
"The AI that writes code first completes multiple rounds of self-writing and sorts out hundreds of test cases. Then the Agent in charge of testing completes them one by one and submits issues. Another type of Agent is responsible for monitoring and fixing issues. Finally, humans conduct actual verification." Jaylen He said.
After each refactoring, the number of lines in the code library can be reduced by about 20% - 30%, achieving the same functions with less code and more precise arrangement.
Behind this "self-evolution" is a real investment - currently, the team has a package that costs $200 per month for about 20 months for code writing, and the testing process can only be done through API calls, with an average daily API cost of about $1,500.
"The core value of our team is that we will never be stingy with the funds invested in computing power." Jaylen He believes that "computing power is the greatest leverage given to individuals and teams in this era."
Regarding Violoop's future, the team said, "Our vision and goal for it are that when the product reaches a mature stage, we hope it can complete any work that humans can do on the computer by itself."
The emergence of Violoop proves that the forms of personal Agents and AI OS are far from being finalized.
This geek spirit of "not compromising with the existing ecosystem" is worthy of admiration. The direction it represents - the combination of hardware and software, proactive AI, edge-cloud collaboration, and high customization - may also be the correct way to open the operating system in the AI era.
However, once entering the field of "AI OS", with the subsequent entry of various technology giants in the Internet, AI, mobile phone, and PC industries, it goes without saying how competitive this battlefield will be.
As the capabilities of large models become stronger and stronger, whether this initial small tool can gradually evolve into a new infrastructure and a computing platform in the AI era is worth