Early-stage Project: Product Expert from ByteDance, OPPO, and OnePlus Integrates Hardware and Software Core to Enable AI to Understand the World

Its Visual Agent ranked first in the MMMU-Pro (Multimodal Understanding and Reasoning) evaluation.

Author | Ou Xue

Editor | Yuan Silai

In the past two years, mainstream AI interaction has relied on input boxes - users first organize their questions in language, and then the AI provides answers.

This "conversational" interaction is extremely efficient but goes against the most natural cognitive path of humans. When we perceive the world, we never start from "asking questions" but from "seeing".

A young company named Chance AI, founded in 2025, is trying to solve this problem. It has proposed another form of AI product - Visual Agent, aiming to transform AI from a tool for "answering questions" into a system that can "understand the world".

Zeng Xi, the founder of Chance AI, has a background spanning both academia and industry. He obtained his doctorate from the University of Barcelona, with his research focusing on cognitive science and contemporary art, and he is concerned about how humans understand the world through vision. After graduation, he entered the consumer electronics industry, where he was in charge of product and design at OnePlus and OPPO. Subsequently, he joined ByteDance and participated in the exploration of AI products such as Doubao from scratch in the Flow department.

His previous work experience made him realize a structural problem: large language models are good at "generating language" and "answering questions", but there is still a significant lack of support from AI in how humans form judgments based on vision in the real world.

He left ByteDance in January 2025, registered the company in March, officially started operations in July, and launched its first product, Chance AI, in September.

Chance AI uses the camera as the core interaction entry. Users can open it to take pictures of the things in front of them, and the AI conducts visual reasoning in real - time.

In actual use, it is widely used for interpreting artworks during gallery visits, analyzing outfits while shopping, identifying the versions of trading cards and trendy collectibles, detecting skin conditions, as well as daily explorations such as photographing menus, identifying plants, and observing pets. Zeng Xi revealed to Yingke that currently, most usage scenarios are discovered spontaneously by users rather than pre - set by the team.

Chance AI is used to interpret artworks during gallery visits (Image source/Enterprise)

Technically, its Visual Agent scored 86.07 in the MMMU - Pro evaluation, an authoritative benchmark for measuring the visual reasoning ability of multimodal models, ranking first in the world.

In March this year, Chance AI became the official AI partner of Art Central. This is the first time that AI has entered the "viewing process" of a large - scale international art exhibition. On - site, when visitors point the camera at artworks, the AI participates in the viewing process in real - time and communicates while observing.

AI enters the "viewing process" of a large - scale global art exhibition for the first time (Image source/Enterprise)

What supports this experience is its newly launched "Live Mode" - a real - time visual interaction system. Different from the existing real - time recognition in the industry, its Live Mode can integrate various visual intelligence capabilities such as knowledge retrieval, content comparison, context understanding, and multi - ability scheduling into a complete and real - time responsive intelligent agent in real - time visual scenarios.

Usage scenarios of the Live Mode launched by Chance AI (Image source/Enterprise)

As of now, the total global downloads of Chance AI have exceeded 200,000.

Zeng Xi revealed that the company has hardly carried out any market promotion so far, and all growth comes from organic spread. Its core users are young people under 25 years old.

When talking about future plans, Zeng Xi said that the most important goal in 2026 is to expand on a larger scale among the student group in North America. However, this is not market promotion in the traditional sense but rather in - depth involvement in user communities to discover the real usage scenarios of young people.

Different from pure AI application - layer entrepreneurs, this serial entrepreneur with a background in large hardware companies has included "integration of hardware and software" in the product roadmap from the very beginning. Zeng Xi believes that the future hardware form suitable for their products should be a camera that can capture all of people's visual information.

We interviewed Zeng Xi to discuss his views on industry development and technological routes.

The following is an excerpt from the interview:

Yingke: Many current AI products have visual capabilities. What is the differentiating advantage of Chance AI?

Zeng Xi: I think it will be difficult for a single AI company to dominate the market in the future; the market will be highly segmented. We choose to focus on vision because it is not yet in the spotlight but will become the mainstream in the future.

Our moat is not the strength of the model but how quickly we can interact with real users. Currently, less than 20 - 30% of our functions are self - designed, and the rest are suggested by users - taking pictures of skin, looking at menus, identifying cards, making comments... To achieve these, you must be close enough to users. We once met the needs of a trendy culture club at New York University within six hours, enabling them to identify specific cards. This is something Google or OpenAI cannot do.

Yingke: Currently, the APP has no paid content. What is the future business model?

Zeng Xi: We currently have three directions. First, subscription for advanced functions, which is the plan for this year. We have good engineering capabilities and low costs, so there is no urgent need to charge. Second, hardware authorization. We are in talks with some hardware manufacturers. They are investing in hardware and have little time to refine products above the model layer, which is our forte. Third, advertising recommendation, but we will be very cautious. For us, the top priority is to cultivate user habits first - to make users habitually take a picture of whatever they see. If you become an entry - level product, business opportunities will naturally emerge.

Yingke: Will you develop your own hardware? When approximately?

Zeng Xi: It depends on the industry situation. Once we judge that the supply chain is mature, we will definitely dive in without hesitation. But more importantly, we won't develop hardware just for the sake of it.

Our core lies in visual reasoning ability, and the Live Mode is just a manifestation of this ability. We believe that the future portable AI hardware must be a camera that can capture everything your eyes are seeing and then provide valuable next - step actions. This is the fundamental difference between us and all existing products - our starting point doesn't have an input box; it's "seeing".

This article is originally produced by「欧雪」， For reprint or content cooperation, please click Reprint Instructions ；Unauthorized reprint will be held accountable.

Early-stage Project | A product expert with backgrounds from ByteDance, OPPO, and OnePlus integrates hardware and software at the core to enable AI to understand the world