HomeArticle

Wu Di from InMooTech: 3D Generation is the Last Piece of the Puzzle of "Spatial Intelligence" | An Exclusive Interview with 36Kr

耿宸斐2025-01-14 16:30
Has the era of 3D-generated "ChatGPT" arrived?

Author | Geng Chenfei

Editor | Song Wanxin

Cover Source | Provided by the enterprise

Not long ago, 3D large-scale model company Yingmo Technology completed a tens of millions of US dollars Series A financing. This round of financing was led by Meituan Dragonball and ByteDance, with the old shareholders Sequoia China Seed Fund and MiraclePlus continuously following the investment, and Lighthouse Capital served as the exclusive financial advisor.

According to Wu Di, the founder of Yingmo Technology, this round of financing will mainly be used for Yingmo Technology's cutting-edge exploration in 3D large-scale models, and to accelerate the commercialization of the Hyper3D series of products with the 3D generation large-scale model Rodin as the core in the global market.

In 2024, the capital and the market's attention to AI quickly shifted from the technological frontier progress to commercial benefits. It is understood that the large-scale model Rodin launched by Yingmo Technology has only been online for 45 days, and its annual recurring revenue (ARR) has already exceeded 1 million US dollars, becoming a rare commercial success case among current AI startups.

   Figure: Rodin interface; Picture source: Provided by the enterprise

The rapid growth of Rodin reflects the huge potential of the 3D generation market. Especially with the rapid development of emerging application scenarios such as the metaverse, virtual reality, and embodied intelligence, the demand for 3D content is experiencing an explosion.

Comprehensively considering the relevant industry data, the total addressable market (TAM) of the 3D modeling outsourcing market in the fields of games and entertainment, film and animation, architecture and real estate (AEC), manufacturing and product design, and e-commerce and virtual display is as high as 14.9 - 33.5 billion US dollars.

However, the traditional manual creation of 3D models based on geometric modeling is not only time-consuming and labor-intensive, but also has a high technical threshold, which greatly limits the efficiency and scale of 3D content production. In this context, generative AI tools are regarded as the key to improving the efficiency of 3D generation and reducing the threshold of 3D content creation.

But the problem is that although the upsurge of AIGC has swept the world, at present, most of the mainstream AIGC tools we are familiar with are still focused on the generation of 2D content such as images and videos, and the "ChatGPT" moment of 3D generation has not really arrived.

The reason behind this is that, limited by technology, the quality bottleneck of 3D large-scale model generation has not been truly broken through.

Compared with forms such as images and videos, production-level 3D content needs to meet more complex and stricter standards. In addition to the generation speed and quality, it is also necessary to take into account the structure and topological quality of the 3D mesh, the UV map structure, and the texture clarity, etc.

Wu Di frankly stated that although the quality of 3D generation has been raised to a new height, the current 3D generation still cannot meet the needs of practical applications in terms of material availability, topological structure, UV unfolding, etc. These technical shortcomings have become the key issues to be overcome in the forefront of the 3D generation industry.

In this context, the iteration of 3D generation technology is accelerating globally. Last year, the "Controllable 3D Native DiT Generation Framework CLAY" and the "3D Clothing Generation Framework DressCode" proposed by the Yingmo team in cooperation with ShanghaiTech University significantly improved the quality of 3D generation and are considered the basic framework of the new generation of 3D generation.

At the same time, a number of AI-generated 3D products have erupted intensively at home and abroad.

Overseas, Meta launched the text-to-3D model Meta 3D Gen, which can generate 3D materials in 1 second; Google released the basic world model Genie2, which can generate a 3D environment that can be controlled by humans or AI agents and playable based on a single image; NVIDIA released Edify 3D, which supports the direct generation of 4K-level 3D entities and scenes from text prompts or images.

In China, in addition to Yingmo Technology launching the AI 3D model generation product Rodin, Tencent has also released the 3D generation open source model Hunyuan3D-1.0, which can simultaneously support the conversion of text and images into 3D assets, and complete the end-to-end generation in the fastest 10 seconds.

It can be said that global 3D large-scale model players are competing and exerting efforts in secret. From Yellow, Kaedim, and BackFlip bet by a16z, to World Labs of Li Feifei, the technological iteration of AI 3D generation is accelerating towards the critical point of qualitative change.

However, from the market perspective, the user groups currently covered by 3D generation are still concentrated in the B-end fields such as games, video production, e-commerce, and industrial design, and the penetration rate in the C-end market is relatively low.

Wu Di analyzed this phenomenon in a conversation with 36Kr, stating that unlike videos, images, and music, 3D assets cannot be easily shared and disseminated through social media at this stage.

Especially In the current device environment dominated by two-dimensional, the demand for 3D assets from ordinary users is not yet mature, and the consumption scenarios are also relatively limited. This situation has largely affected the popularization and promotion of 3D generation technology in the C-end market.

"But with the continuous development and popularization of consumer-grade products such as 3D printing, AR, and VR, 3D generation is expected to witness an explosive growth in the C-end market." In Wu Di's view, with the maturity of technology, 3D generation can definitely be applied in more and more fields, and even become a part of the daily creation and sharing of ordinary users like text, images, and videos.

Figure: Yingmo team demonstrating Rodin 3D generation at the SIGGRAPH Real-time Live! session; Picture source: Provided by the enterprise

 It is precisely based on the insight into commercialization that Yingmo aims to land in the research and development stage, such as the company's focus on the "Production-Ready" standard.

This standard means that the generated 3D model can be directly adapted to the post-production pipeline and enter the actual production process, thereby converting the user's interest into a real productivity tool and generating actual commercial value.

"In the more distant future, when the metaverse and robots become a part of life, 3D generation will inevitably undergo a real explosion." Wu Di said.

Recently, 36Kr had a conversation with Wu Di, the founder of Yingmo Technology. The following is an edited interview:

36Kr: As an enterprise incubated by a university, much of Yingmo's commercial progress is based on academic research and development. Can you briefly introduce it?

Wu Di: Yingmo was incubated at ShanghaiTech University and has been conducting research and development in the field of 3D modeling since 2016. In 2024, two of our articles related to 3D generation large-scale models received honorary nominations for the best paper at SIGGRAPH, and we were selected for SIGGRAPH Real-time Live! twice. It is the first time that a team from the Chinese mainland has been selected for this project in 50 years.

36Kr: Why can Rodin achieve an ARR of 100w US dollars in just 45 days after its launch?

Wu Di: It is mainly due to the precise market positioning and product strength. In the process of research and development and product development, we have always taken "Production-Ready" as the core indicator of research and development. What we want to do is the research and development of directly usable technology. Moreover, in the process of research and development, we compared all technical routes. Instead of choosing the technical path of "2D to 3D elevation" that was more widely concerned at that time, we chose the "3D native" route that was not favored at that time, that is, the training, supervision, and generation of this model are all carried out in three dimensions. Although this made our product released half a year later than that of our peers, it also made the generation effect of our product achieve a generation-level lead in the industry at that time.

36Kr: In the technical path, Yingmo did not follow the majority to adopt the 2D to 3D elevation technical route, but chose 3D native. What is the consideration behind this?

Wu Di: The 2D to 3D path was widely recognized and most used in the industry at that time, because obtaining 3D information from multi-view 2D images is most intuitive for everyone. At the same time, everyone generally feels that 3D assets are not enough and need to be supplemented with 2D assets. But when we first contacted the target customers, we found that they actually not only need to be able to generate 3D models, but more importantly, the usability of this 3D model should be good enough.

At that time, combined with our years of research experience in the field of graphics, we realized that when 3D data is compressed to 2D, no matter how many perspectives there are, it is impossible to fully express every detail in the 3D structure of the object, which will make it difficult for the 2D to 3D path to meet the usage standards of customers. At the same time, 3D native can retain more information, and the upper limit of generation quality will be higher.

If we had chosen the 2D to 3D elevation path at that time, we might have been able to launch the product quickly, but in the end, we didn't do that. Rodin Gen-1 was released about half a year later than others.

36Kr: But in fact, many enterprises will worry that their products will be released later than their peers. Won't there be concerns when making this choice?

Wu Di: This is indeed a decision to "be the last to act but aim to lead". But at that time, we hardly hesitated because we firmly believe that only the 3D native technical path can meet the standards closer to commercial use. We call it "Production-Ready" internally, which is also the core standard for Yingmo's research and development and product development. To achieve "Production-Ready", in addition to building a model, there are more requirements in 3D expression, topology, UV unfolding, materials, etc. Although Rodin Gen-1 was released later than other products, it is the earliest 3D large-scale model product to cross the "usability" standard line. Of course, even though the quality upper limit of our current generation of models has been improved, there is still a distance from being truly integrated into the process.

36Kr: Where is this gap specifically manifested?

Wu Di: Because in some scenarios, such as when using a model in a game, there are very strict requirements, such as the topological structure, UV unfolding, and some absolute detail amounts that we just mentioned. Even though we have done very well, there is still a distance. Customers still need to modify or even re-create during the use process.

36Kr: What is the company's income distribution?

Wu Di: 70% of our income comes from overseas. We have business layouts in the United States, Europe, Japan, and South Korea, among which the share in Europe and the United States is relatively large, accounting for more than 50%. For example, there is a user from Germany who used our API to make a very attractive product, and this product itself even exceeds an ARR of 500,000 US dollars.

36Kr: Who are Yingmo's core customer groups?

Wu Di: Currently, it is still concentrated in the pan-entertainment and new consumption scenarios such as games, video production, and e-commerce. But 3D generation is also continuously expanding its usage boundaries. 3D printing, embodied intelligence, and industrial design are the core user groups of our future goals. 

36Kr: Can you give a specific case to explain?

Wu Di: Take our cooperation with Bambu Lab as an example. The cooperation between us is actually that Bambu Lab develops a product based on our technology and then opens the product to customers. In Bambu Lab's official printmo project, users only need to upload an image, and the AI can convert it into the style of Pokémon, and then our technology completes the 3D generation, and finally the user's 3D printer prints it into a physical object. This is also our first attempt in the 3D printing field.

36Kr: How do these customer needs affect Yingmo's technological iteration?

Wu Di: Almost all the technological iterations of Yingmo are based on customer needs. For example, the models required in fields such as games and video production are models with regular and reasonable topology and high UV utilization rate, and our next technological research and development will move towards this goal. But when we cross into fields such as industrial design, the needs are completely different. Therefore, from the longest-term perspective, we hope to find a 3D expression method that can unify the world, be adaptable to different scenarios, and be able to achieve a good conversion and adaptation for 3D models under various requirements.

At the end of 2024, we newly launched the Rodin Gen-1.5 version of the model. This upgrade, through a new generation of 3D native expression, comprehensively solves the long-standing problems of thin surfaces and edge sharpness in the industry, making the generated model have sharper and straighter edges. This requirement is particularly important in games, especially in the field of product design. This upgrade also further expands the leading margin of our product.

36Kr: How does Yingmo view the future development of 3D generation?

Wu Di: In the future, there is still a lot of room for 3D generation to be explored. To realize personal creation in AR/VR/virtual worlds, it is necessary to liberate the user's 3D content creation ability. As a world constructed in a three-dimensional space, the future machine's understanding of the world will also necessarily be based on three dimensions. 3D generation will realize the most important part of spatial intelligence.

Follow for more information