Fei-Fei Li Releases Latest Achievement: Mobile Phones Can Now Run 3D Worlds with Hundreds of Millions of Particles

One line of connection, delivering the world.

Taking a series of photos to generate an interactive 3D world is no longer a novel topic. However, the question is how to fit a large 3D world into an ordinary person's mobile browser.

Just now, World Labs, an AI world model company under Fei-Fei Li, released and open-sourced a latest achievement: Spark 2.0.

This dynamic 3D Gaussian Splatting (3DGS) rendering engine, specifically designed for the web, is gradually making it a reality to smoothly run large 3D scenes with hundreds of millions of particles in the browser of any device.

Why is it so difficult to fit a 3D world with hundreds of millions of particles into a mobile phone?

You may have heard of "3D Gaussian Splatting", abbreviated as 3DGS. In a nutshell, it is a technology that transforms real - world scenes into 3D interactive content. Without the need for traditional modeling, you can generate a 3D scene just by taking a series of photos.

Different from traditional 3D modeling that uses triangular meshes, 3DGS employs millions of semi - transparent colored ellipsoids, each called a "splat".

The left side uses texture - mapped triangular meshes, while the right side uses Gaussian splats to render the same object.

Each splat is not just a simple point but an ellipsoid with a complete "personality". It records its position in space, the lengths of the radii of its three axes, the orientation angle, RGB color values, and transparency.

The most crucial property is transparency. It determines the influence weight of a splat on its surroundings when overlapping. If you plot the spatial density of a single splat, you'll get a Gaussian curve: the center is the most solid, gradually blurring outwards, and the edges naturally blend into the background.

It is this "soft - boundary" overlapping method that allows millions of splats to stack together, presenting the granularity of a brick wall, the translucency of leaves, and the reflection of glass, rather than the plastic - like texture formed by a bunch of hard - edged triangles.

The effect is good, and the amount of information is also large. A high - quality 3DGS scanned scene often has tens of millions of splats, and the file size can easily exceed 1 GB.

This brings a tricky problem: The upper limit for an ordinary mobile phone to render smoothly is about 1 million to 5 million splats, which is an order of magnitude lower than the tens of millions of splats in a high - quality scan.

Existing renderers also cannot correctly render multiple scanned objects in the same scene. Either they can only render one object at a time, or the sorting goes wrong, and the objects "stick" to each other's surfaces, looking messy.

Thus, Spark came into being. According to the official blog, Spark was initially an internal tool used by World Labs. World Labs needed to display the 3DGS - generated world on the web, but all the renderers on the market had flaws. Some could only render a single object, some relied on WebGPU (which many devices do not support), and some did not support dynamic animations.

After several comparisons, they decided to create their own renderer.

They chose THREE.js, the most popular 3D framework on the web, which runs on top of WebGL2 and covers almost all modern devices. The core rendering logic consists of three steps: first, generate a global splat list across objects on the GPU; then, sort them uniformly from far to near; finally, render them all at once.

"Global sorting" may sound ordinary, but it is the key to allowing multiple 3DGS objects to coexist in the same scene without intersecting. Based on this, Spark also opens up a GPU processing pipeline. Users can perform custom operations such as recoloring, adjusting transparency, and creating dynamic animations for each splat, which can be implemented by writing GLSL code or connecting node graphs like in Blender.

The 1.0 version solved the problem of multi - object rendering, but a scene with 40 million splats was still an insurmountable hurdle. This led to the birth of Spark 2.0.

Make the device always render only the "sufficient" amount of information

The core of Spark 2.0 is a combination of three technologies: Level of Detail (LoD), progressive streaming loading, and virtual memory management. Each of these technologies has precedents, but it is their combined power that enables the smooth rendering of a world with hundreds of millions of splats in a mobile browser.

1. Continuous LoD Tree: Use resources where they matter most

LoD (Level of Detail) is already a well - established concept in the gaming industry. For trees close by, thousands of triangles are used, while for distant trees, only dozens are needed, allocating computing power according to demand. The Nanite system in Unreal Engine follows the same principle, linking triangle details to the viewing distance and automatically scaling.

Spark 2.0 applies the same logic to splats more thoroughly.

Discrete switching between several versions can easily cause "jumps" in the image. Spark constructs a complete "continuous LoD tree". Each internal node is an approximate version after the fusion of its child nodes' splats, converging layer by layer upwards until reaching the root node, which is the single splat representing the coarsest - grained version of the entire scene.

During rendering, the system dynamically makes a cut on this tree according to the current viewing angle. The areas close to the viewing angle take the bottom - level details, while the distant areas take the high - level coarse - grained details.

The entire process is constrained by a fixed splat budget. It is about 500,000 for mobile devices and about 2.5 million for desktop devices. It doesn't matter how many splats there are in the scene. The actual number sent to the GPU always remains within the budget, ensuring a stable frame rate.

In addition, Spark also introduces "Foveated Rendering", which allocates more budget to the direction you are looking at, automatically narrowing the details in the peripheral and back areas. This effect is particularly obvious on VR devices, which usually requires eye - tracking technology. Spark uses a fixed conical area for approximate simulation, and it also works.

2. New .RAD Format: "Stream" loading like swiping short videos

The problem of rendering efficiency is solved, but the problem of transmission efficiency is equally tricky. There are two existing 3DGS file formats: .PLY and .SPZ. The former is uncompressed. A 10 - million - splat file can be as large as 2.3 GB. Although it can be displayed while downloading, the file size is too large.

The latter uses columnar storage and Gzip compression, compressing the same amount of data to 200 - 250 MB. However, the entire file must be downloaded before it can be displayed because the attributes of each splat are scattered throughout the file, and without any part, the complete content cannot be pieced together.

To have the best of both worlds, Spark 2.0 designed a new format, .RAD (RADiance fields). It cuts the splat data into independent blocks of 64K splats each, compresses them separately, and records the byte offset positions of all blocks in the file header, supporting random access to any block.

The first block is always the 64K splats representing the coarsest - grained version of the entire scene. Once downloaded, the outline of the scene becomes immediately visible. After that, the system determines which areas need to be refined based on the viewing angle and prioritizes pulling the corresponding data blocks. The picture gradually evolves from blurry to detailed. Three parallel Web Worker threads pull and decode data in the background synchronously, so the details follow you wherever you go.

3. GPU Virtual Memory: Fit an infinite space into limited video memory

Streaming loading solves the bandwidth problem, but the hard upper limit of GPU memory is still a tough nut to crack. Mobile browsers have strict constraints on video memory and cannot hold an entire scene with 40 million splats.

Spark 2.0 borrows the virtual memory mechanism of the operating system to address this issue.

The system allocates a fixed memory pool on the GPU, with an upper limit of 16 million splats. A page table is used to record which .RAD data blocks are currently resident on the GPU. When a certain area needs to be rendered, the corresponding block is loaded. When the memory is full, the oldest unused block is swapped out.

Thanks to this mechanism, 3DGS scenes from different sources can share the same memory pool. In theory, as long as the network speed is sufficient, countless independent scanned scenes can be seamlessly stitched together to form an infinitely large world.

One link, deliver the world

After the release of Spark 2.0, Fei - Fei Li publicly stated immediately, "Spark 2.0 can now smoothly play more than 100 million splat objects on any device. I'm very honored to contribute to the open - source ecosystem of Web - based 3DGS rendering."

She didn't emphasize "what has been achieved" but focused on "what has been contributed to the open - source community". This statement is thought - provoking. 3DGS rendering is a field that is still evolving rapidly. One company alone cannot drive the entire ecosystem, and open - source is the right way to accelerate this process.

From the existing implementation cases, developers are indeed making various attempts with Spark. James C. Kane, the winner of the Webby Award, independently developed a multiplayer spaceship shooting game called Starspeed.

The entire game scene is constructed with more than 100 million splats, accompanied by 10 pieces of synth - wave style original music. All are streamed in the .RAD format through the browser, and the amazing sci - fi environment can run directly on the web page.

Attached experience link 🔗: https://starspeed.game/

In the art direction, there is Hugues Bruyère's "Dormant Memories". He is the co - founder of the interactive experience studio Dpt. This series juxtaposes 3D scans of real locations with imagined spaces to create an interactive environment for exploration. The boundary between reality and fiction becomes blurred in the granularity of splats, which unexpectedly fits the theme.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Fei-Fei Li released her latest achievement. Now, a mobile phone can also run a 3D world with hundreds of millions of particles.

Why is it so difficult to fit a 3D world with hundreds of millions of particles into a mobile phone?

Make the device always render only the "sufficient" amount of information

One link, deliver the world