Fei-Fei Li Releases New Achievements in the World Model Just Now

Bring 100 million splats to the browser, enabling mobile VR to open ultra-large-scale 3D worlds.

Less than a week after the release of the new models Marble 1.1 & 1.1-Plus, World Labs, the unicorn in spatial intelligence founded by Fei-Fei Li has announced new news -

Open-source 3D Gaussian Splatting Rendering Engine Spark 2.0.

We have developed a streamable Level-of-Detail (LoD) system for 3D Gaussian Splatting, redefining the possibilities of web 3D rendering.

Spark 2.0 is built on Three.js. Users can stream ultra-large-scale 3D worlds containing over 100 million splats (3D Gaussian points/splatter points) to any device via WebGL2, including desktops, iOS, Android, and VR.

For example, the Coit Tower scene below consists of over 40 million splats and can be fully interacted with in the browser:

There are more 3D scenes to experience on the official blog:

Traditional 3D modeling uses triangles with texture mapping to piece together the surface of an object.

3D Gaussian Splatting uses millions of translucent ellipsoids (i.e., splats). Through the color fusion of these ellipsoids, it presents ultra-realistic details:

What is a splat?

Each splat is defined by five attributes: position, XYZ axis scaling, rotation angle, color, and opacity.

The most common method for rendering splats on the screen is the painter's algorithm.

Just like painting, where you paint the distant objects first and then the nearby ones, you arrange millions of small ellipsoids in order from far to near and stack them layer by layer to calculate the final image in real-time.

This is like a digital pointillist painting, except that it uses 3D Gaussian distribution contours to create the image.

Fei-Fei Li commented on this achievement as soon as it was announced:

Spark 2.0 can now stream and render over 100 million splats on any device! I'm very proud to contribute to the open-source ecosystem of web-based 3D Gaussian Splatting rendering!

Spark System Design

Spark was originally an internal 3D Gaussian Splatting rendering engine developed by World Labs.

At that time, all web rendering engines on the market had obvious shortcomings. For example, some engines could only correctly render one 3D Gaussian Splatting object at a time; some engines could not animate splats dynamically; and some engines were developed based on niche 3D frameworks or used the not-yet-popular WebGPU technology, resulting in limited device compatibility.

This internal rendering engine was showcased in the team's large-scale world model research preview released in 2024 and the early scene demonstration project Lofi Worlds.

In order to enable more developers to create interactive 3D Gaussian Splatting web experiences, the team integrated their technical accumulation and open-sourced a general-purpose 3D Gaussian Splatting rendering engine last year.

It was originally named Forge and was introduced by QbitAI at that time. Later, it was renamed Spark.

Spark is built on the mainstream THREE.js framework. At the same time, the team chose WebGL2 as the technical foundation, which is currently the only 3D web API that can run stably on almost all devices.

The team said that the development of Spark has always been in sync with Marble.

The official blog details the technical aspects of Spark.

The new Spark 2.0 enables preprocessing, streaming loading, and cross-device rendering of ultra-large-scale 3D Gaussian Splatting scenes on the web.

The key lies in the integration of three technologies:

Level-of-Detail (LoD) Technology: Pre-generate splat data at different resolutions and intelligently select the subset of splats to be rendered based on the camera's perspective. For areas that are too far away for the naked eye to distinguish details, reduce the number of splats to be rendered, thereby significantly improving rendering performance.

Progressive Streaming: Adopt a "coarse-to-fine" loading strategy, prioritizing the download of data that can optimize the details of the current view. As the data is gradually downloaded, the scene will be continuously refined, achieving a smooth progressive presentation.

Virtual Memory: Allocate a fixed GPU memory pool for the splat page table and automatically replace 3D Gaussian Splatting data blocks based on the user's position in the scene. With this technology, even massive cross-object splat data obtained over the network can be efficiently accessed.

Let's take a closer look below.

Level-of-Detail

In the field of computer graphics, Level-of-Detail is a classic solution for handling large 3D scenes. It can automatically adjust the rendering details based on the distance between the object and the observer. When the frame rate needs to be increased, the detail level can be reduced; when the user is observing statically, the detail level can be increased to present a more detailed picture.

A typical application of Level-of-Detail is Mipmap texture mapping:

Downsample a texture image step by step to generate a set of texture pyramids with resolutions halved successively, with a single pixel at the top. This technology ensures that texture data matching the screen pixel size can be quickly sampled at any distance.

The implementation of Level-of-Detail can be divided into two major categories: discrete and continuous.

The discrete solution requires pre-generating multiple model versions with different numbers of splats and then switching between different versions for rendering based on the distance between the object's bounding box and the camera. This method has obvious drawbacks:

When the user moves in the scene, the sudden change in model details will produce "popping" artifacts; at the same time, when processing splats in blocks, the boundaries between blocks will also be clearly visible.

Spark uses a continuous Level-of-Detail, the core of which is to build a hierarchical structure for all splats - Level-of-Detail Gaussian splat tree.

Spark will precisely select the subset of splats most suitable for the current viewport along the boundary of the tree, achieving a smooth and seamless transition of details.

Spark 2.0 has two built-in algorithms for generating Level-of-Detail Gaussian splat trees:

Tiny-LoD Algorithm: A fast and lightweight algorithm, used by default for real-time scene generation on the web.

Bhatt-LoD Algorithm: A high-precision algorithm, used by default for offline processing in command-line tools.

Both of these algorithms are training-free solutions that can directly process 3D Gaussian Splatting data without referring to images or other additional input data. In addition, Spark is also compatible with other third-party generation algorithms, such as NanoGS.

Progressive Streaming

Spark 2.0 defines a new file format - .RAD (Radiance Fields). This format can not only effectively compress 3D Gaussian Splatting data but also support random access and streaming loading, enabling progressive and refined rendering of the scene, which is perfectly suitable for network transmission scenarios.

After adopting the RAD format, a 3D Gaussian Splatting object can immediately be presented as a rough version containing 64K splats. Then, the system will prioritize obtaining data blocks for optimizing the details of the visible area based on the user's perspective, achieving dynamic priority adjustment.

The LoD splat tree is essentially a four-dimensional structure: including three spatial dimensions and one detail level dimension.

To achieve progressive and refined rendering through streaming loading, the LoD splats must be divided into data blocks in the RAD file in a reasonable way.

There are many strategies to achieve this goal. The core strategy adopted by Spark is spatial proximity priority:

Recursively divide the three-dimensional space into smaller regions, and each data block will be filled with splats in the corresponding spatial region in a "large-to-small" order, ensuring that each data block can maximize the presentation of details in that region.

Virtual Memory

Virtual memory is a classic memory management technology. By dividing memory pages of a fixed size and building a page table mapping relationship, it uses limited physical memory to simulate a large virtual memory space.

Spark 2.0 innovatively applies this technology to 3D Gaussian Splatting rendering:

It pre-allocates a fixed-size memory pool (with a capacity of 16 million splats) in the GPU and builds a page table mapping mechanism to map the 64K splat "memory pages" in the GPU to the 64K splatter point data blocks in the RAD file one by one.

The rules for loading and replacing data blocks are as follows:

Based on the traversal results of the LoD splat trees, load high-priority data blocks into free GPU memory pages.

When the GPU memory pool is full and a new high-priority data block needs to be loaded, the LRU algorithm will be used to replace the data block in the memory page with the lowest priority.

Spark's design is highly flexible: it supports loading multiple RAD files simultaneously and allows these files to share the same GPU memory pool. For each RAD file, Spark maintains two sets of mapping relationships: the mapping from data blocks to memory pages and the reverse mapping from memory pages to files and data blocks.

When traversing multiple Level-of-Detail splat trees, Spark will uniformly record the access order of data blocks in all files and finally generate a global data block priority list, thereby optimizing the loading and storage across all 3D Gaussian Splatting objects.

The official blog also introduces more technical details. For those interested, the link is here:

https://www.worldlabs.ai/blog/spark-2.0#lod-splat-tree

Reference link: https://x.com/i/trending/2044161909943918948

This article is from the WeChat official account QbitAI. Author: Xifeng. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Fei-Fei Li released new achievements in the world model.

Spark System Design

Level-of-Detail

Progressive Streaming

Virtual Memory