Fei-Fei Li's World Model Open-Sources Rendering Tool Just Now

Bring a billion-scale 3D world into the browser

Just now, Fei-Fei Li's new achievement in the world model is here, and it's open-source!

Just early this morning, Fei-Fei Li's spatial intelligence unicorn company, World Labs, officially announced the launch of "Spark 2.0", bringing the most ambitious 3DGS (3D Gaussian Splatting) world to the Web.

This means that the ultra-large-scale, high-fidelity 3D scenes that originally could only be run on professional devices are now accessible to anyone in the browser.

With the ability to access high-fidelity 3D content on any device, including mobile phones, VR devices, etc., the spatial narrative ability has taken a big step forward.

This renderer specifically designed for 3DGS is also open-source. It can stream and load over 100 million 3DGS data, and will build a brand-new 3D physical world.

The 3D Gaussian Splatting rendering artifact, Spark 2.0, is here

As a Web-oriented dynamic 3D Gaussian Splatting (3DGS) renderer, Spark is integrated with the currently most popular Web 3D framework, Three.js, and runs based on WebGL2. Therefore, as long as there is a browser, whether it's a desktop, iOS, Android, or VR device, it can be used.

When it was released last year, Spark brought many capabilities that other renderers don't have: for example, rendering multiple 3DGS objects in the same scene, real-time editing and relighting, and a shader graph system that allows users to create fully dynamic special effects and animations based on splats.

In Spark 2.0, the team added a Level of Detail (LoD) system, which can stream and render ultra-large-scale 3DGS worlds on any device. When you move in the scene, Spark will automatically adjust the detail level of 3DGS according to the current perspective and load the required data on-demand through the network. Let's first take a look at the display pictures of the following scenarios:

Next, let's look at the technical details.

Spark 2.0 provides a complete set of solutions for preparing, transmitting, and rendering ultra-large-scale 3DGS scenes on the Web for various devices. To address the challenges brought by scaling, it mainly uses three types of graphics and system-level technologies:

Level-of-Detail (LoD): Generate different resolution versions for splats and decide which part to actually render based on the current camera perspective. When the object is far away and the details are difficult to distinguish, reduce the number of splats to be rendered, thereby improving performance.
Progressive Streaming: As the data is downloaded, gradually load the detailed content of 3DGS from coarse to fine, and prioritize loading the data that can most improve the clarity of the current perspective.
Virtual Memory: Allocate a fixed-size memory pool in the GPU as a page table for splats. Automatically swap in and out the required 3DGS data blocks according to the position in the scene. In this way, it can access massive data across multiple 3DGS objects obtained through the network.

Level of Detail (LoD) Gaussian Splat Tree

Spark's LoD design uses a continuous LoD method, where all splats are organized in a hierarchical structure, that is, an LoD splat tree. Spark will select a "section" on this tree and pick splats one by one from it to optimize the details in the current viewport.

In this tree, each internal node is a low-resolution version obtained by merging its child nodes: by synthesizing multiple child splats into a new splat to approximate their overall shape and color. This process recursively goes up until the root node of the tree, that is, a single large splat summarizes the overall shape and color of all splats in the entire object.

Spark 2.0 provides two algorithms for generating the LoD splat tree:

A lightweight and faster algorithm called Tiny-LoD is used by default on the Web.

A higher-quality algorithm called Bhatt-LoD is used by default in the command-line environment.

Both methods are "training-free" and do not rely on reference images or other input data. Instead, they are processed directly based on the 3DGS data itself.

Progressive Streaming

Spark 2.0 defines a new file format, .RAD (Radiance fields), for compressing 3DGS data and supporting random access during network transmission to achieve progressive streaming loading.

In the early stage of loading, the 3DGS object will almost instantly appear as a rough version with about 64,000 splats. Subsequently, the system will obtain data in blocks, prioritize refining the LoD splats that are the roughest and most in need of detail improvement in the current picture, and dynamically adjust the loading priority as the user moves in the scene.

Virtual Memory

Virtual memory is a memory management technology that provides access to massive virtual memory through a small fixed "physical memory". The system maps virtual addresses to fixed-size pages in physical memory through a page table.

Spark 2.0 introduces this mechanism into 3DGS, allocates a memory pool with a fixed capacity (about 16 million splats) on the GPU, and automatically manages the mapping relationship between the 64K splat "pages" in the GPU and the corresponding 64K data blocks in the .RAD file.

The data blocks will be loaded into the free pages according to the LoD traversal order; when the page table is full, if there is new, higher-priority data to be loaded, the old data blocks with lower priority will be replaced according to the "Least Recently Used" (LRU) strategy.

For more technical details, please refer to the original blog post.

Blog address: https://www.worldlabs.ai/blog/spark-2.0

This article is from the WeChat official account "MachineHeart" (ID: almosthuman2014). Edited by Du Wei. Republished by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

Just now, Fei-Fei Li's World Model open-sourced a rendering tool.

The 3D Gaussian Splatting rendering artifact, Spark 2.0, is here