OpenAI Unveils OWL Architecture of Atlas Browser: The Highest

OWL (OpenAI’s Web Layer), its core concept is: to let the browser process of Chromium run independently outside the main application process of Atlas.

“Just another Chromium wrapper?”

Faced with the AI browser Atlas released by OpenAI last week, this might be the first reaction of many people. Refer to the report "Just now, OpenAI released the AI browser ChatGPT Atlas, based on Chromium". But today, OpenAI officially "countered" this statement with a technical blog: We did "wrap" it, but it's completely different from others.

Although there are also news about the Sora character guest - starring function and the GPT - 5 intelligent agent for finding and fixing security vulnerabilities today, the focus of this article is to deeply explore the "soul" behind Atlas - the OWL architecture. Let's see how OpenAI tamed Chromium and transformed it from a simple browser "skin - change" to an "architecture restructuring".

Based on Chromium

OpenAI said that to make ChatGPT a true co - pilot for web browsing, the underlying architecture of the browser must be completely restructured: separating Atlas from the Chromium runtime. This means developing a brand - new way to integrate Chromium to meet the following three key goals:

Second - level startup speed

Still run smoothly when opening more tabs

Lay a solid foundation for the Agent scenarios

OpenAI emphasized that Chromium is a natural building block. It can provide an advanced web engine, a perfect security model, first - class performance, and excellent web compatibility. More importantly, it is continuously improved by the global developer community. Therefore, it has become the most commonly used underlying engine for modern desktop browsers.

Redefining the browser experience

Although based on Chromium, OpenAI naturally emphasizes its own design, including introducing rich animations and visual effects in features such as the "Agent mode".

This requires the engineering team to use the most modern native frameworks (such as SwiftUI, AppKit, and Metal) instead of simply "changing the skin" of the open - source Chromium interface.

As a result, OpenAI said: "Atlas's user interface is almost a completely new experience rebuilt from scratch."

In addition, to achieve the goals of fast startup and supporting the simultaneous operation of hundreds of tabs without frame drops, some optimizations are needed for Chromium. After all, its default architecture is very "stubborn" in aspects such as the startup process, thread model, and tab management.

OpenAI said: "We considered making major modifications to Chromium, but that would make subsequent updates complex and fragile. To maintain the development speed, we chose a more ingenious way - redesigning the way to integrate Chromium."

One of their key technical standards is: not only to speed up the rhythm of feature experiments, iterations, and launches, but also to preserve OpenAI's engineering culture - being able to launch code on the first day. "Each new engineer has to submit and merge a small change on the afternoon of their first day at work. Even if it takes several hours to compile the Chromium source code, we have to ensure that this tradition continues."

OpenAI's solution: OWL

To solve these challenges, OpenAI built a new architecture layer called OWL (OpenAI’s Web Layer).

OWL is OpenAI's way of integrating Chromium, and its core concept is: let the Chromium browser process run independently outside the main Atlas application process.

You can understand it like this: Chromium revolutionized the browser architecture by putting each tab into an independent process; while OpenAI took it a step further - separating the entire Chromium from the main application process and putting it into an independent service layer.

This method has many benefits:

A simpler and more modern application: Atlas is mainly built using SwiftUI and AppKit, with a unified language, unified technology stack, and clean code.

Faster startup: Chromium will be loaded asynchronously in the background, and Atlas will display the screen almost instantly.

Isolate crashes and freezes: Even if there is a problem with Chromium, Atlas will not crash.

Fewer merge conflicts: OpenAI modifies very little Chromium code, making it easy to maintain.

Faster development rhythm: Most engineers do not need to compile Chromium locally. OWL is distributed in the form of pre - built binaries internally, and it only takes a few minutes to build Atlas.

Therefore, even new employees can easily submit changes on the afternoon of their first day.

How OWL works

From a high - level perspective, the Atlas browser is the OWL client, and the Chromium browser process is the OWL host. The two communicate through Mojo (Chromium's inter - process communication system). OpenAI wrote Mojo bindings for Swift (and even TypeScript) to enable Swift applications to directly call the host - side interfaces.

The OWL client library provides a set of concise Swift APIs to abstract the key functions of the host layer:

Session: Global configuration and control

Profile: Manage user browsing data

WebView: Rendering, input, navigation, zooming, etc.

WebContentRenderer: Pass input events to the rendering pipeline

LayerHost/Client: Exchange compositing information between the UI and Chromium

In addition, service endpoints such as bookmarks, downloads, extensions, and autofill are also provided.

Rendering: Passing pixels across processes

The WebView shares a compositing container in the client application, and the content of different tabs will be dynamically exchanged and displayed. On the Chromium side, this corresponds to a gfx::AcceleratedWidget, supported by the underlying CALayer.

OpenAI's design is to expose the context ID of this layer to the client, which is embedded by NSView through the private CALayerHost API.

Independent pop - ups such as <select> dropdown boxes or color pickers also use the same mechanism. OWL will keep the view geometry synchronized with Chromium to ensure that the GPU compositor outputs content with the correct resolution and scale.

OpenAI also uses this mechanism to directly project a part of the Chromium native interface into Atlas, such as the permission prompt box, so as to quickly implement the functional prototype without completely rewriting it.

Input events: Capture and forwarding

Normally, the Chromium UI converts macOS's NSEvent into Blink's WebInputEvent and then passes it to the renderer.

But since Chromium runs in the background in OWL, OpenAI performs event translation in the Swift client itself and then sends the converted events to Chromium.

If a web page does not handle an event, the system will return the event to the client. OpenAI will regenerate the NSEvent and let other parts of Atlas take over the input processing.

Agent mode: A special case

Atlas's agent browsing poses additional challenges to rendering, input, and data storage. OpenAI's computer use model requires a complete image of the screen as input.

But some UIs (such as the <select> dropdown box) are rendered separately outside the tab. In Agent mode, OpenAI will recompose these pop - ups as part of the main page, allowing the model to see the complete context in one frame.

Input events also follow the security principle: The events generated by the Agent are directly sent to the renderer without going through the privileged browser layer to ensure sandbox isolation. For example, prevent automated events from triggering non - web behaviors such as system shortcuts.

In addition, Agent browsing can run in a temporary "logged - out" context. It does not use the user's private mode configuration but creates an independent in - memory storage with Chromium's StoragePartition. Each Agent session is brand new, and all cookies and data will be cleared after the session ends. Users can run multiple non - interfering "logged - out" Agent sessions simultaneously.

Conclusion

OpenAI finally reiterated the role of Chromium again: "None of this would have been possible without the excellent contributions of the global Chromium community. OWL has opened up a new direction on this basis: decoupling the engine from the application, combining the top - level web platform with modern native frameworks to create a faster and more flexible architecture."

What do you think of this?

Reference link

https://openai.com/index/building - chatgpt - atlas/

This article is from the WeChat official account "Machine Intelligence", edited by Panda, and published by 36Kr with authorization.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

The highest level of "wrapper": OpenAI reveals the OWL architecture of the Atlas browser