Two “code nukes” in one day: OpenAI unveils its first Codex model “focused on real-time collaboration,” and Google releases Gemini Deep Think, with coding power ranking among the world's top 8.
1
OpenAI Unveils New Model Tailored for Real-Time Coding
Last night, OpenAI officially released the research preview version of GPT-5.3-Codex-Spark. This is a streamlined version "trimmed" from the main GPT-5.3-Codex model and is also OpenAI's first model specifically designed for real-time coding scenarios.
In terms of positioning, Codex-Spark is not intended to replace the existing Codex but to address its shortcomings in "instant interaction" scenarios. In the past, Codex was better at handling complex, long-running tasks, while Codex-Spark has a very clear goal - to compress the interaction latency between humans and the model to an almost "imperceptible" level.
This release is also an important phased achievement of the collaboration between OpenAI and the chip startup Cerebras. To reduce its reliance on NVIDIA chips, OpenAI signed an agreement worth over $10 billion last month to use Cerebras' hardware to improve the response speed of its models. Codex-Spark is regarded as the first technological milestone of this collaboration.
Designed for Real-Time: The Core of Codex-Spark is "Speed"
In the official definition, Codex-Spark is a "model specifically designed for real-time use of Codex." It supports targeted editing, logic reshaping, or interface optimization, and users can immediately view the results. Behind this statement lies a reimagining of the interaction mode.
In the traditional AI coding process, developers often have to wait for the model to complete a relatively comprehensive inference and generation before making the next round of adjustments based on the results. This mode is necessary for complex tasks, but in daily development - such as small-scale code modifications, logic refactoring, and interface style adjustments - high latency becomes a bottleneck for efficiency.
Codex-Spark is designed precisely for these high-frequency, fragmented use scenarios that are extremely sensitive to immediate feedback.
According to OpenAI, Codex-Spark demonstrates excellent advantages in performing long-running tasks and can operate autonomously for hours, days, or even weeks without human intervention. With Codex-Spark, Codex now supports both complex long-running tasks and immediate completion of work.
At the time of release, Codex-Spark has a context window of 128k and only supports text. During the research preview period, Codex-Spark will have an independent rate limit, and its usage will not be counted towards the standard rate limit. However, when the demand is high, users may encounter access restrictions or temporary queuing because it is necessary to balance the reliability for different users.
OpenAI also said that Codex-Spark is optimized for interactive work, where latency is as important as intelligence. Users can collaborate with the model in real-time, interrupt or redirect it at any time during its operation, and iterate quickly to obtain nearly instant responses. Since Codex-Spark focuses on speed, its default working mode is very lightweight: it only performs minimal, targeted editing and will not automatically run tests unless the user actively requests it.
How about its Coding Ability?
In terms of evaluation, as a small model, Codex-Spark still performs outstandingly in multiple software engineering benchmark tests.
Codex-Spark is specifically optimized for fast inference. In the SWE-Bench Pro and Terminal-Bench 2.0, two benchmark tests for evaluating the software engineering capabilities of agents, GPT-5.3-Codex-Spark performs excellently, and the time required to complete tasks is much shorter than that of GPT-5.3-Codex.
The estimated duration is the sum of the following: (1) output generation time (number of output tokens ÷ sampling speed), (2) prefill time (number of prefill tokens ÷ prefill speed), (3) total tool execution time, and (4) total network overhead.
So, how is such programming performance achieved? During the training of Codex-Spark, OpenAI realized that the speed of the model is only part of achieving real-time collaboration - it is also necessary to reduce the latency of the entire request-response process. Therefore, the R & D team implemented end-to-end latency optimization in the framework, which will benefit all models.
During the R & D process of Codex-Spark, OpenAI realized a key issue: the speed of the model itself is only part of the real-time experience.
What really affects the user experience is the entire end-to-end path from when the client sends a request to when the first visible token appears and then to continuous generation.
Therefore, OpenAI carried out system-level optimization of Codex's underlying architecture, including: simplifying the process from the client to the server and the server's return of responses, rewriting the critical path in the inference stack, improving the session initialization mechanism, introducing persistent WebSocket connections, and optimizing the response API.
The quantitative results brought about by these changes include:
- The single-round trip overhead between the client and the server is reduced by 80%
- The processing overhead for each token is reduced by 30%
- The time for the first token to appear is shortened by 50%
Codex-Spark enables the WebSocket path by default, and this communication method will gradually become the default configuration for all models in the future.
This confirms the core positioning of Codex-Spark: it wins not through more complex inference chains but by improving overall efficiency through faster feedback loops.
Developers Care About More Than Just "Faster"
After OpenAI released the research preview version of Codex-Spark for real-time coding scenarios, discussions quickly ensued on X. Compared with the "ultra-low latency" and "instant collaboration experience" emphasized by the official, the focus of the community is significantly more concentrated on one question: Can the model maintain sufficient inference depth and code quality while significantly improving speed?
Judging from the current discussions, the feedback around Codex-Spark is not one-sided but presents several representative voices.
An X user said:
"The real problem is not just speed. The key is whether it can maintain quality under pressure. If the latency is reduced without sacrificing inference depth, it will change the daily workflow."
Some users accused OpenAI of focusing too much on coding performance and neglecting other aspects.
"You've put all your attention on code and those ads that affect the user experience, but this isn't what the vast majority of daily users really care about. You ignore the voice of #Keep4o (keep the 4o model), just like we ignore your lousy new products. Even if you pretend not to see it, we won't stop."
"Faster speed" is certainly good, but the real question is: Can it maintain code quality while being fast?
Some users pointed out that fast but flawed code is useless. Slow but correct code is what matters. They are looking forward to seeing if Spark can excel in both aspects.
Many users expressed similar views, saying that what's the point of just being fast? It should at least reach the level of the GPT 5.3 codec. "Otherwise, you'll soon end up with nothing."
2
Google Updates Gemini 3 Deep Think to Tackle Real Scientific Research Challenges
While OpenAI was releasing its new model, Google was not idle.
Google updated its most research-oriented inference model, Gemini 3 Deep Think, last night. This update is not a regular iteration of capabilities but a systematic upgrade specifically targeting modern scientific research, engineering modeling, and complex reasoning problems.
Notably, Shunyu Yao, a well - known researcher from the Department of Physics at Tsinghua University who joined Google DeepMind last September, is also one of the core participants in the new Deep Think model.
From the official positioning, the goal of Gemini 3 Deep Think is not a smoother conversation experience but to solve the "tough problems" that have long troubled researchers and engineers:
These problems often lack a clear solution path and do not have a single correct answer. The data itself is often incomplete, noisy, and even contradictory.
Google said that this update was completed based on long - term cooperation with a large number of scientists and researchers, and the design concept of the model is clearly tilted towards real scientific research and engineering practice, rather than just a display of abstract reasoning ability.
The new Deep Think is now available in the Gemini application for Google AI Ultra subscribers. In addition, for the first time, we are opening the use of Deep Think to some researchers, engineers, and enterprises through the Gemini API.
Access address for Deep Think: https://forms.gle/eEF5natXTQimPhYH9
Here is a demonstration of how early testers used the latest version of Deep Think:
Lisa Carboni, a mathematician at Rutgers University, is dedicated to researching the mathematical structures required in the field of high - energy physics to bridge the gap between Einstein's theory of gravity and quantum mechanics. Due to the lack of a large amount of training data in this field, she used Deep Think technology to review a highly specialized mathematical paper. Deep Think successfully identified a subtle logical flaw that had not been detected in previous manual peer reviews.
At Duke University, the Wang Laboratory used Deep Think technology to optimize the preparation method for complex crystal growth in order to discover new semiconductor materials. Deep Think successfully designed a process capable of growing films with a thickness greater than 100 microns, achieving a precise goal that was difficult to reach with previous methods.
Anupam Pathak, the R & D director of Google's Platform and Devices division and former CEO of Liftware, tested the new Deep Think to accelerate the design of physical components.
Enhancing Reasoning Ability with the Rigor of Mathematics and Algorithms
In the previous evaluation system for large models, reasoning ability was often measured through standardized questions: well - defined questions, clear goals, and a single evaluation method.
What Gemini 3 Deep Think tries to address is another type of problem - research - type problems.
This type of problem usually has several characteristics:
- No fixed template
- No clear steps
- Complex and incomplete data sources
- The problem - solving process itself may require continuous modification of assumptions
Google emphasized in its technical blog that the focus of the Deep Think update is to combine in - depth scientific knowledge with common sense and methodologies in engineering practice, so that the model is no longer limited to the theoretical level but is closer to the real - world research process.
In terms of improving reasoning ability, mathematics and algorithms remain the core means for Gemini 3 Deep Think.
As early as last year, Google demonstrated a specially customized version of Deep Think, which achieved breakthroughs in multiple high - difficulty reasoning tasks and reached the gold - medal level in international mathematics and programming competitions. This update continues to move forward in this direction.
According to the data disclosed by Google, the upgraded Deep Think has refreshed the current level in multiple rigorous academic benchmark tests, including:
In the Humanity’s Last Exam, without the help of any external tools, it achieved a score of 48.4%. This benchmark is considered a high - difficulty test specifically designed to test the ability limits of cutting - edge models.
In the ARC - AGI - 2 test, Deep Think achieved a score of 84.6% and has been officially verified by the ARC Prize Foundation.
On the competitive programming platform Codeforces, the model reached a rating range of 3455 Elo, which is an extremely high level on this platform.
Judging from the score of Gemini Deep Think 3455, its coding ability ranks eighth in the world.
In the evaluation of the 2025 International Mathematical Olympiad, its overall performance reached the gold - medal level.