HomeArticle

Microsoft Build Conference: All in One Article - 9 Self-Developed Models, Windows Version of Lobster, "Dream Machine", Jensen Huang Praised Highly

智东西2026-06-03 21:32
Microsoft is paving the way and building bridges for the era of intelligent agents.

According to a report by Zhidx on June 3rd, early this morning at the Microsoft Build 2026 Developer Conference, Microsoft unveiled over 20 significant updates, including 9 self-developed models, new PC products in collaboration with NVIDIA, the Windows version of "Lobster", and over 10 agent applications and development tools.

Jensen Huang, the founder and CEO of NVIDIA, connected remotely from Taipei. Late at night, he had a conversation with Satya Nadella, the chairman and CEO of Microsoft. Huang said that the AI infrastructure has entered the agent era, and Microsoft and NVIDIA are jointly defining the next-generation AI computing platform.

"If we can maximize computing power and memory, what kind of 'dream machine' for developers can we create?"

Nadella introduced Microsoft's significant PC product with a question - the Surface RTX Spark Dev Box desktop workstation equipped with NVIDIA's RTX Spark superchip. It has an AI computing power of 1 PFLOPS and can run large models with 120 billion parameters locally.

Surface RTX Spark Dev Box

In addition to the new PC, Microsoft also showcased two new AI hardware devices designed for the multi-agent world: one is an AI wearable device that can be worn on the chest, and the other is an AI desktop companion. They are mainly used to access agents in a low-cost, plug-and-play, and responsive manner.

It's worth mentioning that after supporting OpenAI for seven years, Microsoft's AI super-intelligent team finally made a move and launched 7 self-developed large models: The first flagship inference model, MAI-Thinking-1, has performance comparable to Claude Opus 4.6, and the image model MAI-Image-2.5 scored higher than Google's Nano Banana 2 in blind tests.

Microsoft's first autonomous agent, Microsoft Scout, also made its official debut. It is built on OpenClaw, adopts an enterprise-level security architecture, and can directly operate computers and daily software. Nadella called it the enterprise-level "Lobster".

There is also the Windows version of OpenClaw, and its preview version has been released. Microsoft collaborated with OpenClaw to combine OpenClaw with Microsoft MXC to help IT administrators easily protect agent security.

In addition, Microsoft released the next-generation quantum computing chip, Majorana 2. The reliability of its qubits is 1000 times that of the previous generation, with an average lifespan of 20 seconds and sometimes even exceeding 1 minute. Microsoft expects to achieve a truly scalable quantum computer by 2029.

01. A Mini Workstation with 128GB of Unified Memory and a "Developer-Tuned" Windows 11

The collaboration between Microsoft and NVIDIA was a highlight at this Build Conference. The Surface Laptop Ultra launched by Microsoft earlier this week was the first to feature NVIDIA's RTX Spark superchip and is expected to be available this fall.

For developers, Microsoft created the Surface RTX Spark Dev Box. It is a desktop workstation product that uses NVIDIA's RTX Spark superchip, has an AI computing power of 1 Petaflop, 20 CPU cores, and 128GB of unified memory. It will also be launched this fall.

Microsoft said that the Surface RTX Spark Dev Box can support running models with 120 billion parameters and 1 million contexts locally or fine-tuning models.

Heat dissipation is particularly crucial for such high-performance desktop devices. The Surface RTX Spark Dev Box uses an aluminum casing that also serves as a heat sink.

The Surface RTX Spark Dev Box comes with Windows 11 optimized for developers, pre-configured with all necessary development tools, such as Visual Studio Code, GitHub Copilot embedded in the Windows Terminal, WSL, and PowerShell 7.

Its settings have also been adjusted for development needs, with no news push, no pop-up widgets, no notifications, and the dark mode is enabled by default. This setting can be deployed with one click through the Windows Developer Config project on GitHub.

Windows 11 Optimized for Developers

In terms of security, the Surface RTX Spark Dev Box has a Secure Core PC architecture, BitLocker encryption, and Microsoft Defender protection. It also provides Entra ID and Intune functions for enterprises, enabling large-scale management.

02. Collaborating with Qualcomm and MediaTek to Launch Reference Designs, Agents' Hardware to Enter Work Scenarios Directly

Steven Bathiche, the global vice president of Microsoft and the head of the Applied Sciences Group, shared Microsoft's exploration in new hardware - Project Solara and released two hardware reference designs.

Bathiche believes that the next-generation computing device is not necessarily a computer, a mobile phone, or a pair of glasses, but a system composed of multiple devices working together. Users' agents will appear at the most appropriate time and place according to the scenario and complete tasks in different forms.

Project Solara is an agent-first device platform built on three foundations: first, Microsoft's device ecosystem for enterprise-level deployment; second, a new interaction model driven by agents; and finally, the ability to expand, allowing enterprises to connect their own agents.

At the Build Conference, Microsoft showcased two types of reference devices for the first time.

The first type is a fixed device.

This is an agent terminal designed for the desk scenario. It uses the MediaTek platform and supports Windows Hello enterprise identity authentication. Users can log in securely and directly access their agent services when approaching the device.

The device will continuously provide information and suggestions based on the user's work context. For example, it can remind users of the most important to-do items of the day, assist in planning work processes, and even help users directly delegate tasks to agents for execution.

Meanwhile, this device can also seamlessly collaborate with Windows PCs and Windows 365 cloud computers, becoming a supplement to the existing office environment.

Compared with fixed devices, the second form is more flexible, a prototype of a wearable device similar to a digital work badge.

This device is built on the Qualcomm Snapdragon platform and is smaller than traditional mobile terminals. In the on-site demonstration, after Bathiche completed identity verification with his fingerprint, he directly called his personal agent to perform tasks. He pressed the recording button, and the side camera of the device started to capture the on-site footage. Then he gave instructions to the agent: organize the on-site materials of the Build Conference, generate content, and send it to the team for review.

Subsequently, the entire process was automatically completed by the agent, including content collection, organization, archiving, and distribution.

Bathiche emphasized that the value of such devices lies not in the hardware itself but in the ability to bring agent capabilities directly to the place where work occurs.

Taking the medical scenario as an example, after nurses wear the device, they can interact with agents in real-time through voice, automatically complete medical record keeping, voice transcription, speaker identification, and nursing record organization. The same capabilities are also applicable to many industries such as retail, manufacturing, financial services, and legal services.

These devices are a type of reference design. Enterprises can adjust the appearance, screen size, sensor configuration, and even input methods on the basis of the same software architecture and load their own agent systems to quickly build dedicated devices for specific scenarios.

Microsoft revealed that several enterprises have started to participate in related explorations. Companies such as Best Buy, CVS Health, Levi's, and Target are all researching how to introduce agent devices into their business processes.

Nadella concluded that the most important significance of Project Solara is not to launch a new platform but to redefine a set of platform rules, allowing developers and enterprises to freely imagine where agents should exist and in what form they will appear.

03. MAI Launches 7 Self-Developed Models, Flagship Inference Model Compares to Claude Opus 4.6

At this Build Conference, Microsoft's AI research department, MAI, concentrated on releasing 7 new models, covering various categories such as flagship inference models, programming models, image generation models, speech recognition models, and voice models.

Microsoft's first inference model, MAI-Thinking-1, uses a Mixture of Experts (MoE) architecture, has 35 billion active parameters, approximately 1 trillion, and a 256K context window.

Mustafa Suleyman, the CEO of Microsoft AI, said that in comparison tests with mainstream industry models, MAI-Thinking-1 obtained a higher preference rate in human evaluations. In multiple inference benchmark tests, its performance reached the leading level in the industry.

Especially on the Software Engineering Benchmark SWE-bench Pro, the model achieved a score of 53%, on par with Claude Opus 4.6.

Mustafa specifically emphasized: "MAI-Thinking-1 was trained from scratch, not optimized for specific benchmarks, and did not use distillation technology." This means that the model has a clear, traceable, and commercially licensed data source, making it more suitable for enterprise-level production environment deployment.

In terms of image models, Microsoft launched MAI-Image-2.5 and its lightweight version, MAI-Image-2.5-Flash. The two models achieved breakthroughs in image quality and editing capabilities. MAI-Image-2.5 ranked second in the image editing leaderboard of the large model arena, surpassing many mainstream competitors.

Currently, the two models have been integrated into PowerPoint and are being promoted to OneDrive. They are also available on Azure Foundry.

Microsoft also released the next-generation speech transcription model, MAI-Transcribe-1.5. According to the data released by Microsoft, this model supports 43 languages, reaches the leading level in transcription accuracy, and outperforms existing flagship models in multiple tests.

In the field of speech generation, Microsoft released MAI-Voice 2 and MAI-Voice-2-Flash.

MAI-Voice 2 supports 15 languages, has a more natural intonation, emotional expression, and fine-grained control ability. The Flash version is mainly for real-time voice agent scenarios, meeting enterprise needs with lower latency and higher efficiency.

Microsoft also launched MAI-Code-1-Flash, which is specifically optimized for programming tasks.

Although it only has 5 billion parameters, this model achieved a score of 51% in the SWE-bench Pro test, showing high inference efficiency. The model is deeply optimized for VS Code and GitHub Copilot CLI and can provide code generation and inference capabilities close to those of large models at a lower cost.

In addition to Azure Foundry, Microsoft announced that it will provide the MAI series of models to multiple third-party AI ecosystem platforms, including OpenRouter, Fireworks AI, Baseten, etc. Developers can directly obtain model weights on these platforms in the future and perform personalized fine-tuning.

Mustafa also highlighted Microsoft's latest progress in the co-design of models and chips.

MA