HomeArticle

WWDC 2026, Silicon Valley's most expensive admission of defeat: the 1.2-trillion-parameter Siri runs on Gemini, but it won't work on your phone

极客邦科技InfoQ2026-06-09 16:28
For Apple, this is a moment to "prove itself."

This might be the last time Tim Cook says the familiar "Good morning" at an Apple press conference.

Early in the morning on June 9th, Beijing time, Apple held the WWDC 2026 keynote speech. According to previous news, Tim Cook will step down as Apple's CEO on September 1st. Therefore, the outside world generally expects that this WWDC is the last major Apple press event he will host as CEO. Before the event started, Cook also posted a special video on X in advance, using a relaxed and humorous way to give a farewell - style warm - up to his classic opening line.

Over the years, Tim Cook has almost always started Apple press conferences with the same "Good morning". This simple greeting has long become a fixed ritual of Apple Keynotes and has gradually been turned into various memes by netizens. In this video, Cook invited many film, television, and variety show stars to make cameo appearances, asking them to perform this "Good morning" in different ways. It seems like a self - mockery and also a way to build up the mood for this special WWDC in advance.

https://x.com/tim_cook/status/2063973568787226897

In April this year, Cook announced that he would hand over the reins of Apple to John Ternus in September. Cook has helped Apple become one of the world's most valuable companies, while Ternus is an executive with a background in mechanical engineering and is currently in charge of the development of Apple's hardware products, including Mac computers and iPhones.

Although the outside world believes that Apple has made mistakes in AI, under Cook's leadership, the company has still achieved great success. At least during his 15 - year tenure as CEO, after stock - split adjustment, Apple's stock price has risen by about 2000%.

However, beyond this somewhat farewell - themed opening, the real highlight of WWDC 2026 is still how Apple will tell its AI story next.

Siri from Scratch

At this WWDC, the most - watched protagonist of Apple may no longer be just iOS, macOS, or new developer tools, but Siri. In the past year, Apple Intelligence was highly anticipated, but the most crucial upgrade of Siri was repeatedly delayed. Eventually, Apple's internal realized that the problem was not just a function delay, but an AI strategic crisis.

According to Mark Gurman's disclosure, in early 2025, Apple's senior management held a key meeting to discuss how to deal with the poor performance of Apple Intelligence and the delay in Siri's transformation, and finally promoted the reconstruction of Siri into a new organizational arrangement. In other words, the new Siri that may appear at this WWDC is not a routine product iteration, but the result of Apple's forced acceleration of adjustment in the wave of generative AI.

If the previous Siri was more like a voice command entry, then this time Apple is trying to showcase a redefined system - level AI assistant.

According to Apple, the new Siri AI is not simply adding a few generative AI functions to the old Siri, but a reconstruction from the underlying architecture to the interaction mode.

"Siri has been completely reconstructed, with powerful AI at its core. It makes full use of the new architecture of Apple Intelligence, including the next - generation Apple Foundation models that can run on devices and servers (using private cloud computing)."

Currently, Apple Intelligence already has a second - generation edge - side model. In the new experience, the dictation function has been improved, and the ability to understand personal context has also been integrated into the new system. Apple Intelligence will use Spotlight's semantic index to improve search and support perception capabilities. In addition, the new system also includes extensive world knowledge and App Actions, and screen perception ability has also been added.

In terms of specific apps, Apple Intelligence also brings many practical improvements:

  • Password App: It can help users update multiple weak passwords with one click.
  • Messages: It can understand the chat context, remind users to add something to reminders or memos, and can also help users find photos mentioned in the chat.
  • Phone App: It will add call context capabilities. For example, when users call an airline, it can help retrieve relevant information such as flight confirmation numbers.
  • Mail App and Calendar App: They will become more "context - aware". For example, the mail can give more appropriate suggestions based on the content, and the calendar can directly create schedules through natural language and automatically identify contacts, locations, and other information in it.

According to Mike Rockwell, the vice - president of Apple Siri engineering, the new Siri will provide a more capable assistant and have a dedicated Siri App.

In terms of design, the previously rumored changes have been adopted: the previous colorful effect around the screen edge has been replaced by a dark - themed interface based on the Dynamic Island.

The new Siri also demonstrates screen perception ability. For example, when users see an Instagram post, they can directly ask about the location mentioned in the post; Siri can also recognize contacts without the need to provide specific information in each prompt. At least from the current demonstration, these functions of the new Siri fulfill the capabilities that Apple demonstrated at WWDC 2024 two years ago and originally planned to launch with iOS 18.

The voice function of the new Siri is also more personalized, not only providing preset voice options but also adding more options such as speech speed and intonation.

On iOS, users can swipe down from the Dynamic Island to open Siri, or use the existing wake - up methods. On Mac, Siri is now integrated with Spotlight, and the menu bar icon has finally become monochromatic instead of colorful.

Apple specifically mentioned that the brand - new dedicated Siri app allows users to look back on previous conversations and start new ones. The conversation history is privately synchronized through iCloud, so conversations can be seamlessly connected across different devices.

Rent Google's "Brain" to Fix Siri

A bigger shift is that Apple no longer fully relies on self - developed models to catch up.

The core of Apple Intelligence is the Apple Foundation Models jointly developed with Google. In January this year, Google and Apple announced a multi - year cooperation plan. According to the plan, Google's Gemini artificial intelligence model will become the foundation of Apple's artificial intelligence system.

Apple described this cooperation as a "deep" cooperation and called it a "huge upgrade" for Apple Intelligence, bringing the most advanced understanding and reasoning capabilities, as well as multi - modal support including image understanding and generation.

According to Gurman's report, this Siri upgrade is based on a Google - customized 1.2 - trillion - parameter model, with an annual cost of about $1 billion.

You know, as the company with the highest market value, Apple holds $147 billion in cash and securities. This company, which controls everything from silicon wafers to screws, makes its own chips and modems, officially admitted at this WWDC that they won't build cutting - edge AI models on their own. They chose to rent a model from Google, and this fact better illustrates the economics of AI than any benchmark this year.

Why rent instead of build? Look at what it takes to have a cutting - edge model: OpenAI's operating profit margin is - 122%, Anthropic spends about $1.25 billion on computing power every month, and a single cutting - edge model training costs hundreds of millions of dollars every few months. After seeing these costs, Apple chose to give up.

But Apple didn't give up on computing power. The heavy Gemini inferences run on Apple's own Private Cloud Compute servers, not on Google's servers. Apple rents the model weights but retains the infrastructure. It pays Google for the model while still controlling the computing power layer. This is the really important point of this bet.

Apple has 2.5 billion active devices. A 1.2 - trillion - parameter model is about to stand behind Siri to serve all these devices. This means planet - scale cutting - edge model inferences. It runs in data centers, and data centers rely on electricity - the power grid is not yet ready to bear such a load.

Some also believe that this is not surrender but a judgment of a builder. Having a cutting - edge model is like getting on a treadmill: retraining every few months, most inferences are losing money, and huge costs are needed to maintain the model service. Apple is betting that models will become commodities, and the computing power layer under the models, such as chips, power, and cooling systems, are the moats. These things won't expand according to the software's schedule. The world's most valuable company is betting that cutting - edge models will be commoditized, and the real difficulty lies in everything under the models.

So, it's hard to simply understand this cooperation as a victory or a failure. It's a rare compromise made under the pressure of catching up in AI.

After the WWDC keynote speech, Craig Federighi further explained the cooperation boundary between Apple and Google in a technical exchange with the media.

When talking about the cooperation with Google, Federighi explained:

Of course, we don't use the Gemini app as our app. In fact, when running on iOS, we don't use any of its client - side code. For these models, we don't use any of the models that Google deploys to its customers, nor do we use the infrastructure and methods that Google uses when deploying models to customers. As for the knowledge base, we of course don't use Google Search or similar products as the basis of our system. So I hope this is clear: we use zero components of Google Assistant.

Now, let's talk about what we really use, or how our system is built.

Everything, of course, starts with our Assistant experience. As you saw earlier today, this Assistant experience has been deeply integrated into the system, into iOS, iPadOS, and macOS. On the iPhone, you can see how the Assistant emerges from the Dynamic Island in the form of Liquid Glass in a way that I think is very beautiful; you can wake it up through the side button or directly call Siri's name to start it. But more importantly, it has been integrated into various scenarios in the system. Whether you're using Writing Tools to write or operating through the context menu, it's all deeply integrated with the system experience.

Connected to this experience is the Siri app. The Siri app is a great entry - point that allows you to return to a previously started conversation, see what you've done before, continue to extend that conversation, or start a new one. But this app doesn't simply call a model in the cloud. It's built on the powerful system software in Apple Intelligence.

This includes the System Orchestrator, which is the key to our entire system's privacy architecture. It is responsible for coordinating various requests, such as accessing operations in your apps through the App Toolbox, accessing personal content through the Spotlight Semantic Index to help fulfill your requests, and even using the context on the screen to understand what you might be looking at at the moment of making a request.

All of this is built on a set of powerful edge - side models. These models are responsible for various tasks, from understanding voice to synthesizing the voice to respond to you; from visually understanding the environment and screen context to judging whether there is relevant content, to understanding the text on the screen, and a whole set of other model capabilities.

In this deployment mode, we have a family of models, namely the third - generation Apple Foundation Models, from the AFM Cloud and AFM Cloud Pro models to the AFM Fusion model and the image model. These models are the result of our cooperation with Google, and you'll hear more about them later. But their architecture is designed to run on our deployment architecture. These models are specifically for the Apple Intelligence experience.

Amar Subramanya, the vice - president of Apple AI, further explained:

"We're very excited to launch the third - generation Apple Foundation Models, the AFM, in cooperation with Google. We've built a family of models that covers the edge to the cloud. Before introducing this family of models one by one, I want to say the most core point of this generation: compared with the previous generation, each model in this generation has a significant improvement in both quality and ability.

If we look at them one by one, let's start with the edge - side models. First is AFM Core. This is the next - generation edge - side model we currently launch with the devices, using a dense architecture.

Next is AFM Core Advanced. This is a model we've never run on the device side before. It uses a sparse architecture and natively supports multi - modality. Because of this, the capabilities of this model have made a huge leap, supporting some of the functions you heard this morning, such as invitation - related capabilities and more expressive voices, and all these capabilities run entirely on the device side.

Now let's look at the server - side models. They are all served through Private Cloud Compute. First is AFM Cloud. This is our main server - side model, basically optimized for latency and service cost.

Next is AFM Cloud Image. This is our next - generation image generation and editing model, supporting many excellent experiences, including the spatial reframing you heard this morning.

The four models mentioned above, AFM Core, AFM Core Advanced, AFM Cloud, and AFM Cloud Image, are all custom - built for Apple Silicon, trained with proprietary data, and refined through the Gemini cutting - edge model.

Finally, for some of the most demanding tasks, such as agentic tool use and complex reasoning, we have AFM Cloud Pro. This is our most capable model, with a quality close to the Gemini cutting - edge model.

To put this model into the production environment, we cooperated with Google and Nvidia to expand the Private Cloud Compute infrastructure