The Huawei version of CUDA has been fully open-sourced.
The Huawei version of CUDA is fully open - sourced and accessible!
According to the latest news, Huawei has announced the open - sourcing of its CANN software toolkit for its Ascend AI GPUs.
Xu Zhijun, the rotating chairman of Huawei, emphasized in his keynote speech that the core of Huawei's AI strategy is computing power, and it adheres to realizing the value of Ascend hardware.
Against this backdrop, Xu Zhijun announced at the meeting that Huawei's Ascend hardware - enabled CANN is fully open - sourced and accessible, and the Mind series of application - enabling kits and toolchains are also fully open - sourced. This supports users' independent in - depth exploration and customized development, accelerating the innovation pace of developers, making Ascend more user - friendly and easier to use.
CANN, a neural network computing architecture, provides multi - level programming interfaces to help users build AI applications targeting Huawei's Ascend.
It is a software ecosystem composed of various skill stacks and operator acceleration libraries. In other words, it is like the Huawei version of CUDA, providing the same interfaces for GPUs.
Coincidentally, on the same day, a startup founded by a legendary GPU expert finally emerged. They don't focus on consumer - grade GPUs but instead develop a software ecosystem similar to CUDA.
It seems that there are quite a few players willing to challenge NVIDIA.
The Huawei version of CUDA is fully open - sourced
In the past, developers have long suffered from CUDA's closed ecosystem.
Except for NVIDIA's own hardware, CUDA hardly supports other third - parties. Therefore, if developers want to use CUDA to build software, they can only use NVIDIA's GPUs. This actually constitutes NVIDIA's core barrier.
Once developers want to migrate to other products, they need to rewrite the code, use relatively immature alternative libraries, and also lose the support of the large - scale technical community established by NVIDIA around CUDA.
Previously, some projects tried to introduce CUDA functionality (through a conversion layer) to other GPU vendors, but due to NVIDIA's obstruction, most of these projects were unsuccessful. Starting from CUDA version 11.6 in 2024, the use of the conversion layer has been prohibited.
Now, at the Ascend Computing Industry Development Summit, Huawei announced the open - sourcing of the CANN architecture, and the Mind series of application - enabling kits and toolchains are also open - sourced. Developers can independently explore the potential of Ascend GPUs in - depth.
Currently, CANN has been upgraded to version 8.0, which mainly offers two versions: the community version, which provides early access to new features; and the commercial version, which provides a stable version tailored for enterprise users. Both versions have been updated to version 8.2.RC1, with new support for 12 operating systems.
Complementary to CANN is Huawei's self - developed deep - learning framework, MindSpore, which functions similarly to PyTorch. These tools together form Huawei's native AI software and hardware solutions.
As of now, CANN supports deep - learning frameworks and third - party libraries such as PyTorch, MindSpore, TensorFlow, PaddlePaddle, ONNX, Jittor, OpenCV, and OpenMMLab.
At the meeting, representatives and Huawei jointly launched the "Initiative for the Co - construction of the CANN Open - source and Accessible Ecosystem".
It seems that Huawei has started to take significant actions in building an open - sourced and accessible Ascend ecosystem.
A legendary GPU architect starts a business, targeting NVIDIA's CUDA
There are also many players in the industry challenging NVIDIA's CUDA ecosystem.
For example, Raja Koduri, a legendary GPU architect, announced the establishment of a GPU startup, Oxmiq Labs.
He has worked for AMD, Apple, Intel, etc. He served as the executive vice - president of the Accelerated Computing Systems and Graphics (AXG) business at Intel. Before joining Intel, he was the senior vice - president and chief architect of AMD's graphics department, the Radeon Technologies Group.
The company he founded now focuses on developing GPU hardware and software IP and licensing it to various parties. He positions this company as the first GPU startup in Silicon Valley in 25 years.
However, they don't build consumer - grade GPUs, nor do they develop all the IP modules required for GPUs. They provide a vertically integrated platform that combines GPU hardware IP with a fully - functional software stack, aiming to meet the needs of AI, graphics, and multimodal workloads, where explicit parallel processing is crucial.
In terms of hardware, Oxmiq provides a GPU IP core, OxCore, based on the RISC - V instruction set architecture (ISA). This core integrates scalar, vector, and tensor computing engines in a modular architecture and supports near - memory and in - memory computing functions.
Oxmiq also provides the OxQuilt, a system - on - a - chip (SoC) builder based on chiplets, enabling customers to quickly and cost - effectively create SoCs integrating Compute Cluster Bridges (CCB, possibly integrating OxCores), Memory Cluster Bridges (MCB), and Interconnect Cluster Bridges (ICB) modules according to specific workload requirements.
For example, an inference AI accelerator for edge applications can encapsulate one or two CCBs and one ICB. An inference SoC requires more CCBs, MCBs, and ICBs, while a large - scale SoC for AI training may encapsulate dozens of chiplets.
Oxmiq has not yet revealed whether its OxQuilt only supports building multi - chiplet system - in - package (SiP) or can also be used to assemble monolithic processors.
However, their software business seems to be more core and crucial. The software package they provide is compatible with third - party hardware, supporting the deployment of AI and graphics workloads on various hardware platforms.
The core of the software stack is OXCapsule, a unified runtime and scheduling layer for managing workload distribution, resource balancing, and hardware abstraction.
A prominent component of the stack is OXPython, a compatibility layer that converts CUDA - centric workloads to Oxmiq's runtime and allows Python - based CUDA applications to run on non - NVIDIA hardware without modification and without recompilation.
OXPython will not be initially released on Oxmiq's IP but on Tenstorrent's Wormhole and Blackhole AI accelerators.
In fact, Oxmiq's software stack is fundamentally designed to be independent of Oxmiq hardware, which is a core part of its strategy.
Regardless of the final outcome, the competition has begun, and ultimately, it is the developers who will benefit.
Reference links:
[1]https://x.com/RajaXg/status/1952633159818060164
[2]https://www.tomshardware.com/tech-industry/artificial-intelligence/huawei-is-making-its-ascend-ai-gpu-software-toolkit-open-source-to-better-compete-against-cuda
[3]https://www.tomshardware.com/tech-industry/artificial-intelligence/legendary-gpu-architect-raja-koduris-new-startup-leverages-risc-v-and-targets-cuda-workloads-oxmiq-labs-supports-running-python-based-cuda-applications-unmodified-on-non-nvidia-hardware
[4]https://mp.weixin.qq.com/s/cK7REZ9_ToHPEq4iyWoRqA
This article is from the WeChat official account "QbitAI", author: Hong Jiao. Republished by 36Kr with authorization.