Zyphra trains the ZAYA1 Mixture of Experts model entirely on AMD Instinct MI300X hardware



ZAYA1 Base surpasses Llama 3 8B and OLMoE in multiple evaluated benchmarks



High bandwidth memory simplifies training and delivers more than ten times faster model save times



AMD and IBM provide the jointly engineered infrastructure that enables large scale pretraining

AMD Hardware Drives ZAYA1 Development

AMD reported on Nov. 24, 2025, that Zyphra completed training of the ZAYA1 Mixture of Experts foundation model entirely on AMD GPUs and networking. The system used AMD Instinct MI300X GPUs combined with AMD Pensando networking and the ROCm open software stack. Zyphra detailed the results in a technical report released on the same day.

The company positioned ZAYA1 as the first large scale Mixture of Experts model trained exclusively on the AMD platform. According to Zyphra, the model demonstrates competitive or superior outcomes in reasoning, mathematics, and coding assessments. These results are presented as validation of the platform’s throughput and scalability in production AI environments.

Zyphra emphasized that the collaboration with AMD was central to this achievement. The report outlines a jointly developed workflow, integrating software and hardware contributions to streamline a model of this size.

Memory Capacity Shapes Training Efficiency

The AMD Instinct MI300X GPU includes 192 gigabytes of high bandwidth memory. Zyphra attributes this capacity to a more efficient training process, noting that it avoided the need for expert or tensor sharding. Removing this requirement reduced operational complexity and supported stable throughput across the entire model stack.

The company also reports more than ten times faster model save times. This improvement is credited to AMD optimized distributed input and output implementations. Faster save times contribute to training reliability and offer a practical advantage during long duration model development.

ZAYA1 Base operates with 8.3 billion total parameters, with only 760 million active during execution. Zyphra states that this configuration matches or exceeds the performance of models including Qwen3 4B from Alibaba, Gemma3 12B from Google, Llama 3 8B from Meta, and OLMoE. These comparisons are drawn directly from Zyphra’s benchmark evaluations.

Collaborative Infrastructure Enables Large Scale Training

Zyphra notes that the project builds on prior collaborative work with AMD and IBM. Together, the companies designed and deployed a training cluster engineered specifically for large scale workloads. The platform uses AMD Instinct GPUs along with AMD Pensando networking to provide interconnect performance required for MoE model pretraining.

IBM Cloud contributed a high performance fabric and storage architecture. This infrastructure supports the sustained bandwidth demands of long duration AI training. The system was jointly announced earlier in the quarter and serves as the operational foundation for ZAYA1’s development cycle.

The integrated approach provided the throughput and stability needed to complete training at the described scale. Zyphra cites this as a demonstration of the system’s effectiveness for future production AI workloads.

Positioning ZAYA1 Within Open Model Benchmarks

Zyphra describes ZAYA1 as competitive with multiple open models across core evaluation areas. The model’s performance across reasoning, mathematics, and coding categories is attributed to the MoE architecture and the efficiency gains derived from AMD hardware.

The technical report highlights that ZAYA1 Base uses only a fraction of its parameters during execution while sustaining benchmark scores near or above larger models. Zyphra positions this as evidence of improved efficiency rather than a focus on model size alone.

The announcement reinforces AMD’s involvement in the broader AI hardware landscape. By supporting ZAYA1 through both GPU architecture and software optimization, AMD aims to demonstrate the viability of its platform for production scale workloads.