Microsoft’s AI data center strategic planning enables seamless and large-scale deployment of NVIDIA Rubin | Microsoft Azure Blog

CES 2026 showcases the arrival of the NVIDIA Rubin Platform along with proven Azure deployment readiness.

CES 2026 showcases the arrival of the NVIDIA Rubin platform along with Azure’s proven deployment readiness. Microsoft’s long-range data center strategy was designed for just such moments, when NVIDIA’s next-generation systems slide right into infrastructure that anticipated their power, heat, memory, and networking requirements years ahead of the industry. Our long-term collaboration with NVIDIA ensures that Rubin fits right into the design of the Azure platform.

Construction with a goal for the future

Azure AI data centers are designed for the future of accelerated computing. This enables seamless integration of NVIDIA Vera Rubin NVL72 racks across the largest next-generation Azure AI superfactories from the current Fairwater locations in Wisconsin and Atlanta to future locations.

The latest NVIDIA AI infrastructure requires significant upgrades in power, cooling and performance optimization; however, Azure’s experience with our Fairwater sites and several upgrade cycles over the years demonstrates the ability to flexibly improve and expand the AI ​​infrastructure in line with advances in technology.

Azure’s proven track record of delivering scale and performance

Microsoft has years of proven experience in designing and deploying scalable AI infrastructure that evolves with each major advance in AI technology. In line with each successive generation of NVIDIA’s accelerated computing infrastructure, Microsoft rapidly integrates NVIDIA innovations and delivers them at scale. Our early large-scale deployment of NVIDIA Ampere and Hopper GPUs, connected via the NVIDIA Quantum-2 InfiniBand network, was instrumental in bringing models like the GPT-3.5 to life, while other clusters set supercomputing performance records, proving that we can bring next-generation systems online faster and with higher real-world performance than the rest of the industry.

We unveiled the first and largest implementations of the NVIDIA GB200 NVL72 and NVIDIA GB300 NVL72 platforms, designed to rack into single supercomputers that train AI models significantly faster, helping Azure remain the best choice for customers looking for advanced AI capabilities.

Azure system access

Azure is designed for compute, networking, storage, software and infrastructure all working together as one integrated platform. In this way, Microsoft builds a lasting advantage into Azure, delivering breakthrough costs and performance that compound over time.

Maximizing GPU utilization requires optimization across each layer. In addition to the fact that Azure may soon adopt NVIDIA’s new accelerated computing platforms, Azure benefits also come from the surrounding platform: high-capacity Blob storage, proximity and region-scale design shaped by real-world production patterns, and orchestration layers like CycleCloud and AKS tuned for low-overhead scheduling at massive cluster scale.

Azure Boost and other load shedding modules remove IO, network and storage bottlenecks so models can scale seamlessly. Faster storage powers larger clusters, stronger networking supports them, and optimized orchestration keeps performance stable end-to-end. First-party innovations strengthen the loop: liquid-cooled heat exchanger units maintain tight temperature limits, Azure’s hardware security silicon module (HSM) makes security easier, and Azure Cobalt delivers exceptional performance and efficiency for general computing and AI-related tasks. Together, these integrations ensure efficient system-wide scaling, so GPU investments deliver maximum value.

This systems approach makes Azure ready for the Rubin platform. We deliver new systems and establish an end-to-end platform already shaped by the requirements that Rubin brings.

Running the NVIDIA Rubin platform

NVIDIA Vera Rubin Superchips will deliver 50 PF NVFP4 inference power per chip and 3.6 EF NVFP4 per rackhas jump five times via NVIDIA GB200 NVL72 rack systems.

Azure already incorporates the basic architectural assumptions that Rubin requires:

  • Evolution of NVIDIA NVLink: Sixth generation NVIDIA NVLink expected in Vera Rubin NVL72 systems achieves ~260 TB/s Azure’s scalable bandwidth and rack architecture has already been redesigned to work with these bandwidth and topology advantages.
  • High performance scalable networks: The Rubin AI infrastructure relies on ultra-fast NVIDIA ConnectX-9 1,600 Gbps networks provided by Azure network infrastructure, purpose-built to support large-scale AI workloads.
  • HBM4/HBM4e thermal and density planning: Rubin memory stack requires tighter temperature windows and higher rack density; Azure’s cooling, power envelopes, and rack geometry have already been upgraded to handle the same limitations.
  • Memory expansion controlled by SOCAMM2: Rubin Superchips use a new memory expansion architecture; Azure has already integrated and validated similar memory expansion behaviors to power models at scale.
  • GPU reticle size scaling and multi-die packaging: Rubin is moving towards a massively larger GPU footprint and multi-die layout. Azure’s supply chain, mechanical design, and orchestration layers have been pre-tuned for these physical and logical scaling characteristics.

Azure’s approach to designing for the next generation of accelerated computing platforms like Rubin has been proven over several years, including significant milestones:

  • Operated the largest commercial deployment of InfiniBand in the world across multiple generations of GPUs.
  • Built-in layers of reliability and congestion management techniques that unlock higher cluster utilization and larger job sizes than the competition, reflected in our ability to publish industry-leading extensive benchmarks. (E.g. MLPerf competitors of multi-rack runs never replicated.)
  • AI data centers co-designed with Grace Blackwell and Vera Rubin from the ground up to maximize performance and performance per dollar at the cluster level.

Design principles that differentiate Azure

  • Module exchange architecture: To enable rapid servicing, Azure GPU server stacks are designed to be quickly swapped out without the need for extensive rewiring, improving uptime.
  • Cooling abstraction layer: Rubin’s multi-die, high-bandwidth components require a sophisticated thermal headroom that Fairwater can already accommodate, avoiding costly retrofit cycles.
  • New generation power supply design: Vera Rubin NVL72 requires increasing wattage density; Azure’s multi-year performance overhaul (liquid cooling loop revisions, CDU scaling and high-amp buses) ensures immediate deployments.
  • AI superfactory modularity: Unlike other hyperscalers, Microsoft builds regional supercomputers rather than singular megasites, allowing for a more predictable global rollout of new SKUs.

How co-design leads to user benefits

The NVIDIA Rubin platform represents a major step forward in accelerated computing, and AI data centers and Azure superfactories are already designed to take full advantage. Years of co-design with NVIDIA across interconnects, memory systems, thermals, packaging and rack-scale architecture means Rubin integrates directly into the Azure platform without re-engineering. The core assumptions of Rubin are already reflected in our design principles for networking, power, cooling, orchestration, and module replacement. This alignment provides customers with immediate benefits of faster deployment, faster scaling, and faster impact as they build the next era of large-scale artificial intelligence.

Leave a Comment