TL;DR:
- Nvidia introduces AI foundry service on Microsoft Azure, featuring Nemotron-3 8B models.
- Collaboration empowers enterprises and startups to build custom AI applications on Azure.
- Applications leverage retrieval augmented generation (RAG) technology for seamless data utilization.
- Nvidia’s AI foundry service combines generative AI model tech, LLM training expertise, and Azure’s cloud power.
- New 8-billion parameter models and next-gen GPU integration announced.
- Azure users gain access to a comprehensive AI toolkit, including Nvidia’s foundation models, NeMo framework, and DGX Cloud.
- The partnership extends to Oracle Cloud Marketplace for tool acquisition.
- Early adopters include SAP, Amdocs, and Getty Images, exploring custom AI solutions.
- Microsoft introduces NC H100 v5 VMs with powerful Nvidia GPUs for Azure.
- Nvidia H100 NVL GPU offers a 12x performance increase for AI workloads.
- Microsoft plans to add Nvidia H200 Tensor Core GPU to Azure, enhancing AI capabilities.
- TensorRT LLM for Windows update enables faster inference performance and compatibility with OpenAI’s Chat API.
Main AI News:
Nvidia has unveiled its AI foundry service on Microsoft Azure, introducing the cutting-edge Nemotron-3 8B models. This strategic partnership between Nvidia and Microsoft is set to revolutionize the landscape of AI application development. During the recent Ignite conference, hosted by Microsoft under the leadership of Satya Nadella, Nvidia revealed this game-changing AI foundry service, poised to empower both established enterprises and startups in crafting bespoke AI applications on the Azure cloud. What sets this service apart is its integration of retrieval augmented generation (RAG) technology, allowing applications to harness enterprise data seamlessly.
Jensen Huang, the visionary founder and CEO of Nvidia, emphasized the significance of this collaboration, stating, “Nvidia’s AI foundry service combines our generative AI model technologies, LLM training expertise, and a massive AI factory, all hosted on Microsoft Azure. This synergy enables global enterprises to effortlessly incorporate their tailored models with Microsoft’s world-class cloud services.”
Furthermore, Nvidia introduced a remarkable addition to the foundry service—8-billion parameter models. These models are an integral part of the initiative, and Nvidia has announced its intentions to integrate the next-generation GPU into Microsoft Azure in the near future.
So, what exactly does the AI foundry service bring to Azure users? It offers cloud-based enterprises a comprehensive toolkit to craft specialized, business-focused generative AI applications, all conveniently available within a single platform. This encompassing offering includes Nvidia’s AI foundation models, the NeMo framework, and the formidable Nvidia DGX cloud supercomputing service.
Manuvir Das, VP of Enterprise Computing at Nvidia, highlighted the significance of this development, stating, “For the first time, this entire process, from hardware to software, is available end-to-end on Microsoft Azure. Any customer can come and execute the entire enterprise generative AI workflow with Nvidia on Azure. They can procure the necessary technology components right within Azure. Simply put, it’s a collaborative effort between Nvidia and Microsoft.”
To cater to diverse enterprise needs within Azure environments, Nvidia is introducing a new family of Nemotron-3 8B models, enabling the creation of advanced enterprise chat and Q&A applications in sectors like healthcare, telecommunications, and financial services. These models boast multilingual capabilities and can be accessed through the Azure AI model catalog, Hugging Face, and the Nvidia NGC catalog.
Among the other foundation models available in the Nvidia catalog are Llama 2 (also coming to the Azure AI catalog), Stable Diffusion XL, and Mistral 7B.
Once users have selected their preferred model, they can seamlessly transition to the training and deployment phase for custom applications, utilizing Nvidia DGX Cloud and AI Enterprise software, both accessible through the Azure marketplace. DGX Cloud provides scalable instances and includes the AI Enterprise toolkit, featuring the NeMo framework and Nvidia Triton Inference Server, enhancing Azure’s enterprise-grade AI service for faster LLM customization.
Notably, Nvidia has recently extended a similar partnership with Oracle, enabling eligible enterprises to acquire these tools from the Oracle Cloud marketplace directly for model training and deployment on the Oracle Cloud Infrastructure (OCI).
The early adopters of the foundry service on Azure, including major software players like SAP, Amdocs, and Getty Images, are actively testing and building custom AI applications tailored to various use cases.
Beyond the generative AI service, the collaboration between Microsoft and Nvidia also encompasses the integration of Nvidia’s latest hardware offerings. Microsoft has introduced new NC H100 v5 virtual machines for Azure, marking the industry’s first cloud instances featuring a pair of PCIe-based H100 GPUs connected via Nvidia NVLink. These machines offer an impressive four petaflops of AI compute power and 188GB of high-speed HBM3 memory.
The Nvidia H100 NVL GPU is a game-changer, delivering up to 12 times higher performance on GPT-3 175B compared to its predecessor. This enhanced capability makes it ideal for inference and mainstream training workloads.
Looking ahead, Microsoft plans to introduce the new Nvidia H200 Tensor Core GPU to its Azure fleet, offering 141GB of HBM3e memory (1.8 times more than its predecessor) and a peak memory bandwidth of 4.8 TB/s (a 1.4 times increase). This GPU is tailor-made for handling large AI workloads, including generative AI training and inference, providing Azure users with multiple options for AI workloads, alongside Microsoft’s new Maia 100 AI accelerator.
Nvidia has also taken steps to accelerate LLM work on Windows devices, announcing several updates, including an update for TensorRT LLM for Windows. This update introduces support for new large language models like Mistral 7B and Nemotron-3 8B, delivering five times faster inference performance. These improvements make running these models smoother on desktops and laptops equipped with GeForce RTX 30 Series and 40 Series GPUs with at least 8GB of RAM.
Notably, TensorRT-LLM for Windows will also be compatible with OpenAI’s Chat API through a new wrapper, enabling numerous developer projects and applications to run locally on a Windows 11 PC with RTX, reducing reliance on cloud-based infrastructure. This innovation heralds a new era of AI accessibility and performance on Windows devices.
Conclusion:
The strategic partnership between Nvidia and Microsoft showcased through the introduction of the AI foundry service on Azure, heralds a significant transformation in the AI application development landscape. Enterprises and startups now have a powerful platform to harness custom AI applications with ease, leveraging Nvidia’s expertise and Microsoft’s cloud prowess. This collaboration not only benefits both companies but also paves the way for enhanced AI capabilities in the market, catering to diverse industry needs and expanding AI accessibility on Windows devices.