TensorRT-LLM Release Accelerates AI Inference Performance, Empowering Windows 11 PCs

TL;DR:

  • AI integration in Windows 11 PCs is a game-changer for various user groups.
  • TensorRT-LLM release enhances AI inference performance and adds support for new models on RTX-powered PCs.
  • AI Workbench simplifies AI model creation and deployment.
  • DirectML enhancements accelerate Llama 2, setting a new performance standard.
  • Portable AI becomes accessible with TensorRT-LLM for Windows.
  • A wrapper enables local AI with OpenAI’s Chat API, preserving data privacy.
  • Developers gain access to cutting-edge AI models for cross-vendor deployment.
  • Over 400 partners already leverage RTX GPUs for AI-powered applications and games.

Main AI News:

In the ever-evolving landscape of technology, the integration of artificial intelligence into Windows 11 PCs marks a watershed moment. This groundbreaking advancement is poised to redefine the experiences of gamers, creators, streamers, office professionals, students, and casual PC users alike. It unlocks unprecedented opportunities to elevate productivity for over 100 million Windows PCs and workstations, all powered by RTX GPUs. NVIDIA RTX technology is the catalyst that is simplifying the process for developers to craft AI applications that will fundamentally transform the way people interact with their computers.

Microsoft Ignite has just unveiled a host of new optimizations, models, and resources, poised to expedite the delivery of groundbreaking user experiences. At the heart of this innovation lies an upcoming release of TensorRT-LLM, an open-source software dedicated to enhancing AI inference performance. This release will usher in support for new large language models, making resource-intensive AI workloads accessible on desktops and laptops equipped with RTX GPUs, starting at 8GB of VRAM.

What’s more, TensorRT-LLM for Windows is on the verge of seamless compatibility with OpenAI’s Chat API through an innovative wrapper. This development signifies a monumental shift, as it empowers hundreds of developer projects and applications to run locally on RTX-enabled PCs, eliminating the need for reliance on cloud-based solutions. With this advancement, users can maintain the utmost privacy and safeguard proprietary data on their Windows 11 PCs.

Harnessing the capabilities of custom generative AI projects demands considerable time and effort. Collaborating across various environments and platforms only adds to the complexity and time investment. Enter AI Workbench, a unified, user-friendly toolkit that streamlines the creation, testing, and customization of pretrained generative AI models and LLMs on PCs and workstations. It provides developers with a singular platform to organize their AI initiatives and fine-tune models to cater to specific use cases, fostering seamless collaboration and deployment.

For those looking to be at the forefront of this transformative wave, the early access list is the gateway to gaining priority access to this burgeoning initiative, along with future updates that promise to reshape the AI landscape.

To further empower AI developers, NVIDIA and Microsoft are set to introduce DirectML enhancements that will accelerate one of the most popular foundational AI models, Llama 2. This move expands the horizons for cross-vendor deployment and sets a new standard for performance, enhancing the portability of AI.

Portable AI has become a reality with the recent announcement of TensorRT-LLM for Windows by NVIDIA. This library is designed to accelerate LLM inference, and its forthcoming release, v0.6.0, promises to deliver up to 5x faster inference performance. It will also extend support to additional popular LLMs, including the new Mistral 7B and Nemotron-3 8B. These LLMs will seamlessly run on any GeForce RTX 30 Series and 40 Series GPU equipped with 8GB of RAM or more, democratizing fast and accurate local LLM capabilities, even on some of the most portable Windows devices.

The latest release of TensorRT-LLM will be available for installation on the /NVIDIA/TensorRT-LLM GitHub repository, while optimized models can be accessed via ngc.nvidia.com.

Conversing With Confidence

OpenAI’s Chat API is a global choice for developers and enthusiasts, supporting a wide array of applications, from content summarization and document drafting to data analysis and presentation creation. However, the reliance on cloud-based AIs has its limitations, especially when it comes to handling private or proprietary data and large datasets.

In response to this challenge, NVIDIA is on the verge of enabling TensorRT-LLM for Windows to provide an API interface akin to OpenAI’s Chat API, through an innovative wrapper. This groundbreaking development ensures a similar workflow for developers, whether they are designing models and applications for local PC deployment with RTX or cloud-based utilization. With just a few lines of code, hundreds of AI-powered projects and applications can now tap into the speed of local AI, allowing users to maintain control of their data without the need for data uploads to the cloud.

The added advantage is that many of these projects and applications are open source, simplifying the process for developers to leverage and extend their capabilities, thereby fueling the adoption of generative AI on Windows, fueled by RTX.

The wrapper is designed to work seamlessly with any LLM optimized for TensorRT-LLM, including Llama 2, Mistral, and NV LLM. It will be released as a reference project on GitHub, alongside other developer resources for working with LLMs on RTX.

Model Acceleration

Empowering developers with cutting-edge AI models and enabling cross-vendor API deployment has been a paramount goal for NVIDIA and Microsoft. Building upon the recent announcements regarding the fastest inference performance for these models, a new option for cross-vendor deployment is set to make AI capabilities more accessible than ever for PCs.

Developers and enthusiasts can experience these latest optimizations by downloading the latest ONNX runtime and following the installation instructions provided by Microsoft. Additionally, the latest driver from NVIDIA, available on Nov. 21, will be instrumental in unlocking these new enhancements.

These advancements in optimizations, models, and resources are poised to accelerate the development and deployment of AI features and applications across the 100 million RTX PCs globally. This movement joins the ranks of over 400 partners who are already shipping AI-powered applications and games, all accelerated by RTX GPUs.

As models become more accessible and developers continue to introduce generative AI-powered functionality to RTX-powered Windows PCs, RTX GPUs are poised to play a pivotal role in enabling users to harness the full potential of this transformative technology.

Conclusion:

The integration of AI into Windows 11 PCs, powered by TensorRT-LLM and RTX technology, is poised to reshape the market. It unlocks new possibilities for productivity and collaboration, making AI more accessible and privacy-focused. Developers and users alike stand to benefit from this transformative technology, as it continues to gain momentum in the market.

Source