ONNX Runtime | Microsoft Open Source Blog

•

May 2, 2022

•

5 min read

Optimizing and deploying transformer INT8 inference with ONNX Runtime-TensorRT on NVIDIA GPUs

Mohit Ayani, Solutions Architect, NVIDIA Shang Zhang, Senior AI Developer Technology Engineer, NVIDIA Jay Rodge, Product Marketing Manager-AI, NVIDIA Transformer-based models have revolutionized the natural language processing (NLP) domain.

•

April 19, 2022

•

8 min read

Scaling-up PyTorch inference: Serving billions of daily NLP inferences with ONNX Runtime

Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry.

•

March 21, 2022

•

6 min read

Supporting efficient large model training on AMD Instinct™ GPUs with DeepSpeed

This post was co-authored by Jithun Nair and Aswin Mathews, members of technical staff at AMD. In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains.

•

December 14, 2021

•

2 min read

Add AI to mobile applications with Xamarin and ONNX Runtime

ONNX Runtime now supports building mobile applications in C# with Xamarin. Support for Android and iOS is included in the ONNX Runtime release 1.10 NuGet package. This enables C# developers to build AI applications for Android and iOS to execute ONNX models on mobile devices with ONNX Runtime.

•

September 2, 2021

•

5 min read

ONNX Runtime Web—running your machine learning model in browser

We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.

•

July 13, 2021

•

3 min read

Accelerate PyTorch training with torch-ort

With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.

•

July 13, 2021

•

4 min read

ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm™ Open Software Platform

This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.

•

July 9, 2021

•

7 min read

Simple steps to create scalable processes to deploy ML models as microservices

This post was co-authored by Alejandro Saucedo, Director of Machine Learning Engineering at Seldon Technologies. About the co-author: Alejandro leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products with over five million installations.

•

June 30, 2021

•

7 min read

Journey to optimize large scale transformer model inference with ONNX Runtime

With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code.

June 7, 2021

•

2 min read

ONNX Runtime 1.8: mobile, web, and accelerated training

The V1.8 release of ONNX Runtime includes many exciting new features. This release launches ONNX Runtime machine learning model inferencing acceleration for Android and iOS mobile ecosystems (previously in preview) and introduces ONNX Runtime Web. Additionally, the release also debuts official packages for accelerating model training workloads in PyTorch.

•

October 12, 2020

•

2 min read

Introducing ONNX Runtime mobile – a reduced size, high performance package for edge devices

ONNX Runtime is an open source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce ONNX Runtime release v1.5 as part of our AI at Scale initiative.

•

August 24, 2020

•

4 min read

GPT-2 fine-tuning with ONNX Runtime – a 34% speedup in training time

Model training is an important step when developing and deploying large scale Artificial Intelligence (AI) models. Training typically utilizes a large amount of compute resources to tune the model based on the input dataset.