What’s new with Microsoft in open-source and Kubernetes at KubeCon North America 2024
At Microsoft, we are committed to innovation in the cloud-native ecosystem through…
One year after ONNX Runtime’s initial preview release, we’re excited to announce v1.0 of the high-performance machine learning model inferencing engine. This release marks our commitment to API stability for the cross-platform, multi-language APIs, and introduces a breadth of performance optimizations, broad operator coverage, and pluggable accelerators to take advantage of new and exciting hardware developments.
In its first year, ONNX Runtime was shipped to production for more than 60 models at Microsoft, with adoption from a range of consumer and enterprise products, including Office, Bing, Cognitive Services, Windows, Skype, Ads, and others. These models span from speech to image to text (including state of the art models such as BERT) and ONNX Runtime has improved the performance of these models by an average of 2.5x over previous inferencing solutions.
In addition to performance gains, the interoperable ONNX model format has also provided increased infrastructure flexibility, allowing teams to use a common runtime to scalably deploy a breadth of models to a range of hardware. Across Microsoft technologies, ONNX Runtime is serving hundreds of millions of devices and billions of requests daily.
We also collaborated with a host of community partners to take advantage of ONNX Runtime’s extensibility options to provide accelerators for a variety of hardware options. With active contributions from Intel, NVIDIA, JD.com, NXP, and others, today ONNX Runtime can provide acceleration on the Intel® Distribution of the OpenVINO™ Toolkit, Deep Neural Network Library (DNNL) (formerly Intel® formerly MKL-DNN), nGraph, NVIDIA TensorRT, NN API for Android, the ARM Compute Library, and more.
We’ve made some changes to the C API for clarity of usage and introduced versioning to accommodate future updates.
Keeping up with the evolving ONNX spec remains a key focus for ONNX Runtime and this update provides the most thorough operator coverage to date. ONNX Runtime supports all versions of ONNX since 1.2 with backwards and forward compatibility to run a comprehensive variety of ONNX models.
Outside of adding new Execution Providers for hardware acceleration, we’ve also made a host of updates to minimize default CPU and GPU (CUDA) latency for inference computations.
To facilitate production usage of ONNX Runtime, we’ve released the complementary ONNX Go Live tool, which automates the process of shipping ONNX models by combining model conversion, correctness tests, and performance tuning into a single pipeline as a series of Docker images. We’ve also refreshed the quantization tool to support improved performance and accuracy for inferencing quantized models in ONNX Runtime, with updates for node fusions and bias quantization for convolutions.
We’ve added component level logging through Trace Logging to identify areas for improvement. You can read more about managing these settings and the data collected here.
This release contains many bug fixes identified during the past few months. As an active growing project, we do expect bugs to be uncovered as the breadth of models expands. We continue striving towards quality and are committed to actively resolve issues as they are uncovered. You can always report bugs on Github.
For full release notes, please see https://aka.ms/onnxruntime-release.
ONNX Runtime 1.0 is a notable milestone, but this is just the beginning of our journey. We support the mission of open and interoperable AI and will continue working towards improving ONNX Runtime by making it even more performant, extensible, and easily deployable across a variety of architectures and devices between cloud and edge. You can find our detailed roadmap here.
We thank our community of contributors and look forward to even greater impact to further innovation and operationalization of ML in the field.
Learn more about ONNX Runtime, and join us on Github.
Have feedback or questions about ONNX Runtime? File an issue on GitHub and follow us on Twitter.