1 min read

ONNX Runtime scenario highlight: Vespa.ai integration

Since its open source debut two years ago, ONNX Runtime has seen strong growth with performance improvements, expanded platform and device compatibility, hardware accelerator support, an extension to training acceleration, and more. We are excited by its broad usage in production, powering more than a hundred models across Microsoft products and services and bringing concrete business impact, including reduced latency for customer experiences, machine cost savings, decrease in time-to-production, and easier deployment on a variety of platforms.

Outside of internal use within the company, we are also delighted to see the adoption and support it has garnered in our open source community. From collaborations with Hugging Face on optimizations and quantization of transformer models to ONNX Runtime usage in Oracle’s Tribuo Java Machine Library and hardware support contributions from Intel, Nvidia, AMD, Xilinx, and Rockchip, we’re thrilled by the community enthusiasm around ONNX Runtime and the varied use cases we’ve seen.

Vespa ai logoRecently, our community colleagues at Verizon Media shared their experience incorporating ONNX Runtime into Vespa.ai, an open source engine for real-time computations for large data, including machine learning models. Vespa serves hundreds of thousands of queries per second worldwide powering hundreds of applications within Verizon Media (formerly Yahoo) and beyond. The Vespa team found that ONNX Runtime supported models created in many frameworks, achieved breakthrough performance for state-of-the-art models like transformers, and integrated into their C++ stack. Read about how Vespa.ai is using ONNX Runtime, details on the technical integration and end-to-end scenario, and more on the Vespa.ai blog.

We’re looking forward to more community collaborations and diverse scenarios using ONNX Runtime! If you have a story you’d like to share, please contact us at onnxruntime@microsoft.com. Find us on GitHub!