Performant on-device inferencing with ONNX Runtime
The team at Pieces shares the problems and solutions evaluated for their on-device model serving stack and how ONNX Runtime enables their success.
The team at Pieces shares the problems and solutions evaluated for their on-device model serving stack and how ONNX Runtime enables their success.
Make large models smaller and faster with OpenVino Execution Provider, NNCF and ONNX Runtime leveraging Azure Machine Learning.
Together with our colleagues at LinkedIn, we are happy to announce that Feathr is joining the LF AI Data Foundation, an umbrella foundation of the Linux Foundation supporting open source innovation in AI and data.
Choosing which machine learning model to use, sharing a model with a colleague, and quickly trying out a model are all reasons why you may find yourself wanting to quickly run inference on a model.
Scale, performance, and efficient deployment of state-of-the-art Deep Learning models are ubiquitous challenges as applied machine learning grows across the industry.
This post was co-authored by Jithun Nair and Aswin Mathews, members of technical staff at AMD. In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains.
ONNX Runtime now supports building mobile applications in C# with Xamarin. Support for Android and iOS is included in the ONNX Runtime release 1.10 NuGet package. This enables C# developers to build AI applications for Android and iOS to execute ONNX models on mobile devices with ONNX Runtime.
We are introducing ONNX Runtime Web (ORT Web), a new feature in ONNX Runtime to enable JavaScript developers to run and deploy machine learning models in browsers. It also helps enable new classes of on-device computation. ORT Web will be replacing the soon to be deprecated onnx.
With a simple change to your PyTorch training script, you can now speed up training large language models with torch_ort.ORTModule, running on the target hardware of your choice. Training deep learning models requires ever-increasing compute and memory resources. Today we release torch_ort.
This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.
This post was co-authored by Alejandro Saucedo, Director of Machine Learning Engineering at Seldon Technologies. About the co-author: Alejandro leads teams of machine learning engineers focused on the scalability and extensibility of machine learning deployment and monitoring products with over five million installations.
With its resource-efficient and high-performance nature, ONNX Runtime helped us meet the need of deploying a large-scale multi-layer generative transformer model for code, a.k.a., GPT-C, to empower IntelliCode with the whole line of code completion suggestions in Visual Studio and Visual Studio Code.
Watch our webinar on Open Data Science Conference Read the white paper on SmartNoise Differential Privacy machine learning case studies The COVID-19 pandemic demonstrates the tremendous importance of sufficient and relevant data for research, causal analysis, government action, and medical progress. However, for understandable data protection considerations, individuals and decision-makers are often very reluctant to share personal or sensitive data.