Distributing WebAssembly components using OCI registries
As the cloud-native space keeps evolving at a rapid pace, WebAssembly is…
Scikit-learn is one of the most useful libraries for general machine learning in Python. To minimize the cost of deployment and avoid discrepancies, deploying scikit-learn models to production usually leverages Docker containers and pickle, the object serialization module of the Python standard library. Docker is a good way to create consistent environments and pickle saves and restores models with ease.
However, there are some limitations when going this route:
With framework interoperability and backward compatibility, the resource-efficient and high-performance ONNX Runtime can help address these limitations. This blog post introduces how to operationalize scikit-learn with ONNX, sklearn-onnx, and ONNX Runtime.
ONNX (Open Neural Network Exchange) is an open standard format for representing the prediction function of trained machine learning models. Models trained from various training frameworks can be exported to ONNX. Sklearn-onnx is the dedicated conversion tool for converting Scikit-learn models to ONNX.
ONNX Runtime is a high-performance inference engine for both traditional machine learning (ML) and deep neural network (DNN) models. ONNX Runtime was open sourced by Microsoft in 2018. It is compatible with various popular frameworks, such as scikit-learn, Keras, TensorFlow, PyTorch, and others. ONNX Runtime can perform inference for any prediction function converted to the ONNX format.
ONNX Runtime is backward compatible with all the operators in the ONNX specification. Newer versions of ONNX Runtime support all models that worked with the prior version. By offering APIs covering most common languages including C, C++, C#, Python, Java, and JavaScript, ONNX Runtime can be easily plugged into an existing serving stack. With cross-platform support for Linux, Windows, Mac, iOS, and Android, you can run your models with ONNX Runtime across different operating systems with minimum effort, improving engineering efficiency to innovate faster.
If you are interested in performing high-performance inference with ONNX Runtime for a given scikit-learn model, here are the steps:
Here is a tutorial to convert an end-to-end flow: Train and deploy a scikit-learn pipeline.
A pipeline can be exported to ONNX only when every step can. Most of the numerical models are now supported in sklearn-onnx. There are also some restrictions:
In addition to improving the model coverage, sklearn-onnx also extends its API to allow users to register any additional converter or even overwrite existing ones as custom transformers. The tutorial named Implement a new converter demonstrates how to convert a pipeline that includes an unsupported model class using a custom converter.
A prediction function may be converted in a way slightly different from the original scikit-learn implementation. It is therefore recommended to always compare the approximate equality of the predictions on a validation set prior to deploying the exported ONNX model. In particular, ONNX Runtime development prioritizes float over double, which is driven by performance for deep learning models. That’s why sklearn-onnx also uses single-precision floating-point values by default. However, in some cases, double precision is required to avoid significant discrepancies, specifically when the prediction computation involves the inverse of a matrix as in GaussianProcessRegressor or a discontinuous function as in Trees. An example named Issues when switching to float shows ways to solve the discrepancies introduced by using limited precision. The ONNX specification will be extended to resolve these discrepancies in the future.
ONNX Runtime includes CPU state-of-the-art implementation for standard machine model predictions. In addition, with pure C++ implementations for both data verification and computation, ONNX Runtime only needs to acquire the GIL to return the output predictions (when called from Python) while scikit-learn needs it for every intermediate result. Therefore, ONNX Runtime is usually significantly faster than scikit-learn. The speed improvement depends on the batch size and the model class and hyper-parameters.
Let us consider a few popular scikit-learn models as examples. Below are performance benchmarks between scikit-learn 23.2 and ONNX Runtime 1.6 on Intel i7-8650U at 1.90GHz with eight logical cores. The y-axis is the model speedup with ONNX Runtime over the prediction speed of the scikit-learn model. The x-axis represents the number of observations for which we compute predictions in a single call (batch size). Different columns highlight the impact of changing some parameter values and the number of features used to train a given model.
The performance of RandomForestRegressor has been improved by a factor of five in the latest release of ONNX Runtime (1.6). The performance difference between ONNX Runtime and scikit-learn is constantly monitored. The fastest library helps to find more efficient implementation strategies for the slowest one. The comparison is more relevant on a large batch size as the intermediate steps scikit-learn calls to verify the inputs become insignificant compared to the overall computation time. Note that the poor speed of scikit-learn with a small batch size was reported on the scikit-learn issue tracker and will hopefully be improved in future releases of scikit-learn.
The list of supported models is still growing. This has led to some interesting discussions with core developers of scikit-learn. We will continue optimizing the performance in ONNX Runtime given this is a hard challenge and scikit-learn also continues to improve. This story of scikit-learn and ONNX began two years ago when Microsoft became a sponsor of the Scikit-learn consortium @ Inria Foundation and it is still going on.
Questions or feedback? Please let us know in the comments below.