Microsoft at Open Source Summit Europe 2024: Driving collaboration and innovation
Connect with other open source enthusiasts at Open Source Summit Europe 2024…
This blog is co-authored by Edwin Cheung, Principal Software Engineering Manager and Xiaoyong Zhu, Principal Data Scientist.
Feathr is an enterprise scale feature store, which facilitates the creation, engineering, and usage of machine learning features in production. It has been used by many organizations as an online/offline store, as well as for real-time streaming.
Today, we are excited to announce the much-anticipated availability of the OSS Feathr 1.0. It contains many new features and enhancements since Feathr became open-source one year ago. Similar to the online transformation, rapid sandbox environment, MLOPs V2 accelerator integration really accelerates the development and deployment of machine learning projects at enterprise scale.
In various machine learning scenarios, features generation is required for both training and inferences. There is a limitation where data source cannot come from online service, as currently transformation only happens before feature data is published to the online store and the transformation is required close to real-time. In such cases, there is a need for a mechanism where the user has the ability to run transformation on the inference data dynamically before inferencing via the model. The new online transformation via DSL feature addresses these challenges by using a custom transformation engine that can process transformation requests and responses close to real-time on demand.
It allows definition of transformation logic declaratively using DSL syntax which is based on EBNF. It also provides extensibility, where there is a need to define custom complex transformation, by supporting user defined function (UDF) written in Python or Java.
nyc_taxi_demo(pu_loc_id as int, do_loc_id as int, pu_time as string, do_time as string, trip_distance as double, fare_amount as double) …
project duration_second = (to_unix_timestamp(do_time, "%Y/%-m/%-d %-H:%-M") - to_unix_timestamp(pu_time, "%Y/%-m/%-d %-H:%-M"))
| project speed_mph = trip_distance * 3600 / duration_second
;
This declarative logic runs in a new high-performance DSL engine. We provide HELM Chart to deploy this service in a container-based technology such as the Azure Kubernetes Service (AKS).
The transformation engine can also run as a standalone executable, which is a HTTP server that can be used to transform data for testing purposes. feathrfeaturestore/feathrpiper:latest.
curl -s -H"content-type:application/json" http://localhost:8000/process -d'{"requests": [{"pipeline": "nyc_taxi_demo_3_local_compute","data": {"pu_loc_id": 41,"do_loc_id": 57,"pu_time": "2020/4/1 0:41","do_time": "2020/4/1 0:56","trip_distance": 6.79,"fare_amount": 21.0}}]}'
It also provides the ability to auto-generate the DSL file if there are already predefined feature transformations, which have been created for the offline-transformation.
It is imperative that online transformation performs close to real-time and meets low latency demand with high queries per second (QPS) transformation for many of the enterprise customers’ needs. To determine the performance, we have conducted a benchmark on three tests. First, deployment on AKS with traffic going through ingress controller. Second, traffic going through AKS internal load balance, and finally, via the localhost.
Total Requests | Concurrency | p90 | p95 | p99 | request/sec |
1000000 | 100 | 3 | 4 | 9 | 43710 |
1000000 | 200 | 6 | 8 | 15 | 43685 |
1000000 | 300 | 10 | 11 | 18 | 43378 |
1000000 | 400 | 13 | 15 | 21 | 43220 |
1000000 | 500 | 16 | 19 | 24 | 42406 |
Total Requests | Concurrency | p90 | p95 | p99 | request/sec |
1000000 | 100 | 3 | 4 | 4 | 47673 |
1000000 | 200 | 5 | 7 | 8 | 47035 |
1000000 | 300 | 9 | 10 | 12 | 46613 |
1000000 | 400 | 11 | 12 | 15 | 45362 |
1000000 | 500 | 14 | 15 | 19 | 44941 |
Total Requests | Concurrency | p90 | p95 | p99 | Request/sec |
1000000 | 100 | 2 | 2 | 3 | 59466 |
1000000 | 200 | 4 | 4 | 5 | 59433 |
1000000 | 300 | 6 | 6 | 8 | 60184 |
1000000 | 400 | 8 | 9 | 10 | 59622 |
1000000 | 500 | 10 | 11 | 14 | 59031 |
Benchmark thanks to Blair Chan and Chen Xu. For more details, check out the online transformation guide.
This is an exciting feature, especially for data scientists, who may not have the necessary infrastructure background or know how to deploy the infrastructure in the cloud. The sandbox is a fully-featured, quick-start Feathr environment that enables organizations to rapidly prototype various capabilities of Feathr without the burden of full-scale infrastructure deployment. It is designed to make it easier for users to get started quickly, validate feature definitions and new ideas, and interactive experience.
By default, it comes with a Jupyter notebook environment to interact with the Feathr platform.
Users can also use the user experience (UX) to visualize the features, lineage, and other capabilities.
To get started, check out the quick start guide to local sandbox.
MLOps V2 solution accelerator provides a modular end-to-end approach to MLOps in Azure based on pattern architecture. We are pleased to announce an initial integration of Feathr to the classical pattern that enables Terraform-based infrastructure deployment as part of the infrastructure provisioning with Azure machine learning (AML) workspace.
With this integration, enterprise customers can use the templates to customize their continuous integration and continuous delivery (CI/CD) workflows to run end-to-end MlOps in their organization.
Check out the Feathr integration with MLOps V2 deployment guide.
We have added a number of enhancements to the graphical user interface (GUI) to improve the usability. These include support for registering features, support for deleting features, support for displaying version, and quick access to lineage via the top menu.
Try out our demo UX on our live demo site.
The Feathr journey has just begun, this is the first stop to many great things to come. So, stay tuned for many enterprise enhancements, security, monitoring, and compliance features with a more enriched MLOps experience. Check out how you can also contribute to this great project, and if you have not already, you can join our slack channel here.