5 min read

Announcing the availability of Feathr 1.0

This blog is co-authored by Edwin Cheung, Principal Software Engineering Manager and Xiaoyong Zhu, Principal Data Scientist.

Feathr is an enterprise scale feature store, which facilitates the creation, engineering, and usage of machine learning features in production. It has been used by many organizations as an online/offline store, as well as for real-time streaming.

Today, we are excited to announce the much-anticipated availability of the OSS Feathr 1.0. It contains many new features and enhancements since Feathr became open-source one year ago. Similar to the online transformation, rapid sandbox environment, MLOPs V2 accelerator integration really accelerates the development and deployment of machine learning projects at enterprise scale.

Online transformation via domain specific language (DSL)

In various machine learning scenarios, features generation is required for both training and inferences. There is a limitation where data source cannot come from online service, as currently transformation only happens before feature data is published to the online store and the transformation is required close to real-time. In such cases, there is a need for a mechanism where the user has the ability to run transformation on the inference data dynamically before inferencing via the model. The new online transformation via DSL feature addresses these challenges by using a custom transformation engine that can process transformation requests and responses close to real-time on demand. 

diagram

It allows definition of transformation logic declaratively using DSL syntax which is based on EBNF. It also provides extensibility, where there is a need to define custom complex transformation, by supporting user defined function (UDF) written in Python or Java.

nyc_taxi_demo(pu_loc_id as int, do_loc_id as int, pu_time as string, do_time as string, trip_distance as double, fare_amount as double) …

 project duration_second = (to_unix_timestamp(do_time, "%Y/%-m/%-d %-H:%-M") - to_unix_timestamp(pu_time, "%Y/%-m/%-d %-H:%-M"))

| project speed_mph = trip_distance * 3600 / duration_second

;

This declarative logic runs in a new high-performance DSL engine. We provide HELM Chart to deploy this service in a container-based technology such as the Azure Kubernetes Service (AKS).

The transformation engine can also run as a standalone executable, which is a HTTP server that can be used to transform data for testing purposes. feathrfeaturestore/feathrpiper:latest.

curl -s -H"content-type:application/json" http://localhost:8000/process -d'{"requests": [{"pipeline": "nyc_taxi_demo_3_local_compute","data": {"pu_loc_id": 41,"do_loc_id": 57,"pu_time": "2020/4/1 0:41","do_time": "2020/4/1 0:56","trip_distance": 6.79,"fare_amount": 21.0}}]}' 

It also provides the ability to auto-generate the DSL file if there are already predefined feature transformations, which have been created for the offline-transformation.

Online transformation performance benchmark

It is imperative that online transformation performs close to real-time and meets low latency demand with high queries per second (QPS) transformation for many of the enterprise customers’ needs. To determine the performance, we have conducted a benchmark on three tests. First, deployment on AKS with traffic going through ingress controller. Second, traffic going through AKS internal load balance, and finally, via the localhost.  

Benchmark A—Traffic going through ingress controller (AKS)

Infrastructure setup

  • Test agent runs on 1 pod on node with size Standard_D8ds_v5
  • Transform function deployed as docker image running on 1 pod on a different node with size Standard_D8ds_v5 in same AKS.
  • Agent sends request thru service hostname which means traffic will go thru ingress controller.
  • Test command: ab -k -c {concurrency_count} -n 1000000 (http://feathr-online.trafficmanager.net/healthz)

Benchmark A result

Total RequestsConcurrencyp90p95p99request/sec
100000010034943710
1000000200681543685
100000030010111843378
100000040013152143220
100000050016192442406

Benchmark B—Traffic goes thru AKS internal load balancer (AKS)

Benchmark B—Infrastructure setup

  • Test agent runs on 1 pod on node with size Standard_D8ds_v5
  • Transform function deployed as docker image running on 1 pod on a different node with size Standard_D8ds_v5 in same AKS.
  • Agent sends request thru service pip which means traffic will go thru internal load balancer.
  • Test command: ab -k -c {concurrency_count} -n 1000000 ab -k -c 100 -n 1000000 http://10.0.187.2/healthz

Benchmark B result

Total RequestsConcurrencyp90p95p99request/sec
100000010034447673
100000020057847035
10000003009101246613
100000040011121545362
100000050014151944941

 

Benchmark C—Traffic going through local host (AKS)

Infrastructure setup

  • Test agent runs on 1 pod on node with size Standard_D8ds_v5.
  • Transform function deployed as docker image running on the same pod.
  • Agent sends request thru localhost which means there are not network traffic at all.
  • Test command: ab -k -c {concurrency_count} -n 1000000 (http://localhost/healthz)

Benchmark C result

Total RequestsConcurrencyp90p95p99Request/sec
100000010022359466
100000020044559433
100000030066860184
1000000400891059622
100000050010111459031

Benchmark summary

  • If transform service and up-streaming are in same host/pod, the p95 latency result is very good, stay within 10ms if concurrency < 500.
  • If transform service and up-streaming are in different host/pod, the p95 latency result might get reduced with 2-4ms, if traffic goes thru internal load balance.
  • If transform service and up-streaming are in different host/pod, the p95 latency result might get reduced with 2-8ms, if traffic goes thru ingress controller.

Benchmark thanks to Blair Chan and Chen Xu. For more details, check out the online transformation guide.

Getting started with sandbox environment

This is an exciting feature, especially for data scientists, who may not have the necessary infrastructure background or know how to deploy the infrastructure in the cloud. The sandbox is a fully-featured, quick-start Feathr environment that enables organizations to rapidly prototype various capabilities of Feathr without the burden of full-scale infrastructure deployment. It is designed to make it easier for users to get started quickly, validate feature definitions and new ideas, and interactive experience.

By default, it comes with a Jupyter notebook environment to interact with the Feathr platform.

graphical user interface, text, application

Users can also use the user experience (UX) to visualize the features, lineage, and other capabilities.

To get started, check out the quick start guide to local sandbox.

Feathr with MlOps V2 accelerator

graphical user interface

MLOps V2 solution accelerator provides a modular end-to-end approach to MLOps in Azure based on pattern architecture. We are pleased to announce an initial integration of Feathr to the classical pattern that enables Terraform-based infrastructure deployment as part of the infrastructure provisioning with Azure machine learning (AML) workspace.

With this integration, enterprise customers can use the templates to customize their continuous integration and continuous delivery (CI/CD) workflows to run end-to-end MlOps in their organization.

Check out the Feathr integration with MLOps V2 deployment guide.

Feathr GUI enhancement

We have added a number of enhancements to the graphical user interface (GUI) to improve the usability. These include support for registering features, support for deleting features, support for displaying version, and quick access to lineage via the top menu. 

Try out our demo UX on our live demo site.

What’s next

The Feathr journey has just begun, this is the first stop to many great things to come. So, stay tuned for many enterprise enhancements, security, monitoring, and compliance features with a more enriched MLOps experience. Check out how you can also contribute to this great project, and if you have not already, you can join our slack channel here.