Announcing the availability of Feathr 1.0

By Mufajjul Ali, Principal Technical Product Manager

Content type
News

Topic
AI + Machine Learning

This blog is co-authored by Edwin Cheung, Principal Software Engineering Manager and Xiaoyong Zhu, Principal Data Scientist.

Feathr is an enterprise scale feature store, which facilitates the creation, engineering, and usage of machine learning features in production. It has been used by many organizations as an online/offline store, as well as for real-time streaming.

Today, we are excited to announce the much-anticipated availability of the OSS Feathr 1.0. It contains many new features and enhancements since Feathr became open-source one year ago. Similar to the online transformation, rapid sandbox environment, MLOPs V2 accelerator integration really accelerates the development and deployment of machine learning projects at enterprise scale.

Online transformation via domain specific language (DSL)

In various machine learning scenarios, features generation is required for both training and inferences. There is a limitation where data source cannot come from online service, as currently transformation only happens before feature data is published to the online store and the transformation is required close to real-time. In such cases, there is a need for a mechanism where the user has the ability to run transformation on the inference data dynamically before inferencing via the model. The new online transformation via DSL feature addresses these challenges by using a custom transformation engine that can process transformation requests and responses close to real-time on demand.

Feathr online transformation architecture.

It allows definition of transformation logic declaratively using DSL syntax which is based on EBNF. It also provides extensibility, where there is a need to define custom complex transformation, by supporting user defined function (UDF) written in Python or Java.

nyc_taxi_demo(pu_loc_id as int, do_loc_id as int, pu_time as string, do_time as string, trip_distance as double, fare_amount as double) …

 project duration_second = (to_unix_timestamp(do_time, "%Y/%-m/%-d %-H:%-M") - to_unix_timestamp(pu_time, "%Y/%-m/%-d %-H:%-M"))

| project speed_mph = trip_distance * 3600 / duration_second

;

This declarative logic runs in a new high-performance DSL engine. We provide HELM Chart to deploy this service in a container-based technology such as the Azure Kubernetes Service (AKS).

The transformation engine can also run as a standalone executable, which is a HTTP server that can be used to transform data for testing purposes. feathrfeaturestore/feathrpiper:latest.

curl -s -H"content-type:application/json" http://localhost:8000/process -d'{"requests": [{"pipeline": "nyc_taxi_demo_3_local_compute","data": {"pu_loc_id": 41,"do_loc_id": 57,"pu_time": "2020/4/1 0:41","do_time": "2020/4/1 0:56","trip_distance": 6.79,"fare_amount": 21.0}}]}'

It also provides the ability to auto-generate the DSL file if there are already predefined feature transformations, which have been created for the offline-transformation.

Online transformation performance benchmark

It is imperative that online transformation performs close to real-time and meets low latency demand with high queries per second (QPS) transformation for many of the enterprise customers’ needs. To determine the performance, we have conducted a benchmark on three tests. First, deployment on AKS with traffic going through ingress controller. Second, traffic going through AKS internal load balance, and finally, via the localhost.

Benchmark A—Traffic going through ingress controller (AKS)

Infrastructure setup

Test agent runs on 1 pod on node with size Standard_D8ds_v5
Transform function deployed as docker image running on 1 pod on a different node with size Standard_D8ds_v5 in same AKS.
Agent sends request thru service hostname which means traffic will go thru ingress controller.
Test command: ab -k -c {concurrency_count} -n 1000000 (http://feathr-online.trafficmanager.net/healthz)

Benchmark A result

Total Requests	Concurrency	p90	p95	p99	request/sec
1000000	100	3	4	9	43710
1000000	200	6	8	15	43685
1000000	300	10	11	18	43378
1000000	400	13	15	21	43220
1000000	500	16	19	24	42406

Benchmark B—Traffic goes thru AKS internal load balancer (AKS)

Benchmark B—Infrastructure setup

Test agent runs on 1 pod on node with size Standard_D8ds_v5
Transform function deployed as docker image running on 1 pod on a different node with size Standard_D8ds_v5 in same AKS.
Agent sends request thru service pip which means traffic will go thru internal load balancer.
Test command: ab -k -c {concurrency_count} -n 1000000 ab -k -c 100 -n 1000000 http://10.0.187.2/healthz

Benchmark B result

Total Requests	Concurrency	p90	p95	p99	request/sec
1000000	100	3	4	4	47673
1000000	200	5	7	8	47035
1000000	300	9	10	12	46613
1000000	400	11	12	15	45362
1000000	500	14	15	19	44941

Benchmark C—Traffic going through local host (AKS)

Infrastructure setup

Test agent runs on 1 pod on node with size Standard_D8ds_v5.
Transform function deployed as docker image running on the same pod.
Agent sends request thru localhost which means there are not network traffic at all.
Test command: ab -k -c {concurrency_count} -n 1000000 (http://localhost/healthz)

Benchmark C result

Total Requests	Concurrency	p90	p95	p99	Request/sec
1000000	100	2	2	3	59466
1000000	200	4	4	5	59433
1000000	300	6	6	8	60184
1000000	400	8	9	10	59622
1000000	500	10	11	14	59031

Benchmark summary

If transform service and up-streaming are in same host/pod, the p95 latency result is very good, stay within 10ms if concurrency < 500.
If transform service and up-streaming are in different host/pod, the p95 latency result might get reduced with 2-4ms, if traffic goes thru internal load balance.
If transform service and up-streaming are in different host/pod, the p95 latency result might get reduced with 2-8ms, if traffic goes thru ingress controller.

Benchmark thanks to Blair Chan and Chen Xu. For more details, check out the online transformation guide.

Getting started with sandbox environment

This is an exciting feature, especially for data scientists, who may not have the necessary infrastructure background or know how to deploy the infrastructure in the cloud. The sandbox is a fully-featured, quick-start Feathr environment that enables organizations to rapidly prototype various capabilities of Feathr without the burden of full-scale infrastructure deployment. It is designed to make it easier for users to get started quickly, validate feature definitions and new ideas, and interactive experience.

By default, it comes with a Jupyter notebook environment to interact with the Feathr platform.

Jupyter notebook running within the Sandbox environment.

Users can also use the user experience (UX) to visualize the features, lineage, and other capabilities.

To get started, check out the quick start guide to local sandbox.

Feathr with MlOps V2 accelerator

Feathr integrated MLOps V2 architecture.

MLOps V2 solution accelerator provides a modular end-to-end approach to MLOps in Azure based on pattern architecture. We are pleased to announce an initial integration of Feathr to the classical pattern that enables Terraform-based infrastructure deployment as part of the infrastructure provisioning with Azure machine learning (AML) workspace.

With this integration, enterprise customers can use the templates to customize their continuous integration and continuous delivery (CI/CD) workflows to run end-to-end MlOps in their organization.

Check out the Feathr integration with MLOps V2 deployment guide.

Feathr GUI enhancement

We have added a number of enhancements to the graphical user interface (GUI) to improve the usability. These include support for registering features, support for deleting features, support for displaying version, and quick access to lineage via the top menu.

Try out our demo UX on our live demo site.

What’s next

The Feathr journey has just begun, this is the first stop to many great things to come. So, stay tuned for many enterprise enhancements, security, monitoring, and compliance features with a more enriched MLOps experience. Check out how you can also contribute to this great project, and if you have not already, you can join our slack channel here.

Mufajjul Ali

Principal Technical Product Manager

See more articles from this author

Jun 30 •

4 min read

Expanding platform engineering capabilities with Radius Resource Types

Now, with Radius Resource Types, platform engineers can define resource types specific…
Jun 10 •

4 min read

Drasi accepted into CNCF sandbox for change-driven solutions

The Azure Incubations team is proud to share that Drasi has officially…
Jun 9 •

3 min read

Microsoft and F5 join forces on OpenTelemetry with Apache Arrow in Rust

Microsoft and F5 are collaborating on Phase 2 of the OpenTelemetry with…
Mar 26 •

10 min read

Hyperlight Wasm: Fast, secure, and OS-free

We're announcing the release of Hyperlight Wasm: a Hyperlight virtual machine (VM)…

Online transformation via domain specific language (DSL)

Online transformation performance benchmark

Benchmark A—Traffic going through ingress controller (AKS)

Infrastructure setup

Benchmark A result

Benchmark B—Traffic goes thru AKS internal load balancer (AKS)

Benchmark B—Infrastructure setup

Benchmark B result

Benchmark C—Traffic going through local host (AKS)

Infrastructure setup

Benchmark C result

Benchmark summary

Getting started with sandbox environment

Feathr with MlOps V2 accelerator

Feathr GUI enhancement

What’s next

Related posts

Expanding platform engineering capabilities with Radius Resource Types

Drasi accepted into CNCF sandbox for change-driven solutions

Microsoft and F5 join forces on OpenTelemetry with Apache Arrow in Rust

Hyperlight Wasm: Fast, secure, and OS-free