{"id":85277,"date":"2021-03-01T15:45:35","date_gmt":"2021-03-01T23:45:35","guid":{"rendered":"https:\/\/cloudblogs.microsoft.com\/opensource\/?p=85277"},"modified":"2025-06-23T08:42:19","modified_gmt":"2025-06-23T15:42:19","slug":"optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider","status":"publish","type":"post","link":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/","title":{"rendered":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>This blog was co-authored with Manash Goswami, Principal Program Manager, Machine Learning Platform.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"360\" height=\"249\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/03\/onnx-runtime-intel-ai.jpg\" alt=\"A logo for a company\" class=\"wp-image-97580\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The performance improvements provided by ONNX Runtime powered by Intel\u00ae Deep Learning Boost: Vector Neural Network Instructions (Intel\u00ae DL Boost: VNNI) greatly improves performance of machine learning model execution for developers. In the past, machine learning models mostly relied on 32-bit floating point instructions using AVX512. Now, machine learning models can use 8-bit integer instructions (Intel\u00ae DL Boost: VNNI) to achieve substantial speed increases without significant loss of accuracy. To fully understand these performance improvements, you must first understand ONNX Runtime, Bi-Directional Encoder Representations from Transformers (BERT), Intel DL Boost: VNNI, and steps to achieve the best performance with ONNX Runtime on Intel platforms. Keep reading to learn more about accelerating BERT model inference with ONNX Runtime and Intel\u00ae DL Boost: VNNI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-onnx-runtime\">What is ONNX Runtime?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.onnxruntime.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime<\/a> is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. It enables acceleration of machine learning inferencing across all of your deployment targets using a single set of APIs.<sup>1<\/sup>Intel has partnered with the Microsoft ONNX Runtime team to add support for Intel\u00ae DL Boost and take advantage of microarchitectural improvements, such as non-exclusive caches on the new 11th Gen Intel\u00ae Core\u2122 processors to significantly improve performance. Read more to learn how to achieve the best performance using Intel\u00ae Deep Learning Boost: VNNI on ONNX Runtime\u2019s default CPU backend (Microsoft Linear Algebra Subroutine (MLAS)).<\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Picture2.webp\" alt=\"ONNX Runtime Architecture\" class=\"wp-image-85313 webp-format\" srcset=\"\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Picture2.jpg\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Figure 1: ONNX Runtime Architecture<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-bert\">What is BERT?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">BERT was originally created and published in 2018 by Jacob Devlin and his colleagues at Google. It\u2019s a machine learning technique that greatly improves machine natural language processing (NLP) capabilities. This technique does not process individual words (as previously done), but instead, it processes complete sentences. Machine learning models can now understand the relationship between words within a sentence and understand the context of a sentence. This approach to neuro-linguistic programming (NLP) has revolutionized language processing tasks such as search, document classification, question answering, sentence similarity, text prediction, and more. BERT class models are widely applied in the industry. Recently techniques such as knowledge distillation and quantization have been successfully applied to BERT, making this model deployable on Windows PCs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"what-is-deep-learning-boost-vnni\">What is Deep Learning Boost: VNNI?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Intel Deep Learning Boost: VNNI is designed to deliver significant deep learning acceleration, as well as power-saving optimizations. A single vector instruction (such as VPDPBUSD) can be used to multiply two 8-bit integers and combining the result into a 32-bit output.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"360\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/03\/isa-fusion.jpg\" alt=\"A screenshot of a computer screen\" class=\"wp-image-97586\" srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/03\/isa-fusion.jpg 640w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/03\/isa-fusion-388x218.jpg 388w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/03\/isa-fusion-450x253.jpg 450w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"steps-to-build-and-execute-onnx-runtime-for-windows-10-on-11th-gen-intel-core-processors\">Steps to build and execute ONNX Runtime for Windows 10 on 11<sup>th<\/sup> Gen Intel\u00ae Core\u2122 Processors<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Pre-requisites:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Install <a href=\"https:\/\/www.python.org\/downloads\/release\/python-380\/\" target=\"_blank\" rel=\"noreferrer noopener\">Python 3.8<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Install <a href=\"https:\/\/jupyter.org\/install\" target=\"_blank\" rel=\"noreferrer noopener\">jupyter notebook<\/a>.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"preparing-the-model\">Preparing the model:<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In the Command Line terminal, open the jupyter notebook:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\njupyter notebook\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Once the notebook opens in the browser, run all the cells in notebook and save the quantized INT8 ONNX model on your local machine.<br>Build ONNXRuntime:<br>When building ONNX Runtime, developers have the flexibility to choose between OpenMP or ONNX Runtime\u2019s own thread pool implementation. For achieving the best performance on Intel platforms, configure ONNX Runtime with OpenMP and later explicitly define the threading policy for model inference.<br>In the Command Line terminal:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n\u200bgit clone --recursive https:\/\/github.com\/Microsoft\/ONNXRuntime\ncd ONNXRuntime\nInstall cmake-3.13 or higher from https:\/\/cmake.org\/download\/\n.\\build.bat --config RelWithDebInfo --build_shared_lib \u2013parallel --use_openmp\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"tuning-performance-for-onnx-runtime-s-default-execution-provider\">Tuning Performance for ONNX Runtime\u2019s Default Execution Provider:<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In conditions where threading can be explicit, it is recommended to parallelize threads, binding each thread to separate physical cores. On platforms where hyperthreading is enabled, the recommendation is to skip alternate cores (if the number of threads needs to be less than the number of logical cores). This reduces the overhead of cache thrashing caused by repeated thread swapping between cores.<br>For Windows, use \u201cstart \/affinity AA\u201d to keep four threads of ONNX Runtime on physical cores by skipping alternate logical cores. To explicitly fix the number of threads OMP_NUM_THREADS environment variable is used. For example, in the Command Line terminal:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\n\u200bset KMP_AFFINITY=granularity=fine,compact,1,0\nset OMP_NESTED=0\nset OMP_WAIT_POLICY=ACTIVE\nset \/a OMP_NUM_THREADS=4\n<\/pre><\/div>\n\n\n<h2 class=\"wp-block-heading\" id=\"run-the-quantized-model-with-onnx-runtime\">Run the quantized model with ONNX Runtime:<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">When executing the runtime, you need to place a folder in the same directory as the runtime with the input test dataset you want to use. For illustration purposes, we will generate a random test input dataset with the following python script which we will name <em>generate_test_data_set.py<\/em>:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nimport numpy as np\nfrom onnx import numpy_helper\nbatch_range = [1,2, 4, 8, 16]\nfor batch in range(len(batch_range)):\n              for seq in [20,32,64]:\n                             numpy_array = np.random.rand(batch_range[batch],seq).astype(np.int64)\n                             tensor = numpy_helper.from_array(numpy_array)\n                             name = \"input_0_\" + str(batch_range[batch]) +\"_\"+ str(seq) + \".pb\"\n                             f = open(name, \"wb\")\n                             f.write(tensor.SerializeToString())\n                             f.close()\n                             name = \"input_1_\" + str(batch_range[batch]) +\"_\"+ str(seq) + \".pb\"\n                             f = open(name, \"wb\")\n                             f.write(tensor.SerializeToString())\n                             f.close()\n                             name = \"input_2_\" + str(batch_range[batch]) +\"_\"+ str(seq) + \".pb\"\n                             f = open(name, \"wb\")\n                             f.write(tensor.SerializeToString())\n                             f.close()\n                             print (name)\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">In the Command Line terminal:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\npython generate_test_data_set.py\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">This will generate test data set for three inputs for BERT base:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">input_0_&lt;batch_size&gt;_&lt;seqLength&gt;.pb<\/li>\n\n\n\n<li class=\"wp-block-list-item\">input_1_&lt;batch_size&gt;_&lt;seqLength&gt;.pb<\/li>\n\n\n\n<li class=\"wp-block-list-item\">input_2_&lt;batch_size&gt;_&lt;seqLength&gt;.pb<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Create a new folder <em>\u2018test_data_set_0\u2019<\/em> folder in the same location as the ONNX model Files. Make sure no other folder exists in the same location. Copy the three inputs of the SAME sequence and batch length to the <em>test_data_set_0<\/em> folder.<br>In the test_data_set_0 folder, rename<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">input_0_&lt;batch_size&gt;_&lt;seqLength&gt;.pb to input_0.pb<\/li>\n\n\n\n<li class=\"wp-block-list-item\">input_1_&lt;batch_size&gt;_&lt;seqLength&gt;.pb to input_1.pb<\/li>\n\n\n\n<li class=\"wp-block-list-item\">input_2_&lt;batch_size&gt;_&lt;seqLength&gt;.pb to input_2.pb<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Now run ONNX Runtime. In the Command Line terminal:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\ncd&lt;root&gt;\\onnxruntime\\build\\Windows\\RelWithDebInfo\\RelWithDebInfo\nonnxruntime_perf_test.exe -m times -r&lt;#iterations&gt;-o 99 -e cpu MODEL_NAME.onnx\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Repeat steps for the next set of batch and seq lengths<strong>.<\/strong><br>Get extensive details about <a href=\"https:\/\/www.onnxruntime.ai\/docs\/how-to\/build.html\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime inference<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"results\">Results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">With Intel\u00ae DL Boost: VNNI and ONNX Runtime, developers can significantly increase throughput and performance for transformer-based Natural Language Processing models with quantization. For example, the quantized BERT 12-layer model with Intel\u00ae DL Boost: VNNI and ONNX Runtime can achieve up to 2.9 times performance gains. The Distilled BERT can achieve up to 3.3 times performance gains.<br>To participate, check out GitHub repos located on <a href=\"https:\/\/github.com\/microsoft\/onnxruntime\/\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/graph1-1024x635.webp\" alt=\"\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>BERT 12-layer language processing workload on 11th Gen Intel\u00ae Core\u2122 processor get increased speeds up to 2.9 times with DLBoost: VNNI<\/em><\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3.webp\" alt=\"chart, histogram\" class=\"wp-image-85292 webp-format\" srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3.webp 856w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-300x212.webp 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-768x542.webp 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-330x233.webp 330w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-800x564.webp 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-400x282.webp 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-450x318.jpg 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-650x459.jpg 650w\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3.jpg\" data-orig-srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3.jpg 856w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-300x212.jpg 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-768x542.jpg 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-330x233.jpg 330w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-800x564.jpg 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-400x282.jpg 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-450x318.jpg 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/Capture-3-650x459.jpg 650w\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Distilled BERT model achieves 3.38 times increased speeds due to DLBoost: VNNI (Number of threads = four)<\/em><a href=\"https:\/\/cloudblogs.microsoft.com\/industry-blog\/digital-transformation\/2020\/02\/27\/four-elements-of-a-successful-digital-transformation\/#_ednref1\"><\/a><br><br><sup>1<\/sup><a href=\"https:\/\/microsoft.github.io\/onnxruntime\/about.html\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/microsoft.github.io\/onnxruntime\/about.html<\/a>, 10\/1\/2020<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This blog was co-authored with Manash Goswami, Principal Program Manager, Machine Learning Platform. The performance improvements provided by ONNX Runtime powered by Intel\u00ae Deep Learning Boost: Vector Neural Network Instructions (Intel\u00ae DL Boost: VNNI) greatly improves performance of machine learning model execution for developers.<\/p>\n","protected":false},"author":5562,"featured_media":85457,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","ms-ems-related-posts":[],"footnotes":""},"tags":[],"programming-languages":[],"content-type":[],"job-role":[],"topic":[],"coauthors":[1775,1778,1781],"class_list":["post-85277","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","review-flag-1593580362-584","review-flag-1593580428-734","review-flag-1593580415-931","review-flag-1593580419-521","review-flag-1-1593580432-963","review-flag-2-1593580437-411","review-flag-3-1593580442-169","review-flag-4-1593580448-609","review-flag-8-1593580468-572","review-flag-9-1593580473-997","review-flag-exclu-1593580297-613","review-flag-machi-1680214156-53","review-flag-new-1593580248-669"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog\" \/>\n<meta property=\"og:description\" content=\"This blog was co-authored with Manash Goswami, Principal Program Manager, Machine Learning Platform. The performance improvements provided by ONNX Runtime powered by Intel\u00ae Deep Learning Boost: Vector Neural Network Instructions (Intel\u00ae DL Boost: VNNI) greatly improves performance of machine learning model execution for developers.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Open Source Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-03-01T23:45:35+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-23T15:42:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"360\" \/>\n\t<meta property=\"og:image:height\" content=\"249\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Aparna Gollapudi, Ramakrishnan Sivakumar, Saurabh Tangri\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:site\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Aparna Gollapudi, Ramakrishnan Sivakumar, Saurabh Tangri\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/aparna-gollapudi\\\/\",\"@type\":\"Person\",\"@name\":\"Aparna Gollapudi\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/ramakrishnan-sivakumar\\\/\",\"@type\":\"Person\",\"@name\":\"Ramakrishnan Sivakumar\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/saurabh-tangri\\\/\",\"@type\":\"Person\",\"@name\":\"Saurabh Tangri\"}],\"headline\":\"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider\",\"datePublished\":\"2021-03-01T23:45:35+00:00\",\"dateModified\":\"2025-06-23T15:42:19+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/\"},\"wordCount\":979,\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/4.jpg\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/\",\"name\":\"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/4.jpg\",\"datePublished\":\"2021-03-01T23:45:35+00:00\",\"dateModified\":\"2025-06-23T15:42:19+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#primaryimage\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/4.jpg\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/02\\\/4.jpg\",\"width\":360,\"height\":249,\"caption\":\"shape\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/03\\\/01\\\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"name\":\"Microsoft Open Source Blog\",\"description\":\"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability\",\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\",\"name\":\"Microsoft Open Source Blog\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Open Source Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/OpenAtMicrosoft\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/person\\\/4d7e7cd8266dc319e43a6de1e173495f\",\"name\":\"Teri Dormer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"caption\":\"Teri Dormer\"},\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/teridormer\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/","og_locale":"en_US","og_type":"article","og_title":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog","og_description":"This blog was co-authored with Manash Goswami, Principal Program Manager, Machine Learning Platform. The performance improvements provided by ONNX Runtime powered by Intel\u00ae Deep Learning Boost: Vector Neural Network Instructions (Intel\u00ae DL Boost: VNNI) greatly improves performance of machine learning model execution for developers.","og_url":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/","og_site_name":"Microsoft Open Source Blog","article_published_time":"2021-03-01T23:45:35+00:00","article_modified_time":"2025-06-23T15:42:19+00:00","og_image":[{"width":360,"height":249,"url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg","type":"image\/jpeg"}],"author":"Aparna Gollapudi, Ramakrishnan Sivakumar, Saurabh Tangri","twitter_card":"summary_large_image","twitter_creator":"@OpenAtMicrosoft","twitter_site":"@OpenAtMicrosoft","twitter_misc":{"Written by":"Aparna Gollapudi, Ramakrishnan Sivakumar, Saurabh Tangri","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#article","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/"},"author":[{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/aparna-gollapudi\/","@type":"Person","@name":"Aparna Gollapudi"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/ramakrishnan-sivakumar\/","@type":"Person","@name":"Ramakrishnan Sivakumar"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/saurabh-tangri\/","@type":"Person","@name":"Saurabh Tangri"}],"headline":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider","datePublished":"2021-03-01T23:45:35+00:00","dateModified":"2025-06-23T15:42:19+00:00","mainEntityOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/"},"wordCount":979,"publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/","url":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/","name":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider | Microsoft Open Source Blog","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#primaryimage"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg","datePublished":"2021-03-01T23:45:35+00:00","dateModified":"2025-06-23T15:42:19+00:00","breadcrumb":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#primaryimage","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/02\/4.jpg","width":360,"height":249,"caption":"shape"},{"@type":"BreadcrumbList","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/03\/01\/optimizing-bert-model-for-intel-cpu-cores-using-onnx-runtime-default-execution-provider\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/opensource.microsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Optimizing BERT model for Intel CPU Cores using ONNX runtime default execution provider"}]},{"@type":"WebSite","@id":"https:\/\/opensource.microsoft.com\/blog\/#website","url":"https:\/\/opensource.microsoft.com\/blog\/","name":"Microsoft Open Source Blog","description":"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability","publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/opensource.microsoft.com\/blog\/#organization","name":"Microsoft Open Source Blog","url":"https:\/\/opensource.microsoft.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Open Source Blog"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/OpenAtMicrosoft"]},{"@type":"Person","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/person\/4d7e7cd8266dc319e43a6de1e173495f","name":"Teri Dormer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932","url":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","caption":"Teri Dormer"},"url":"https:\/\/opensource.microsoft.com\/blog\/author\/teridormer\/"}]}},"bloginabox_animated_featured_image":null,"bloginabox_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Open Source Blog","distributor_original_site_url":"https:\/\/opensource.microsoft.com\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/85277","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/users\/5562"}],"replies":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=85277"}],"version-history":[{"count":7,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/85277\/revisions"}],"predecessor-version":[{"id":97588,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/85277\/revisions\/97588"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media\/85457"}],"wp:attachment":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=85277"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=85277"},{"taxonomy":"programming-languages","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/programming-languages?post=85277"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/content-type?post=85277"},{"taxonomy":"job-role","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/job-role?post=85277"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/topic?post=85277"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/coauthors?post=85277"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}