{"id":94525,"date":"2023-06-26T08:00:00","date_gmt":"2023-06-26T15:00:00","guid":{"rendered":""},"modified":"2023-08-01T17:35:34","modified_gmt":"2023-08-02T00:35:34","slug":"automate-optimization-techniques-for-transformer-models","status":"publish","type":"post","link":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/","title":{"rendered":"Automate optimization techniques for transformer models"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/microsoft\/Olive\" target=\"_blank\" rel=\"noreferrer noopener\">Olive <\/a>is an easy-to-use hardware-aware model optimization tool by Microsoft which builds up a unified optimization framework to enable independent hardware vendors (IHVs) extend their capabilities to include their state-of-the-art and hardware-specific optimization toolchains. The Intel\u00ae Neural Compressor is an open-source library supporting popular advanced model compression technologies, from techniques used in the industry to the latest state-of-the-art from research. Intel has collaborated with Microsoft to integrate Intel\u00ae Neural Compressor into Olive, enabling developers to easily take advantage of model compression techniques in their deployment platform, including Intel processors and accelerators.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The rest of this blog is organized as follows:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1) We first provide an introduction to the toolchains, including <a href=\"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/oneapi\/neural-compressor.html\" target=\"_blank\" rel=\"noreferrer noopener\">Intel Neural Compressor<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2) We then walk through a step-by-step example, on how to optimize popular workloads like transformer-based models from Hugging Face.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3) Finally, we conclude by summarizing performance gains and accuracy results obtained by using Olive and Intel Neural Compressor to optimize GPT-J, Bert-Base, and RoBERTa.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Olive with Intel\u00ae Neural Compressor<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Olive is a user-friendly tool for optimizing models with hardware awareness. It combines top-notch techniques across model compression, optimization, and compilation. By considering a given model, target hardware, as well as deployment constraints such as accuracy and latency, Olive tunes the most suitable optimization techniques to generate highly efficient models for inferencing across operating environments, platforms, and devices.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Intel\u00ae Neural Compressor provides popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search for mainstream <a href=\"https:\/\/www.intel.com\/content\/www\/us\/en\/developer\/tools\/frameworks\/overview.html\" target=\"_blank\" rel=\"noreferrer noopener\">frameworks<\/a> such as PyTorch*, TensorFlow*, and ONNX Runtime.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By integrating the Intel\u00ae Neural Compressor quantizer into Olive\u2019s optimizations, as shown in <em>figure 1, <\/em>developers can benefit from automatic selection of optimizations based on their target deployment platform. For example, developers can easily optimize their models with Olive to take advantage of Intel hardware acceleration technologies such as Intel\u00ae Advanced Matrix Extensions (Intel\u00ae AMX) and Intel\u00ae DL Boost. With Intel\u00ae DL Boost, developers can get up to 4 times theoretical performance speedups compared to the FP32 baseline models, and even higher speedups when using Intel\u00ae AMX while meeting your model accuracy requirements.<\/p>\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-1024x701.webp\" alt=\"here is Olive architecture. By taking input model and production requirements, Olive tunes optimization techniques to output deployment-ready model packages. Now INCQuantizer is one optimization technique user can apply in Olive. \" class=\"wp-image-94531 webp-format\" width=\"800\" height=\"547\" srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-1024x701.png 1024w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-300x205.png 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-768x526.png 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-800x547.png 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-400x274.png 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-450x308.png 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-650x445.webp 650w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1.webp 1096w\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-1024x701.png\" data-orig-srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-1024x701.png 1024w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-300x205.png 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-768x526.png 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-800x547.png 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-400x274.png 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-450x308.png 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1-650x445.png 650w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture1.png 1096w\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 1: Olive Architecture<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Over two times speedup with Hugging Face transformer models<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Model Enabling<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In this section, we\u2019ll explain how to optimize <a href=\"https:\/\/github.com\/microsoft\/Olive\/blob\/main\/examples\/bert\/README.md#bert-optimization-with-intel-neural-compressor-ptq-on-cpu\" target=\"_blank\" rel=\"noreferrer noopener\">Hugging Face Bert-Base<\/a> model. We\u2019ll leverage Intel\u00ae Neural Compressor\u2019s quantization capabilities, which supports static and dynamic, and have been integrated into Olive. Olive model optimization workflows are defined using JSON files and each optimization is named as a pass. Intel\u00ae Neural Compressor\u2019s quantization techniques have been incorporated into Olive as a single pass named, \u2018IncQuantization\u2019, providing developers with the ability to tune quantization methods and hyperparameters at the same time. &nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The following steps illustrate how developers can use Olive with \u2018IncQuantization\u2019 to accelerate a Hugging Face Bert-Base&nbsp;model:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1. Add \u2018IncQuantization\u2019 to \u2018passes\u2019 in a config.json file<\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture2.webp\" alt=\"code snippet for INCQuantization pass\" class=\"wp-image-94532 webp-format\" srcset=\"\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture2.webp\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 2: IncQuantization pass<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">2. Define the dataset and dataloader needed by \u2018IncQuantizaton\u2019 for model calibration and validation. To achieve this, Olive provides the flexibility for developers to feed their data via a separate Python file: \u2018user_script.py\u2019. The following shows how IncQuantizaton users feed calibration data to tune and validate quantization accuracy.<\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture3-1.webp\" alt=\"code snippet for dataloader\" class=\"wp-image-94533 webp-format\" srcset=\"\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture3-1.webp\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 3: Dataloader<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">3. Run the Olive optimization workflow using the config file.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Install required packages based on the config file: <em>\u2018python -m olive.workflows.run &#8211;config config.json \u2013setup\u2019<\/em><\/li>\n\n\n\n<li class=\"wp-block-list-item\">Then, optimize the model: <em>\u2018python -m olive.workflows.run &#8211;config config.json\u2019<\/em><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Performance benchmark<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The previous section demonstrated the simple process for developers to quantize a model using Intel\u00ae Neural Compressor\u2019s quantization capabilities through Olive. It applied quantization methods and hyperparameters tuned to obtain the best model, delivering performance gains, and meeting accuracy requirements. Following this same approach, we enabled and measured the performance and accuracy of three popular Hugging Face models: GPT-J, Bert-Base, and RoBERTa. We deployed these optimized models on the Microsoft Azure Standard E16s v5 (16 vCPUs, 128 GiB memory) Virtual Machine instance. The accuracy and performance improvements are shown below.<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1024x536.webp\" alt=\"Here is the performance and accuracy benchmark using Olive with Intel Neural Compressor for three popular Hugging Face models: GPT-J, Bert-Base, and RoBERTa. Taking GPT-J as an example, our solution demonstrates a threefold increase in performance while only experiencing a minimal 0.3% reduction in accuracy.\" class=\"wp-image-94534 webp-format\" srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1024x536.png 1024w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-300x157.png 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-768x402.png 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1536x803.png 1536w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-2048x1071.png 2048w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-800x418.png 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-400x209.png 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-450x235.png 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-650x340.webp 650w\" data-orig-src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1024x536.png\" data-orig-srcset=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1024x536.png 1024w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-300x157.png 300w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-768x402.png 768w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-1536x803.png 1536w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-2048x1071.png 2048w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-800x418.png 800w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-400x209.png 400w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-450x235.png 450w, https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2023\/06\/Picture4-1-650x340.png 650w\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Figure 4: Speedups<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td rowspan=\"2\">Model<\/td><td rowspan=\"2\">Dataset<\/td><td rowspan=\"2\">Sequence length<\/td><td colspan=\"4\">Accuracy<\/td><td colspan=\"4\">Latency (ms\/sample, 1bs)<\/td><\/tr><tr><td>INT8<\/td><td>FP32<\/td><td>Accuracy Percentage<\/td><td colspan=\"2\">INT8<\/td><td>FP32<\/td><td>Performance Speedup<\/td><td>&nbsp;<\/td><\/tr><tr><td>GPT-J<sup>[D]<\/sup><\/td><td>lambada<\/td><td>196 (32tokens)<\/td><td>78.93%<\/td><td>79.17%<\/td><td>-0.31%<\/td><td colspan=\"2\">1426.23<\/td><td>4382.90<\/td><td>3.07x<\/td><td>&nbsp;<\/td><\/tr><tr><td>Bert-Base<sup>[D]<\/sup><\/td><td>MRPC<\/td><td>128<\/td><td>84.07%<\/td><td>84.56%<\/td><td>-0.58%<\/td><td colspan=\"2\">19.96<\/td><td>47.32<\/td><td>2.37x<\/td><td>&nbsp;<\/td><\/tr><tr><td>RoBERTa<sup>[S]<\/sup><\/td><td>MRPC<\/td><td>128<\/td><td>86.76%<\/td><td>87.75%<\/td><td>-1.12%<\/td><td colspan=\"2\">20.47<\/td><td>47.22<\/td><td>2.31x<\/td><td>&nbsp;<\/td><\/tr><tr><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><td><\/td><\/tr><\/tbody><\/table><figcaption class=\"wp-element-caption\">[D]: post training dynamic quantization<br>&nbsp;&nbsp;&nbsp;&nbsp; [S]: post training static quantization<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Table 1: Accuracy, Throghput, Latency Results<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What\u2019s next?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">By using Olive with Intel Neural Compressor capabilities, developers can now easily leverage state-of-the-art model compression techniques for their inference deployment. The high level of integration built into the framework enables developers to automatically optimize models to meet performance and accuracy requirements in their targeted deployment. This framework also enables developers to get the most out of Intel platform acceleration capabilities (such as Intel\u00ae Advanced Matrix Extensions and Intel\u00ae DL Boost).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We invite you to try <a href=\"https:\/\/github.com\/microsoft\/Olive\" target=\"_blank\" rel=\"noreferrer noopener\">Olive<\/a> with Intel\u00ae Neural Compressor for your model deployment needs. We look forward to hearing your feedback and requests, and invite you to <a href=\"https:\/\/github.com\/microsoft\/Olive\/issues\/new\/choose\" target=\"_blank\" rel=\"noreferrer noopener\">submit them<\/a> through our GitHub projects.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Configuration details<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Test by Intel as of 04\/28\/23, Azure Standard E16s v5 instance, 1-node, 1x Intel(R) Xeon(R) Platinum ;8370C CPU @ 2.80GHz, 8 cores, HT On, Turbo Off, Total Memory 128GB, BIOS Hyper-V UEFI Release v4.1, microcode N\/A, 1x 64G Virtual Disk, Ubuntu 22.04.2 LTS, 5.15.0-1035-azure, gcc 11.3.0, Transformer Models, Deep Learning Framework: ONNXRT v1.13.1, BS1, 1 instance\/1 socket, Datatype: FP32\/INT8<a id=\"_msocom_2\"><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Intel has collaborated with Microsoft to integrate Intel\u00ae Neural Compressor into Olive, enabling developers to easily take advantage of model compression techniques in their deployment platform, including Intel processors and accelerators.<\/p>\n","protected":false},"author":6194,"featured_media":95483,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","ms-ems-related-posts":[],"footnotes":""},"tags":[663,1824,1911,1827],"programming-languages":[],"content-type":[346],"job-role":[],"topic":[2241,2252],"coauthors":[699,2036,2037,2038,1781],"class_list":["post-94525","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-onnx","tag-onnx-runtime","tag-quantization","tag-transformer","content-type-news","topic-cloud","topic-tools","review-flag-1593580428-734","review-flag-1-1593580432-963","review-flag-2-1593580437-411","review-flag-3-1593580442-169","review-flag-4-1593580448-609","review-flag-5-1593580453-725","review-flag-8-1593580468-572","review-flag-integ-1593580288-449","review-flag-lever-1593580265-989"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Automate optimization techniques for transformer models | Microsoft Open Source Blog<\/title>\n<meta name=\"description\" content=\"Olive is a user-friendly tool for optimizing models with hardware awareness.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Automate optimization techniques for transformer models | Microsoft Open Source Blog\" \/>\n<meta property=\"og:description\" content=\"Olive is a user-friendly tool for optimizing models with hardware awareness.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Open Source Blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-06-26T15:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-08-02T00:35:34+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1170\" \/>\n\t<meta property=\"og:image:height\" content=\"640\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Emma Ning, Feng Tian, Yuwen Zhou, Haihao Shen, Saurabh Tangri\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:site\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Emma Ning, Feng Tian, Yuwen Zhou, Haihao Shen, Saurabh Tangri\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/emma-ning\\\/\",\"@type\":\"Person\",\"@name\":\"Emma Ning\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/feng-tian\\\/\",\"@type\":\"Person\",\"@name\":\"Feng Tian\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/yuwen-zhou\\\/\",\"@type\":\"Person\",\"@name\":\"Yuwen Zhou\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/haihao-shen\\\/\",\"@type\":\"Person\",\"@name\":\"Haihao Shen\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/saurabh-tangri\\\/\",\"@type\":\"Person\",\"@name\":\"Saurabh Tangri\"}],\"headline\":\"Automate optimization techniques for transformer models\",\"datePublished\":\"2023-06-26T15:00:00+00:00\",\"dateModified\":\"2023-08-02T00:35:34+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/\"},\"wordCount\":880,\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/SEC20_Security_042.webp\",\"keywords\":[\"ONNX\",\"ONNX Runtime\",\"Quantization\",\"Transformer\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/\",\"name\":\"Automate optimization techniques for transformer models | Microsoft Open Source Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/SEC20_Security_042.webp\",\"datePublished\":\"2023-06-26T15:00:00+00:00\",\"dateModified\":\"2023-08-02T00:35:34+00:00\",\"description\":\"Olive is a user-friendly tool for optimizing models with hardware awareness.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#primaryimage\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/SEC20_Security_042.webp\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/SEC20_Security_042.webp\",\"width\":1170,\"height\":640},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2023\\\/06\\\/26\\\/automate-optimization-techniques-for-transformer-models\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Automate optimization techniques for transformer models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"name\":\"Microsoft Open Source Blog\",\"description\":\"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability\",\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\",\"name\":\"Microsoft Open Source Blog\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Open Source Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/OpenAtMicrosoft\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/person\\\/670a503ab71d544752bb510401276000\",\"name\":\"briannamcgovern\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g59fda97d548239cc4943609f616e5de4\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g\",\"caption\":\"briannamcgovern\"},\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/briannamcgovern\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Automate optimization techniques for transformer models | Microsoft Open Source Blog","description":"Olive is a user-friendly tool for optimizing models with hardware awareness.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/","og_locale":"en_US","og_type":"article","og_title":"Automate optimization techniques for transformer models | Microsoft Open Source Blog","og_description":"Olive is a user-friendly tool for optimizing models with hardware awareness.","og_url":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/","og_site_name":"Microsoft Open Source Blog","article_published_time":"2023-06-26T15:00:00+00:00","article_modified_time":"2023-08-02T00:35:34+00:00","og_image":[{"width":1170,"height":640,"url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.png","type":"image\/png"}],"author":"Emma Ning, Feng Tian, Yuwen Zhou, Haihao Shen, Saurabh Tangri","twitter_card":"summary_large_image","twitter_creator":"@OpenAtMicrosoft","twitter_site":"@OpenAtMicrosoft","twitter_misc":{"Written by":"Emma Ning, Feng Tian, Yuwen Zhou, Haihao Shen, Saurabh Tangri","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#article","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/"},"author":[{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/emma-ning\/","@type":"Person","@name":"Emma Ning"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/feng-tian\/","@type":"Person","@name":"Feng Tian"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/yuwen-zhou\/","@type":"Person","@name":"Yuwen Zhou"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/haihao-shen\/","@type":"Person","@name":"Haihao Shen"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/saurabh-tangri\/","@type":"Person","@name":"Saurabh Tangri"}],"headline":"Automate optimization techniques for transformer models","datePublished":"2023-06-26T15:00:00+00:00","dateModified":"2023-08-02T00:35:34+00:00","mainEntityOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/"},"wordCount":880,"publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.webp","keywords":["ONNX","ONNX Runtime","Quantization","Transformer"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/","url":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/","name":"Automate optimization techniques for transformer models | Microsoft Open Source Blog","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#primaryimage"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.webp","datePublished":"2023-06-26T15:00:00+00:00","dateModified":"2023-08-02T00:35:34+00:00","description":"Olive is a user-friendly tool for optimizing models with hardware awareness.","breadcrumb":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#primaryimage","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.webp","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/SEC20_Security_042.webp","width":1170,"height":640},{"@type":"BreadcrumbList","@id":"https:\/\/opensource.microsoft.com\/blog\/2023\/06\/26\/automate-optimization-techniques-for-transformer-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/opensource.microsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Automate optimization techniques for transformer models"}]},{"@type":"WebSite","@id":"https:\/\/opensource.microsoft.com\/blog\/#website","url":"https:\/\/opensource.microsoft.com\/blog\/","name":"Microsoft Open Source Blog","description":"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability","publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/opensource.microsoft.com\/blog\/#organization","name":"Microsoft Open Source Blog","url":"https:\/\/opensource.microsoft.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Open Source Blog"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/OpenAtMicrosoft"]},{"@type":"Person","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/person\/670a503ab71d544752bb510401276000","name":"briannamcgovern","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g59fda97d548239cc4943609f616e5de4","url":"https:\/\/secure.gravatar.com\/avatar\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7c30bc4dc8edbe3946a4abd853d658f2488fae9fd34783dfa6c1b30e35870305?s=96&d=microsoft&r=g","caption":"briannamcgovern"},"url":"https:\/\/opensource.microsoft.com\/blog\/author\/briannamcgovern\/"}]}},"bloginabox_animated_featured_image":null,"bloginabox_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Open Source Blog","distributor_original_site_url":"https:\/\/opensource.microsoft.com\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/94525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/users\/6194"}],"replies":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=94525"}],"version-history":[{"count":0,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/94525\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media\/95483"}],"wp:attachment":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=94525"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=94525"},{"taxonomy":"programming-languages","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/programming-languages?post=94525"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/content-type?post=94525"},{"taxonomy":"job-role","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/job-role?post=94525"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/topic?post=94525"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/coauthors?post=94525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}