{"id":86937,"date":"2021-07-13T09:00:14","date_gmt":"2021-07-13T16:00:14","guid":{"rendered":"https:\/\/cloudblogs.microsoft.com\/opensource\/?p=86937"},"modified":"2025-05-30T15:18:18","modified_gmt":"2025-05-30T22:18:18","slug":"onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform","status":"publish","type":"post","link":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/","title":{"rendered":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.png\" alt=\"ONNX Runtime and AMD logo side by side\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/microsoft.github.io\/onnxruntime\/\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime<\/a> is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms. Today, we are excited to announce a preview version of ONNX Runtime in release 1.8.1 featuring support for AMD Instinct\u2122 GPUs facilitated by the AMD ROCm\u2122 open software platform. Users can now use AMD Instinct\u2122 GPUs with ONNX Runtime to accelerate distributed training for large-scale DNN models. AMD ROCm\u2122 becomes the latest ONNX Runtime execution provider, continuing the Microsoft mission to endorse choice and versatility in targeting different compute devices and server platforms.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Get-started-easily.png\" alt=\"Selection interface showing AMD GPU support\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Figure 1: Selection interface showing <a href=\"https:\/\/www.onnxruntime.ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">AMD GPU support<\/a>.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-rocm-open-software-platform\">The ROCm Open Software Platform<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">ROCm is AMD&#8217;s open software platform for GPU-accelerated high-performance computing and machine learning workloads. &nbsp;Since the first ROCm release in 2016, the ROCm platform has evolved to support additional math, AI and machine learning, and communication libraries and tools, a wider set of Linux\u00ae distributions, and a range of new GPUs. This includes the AMD Instinct\u2122 MI100 GPU, the first AMD data center accelerator based on the compute-optimized AMD CDNA\u2122 architecture.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The primary focus of ROCm has been high-performance computing at scale. The combined capabilities of ROCm and the AMD Instinct family of data center accelerators are well suited to accelerate AI\/ML training using ONNX Runtime.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"accelerated-training-with-onnx-runtime-on-amd-gpus\">Accelerated training with ONNX Runtime on AMD GPUs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Large transformer models like GPT2 have proven themselves state of the art in natural language processing (NLP) tasks like NLP understanding, generation, and translation. They are also proving useful in applications like time-series prediction and computer vision. Due to their size, these models need to be trained in a large\u2011scale, distributed GPU environment. ONNX Runtime, with support from AMD (rocBLAS, MIOpen, hipRAND, and RCCL) libraries, enables users to train large transformer models in mixed\u2011precision in a distributed AMD GPU environment. Thus, ONNX Runtime on ROCm supports training state-of-art models like BERT, GPT-2, T5, BART, and more using AMD Instinct\u2122 GPUs. Data scientists, researchers, students, and others in the community have an option to accelerate workloads using ONNX Runtime on AMD GPUs. This includes <a href=\"https:\/\/www.amd.com\/en\/products\/server-accelerators\/instinct-mi100\" target=\"_blank\" rel=\"noreferrer noopener\">AMD Instinct\u2122 MI100<\/a>, AMD Radeon Instinct\u2122 MI50, and AMD&nbsp;Radeon\u2122&nbsp;Pro&nbsp;VII GPUs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today, we are happy to announce the preview of Python\u2122 packages supporting ONNX Runtime on ROCm, making it easy to get started with ROCm and ONNX Runtime.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"training-performance-acceleration\">Training performance acceleration<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In this preview, we have demonstrated clear performance gains with ONNX runtime using AMD GPUs for fine-tuning GPT2 using <a href=\"https:\/\/github.com\/huggingface\/transformers\" target=\"_blank\" rel=\"noreferrer noopener\">HuggingFace<\/a> on eight AMD Instinct\u2122 MI100 GPUs. We see an 18 percent performance gain in these experiments relative to standalone PyTorch along and validated well-matched loss curves.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Throughput-samples_.png\" alt=\"Using ONNX runtime gets 18 percent perf gains over stand-alone PyTorch\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Figure 2: Using ONNX runtime gets 18 percent perf gains over standalone PyTorch. Configuration details are listed below.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Hugging-Face-training-loss.png\" alt=\"chart, line chart, histogram\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Figure 3: Training loss comparing the PyTorch and PyTorch and ONNX Runtime experiments.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In general, the preview ONNX Runtime-ROCm library can be used in multi-node MI100 AMD GPU configurations with high-speed interconnects for inter-GPU communications. As we proceed to our official release, we expect users to see excellent performance across a wide range of Transformer models and ML\/AI workloads, offering users a highly performant choice for their datacenter applications.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"getting-started-with-onnx-runtime-on-amd-gpus\">Getting started with ONNX runtime on AMD GPUs<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"with-python-packages\">With Python packages<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">In a ROCm enabled environment, users can get off to a quick start with a pip install:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; auto-links: false; gutter: false; title: ; quick-code: false; notranslate\" title=\"\">\npip install onnxruntime-training -f\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/onnxruntimepackages.z14.web.core.windows.net\/onnxruntime_stable_torch190.rocm42.html\">https:\/\/onnxruntimepackages.z14.web.core.windows.net\/onnxruntime_stable_torch190.rocm42.html<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Plugging their Pytorch script to ONNX runtime only* requires wrapping the model with<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; auto-links: false; gutter: false; title: ; quick-code: false; notranslate\" title=\"\">\nmodel = ORTModule(model)\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">More details are available at <a href=\"https:\/\/github.com\/pytorch\/ort\" target=\"_blank\" rel=\"noreferrer noopener\">pytorch\/ort: Accelerate PyTorch models with ONNX Runtime<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>*PyTorch model should use standard PyTorch to support export to ONNX.<\/em><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"with-dockerfiles\">With Dockerfiles<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Users can also take advantage of a simple Dockerfile to get pre-configured packages of the ROCm libraries, Pytorch, and ONNX Runtime.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The stable ONNX runtime 1.8.1 release is now available at <a href=\"https:\/\/github.com\/pytorch\/ort\/blob\/main\/docker\/Dockerfile.ort-torch181-onnxruntime-nightly-rocm4.2-ubuntu18.04\" target=\"_blank\" rel=\"noreferrer noopener\">ort\/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04 at main \u00b7 pytorch\/ort<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">More details are available at <a href=\"https:\/\/github.com\/pytorch\/ort\" target=\"_blank\" rel=\"noreferrer noopener\">pytorch\/ort<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"more-information-about-onnx-runtime\">More information about ONNX Runtime<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Read our recent blog, &#8220;<a href=\"https:\/\/cloudblogs.microsoft.com\/opensource\/2021\/06\/07\/onnx-runtime-1-8-mobile-web-and-accelerated-training\/\" target=\"_blank\" rel=\"noreferrer noopener\">ONNX Runtime 1.8: mobile, web, and accelerated training<\/a>,&#8221; introducing the extended capabilities of ONNX runtime release 1.8.1.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Check out examples demonstrating accelerating <a href=\"https:\/\/github.com\/microsoft\/onnxruntime-training-examples\" target=\"_blank\" rel=\"noreferrer noopener\">large transformer models using ONNX runtime<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">View instructions for accelerating general PyTorch workloads using ONNX runtime at <a href=\"https:\/\/github.com\/pytorch\/ort\" target=\"_blank\" rel=\"noreferrer noopener\">Accelerate PyTorch models with ONNX Runtime<\/a>.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"more-information-about-rocm-open-software-platform\">More Information about ROCm\u2122 Open Software Platform<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">A list of <a href=\"https:\/\/github.com\/RadeonOpenCompute\/ROCm\" target=\"_blank\" rel=\"noreferrer noopener\">ROCm\u2122 supported GPUs and operating systems<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/rocmdocs.amd.com\/en\/latest\/\" target=\"_blank\" rel=\"noreferrer noopener\">General documentation<\/a> on the ROCm\u2122 platform.<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/developer.amd.com\/resources\/rocm-resources\/rocm-learning-center\/\" target=\"_blank\" rel=\"noreferrer noopener\">ROCm\u2122 Learning Center<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">General information on <a href=\"https:\/\/amd.com\/hpc\" target=\"_blank\" rel=\"noreferrer noopener\">AMD\u2019s offerings for HPC and machine learning<\/a>.<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"configuration-and-performance-benchmarking\">Configuration and performance benchmarking<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"hardware-setup\">Hardware setup<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Server Type: HPE Apollo 6500 Gen10 Plus<\/li>\n\n\n\n<li class=\"wp-block-list-item\">8 x <a href=\"https:\/\/www.amd.com\/en\/products\/server-accelerators\/instinct-mi100\" target=\"_blank\" rel=\"noreferrer noopener\">AMD Instinct MI100<\/a> with 2nd Gen Infinity Fabric Link (4 GPUs\/ring) and PCIe Gen4 (across rings)<\/li>\n\n\n\n<li class=\"wp-block-list-item\">GPU Memory: 32 GB<\/li>\n\n\n\n<li class=\"wp-block-list-item\">CPU: 2 x <a href=\"https:\/\/www.amd.com\/en\/products\/cpu\/amd-epyc-7662\" target=\"_blank\" rel=\"noreferrer noopener\">AMD EPYC\u2122 7662 | AMD<\/a><\/li>\n\n\n\n<li class=\"wp-block-list-item\">Main Memory: 512 GB (HPE 32GB 2Rx4 PC4-3200AA-R)<\/li>\n\n\n\n<li class=\"wp-block-list-item\">SSD: HPE 1.92TB NVMe Read-Intensive Smart Carrier U.3 PE8010 SSD<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Ethernet: Intel I350 1GbE 4-port BASE-T<\/li>\n<\/ul>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"huggingface-configuration\">HuggingFace configuration<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Repository<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/microsoft\/huggingface-transformers\/tree\/blog-commit\" target=\"_blank\" rel=\"noreferrer noopener\">HuggingFace Transformers<\/a> (branch <em>blog-commit<\/em>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Dockerfile<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/pytorch\/ort\/blob\/main\/docker\/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04\" target=\"_blank\" rel=\"noreferrer noopener\">ort\/Dockerfile.ort-torch181-onnxruntime-stable-rocm4.2-ubuntu18.04<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>HuggingFace GPT2<\/strong><\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; auto-links: false; gutter: false; title: ; quick-code: false; notranslate\" title=\"\">\npython -m torch.distributed.launch \\\n  --nproc_per_node=8 \\\n  huggingface-transformers\/examples\/pytorch\/language-modeling\/run_clm.py \\\n  --model_name_or_path gpt2 \\\n  --dataset_name wikitext \\\n  --dataset_config_name wikitext-2-raw-v1 \\\n  --do_train \\\n  --label_smoothing 0.1 \\\n  --max_steps 260 \\\n  --logging_steps 1 \\\n  --overwrite_output_dir \\\n  --output_dir \/tmp\/test-clm \\\n  --per_device_train_batch_size 8 \\\n  --fp16 \\\n  --dataloader_num_workers 1 \\\n  --ort \\\n  --skip_memory_metrics\n\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">The flag below enables wrapping with ONNX Runtime.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; auto-links: false; gutter: false; title: ; quick-code: false; notranslate\" title=\"\">\n--ort\n<\/pre><\/div>\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p class=\"wp-block-paragraph\">Author information: Jeff Daily is a Principal Member of Technical Staff, Deep Learning Software for AMD. Weixing Zhang is a Principal Software Engineer, AI Frameworks at Microsoft. Suffian Khan is a Software Engineer, AI Frameworks at Microsoft.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Their postings are their own opinions and may not represent AMD\u2019s or Microsoft\u2019s positions, strategies or opinions. Links to third-party sites are provided for convenience and unless explicitly stated, neither AMD nor Microsoft is responsible for the contents of such linked sites and no endorsement is implied.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.<\/p>\n","protected":false},"author":5562,"featured_media":87660,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","ms-ems-related-posts":[],"footnotes":""},"tags":[100,2272,1824],"programming-languages":[2265],"content-type":[361],"job-role":[],"topic":[2238,2252],"coauthors":[1833,1848],"class_list":["post-86937","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-azure-marketplace","tag-microsoft","tag-onnx-runtime","programming-languages-pytorch","content-type-project-updates","topic-ai-machine-learning","topic-tools","review-flag-1593580428-734","review-flag-1-1593580432-963","review-flag-2-1593580437-411","review-flag-3-1593580442-169","review-flag-4-1593580448-609","review-flag-8-1593580468-572","review-flag-machi-1680214156-53","review-flag-micro-1680215167-604","review-flag-ml-1680214110-748","review-flag-new-1593580248-669","review-flag-perce-1706214400-122"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog\" \/>\n<meta property=\"og:description\" content=\"This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Open Source Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-07-13T16:00:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-30T22:18:18+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.png\" \/>\n\t<meta property=\"og:image:width\" content=\"812\" \/>\n\t<meta property=\"og:image:height\" content=\"159\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Weixing Zhang, Suffian Khan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:site\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Weixing Zhang, Suffian Khan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/weixing-zhang\\\/\",\"@type\":\"Person\",\"@name\":\"Weixing Zhang\"},{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/suffian-khan\\\/\",\"@type\":\"Person\",\"@name\":\"Suffian Khan\"}],\"headline\":\"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform\",\"datePublished\":\"2021-07-13T16:00:14+00:00\",\"dateModified\":\"2025-05-30T22:18:18+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/\"},\"wordCount\":945,\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/ONNX-Runtime-AMD-ROCm_logo_update.webp\",\"keywords\":[\"Azure Marketplace\",\"Microsoft\",\"ONNX Runtime\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/\",\"name\":\"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/ONNX-Runtime-AMD-ROCm_logo_update.webp\",\"datePublished\":\"2021-07-13T16:00:14+00:00\",\"dateModified\":\"2025-05-30T22:18:18+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#primaryimage\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/ONNX-Runtime-AMD-ROCm_logo_update.webp\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2021\\\/07\\\/ONNX-Runtime-AMD-ROCm_logo_update.webp\",\"width\":812,\"height\":159,\"caption\":\"icon\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/07\\\/13\\\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"name\":\"Microsoft Open Source Blog\",\"description\":\"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability\",\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\",\"name\":\"Microsoft Open Source Blog\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Open Source Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/OpenAtMicrosoft\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/person\\\/4d7e7cd8266dc319e43a6de1e173495f\",\"name\":\"Teri Dormer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"caption\":\"Teri Dormer\"},\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/teridormer\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/","og_locale":"en_US","og_type":"article","og_title":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog","og_description":"This post was co-authored by Jeff Daily, a Principal Member of Technical Staff, Deep Learning Software for AMD. ONNX Runtime is an open-source project that is designed to accelerate machine learning across a wide range of frameworks, operating systems, and hardware platforms.","og_url":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/","og_site_name":"Microsoft Open Source Blog","article_published_time":"2021-07-13T16:00:14+00:00","article_modified_time":"2025-05-30T22:18:18+00:00","og_image":[{"width":812,"height":159,"url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.png","type":"image\/png"}],"author":"Weixing Zhang, Suffian Khan","twitter_card":"summary_large_image","twitter_creator":"@OpenAtMicrosoft","twitter_site":"@OpenAtMicrosoft","twitter_misc":{"Written by":"Weixing Zhang, Suffian Khan","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#article","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/"},"author":[{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/weixing-zhang\/","@type":"Person","@name":"Weixing Zhang"},{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/suffian-khan\/","@type":"Person","@name":"Suffian Khan"}],"headline":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform","datePublished":"2021-07-13T16:00:14+00:00","dateModified":"2025-05-30T22:18:18+00:00","mainEntityOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/"},"wordCount":945,"publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.webp","keywords":["Azure Marketplace","Microsoft","ONNX Runtime"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/","url":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/","name":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform | Microsoft Open Source Blog","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#primaryimage"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.webp","datePublished":"2021-07-13T16:00:14+00:00","dateModified":"2025-05-30T22:18:18+00:00","breadcrumb":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#primaryimage","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.webp","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/ONNX-Runtime-AMD-ROCm_logo_update.webp","width":812,"height":159,"caption":"icon"},{"@type":"BreadcrumbList","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/07\/13\/onnx-runtime-release-1-8-1-previews-support-for-accelerated-training-on-amd-gpus-with-the-amd-rocm-open-software-platform\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/opensource.microsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"ONNX Runtime release 1.8.1 previews support for accelerated training on AMD GPUs with the AMD ROCm\u2122 Open Software Platform"}]},{"@type":"WebSite","@id":"https:\/\/opensource.microsoft.com\/blog\/#website","url":"https:\/\/opensource.microsoft.com\/blog\/","name":"Microsoft Open Source Blog","description":"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability","publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/opensource.microsoft.com\/blog\/#organization","name":"Microsoft Open Source Blog","url":"https:\/\/opensource.microsoft.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Open Source Blog"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/OpenAtMicrosoft"]},{"@type":"Person","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/person\/4d7e7cd8266dc319e43a6de1e173495f","name":"Teri Dormer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932","url":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","caption":"Teri Dormer"},"url":"https:\/\/opensource.microsoft.com\/blog\/author\/teridormer\/"}]}},"bloginabox_animated_featured_image":null,"bloginabox_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Open Source Blog","distributor_original_site_url":"https:\/\/opensource.microsoft.com\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/86937","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/users\/5562"}],"replies":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=86937"}],"version-history":[{"count":3,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/86937\/revisions"}],"predecessor-version":[{"id":97504,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/86937\/revisions\/97504"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media\/87660"}],"wp:attachment":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=86937"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=86937"},{"taxonomy":"programming-languages","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/programming-languages?post=86937"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/content-type?post=86937"},{"taxonomy":"job-role","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/job-role?post=86937"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/topic?post=86937"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/coauthors?post=86937"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}