{"id":87762,"date":"2021-08-04T08:00:19","date_gmt":"2021-08-04T15:00:19","guid":{"rendered":"https:\/\/cloudblogs.microsoft.com\/opensource\/?p=87762"},"modified":"2025-05-29T15:00:31","modified_gmt":"2025-05-29T22:00:31","slug":"introducing-distributed-data-parallel-support-on-pytorch-windows","status":"publish","type":"post","link":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/","title":{"rendered":"Introducing Distributed Data Parallel support on PyTorch Windows"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can\u2019t really do anything about it. If you have the luxury (especially at this moment of time) of having multiple GPUs, you are likely to find Distributed Data Parallel (DDP) helpful in terms of model training. DDP performs model training across multiple GPUs, in a transparent fashion. You can have multiple GPUs on a single machine, or multiple machines separately. DDP can utilize all the GPUs you have to maximize the computing power, thus significantly shorten the time needed for training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For a reasonably long time, DDP was only available on Linux. This was changed in PyTorch 1.7. In PyTorch 1.7 the support for DDP on Windows was introduced by Microsoft and has since then been continuously improved. In this article, we\u2019d like to show you how it can help with the training experience on Windows.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"walkthrough\">Walkthrough<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">For reference, we\u2019ll set up two machines with the same spec on Azure, with one being Windows and the other being Linux, then perform model training with the same code and dataset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We use this very nice resource in Azure called a Data Science Virtual Machine (DSVM). This is a handy VM image with a lot of machine learning tools preinstalled. At the time of writing, PyTorch 1.8.1(Anaconda) is included in the DSVM image, which will be what we use for demonstration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You can search directly for this resource:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Image-1.png\" alt=\"Create a resource\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">You can also follow the normal VM creation process and choose the desired DSVM image:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Image-2.png\" alt=\"Instance details\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In this article, we use the size \u201cStandard NC24s_v3\u201d, which puts four NVIDIA Tesla V100 GPUs at our disposal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To better understand how DDP works, here are some basic concepts we need to learn first.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One important concept we need to understand is \u201cprocess group\u201d, which is the fundamental tool that powers DDP. A process group is, as the name suggests, a group of processes. Each of the processes is responsible for the training workload of one dedicated GPU. Additionally, we need some method to coordinate the group of processes (more importantly, the GPUs behind them), so that they can communicate with each other. This is called \u201cbackend\u201d in PyTorch (&#8211;dist-backend in the script parameter). In PyTorch 1.8 we will be using Gloo as the backend because NCCL and MPI backends are currently not available on Windows. See the PyTorch documentation to find <a href=\"https:\/\/pytorch.org\/docs\/stable\/distributed.html\" target=\"_blank\" rel=\"noreferrer noopener\">more information about \u201cbackend\u201d<\/a>. And finally, we need a place for the backend to exchange information. This is called \u201cstore\u201d in PyTorch (&#8211;dist-url in the script parameter). See the PyTorch documentation to find out <a href=\"https:\/\/pytorch.org\/docs\/stable\/distributed.html#torch.distributed.Store\" target=\"_blank\" rel=\"noreferrer noopener\">more about \u201cstore\u201d<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Other concepts that might be a bit confusing are \u201cworld size\u201d and \u201crank\u201d. World size is essentially the number of processes participating in the training job. As we mentioned before, each process is responsible for one dedicated GPU. Thus, world size also equals to the total number of GPUs used. Pretty straightforward, right? Now let\u2019s talk about \u201crank\u201d. Rank can be seen as an index number of each process, which can be used to identify one specific process. Note that a process with rank 0 is always needed because it will act like the \u201ccontroller\u201d which coordinates all the processes. If the process with rank 0 doesn\u2019t exist, the entire training is a no-go.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With the necessary knowledge in our backpack, let\u2019s get started with the actual training. We use a small subset of ImageNet 2012 as the dataset. Let\u2019s assume we have downloaded and placed our dataset at some location in the filesystem, we\u2019ll use &#8220;D:\\imagenet-small&#8221; for this demonstration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Obviously, we also need a training script. We use the imagenet training script from <a href=\"https:\/\/github.com\/pytorch\/examples\/tree\/master\/imagenet\" target=\"_blank\" rel=\"noreferrer noopener\">PyTorch Examples repo<\/a> and ResNet50 as the target model. The training script here can be seen as a normal training script, plus the DDP power provided packages like &#8220;torch.distributed&#8221; and &#8220;torch.multiprocessing&#8221;. The script doesn\u2019t contain too much logic and you can easily set up your own script based on it. You can also refer to this <a href=\"https:\/\/pytorch.org\/tutorials\/intermediate\/ddp_tutorial.html\" target=\"_blank\" rel=\"noreferrer noopener\">Getting Started tutorial<\/a> for more inspiration.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">On a single machine, we can simply use FileStore which is easier to set up. The complete command looks like this:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; auto-links: false; gutter: false; title: ; quick-code: false; notranslate\" title=\"\">\n> python main.py D:\\imagenet-small --arch resnet50 --dist-url file:\/\/\/D:\\pg --dist-backend gloo --world-size 1 --multiprocessing-distributed  --rank 0\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">You probably noticed that we are using \u201cworld-size 1\u201d and \u201crank 0\u201d. This is because the script will calculate the desired world size and rank based on the available GPUs. Here the actual world size used is the same as the number of GPUs available, which is four. The rank of each process will also be automatically assigned with the correct number, starting from zero.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If you\u2019re not a fan of command-line arguments, you can also use environment variables to initialize the DDP arguments. This might be helpful if you need to automate the deployment. More details can be found in the <a href=\"https:\/\/pytorch.org\/docs\/stable\/distributed.html\" target=\"_blank\" rel=\"noreferrer noopener\">\u201cEnvironment variable initialization\u201d section<\/a> of the PyTorch documentation.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If everything goes well, the training job will start shortly after.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"troubleshooting\">Troubleshooting<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">If something doesn\u2019t go well, here are some troubleshooting tips that might be helpful:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">If you\u2019re using FileStore on Windows, make sure the file used is not locked by other processes, which can happen if you forcefully kill the training processes. This can lead to freezing of the DDP training process, because the script fails to initialize the FileStore. A workaround is to manually kill previous training processes and delete the file before you conduct the next training.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">If you\u2019re using TcpStore, make sure the network is accessible and the port is in fact available. Otherwise, the training may freeze because the script fails to initialize the TcpStore. The process with rank zero will bind and listen on the port you provided. Other processes will try to connect to that port. You can use network monitoring tools like \u201cnetstat\u201d to help debugging the TCP connection issue.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">You can use tools like nvidia-smi to monitor the GPU load while performing the training. Ideally, we want all the GPUs fully utilized and running at 100 percent usage. If you find that the GPU load is low, you may want to increase the batch size and\/or the number of DataLoader workers.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Be aware that the number of GPUs used in DDP also affects the effective batch size. For example, if we use 128 as batch size on a single GPU, and then we switch to DDP with two GPUs. We have two options: a) split the batch and use 64 as batch size on each GPU; b) use 128 as batch size on each GPU and thus resulting in 256 as the effective batch size. Besides the limitation of the GPU memory, the choice is mostly up to you. You can tweak the script to choose either way. Remember to also adjust the initial learning rate if you choose option b) and expect a similar training result.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"benchmark\">Benchmark<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Back to our benchmarking mission. First, we tried to perform the training without using DDP to establish a baseline. Then we tried the DDP setup with two GPUs, then finally with four GPUs. These are the results:<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><tbody><tr><td><strong>Duration<\/strong><\/td><td><strong>1 GPU (No DDP)<\/strong><\/td><td><strong>2 GPUs<\/strong><\/td><td><strong>4 GPUs<\/strong><\/td><\/tr><tr><td><strong>Linux<\/strong><\/td><td>56m 58s<\/td><td>31m 7s<\/td><td>17m 20s<\/td><\/tr><tr><td><strong>Windows<\/strong><\/td><td>58m 55s<\/td><td>31m 55s<\/td><td>19m 3s<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">To better visualize it, we plot it as the chart below:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Image-3-1024x540.webp\" alt=\"Training Duration for 1GPU is slower than for 2 GPUs. 4 GPUs is fastest.\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">As we can see from the data, the acceleration from additional GPUs meets our overall expectations. Using two GPUs cuts training duration to almost half. And using four GPUs makes it nearly one-quarter.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In terms of accuracy, here\u2019s the loss curve we see on both Windows and Linux:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2021\/07\/Image-4-1024x304.webp\" alt=\"Training with 4 GPUs reaches accuracy threshold much faster than with 2 GPUs or with 1 GPU (No DDP)\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We can tell from the loss curve that the shortening of training time does not end up with a bad training result. We can still expect the model to be gradually trained over time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is of course only a small demonstration of how DDP on Windows can bring users a performance boost that is comparable to the one on Linux, without compromising accuracy. We at Microsoft are working closely with PyTorch team to keep improving the PyTorch experience on Windows. The support of DDP on Windows is a huge leap ahead in terms of training performance. We\u2019d like to encourage people to try it and we\u2019d love to hear your feedback.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can\u2019t really do anything about it.<\/p>\n","protected":false},"author":5562,"featured_media":95482,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","ms-ems-related-posts":[],"footnotes":""},"tags":[],"programming-languages":[2265],"content-type":[361],"job-role":[],"topic":[2240,2244],"coauthors":[1854],"class_list":["post-87762","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","programming-languages-pytorch","content-type-project-updates","topic-application-development","topic-devops","review-flag-1593580428-734","review-flag-1-1593580432-963","review-flag-2-1593580437-411","review-flag-4-1593580448-609","review-flag-7-1593580463-151","review-flag-8-1593580468-572","review-flag-alway-1593580310-39","review-flag-and-o-1593580423-446","review-flag-machi-1680214156-53","review-flag-perce-1706214400-122","review-flag-vm-1593580807-312"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog\" \/>\n<meta property=\"og:description\" content=\"Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can\u2019t really do anything about it.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Open Source Blog\" \/>\n<meta property=\"article:published_time\" content=\"2021-08-04T15:00:19+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-29T22:00:31+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1170\" \/>\n\t<meta property=\"og:image:height\" content=\"640\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Chester Liu\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:site\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Chester Liu\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/chester-liu\\\/\",\"@type\":\"Person\",\"@name\":\"Chester Liu\"}],\"headline\":\"Introducing Distributed Data Parallel support on PyTorch Windows\",\"datePublished\":\"2021-08-04T15:00:19+00:00\",\"dateModified\":\"2025-05-29T22:00:31+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/\"},\"wordCount\":1389,\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/MSC24-Japan-business-Getty-1024531730-rgb.webp\",\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/\",\"name\":\"Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/MSC24-Japan-business-Getty-1024531730-rgb.webp\",\"datePublished\":\"2021-08-04T15:00:19+00:00\",\"dateModified\":\"2025-05-29T22:00:31+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#primaryimage\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/MSC24-Japan-business-Getty-1024531730-rgb.webp\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/MSC24-Japan-business-Getty-1024531730-rgb.webp\",\"width\":1170,\"height\":640},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/2021\\\/08\\\/04\\\/introducing-distributed-data-parallel-support-on-pytorch-windows\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Introducing Distributed Data Parallel support on PyTorch Windows\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"name\":\"Microsoft Open Source Blog\",\"description\":\"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability\",\"publisher\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#organization\",\"name\":\"Microsoft Open Source Blog\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"contentUrl\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/wp-content\\\/uploads\\\/2019\\\/08\\\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Open Source Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/OpenAtMicrosoft\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/#\\\/schema\\\/person\\\/4d7e7cd8266dc319e43a6de1e173495f\",\"name\":\"Teri Dormer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g\",\"caption\":\"Teri Dormer\"},\"url\":\"https:\\\/\\\/opensource.microsoft.com\\\/blog\\\/author\\\/teridormer\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/","og_locale":"en_US","og_type":"article","og_title":"Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog","og_description":"Model training has been and will be in the foreseeable future one of the most frustrating things machine learning developers face. It takes quite a long time and people can\u2019t really do anything about it.","og_url":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/","og_site_name":"Microsoft Open Source Blog","article_published_time":"2021-08-04T15:00:19+00:00","article_modified_time":"2025-05-29T22:00:31+00:00","og_image":[{"width":1170,"height":640,"url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.png","type":"image\/png"}],"author":"Chester Liu","twitter_card":"summary_large_image","twitter_creator":"@OpenAtMicrosoft","twitter_site":"@OpenAtMicrosoft","twitter_misc":{"Written by":"Chester Liu","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#article","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/"},"author":[{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/chester-liu\/","@type":"Person","@name":"Chester Liu"}],"headline":"Introducing Distributed Data Parallel support on PyTorch Windows","datePublished":"2021-08-04T15:00:19+00:00","dateModified":"2025-05-29T22:00:31+00:00","mainEntityOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/"},"wordCount":1389,"publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.webp","inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/","url":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/","name":"Introducing Distributed Data Parallel support on PyTorch Windows | Microsoft Open Source Blog","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#primaryimage"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.webp","datePublished":"2021-08-04T15:00:19+00:00","dateModified":"2025-05-29T22:00:31+00:00","breadcrumb":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#primaryimage","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.webp","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2024\/06\/MSC24-Japan-business-Getty-1024531730-rgb.webp","width":1170,"height":640},{"@type":"BreadcrumbList","@id":"https:\/\/opensource.microsoft.com\/blog\/2021\/08\/04\/introducing-distributed-data-parallel-support-on-pytorch-windows\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/opensource.microsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Introducing Distributed Data Parallel support on PyTorch Windows"}]},{"@type":"WebSite","@id":"https:\/\/opensource.microsoft.com\/blog\/#website","url":"https:\/\/opensource.microsoft.com\/blog\/","name":"Microsoft Open Source Blog","description":"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability","publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/opensource.microsoft.com\/blog\/#organization","name":"Microsoft Open Source Blog","url":"https:\/\/opensource.microsoft.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Open Source Blog"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/OpenAtMicrosoft"]},{"@type":"Person","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/person\/4d7e7cd8266dc319e43a6de1e173495f","name":"Teri Dormer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g98331fbdc1fedab03f83292cd9dfa932","url":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/4f1c6b1df49619573e006bda75a18efb7f99db184762acc79d899b8a6ef768aa?s=96&d=microsoft&r=g","caption":"Teri Dormer"},"url":"https:\/\/opensource.microsoft.com\/blog\/author\/teridormer\/"}]}},"bloginabox_animated_featured_image":null,"bloginabox_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Open Source Blog","distributor_original_site_url":"https:\/\/opensource.microsoft.com\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/87762","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/users\/5562"}],"replies":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=87762"}],"version-history":[{"count":2,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/87762\/revisions"}],"predecessor-version":[{"id":97498,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/87762\/revisions\/97498"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media\/95482"}],"wp:attachment":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=87762"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/tags?post=87762"},{"taxonomy":"programming-languages","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/programming-languages?post=87762"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/content-type?post=87762"},{"taxonomy":"job-role","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/job-role?post=87762"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/topic?post=87762"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/coauthors?post=87762"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}