{"id":73103,"date":"2018-01-22T14:47:03","date_gmt":"2018-01-22T22:47:03","guid":{"rendered":"https:\/\/open.microsoft.com\/?p=73103"},"modified":"2025-01-23T15:55:02","modified_gmt":"2025-01-23T23:55:02","slug":"openai-masters-scale-kubernetes-azure","status":"publish","type":"post","link":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/","title":{"rendered":"OpenAI masters scale with Kubernetes on Azure"},"content":{"rendered":"\n<p>OpenAI&#8217;s mission is to build safe artificial general intelligence (AGI) and ensure AGI&#8217;s benefits are as widely and evenly distributed as possible. As a non-profit AI research company, they focus on long-term research, working on problems that require fundamental advances in AI capabilities.<\/p>\n\n\n\n<p>OpenAI runs Kubernetes for their deep learning research because Kubernetes can provide a fast iteration cycle, scalability, and a lack of boilerplate, which makes it ideal for most of OpenAI\u2019s experiments. They currently operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which they pushed to over 2,500 nodes. Their Kubernetes cluster runs in Azure on a combination of <a href=\"https:\/\/azure.microsoft.com\/en-us\/pricing\/details\/virtual-machines\/linux\/\">D15v2 and NC24 VMs<\/a>.<\/p>\n\n\n\n<p>To find out more about how OpenAI adopted Kubernetes and how they resolved some common deployment issues, check out this detailed OpenAI blog post on <a href=\"https:\/\/blog.openai.com\/scaling-kubernetes-to-2500-nodes\/\">scaling Kubernetes to 2,500 nodes<\/a>.<\/p>\n\n\n\n<p>If you want to learn more about Azure Container Service (AKS), the new managed Kubernetes service that OpenAI is using, visit the <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/container-service\/\">AKS site<\/a>. You only pay for the VMs that add value to your business and can try <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/container-service\/\">AKS for free<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI&#8217;s mission is to build safe artificial general intelligence (AGI) and ensure AGI&#8217;s benefits are as widely and evenly distributed as possible. As a non-profit AI research company, they focus on long-term research, working on problems that require fundamental advances in AI capabilities.<\/p>\n","protected":false},"author":5562,"featured_media":52121,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"msxcm_post_with_no_image":false,"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","footnotes":""},"post_tag":[158,2272,166],"content-type":[340],"topic":[2241,2242],"programming-languages":[],"coauthors":[2316],"class_list":["post-73103","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-kubernetes","tag-microsoft","tag-azure","content-type-tutorials-and-demos","topic-cloud","topic-containers","review-flag-2-1593580437-411","review-flag-free-1593619513-693","review-flag-new-1593580248-669"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>OpenAI masters scale with Kubernetes on Microsoft Azure<\/title>\n<meta name=\"description\" content=\"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI masters scale with Kubernetes on Microsoft Azure\" \/>\n<meta property=\"og:description\" content=\"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Open Source Blog\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-22T22:47:03+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-23T23:55:02+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1800\" \/>\n\t<meta property=\"og:image:height\" content=\"540\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Anand Chandramohan\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:site\" content=\"@OpenAtMicrosoft\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Anand Chandramohan\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 min read\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\"},\"author\":[{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/author\/anand-chandramohan\/\",\"@type\":\"Person\",\"@name\":\"Anand Chandramohan\"}],\"headline\":\"OpenAI masters scale with Kubernetes on Azure\",\"datePublished\":\"2018-01-22T22:47:03+00:00\",\"dateModified\":\"2025-01-23T23:55:02+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\"},\"wordCount\":194,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg\",\"keywords\":[\"Kubernetes\",\"Microsoft\",\"Microsoft Azure\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\",\"url\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\",\"name\":\"OpenAI masters scale with Kubernetes on Microsoft Azure\",\"isPartOf\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg\",\"datePublished\":\"2018-01-22T22:47:03+00:00\",\"dateModified\":\"2025-01-23T23:55:02+00:00\",\"description\":\"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.\",\"breadcrumb\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage\",\"url\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg\",\"contentUrl\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg\",\"width\":1800,\"height\":540,\"caption\":\"shipping containers\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/opensource.microsoft.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI masters scale with Kubernetes on Azure\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#website\",\"url\":\"https:\/\/opensource.microsoft.com\/blog\/\",\"name\":\"Microsoft Open Source Blog\",\"description\":\"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability\",\"publisher\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#organization\",\"name\":\"Microsoft Open Source Blog\",\"url\":\"https:\/\/opensource.microsoft.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png\",\"contentUrl\":\"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png\",\"width\":259,\"height\":194,\"caption\":\"Microsoft Open Source Blog\"},\"image\":{\"@id\":\"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/OpenAtMicrosoft\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI masters scale with Kubernetes on Microsoft Azure","description":"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI masters scale with Kubernetes on Microsoft Azure","og_description":"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.","og_url":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/","og_site_name":"Microsoft Open Source Blog","article_published_time":"2018-01-22T22:47:03+00:00","article_modified_time":"2025-01-23T23:55:02+00:00","og_image":[{"width":1800,"height":540,"url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg","type":"image\/jpeg"}],"author":"Anand Chandramohan","twitter_card":"summary_large_image","twitter_creator":"@OpenAtMicrosoft","twitter_site":"@OpenAtMicrosoft","twitter_misc":{"Written by":"Anand Chandramohan","Est. reading time":"1 min read"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#article","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/"},"author":[{"@id":"https:\/\/opensource.microsoft.com\/blog\/author\/anand-chandramohan\/","@type":"Person","@name":"Anand Chandramohan"}],"headline":"OpenAI masters scale with Kubernetes on Azure","datePublished":"2018-01-22T22:47:03+00:00","dateModified":"2025-01-23T23:55:02+00:00","mainEntityOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/"},"wordCount":194,"commentCount":0,"publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg","keywords":["Kubernetes","Microsoft","Microsoft Azure"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/","url":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/","name":"OpenAI masters scale with Kubernetes on Microsoft Azure","isPartOf":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage"},"thumbnailUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg","datePublished":"2018-01-22T22:47:03+00:00","dateModified":"2025-01-23T23:55:02+00:00","description":"Find out how the AI research company, OpenAI, adopted Kubernetes, resolved common deployment issues, and scaled Kubernetes to 2,500 nodes.","breadcrumb":{"@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#primaryimage","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2016\/06\/DockerCon-containers.jpg","width":1800,"height":540,"caption":"shipping containers"},{"@type":"BreadcrumbList","@id":"https:\/\/opensource.microsoft.com\/blog\/2018\/01\/22\/openai-masters-scale-kubernetes-azure\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/opensource.microsoft.com\/blog\/"},{"@type":"ListItem","position":2,"name":"OpenAI masters scale with Kubernetes on Azure"}]},{"@type":"WebSite","@id":"https:\/\/opensource.microsoft.com\/blog\/#website","url":"https:\/\/opensource.microsoft.com\/blog\/","name":"Microsoft Open Source Blog","description":"Open dialogue about openness at Microsoft \u2013 open source, standards, interoperability","publisher":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/opensource.microsoft.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/opensource.microsoft.com\/blog\/#organization","name":"Microsoft Open Source Blog","url":"https:\/\/opensource.microsoft.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","contentUrl":"https:\/\/opensource.microsoft.com\/blog\/wp-content\/uploads\/2019\/08\/Microsoft-Logo.png","width":259,"height":194,"caption":"Microsoft Open Source Blog"},"image":{"@id":"https:\/\/opensource.microsoft.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/OpenAtMicrosoft"]}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Open Source Blog","distributor_original_site_url":"https:\/\/opensource.microsoft.com\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/73103","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/users\/5562"}],"replies":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/comments?post=73103"}],"version-history":[{"count":2,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/73103\/revisions"}],"predecessor-version":[{"id":96976,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/posts\/73103\/revisions\/96976"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media\/52121"}],"wp:attachment":[{"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/media?parent=73103"}],"wp:term":[{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/post_tag?post=73103"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/content-type?post=73103"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/topic?post=73103"},{"taxonomy":"programming-languages","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/programming-languages?post=73103"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/opensource.microsoft.com\/blog\/wp-json\/wp\/v2\/coauthors?post=73103"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}