1 min read

OpenAI masters scale with Kubernetes on Azure

OpenAI’s mission is to build safe artificial general intelligence (AGI) and ensure AGI’s benefits are as widely and evenly distributed as possible. As a non-profit AI research company, they focus on long-term research, working on problems that require fundamental advances in AI capabilities.
OpenAI runs Kubernetes for their deep learning research because Kubernetes can provide a fast iteration cycle, scalability, and a lack of boilerplate, which makes it ideal for most of OpenAI’s experiments. They currently operate several Kubernetes clusters (some in the cloud and some on physical hardware), the largest of which they pushed to over 2,500 nodes. Their Kubernetes cluster runs in Azure on a combination of D15v2 and NC24 VMs.
To find out more about how OpenAI adopted Kubernetes and how they resolved some common deployment issues, check out this detailed OpenAI blog post on scaling Kubernetes to 2,500 nodes.
If you want to learn more about Azure Container Service (AKS), the new managed Kubernetes service that OpenAI is using, visit the AKS site. You only pay for the VMs that add value to your business and can try AKS for free.