What’s new with Microsoft in open-source and Kubernetes at KubeCon North America 2025
WRITTEN BY
/en-us/opensource/blog/author/brendan-burns
Open source moves forward when we build in the open, learn from production, and bring those lessons back upstream. That’s why I am excited to be in Atlanta for KubeCon + CloudNativeCon North America 2025 to share what our teams have learned and what we are contributing to next upstream.
Since my last update at KubeCon + CloudNativeCon Europe 2025, we have continued our investments in growing existing CNCF projects, strengthening the foundations of Kubernetes while launching new projects where the community has seen need. From improving reliability and performance to advancing security and AI-native workloads, our goal remains the same: make Kubernetes better for everyone.
We’re also introducing practical improvements across Azure and Azure Kubernetes Service (AKS) that make running cloud-native and AI workloads even simpler and more resilient. These updates aren’t just features; they reflect real-world feedback from customers and the community, and they’re designed to help you scale Kubernetes with confidence.
Our ongoing commitment to building in the open
Previously, I have shared why building in the open matters to our teams at Microsoft—not just for transparency, but for driving real progress in cloud-native technologies. That commitment continues. We’re proud to work alongside the Cloud Native Computing Foundation (CNCF) and the broader ecosystem to help define the new Kubernetes AI Conformance program, aimed at ensuring interoperability and portability for AI workloads. AKS has already met these conformance requirements, reflecting our commitment to open standards. We believe AI conformance will unlock greater choice across open-source tooling and platforms, and we are also contributing to new open-source AI platforms that run on Kubernetes where we see strong community and customers adoption.
Our collaborations span key areas of AI infrastructure. With NVIDIA, we’re scaling multi-node LLM inference with NVIDIA Dynamo. We are also collaborating on Ray (now part of the PyTorch foundation) as a distributed compute engine with a set of AI libraries that accelerate machine learning workloads. AKS is also now recognized as one of thellm-d project’s well-lit infrastructure providers, ensuring robust support for large-scale AI deployments.
We’re bringing AI to the world of Kubernetes observability with the Inspektor Gadget Model Context Protocol (MCP) server, and we’ve open-sourced Wassette, a security-oriented runtime for WebAssembly Components via MCP. In the Kubernetes AI Toolchain Operator (KAITO), we have added multi-node distributed inference for large models, Gateway API Inference Extension (GAIE) integration, and OCI artifact support to vastly reduce model pull time. AIKit has also joined the KAITO project, making it easier to fine-tune, build, and deploy open-source LLMs.
Security remains a top priority. We’re addressing community demand for stronger dataplane and software supply chain protections by contributing multicluster support for Istio's Ambient mode, donating the Dalec as a sandbox project to the CNCF for secure package and container creation, and enhancing Copa with multi-platform support and end-of-life detection for direct image patching. We have also introduced the Headlamp plugin for Karpenter to improve scaling and visibility.
In Kubernetes v1.34, Microsoft engineers helped lead the stable feature releases of structured authentication and Dynamic Resource Allocation (DRA), including DRA’s integration with cluster autoscaler. We also collaborated to bring Headlamp under SIG UI and Kube Resource Orchestrator (kro) under SIG Cloud Provider, strengthening governance and collaboration.
For the past three years, Azure has been the top cloud provider in CNCF contributions, and our focus remains on advancing open source through partnership and shared innovation. These efforts reflect our belief that leadership in cloud-native means listening, contributing, and building together with the community. You can meet many of our contributors in the Azure booth and Project Pavilion at KubeCon!
Advancing AKS for what’s next
In addition to our work in the upstream community, I am happy to share several new capabilities in Azure Kubernetes Service that reflect where customers are headed: stronger security and governance, improved performance at scale, AI-powered operations, and simplified management.
Secure by design
Security isn’t an afterthought in AKS, it’s foundational. We’ve focused on making clusters resilient by design, starting with the node layer. Flatcar Container Linux introduces an immutable filesystem that prevents configuration drift and simplifies recovery from incidents. For organizations operating across clouds, Flatcar’s CNCF roots ensure consistency and portability.
Building on that principle, Azure Linux with OS Guard takes host security even further. This next-generation container host enforces immutability and code integrity using technologies like SELinux and Integrity Policy Enforcement (IPE), upstreamed in Linux kernel 6.12. With Trusted Launch enabled by default, OS Guard locks down the boot process and user space, ensuring only trusted binaries run. It’s the same hardened foundation that powers Microsoft’s own fleet, now available for your workloads.
Governance also matters. Azure Kubernetes Fleet Manager’s new Managed Namespaces feature gives platform teams a way to enforce resource quotas, networking policies, and RBAC across clusters without manual intervention. Immutable configurations mean tenants can’t override security baselines, helping organizations maintain compliance and control at scale.
Performance at scale
Networking is central to Kubernetes performance, and AKS is evolving to keep clusters fast and reliable. LocalDNS speeds up DNS resolution by handling queries locally on each node, eliminating bottlenecks and insulating workloads from upstream outages.
For high-scale and latency-sensitive applications, eBPF host routing moves routing logic into the kernel for fewer hops and higher throughput, while local redirect policy ensures traffic stays node-local whenever possible. We’ve also strengthened traffic control and observability in Azure Container Networking Services (ACNS) with Layer 7 policy for fine-grained application-level enforcement and container network metrics filtering to cut data noise and costs. These improvements, combined with new options like Pod CIDR expansion and cluster-wide Cilium policies, give operators the tools to scale without compromise.
AI-powered operations
AI workloads need more than GPU capacity—they need operational intelligence. We’re bringing agentic reasoning directly into the CLI with az aks agent, so operators can describe an issue in natural language and get targeted diagnostics and actionable fixes without hopping across tools. We’ve also streamlined model serving with integrated Model Context Protocol to connect models with external tools and data in real time. For teams running GPU fleets, built‑in GPU metrics and a managed device plugin reduce the toil of provisioning and monitoring, while scheduler profile configuration helps place the right workload on the right node for performance and cost.
To give customers more choice in how they run distributed AI, we also recently announced a partnership with Anyscale to deliver a managed Ray service on Azure, bringing a Python‑native engine for training, tuning, and inference to AKS‑backed clusters without the burden of control‑plane management.
Simplifying the Kubernetes experience
Running Kubernetes shouldn’t feel like a maze. We’re streamlining operations with features like one-click Cloud Shell in a VNet for private clusters, removing the need for custom VM setups, and enhancing scheduling flexibility with profile configuration for advanced workload placement.
We’re also continuing to invest in making AKS Automatic the easiest Kubernetes experience for developers and operators. Expect more updates at Microsoft Ignite next week, where we’ll share enhancements that make it even easier to get started with AKS.
See you at KubeCon + CloudNativeCon
I’m looking forward to connecting in Atlanta and hearing what’s top of mind for you. Here’s where you can find us:
-
Keynotes to catch:
- Scaling Smarter: Simplifying Multicluster AI with KAITO and KubeFleet with Jorge Palma, on November 13 at 9:49am.
- Cloud Native Back to the Future: The Road Ahead with Jeremy Rickard on November 13 at 9:56am.
-
Expo Theater demo:
- HolmesGPT: Agentic K8s troubleshooting in your terminal with Pavneet Singh Ahluwalia and Arik Alon (Robusta) on Wednesday Nov 12, 2:15pm-2:45pm.
-
Visit our booth (#500): Watch live demos, chat with experts, enjoy cool swag, and compete in Kubernetes Trivia for exclusive prizes!
-
Sessions worth bookmarking:
- Smarter Together: Orchestrating Multi-Agent AI Systems With A2A and MCP on Container
- Shaping LTS Together: What We’ve Learned the Hard Way
- Rage Against the Machine: Fighting AI Complexity with Kubernetes simplicity
- OpenTelemetry: Unpacking 2025, Charting 2026
- AI Models Are Huge, but Your GPUs Aren’t: Mastering multi-mode distributed inference on Kubernetes
- Drasi: A New Take on Change-driven Architectures
- How Comcast Leverages Radius in Their Internal Developer Platform
- Beyond ChatOps: Agentic AI in Kubernetes—What Works, What Breaks, and What’s Next
- GitHub Actions: Project Usage and Deep Dive
- Open Policy Agent (OPA) Intro & Deep Dive
- …and many more.
If you’re at the event, come say hello! We’d love to hear what you’re building and where Kubernetes needs to go next.