Running Kubernetes in the cloud is powerful—but if you’re not careful, it can also get expensive fast. In Azure Kubernetes Service, the largest portion of your bill usually comes from virtual machines, not Kubernetes itself.

In this post, we’ll break down practical, real-world techniques to reduce AKS costs—without compromising reliability where it matters. This guide is written for developers, DevOps engineers, and platform teams, and focuses on easy wins first, then slightly more advanced optimizations.


TL;DR

If you’re short on time, feel free to download the checklist and get on with your day. If you’ve got a few minutes to spare, grab a coffee ☕ and read on.

A clean, printable 24-point AKS Cost Optimization Checklist to audit Dev, Test, and Production clusters. Download the Checklist (PDF)


1. Understand Where AKS Costs Actually Come From

Before optimizing, it’s crucial to know what you’re paying for.

Behind every AKS cluster, Azure provisions several resources that do incur cost:

  • Virtual Machine Scale Sets (VMSS) for:
    • System node pools
    • User node pools
  • Load Balancers
  • Public IP addresses
  • Network traffic (especially cross region traffic)
  • Control plane Standard SKU (optional but recommended for production)
  • Private endpoint traffic (for private clusters)

Free AKS Components

Some components are free and don’t directly add to your bill:

  • AKS control plane (Free SKU)
  • Managed Identity
  • AKS extensions (CSI drivers, Open Service Mesh, etc.)

Additional “Hidden” Costs

In real architectures, AKS is rarely alone. Costs may also come from:

  • Virtual network peering (hub-and-spoke setups)
  • Application Gateway used as an Ingress Controller
  • Azure Key Vault for secrets and certificates
  • Log Analytics, Prometheus, Grafana
  • Persistent storage (Azure Disk, Azure Files, Blob)
  • AKS Backup extension and Backup Vaults

📌 Tip:
Use Azure Portal → Subscription → Cost Analysis to get a full picture of AKS-related spend.


2. Start With Universal Kubernetes Cost Optimizations

These techniques apply to any Kubernetes cluster, not just AKS.

Optimize Docker Images

  • Use smaller base images (Alpine, Distroless)
  • Remove unnecessary layers
  • Smaller images = faster pulls + less resource usage

Use Horizontal Pod Autoscaler (HPA)

  • Automatically scale pods based on CPU or memory
  • Prevent over-provisioning during low traffic

Enable Cluster Autoscaler

  • Automatically scales node pools up and down
  • Removes unused VMs when demand drops

These alone can significantly reduce waste.


3. Stop and Start AKS Clusters During Non-Working Hours

One of the simplest and most effective cost savers—especially for Dev/Test environments.

How It Works

  • When you stop an AKS cluster:
    • All node pool VMs (including control plane VMs) are stopped
    • Cluster state is saved in Azure
  • When you start it again:
    • The cluster is restored to its previous state

You can do this:

  • From the Azure Portal
  • Using Azure CLI
  • Inside a DevOps pipeline (automation-friendly)

When to Use This

✅ Dev and Test clusters
❌ Production clusters (should always be running)

💡 Savings Note:
This saves node pool VM costs, but not load balancers, public IPs, or ingress components.


4. Stop Only Specific Node Pools (User Node Pools)

You don’t always need to stop the entire cluster.

Key Rule

  • System node pools must stay running
  • User node pools can be stopped independently

This is perfect when:

  • Only some workloads are idle
  • You want partial cost savings without full downtime

You can stop/start node pools via:

  • Azure CLI (aks nodepool stop/start)
  • Azure Portal (Node Pool → Stop)

5. Choose the Right VM Size and Family

VM selection has a huge impact on cost.

VM Families to Consider

  • General Purpose
  • CPU-Optimized
  • Memory-Optimized
  • Disk-Optimized
  • GPU-based (expensive—use carefully)

Even within the same family:

  • Different VM sizes have different:
    • IOPS limits
    • Disk attachment limits
    • Pricing

📌 Best Practice:
Choose the smallest VM that meets your workload needs—and scale horizontally instead of vertically when possible.


6. Use Multiple Node Pools With Independent Autoscaling

AKS allows you to run:

  • One system node pool
  • Multiple user node pools

Each node pool can:

  • Use a different VM size
  • Scale independently

Why This Saves Money

  • Heavy workloads scale up only where needed
  • Idle pools scale down automatically
  • VM costs directly depend on VM count in each scale set

7. Free vs Standard Control Plane SKU

AKS offers two control plane options:

Free SKU (Default)

  • No SLA
  • Best for Dev/Test

Standard SKU

  • Comes with SLA
  • Costs about $70/month
  • Recommended for Production

✅ Enable Standard SKU only where reliability matters
❌ Avoid it in Dev/Test to save money


8. Use Spot Node Pools (Massive Savings)

Spot VMs can reduce VM costs by up to 80%.

When Spot Nodes Make Sense

  • Batch workloads
  • Stateless applications
  • Jobs that can tolerate interruptions

Things to Configure

  • Eviction policy
  • Maximum price per hour
  • Node size

Spot node pools can be used in:

  • Dev
  • Test
  • Production (with the right workloads)

⚠️ Warning:
Spot VMs can be evicted anytime—don’t use them for critical stateful services.


9. Azure-Wide Cost Reduction Techniques

These apply beyond AKS itself:

  • ARM-based processors (cheaper than Intel)
  • Azure Reservations (1-year or 3-year)
  • Azure Savings Plans (up to 65% savings)
  • Azure Hybrid Benefit (Windows workloads)
  • Azure Dev/Test subscriptions:
    • No SLA
    • Up to 50% cheaper
    • Ideal for non-production environments

10. Final Thoughts: Think Architecture, Not Just Kubernetes

AKS cost optimization isn’t about one single trick. It’s about combining:

  • Smart VM choices
  • Autoscaling
  • Spot instances
  • Environment-specific decisions (Prod vs Dev/Test)
  • Subscription and pricing model optimizations

Start with simple wins (stopping clusters, autoscaling), then gradually adopt advanced strategies as your platform matures.


References

🎥 Video Reference

This blog post and the downloadable checklist are based on the following video walkthrough:

The video demonstrates real AKS cost optimization techniques including autoscaling, spot node pools, control plane SKUs, and cluster scheduling.


AKS Pricing & Cost Visibility

Kubernetes & AKS Autoscaling

Node Pools & VM Choices

Stop / Start AKS & Node Pools

AKS Control Plane (Free vs Standard SKU)

Spot Virtual Machines

Reservations & Savings


Azure Hybrid Benefit

Azure Dev/Test Subscriptions

Networking, Ingress, Storage, Monitoring

AKS Cost Optimization & Architecture Guidance