EKS Cost Optimization Playbook: From Visibility to Savings

EKS Cost Optimization Playbook: From Visibility to Savings

EKS Cost Optimization Playbook: From Visibility to Savings

Published by

Vishnu Siddarth

on

Jan 18, 2026

Introduction

If you run production workloads on Amazon EKS, you likely have untapped savings sitting in your cluster right now. Finding them requires more than high-level advice. This playbook combines observability through AWS tools, hands-on autoscaler configurations (Karpenter, Cluster Autoscaler, HPA, VPA), a decision matrix for Spot versus Fargate versus Reserved capacity, and a step-by-step rollout plan with real before-and-after numbers.

Before & After: Real Savings in 90 Days

A typical mid-size engineering team running 200 pods across 40 m5.large nodes spending $8,400 monthly can expect these results after implementing this playbook:

Before: $8,400/month

  • 40 On-Demand m5.large nodes at 25% average utilization

  • Oversized pod requests

  • No autoscaling

  • Dev clusters running 24/7

After: $2,800/month

  • 15 nodes via Karpenter (70% Spot, 30% On-Demand with Savings Plans)

  • Right-sized requests

  • Dev clusters scaled to zero overnight

Net Savings: $5,600/month (67% reduction)

This isn't hypothetical. Most teams implementing this playbook end-to-end achieve 50-70% cost reduction within three months.

Key Highlights

  • EKS cost waste primarily comes from oversized pods, idle EC2 nodes, orphaned EBS volumes, and unnecessary cross-AZ traffic

  • AWS Split Cost Allocation Data combined with native billing tools enables pod-level cost visibility and accurate attribution

  • Right-sizing pod requests typically cuts 30 to 40 percent of node capacity requirements

  • Karpenter delivers faster, cheaper autoscaling than Cluster Autoscaler and reduces compute costs by 30 to 50 percent

  • Spot Instances offer 60 to 90 percent savings for tolerant workloads when paired with proper PDBs and diversification

  • A hybrid compute strategy using On-Demand baseline, Spot variability, and Savings Plans commitments produces the largest long-term ROI

  • Most teams achieve 50 to 70 percent EKS savings in three months by implementing this playbook end to end

Why EKS Costs Spiral: Anatomy of Spend

The single biggest driver of EKS cost waste is mis-sized pod resource requests. When developers overestimate CPU and memory needs, often requesting 4 cores when workloads use only 1, Kubernetes provisions additional nodes to satisfy those inflated requests. This creates a cascade effect: more nodes lead to higher EC2 charges, more EBS volumes, increased cross-AZ traffic, and lower overall cluster utilization. Fixing pod resource requests first makes everything downstream significantly cheaper.

Understanding Cost Components

EKS spending breaks down into several key components, each with its own cost leaks:

EC2 Compute forms the largest expense category. When teams overprovision instance types or leave nodes running during idle periods, waste compounds quickly. A single m5.large instance running 24/7 costs roughly $70 monthly. Multiply that by dozens of underutilized nodes, and you're looking at thousands in unnecessary spend.

EBS Volumes accumulate silently. Developers provision persistent volumes for stateful workloads, but those volumes persist even after pods terminate, requiring regular S3 Cost Optimization and storage lifecycle controls across clusters. Orphaned EBS volumes charging $0.10 per GB-month add up to significant waste over time.

Data Transfer Costs catch teams off guard. Cross-AZ traffic within your VPC costs $0.01 per GB in each direction. For data-intensive applications processing terabytes monthly, this becomes a five-figure line item. NAT Gateway processing fees add another layer, charging $0.045 per GB processed.

EKS Control Plane costs $0.10 per hour per cluster, totaling $73 monthly. While this fee stays constant, running multiple clusters for different environments multiplies this base cost unnecessarily.

Common Cost Leaks

  • Oversized pod resource requests force Kubernetes to provision more nodes than workloads actually need. A pod requesting 4 CPU cores but using only 1 core wastes 75% of allocated capacity.

  • Abandoned workloads from completed experiments continue consuming resources months after teams stop caring about them.

  • Noisy neighbor problems arise when resource-intensive pods land on shared nodes, forcing other workloads onto additional instances.

Quick Decision Matrix: Workload → Autoscaler → Compute

Workload Type

Best Autoscaler

Compute Strategy

Expected Savings

Stateless web apps

Karpenter + HPA

70% Spot, 30% On-Demand

60-75%

Batch/CI-CD

Karpenter

90% Spot, 10% On-Demand

75-85%

Databases/stateful

Cluster Autoscaler

100% On-Demand + Savings Plans

40-50%

ML inference

Karpenter

50% Spot, 50% On-Demand

50-60%

Low-traffic services

Fargate

Pay-per-use

20% (ops efficiency gain)

Development/staging

Karpenter

100% Spot, scale-to-zero

80-90%

Fargate Trade-off: ~20% higher compute cost for 80% reduction in operational overhead. Choose when operational simplicity justifies the premium.

Observe: Visibility and Allocation

You cannot optimize what you cannot measure. Establishing cost visibility requires connecting AWS billing data with Kubernetes resource allocation using structured Cloud Cost Allocation Methods.

Enable AWS Split Cost Allocation Data

AWS Split Cost Allocation Data for EKS provides native, granular cost visibility down to the pod level. As of October 2025, this feature now supports importing up to 50 Kubernetes custom labels per pod as cost allocation tags.

Setup Steps:

  1. Navigate to AWS Billing Console → Cost Management Preferences

  2. Enable Split Cost Allocation Data for Amazon EKS

  3. Choose your measurement method:

    • Resource requests (recommended): Allocates costs based on CPU/memory requests

    • Amazon Managed Service for Prometheus: Uses actual utilization metrics

    • CloudWatch Container Insights: Provides more granular visibility


  4. For accelerated computing workloads (GPU, Trainium, Inferentia): Split cost allocation automatically tracks these resources alongside CPU and memory (feature added September 2025)

  5. Enable Cost and Usage Reports (CUR) delivery to an S3 bucket with hourly granularity

Configure Cost Visualization

After enabling split cost allocation (data appears within 24 hours):

Use AWS Cost Explorer for basic visualization:

  • Filter by aws:eks:cluster-name, aws:eks:namespace, aws:eks:workload-name

  • Group costs by custom Kubernetes labels imported as tags

  • Set up Cost Anomaly Detection for automatic alerts

For Advanced Analysis:

  • Deploy the Containers Cost Allocation Dashboard in Amazon QuickSight

  • Query CUR data using Amazon Athena for custom reports

  • Create Cost Categories to map Kubernetes costs to business units

Map Billing to Kubernetes

Establish consistent labeling across all workloads:

metadata:
  labels:
    team: platform
    environment: production
    app: api-gateway
    cost-center: engineering

These labels automatically appear in CUR as resourceTags/kubernetes.io/team, resourceTags/kubernetes.io/environment, etc., enabling precise cost allocation and chargeback.

Quick Wins: Low-Effort High-Impact Changes

Several optimization tactics deliver immediate savings with minimal implementation effort.

1. Tune Resource Requests and Limits

Analyze actual pod consumption using kubectl top pods to compare requested resources against actual usage. A pod requesting 2 CPU cores but averaging 0.5 cores wastes 75% of its allocation.

resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: 1000m
    memory: 1Gi

This change alone can reduce required node capacity by 30-40% in over-provisioned clusters.

2. Scale Down Development Clusters

Development and testing environments rarely need full capacity overnight. Implement cluster schedulers or simple cron jobs:

# Scale down at 7 PM
0 19 * * * kubectl scale deployment --all --replicas=1 -n development

# Scale up at 7 AM 
0 7 * * * kubectl scale deployment --all --replicas=3 -n development

3. Remove Abandoned Workloads

Audit deployments that haven't been updated in 90+ days:

kubectl get deployments --all-namespaces \
  --sort-by=.metadata.creationTimestamp

This housekeeping typically recovers 10-15% of cluster capacity.

4. Choose Right-Sized Node Types

Analyze workload patterns to select optimal instances:

  • Memory-intensive: r5 instances provide more RAM per dollar

  • Compute-bound: c5 instances offer more CPU capacity

  • Burstable workloads: t3/t4g instances save costs during idle periods

Autoscaling Deep Dive: Karpenter vs Cluster Autoscaler

Cluster Autoscaler: Stable and Predictable

The traditional choice that monitors pending pods and scales Auto Scaling Groups.

Setup:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  replicas: 1
  template:
    spec:
      containers:
      - image: registry.k8s.io/autoscaling/cluster-autoscaler:v1.33.0
        name: cluster-autoscaler
        command:
          - ./cluster-autoscaler
          - --cloud-provider=aws
          - --skip-nodes-with-local-storage=false
          - --expander=least-waste
          - --scale-down-utilization-threshold=0.5

Characteristics:

  • Scale-up latency: 2-5 minutes

  • Average utilization: 25-40%

  • Works within predefined node groups

  • Mature, stable, multi-cloud portable

Karpenter: Fast and Flexible (Recommended for AWS)

Provisions nodes directly via EC2 APIs, selecting optimal instance types based on pending pod requirements. As of July 2025, Karpenter v1.5 introduced faster bin-packing and "emptiness-first" consolidation.

Basic NodePool Configuration:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5a.large", "m5n.large"]
  limits:
    cpu: 1000
    memory: 1000Gi


Performance Advantages:

  • Scale-up latency: 45-60 seconds (vs 2-5 minutes for CA)

  • Average utilization: 60-70% (vs 25-40% for CA)

  • Cost reduction: Real-world teams report 20-30% cluster-wide savings, up to 90% for CI/CD workloads

  • Intelligent bin-packing: Consolidates workloads aggressively without breaking SLOs

Decision Framework

Choose Cluster Autoscaler when:

  • You need multi-cloud portability

  • Regulatory requirements mandate managed node group controls

  • Workload patterns are highly predictable

  • Team prefers proven, stable technology

Choose Karpenter when:

  • Optimizing aggressively for cost on AWS

  • Requiring fast scale-up response (<60s)

  • Running substantial Spot Instance workloads

  • Need flexible instance type selection

Horizontal and Vertical Pod Autoscalers

HPA scales pod replicas based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

VPA adjusts pod resource requests automatically:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  updatePolicy:
    updateMode: "Auto"

VPA analyzes historical usage and right-sizes requests, eliminating guesswork.

Compute Choice Matrix: Spot, On-Demand, Fargate, and Commitments

Spot Instances: 60-90% Savings

Spot Instances offer dramatic discounts on spare EC2 capacity. They work exceptionally well for fault-tolerant workloads like batch processing, CI/CD pipelines, and stateless web services.

Configure Spot-friendly NodePools with diversification:

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-pool
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values: ["m5.large", "m5a.large", "m5n.large", "c5.large"]
      taints:
        - key: workload-type
          value: spot
          effect: NoSchedule

Implement Pod Disruption Budgets:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

Real-world Spot adoption typically delivers 60-75% cost reduction for suitable workloads.

On-Demand Instances: Guaranteed Availability

Use for critical workloads requiring guaranteed availability:

  • Databases and stateful applications

  • Message queues

  • Application pods that cannot tolerate interruptions

Price predictability and instant availability justify the premium cost.

AWS Fargate: Zero Infrastructure Management

Fargate eliminates node management entirely, charging only for pod vCPU and memory consumption.

Pricing (as of December 2025):

  • $0.04048 per vCPU-hour

  • $0.004445 per GB-hour

Trade-off: Approximately 20% higher compute costs than equivalent EC2 instances, but removes 80% of operational overhead—no node patching, no capacity planning, no autoscaler configuration.

Consider Fargate when:

  • Operational simplicity justifies the premium

  • Running low-traffic services or scheduled jobs

  • Small teams without dedicated infrastructure engineers

  • Need instant scaling without pre-provisioned capacity

For cost-sensitive production workloads at scale, EC2 with Karpenter delivers better economics.

Savings Plans and Reserved Instances: 40-72% Discounts

Apply commitments to baseline capacity:

Compute Savings Plans: Flexibility across instance families and regions (40-50% discount)

Reserved Instances: Slightly higher discounts (up to 72%) but locked to specific instance types

Strategy: Purchase Savings Plans for predictable baseline capacity, use Spot for variable demand.

Tagging and Cost Allocation Best Practices

Establish a tagging taxonomy covering:

  • Team: Engineering team responsible

  • Environment: Production, staging, development

  • Application: Specific service name

  • Cost-center: Business unit for chargeback

apiVersion: v1
kind: Pod
metadata:
  labels:
    team: platform
    environment: production
    app: api-gateway
    cost-center: engineering

AWS Split Cost Allocation automatically imports these labels as tags with the prefix resourceTags/kubernetes.io/, enabling filtered analysis in Cost Explorer and CUR queries.

KPIs, Alerts, and Guardrails for FinOps

Track These Metrics

  1. Cost per namespace: Monthly spend attributed to each Kubernetes namespace

  2. Cost per pod: Average cost per running pod

  3. CPU/Memory efficiency: Ratio of utilized resources to requested resources

  4. Idle node percentage: Proportion of provisioned capacity unused

  5. Wasted spend: Dollar value of over-provisioned resources

Configure Budget Alerts

Use AWS Budgets with Cost Anomaly Detection:

# Set budget threshold
aws budgets create-budget \
  --account-id 123456789 \
  --budget file://budget-config.json

Set Karpenter Limits

Prevent runaway scaling:

spec:
  limits:
    cpu: "1000"
    memory: "1000Gi"

Get Expert Help

Organizations implementing this playbook typically reduce EKS costs by 50-70% within three months. The combination of visibility, right-sizing, intelligent autoscaling, and strategic use of Spot Instances delivers sustainable savings without compromising reliability.

Ready to uncover your hidden EKS savings? Visit Opsolute or book a free assessment to get started.

Opsolute provides comprehensive EKS cost optimization as a service, combining automated analysis, expert recommendations, and hands-on implementation support to help you achieve these savings faster with less risk.