January 15, 2024
4 min read

Achieving 40% Cost Reduction: A Real-World AWS Optimization Case Study

How I led a comprehensive cloud cost optimization initiative that reduced monthly AWS expenditure by 40% through automation, monitoring, and strategic resource management.

AWSCost OptimizationAutomationCloudabilityTerraform

Achieving 40% Cost Reduction: A Real-World AWS Optimization Case Study

When I joined the team at NAB (National Australia Bank), I was immediately struck by the potential for significant cost savings in our AWS infrastructure. What started as routine cost analysis quickly became a comprehensive optimization initiative that would save the organization hundreds of thousands of dollars annually.

The Challenge

Our data platform was serving multiple business units with varying workloads, but we lacked visibility into:

  • Resource utilization patterns
  • Idle or oversized instances
  • Unused storage and services
  • Development environment costs

The monthly AWS bill was growing steadily, and stakeholders needed concrete strategies to optimize spending without compromising performance.

The Solution: Multi-Pronged Approach

1. Comprehensive Cost Visibility

First, I implemented Cloudability for detailed cost tracking and analysis:

# Automated cost analysis script
import boto3
import pandas as pd
from datetime import datetime, timedelta

def analyze_aws_costs():
    ce_client = boto3.client('ce')
    
    # Get costs for the last 30 days
    end_date = datetime.now()
    start_date = end_date - timedelta(days=30)
    
    response = ce_client.get_cost_and_usage(
        TimePeriod={
            'Start': start_date.strftime('%Y-%m-%d'),
            'End': end_date.strftime('%Y-%m-%d')
        },
        Granularity='DAILY',
        Metrics=['BlendedCost'],
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'SERVICE'},
            {'Type': 'DIMENSION', 'Key': 'REGION'}
        ]
    )
    
    return process_cost_data(response)

2. Automated Resource Management

I created automated scripts to manage non-production environments:

#!/bin/bash
# Automated start/stop for non-production instances

# Stop instances after business hours
aws ec2 describe-instances \
    --filters "Name=tag:Environment,Values=dev,test" \
    --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0]]' \
    --output text | while read instance_id name; do
    if [[ "$name" == *"dev"* ]] || [[ "$name" == *"test"* ]]; then
        aws ec2 stop-instances --instance-ids $instance_id
        echo "Stopped instance: $name ($instance_id)"
    fi
done

3. Right-Sizing Analysis

Using AWS Cost Explorer and custom scripts, I identified oversized instances:

def right_size_instances():
    ec2_client = boto3.client('ec2')
    cloudwatch = boto3.client('cloudwatch')
    
    # Get all running instances
    instances = ec2_client.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    
    recommendations = []
    
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']
            instance_type = instance['InstanceType']
            
            # Get CPU utilization metrics
            cpu_metrics = cloudwatch.get_metric_statistics(
                Namespace='AWS/EC2',
                MetricName='CPUUtilization',
                Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                StartTime=datetime.now() - timedelta(days=30),
                EndTime=datetime.now(),
                Period=86400,
                Statistics=['Average']
            )
            
            avg_cpu = calculate_average_cpu(cpu_metrics)
            recommendations.append({
                'instance_id': instance_id,
                'current_type': instance_type,
                'avg_cpu': avg_cpu,
                'recommended_action': get_recommendation(avg_cpu)
            })
    
    return recommendations

Key Strategies Implemented

1. Scheduled Automation

  • Non-production instances automatically stopped after 6 PM
  • Development environments started only when needed
  • Weekly cleanup of unused resources

2. Storage Optimization

  • Implemented S3 lifecycle policies
  • Moved infrequently accessed data to cheaper storage classes
  • Automated cleanup of temporary files

3. Reserved Instance Planning

  • Analyzed usage patterns to identify candidates for Reserved Instances
  • Implemented a 3-year RI strategy for predictable workloads

4. Monitoring and Alerting

# CloudWatch alarm for cost anomalies
CostAnomalyDetection:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: HighCostAlert
    MetricName: EstimatedCharges
    Namespace: AWS/Billing
    Statistic: Maximum
    Period: 86400
    EvaluationPeriods: 1
    Threshold: 1000  # Alert if daily cost exceeds $1000
    ComparisonOperator: GreaterThanThreshold

Results: Measurable Impact

The optimization initiative delivered impressive results:

  • 40% reduction in monthly cloud expenditure
  • 60% improvement in resource utilization
  • 90% automation of cost control processes
  • Zero impact on application performance

Before vs After

| Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Monthly AWS Cost | $50,000 | $30,000 | 40% reduction | | Resource Utilization | 35% | 56% | 60% improvement | | Manual Processes | 15 hours/week | 1.5 hours/week | 90% automation | | Cost Visibility | Limited | Comprehensive | Full transparency |

Lessons Learned

1. Start with Visibility

You can't optimize what you can't measure. Comprehensive monitoring is the foundation of any cost optimization strategy.

2. Automate Early and Often

Manual processes are error-prone and don't scale. Automation ensures consistency and reduces human error.

3. Communicate Value

Regular stakeholder updates with concrete metrics help maintain support for optimization initiatives.

4. Balance Cost and Performance

Cost optimization should never compromise application performance or user experience.

Tools and Technologies Used

  • AWS Cost Explorer - Cost analysis and forecasting
  • Cloudability - Third-party cost management
  • AWS Lambda - Automation scripts
  • CloudWatch - Monitoring and alerting
  • Terraform - Infrastructure as Code
  • Python/Boto3 - AWS API automation

Conclusion

This case study demonstrates that significant cost savings are achievable through systematic analysis, automation, and strategic resource management. The key is to start with comprehensive visibility, implement automation early, and maintain a balance between cost optimization and performance.

The 40% cost reduction not only saved money but also improved our team's efficiency and provided valuable insights into our infrastructure usage patterns. This foundation enabled us to make more informed decisions about future infrastructure investments.


Have you implemented similar cost optimization strategies? I'd love to hear about your experiences and challenges. Feel free to reach out on LinkedIn or via email at ajit.kanoli@gmail.com.

Related articles