Accelerating Cost Analysis with Amazon SageMaker Notebooks

In today's cloud-driven world, businesses are increasingly relying on cloud resources to power their applications and services. However, with great power comes great complexity, and managing the cost of cloud resources can be a daunting task. Enter Amazon SageMaker notebooks, a powerful tool for accelerating cost analysis and optimizing resource usage.

Understanding the Challenge

As organizations scale up their cloud presence, they often face the challenge of tracking and optimizing their cloud costs. This is especially true when dealing with Amazon Elastic Compute Cloud (Amazon EC2) instances, where the cost can quickly add up if not managed efficiently.

The traditional approach to cost analysis involves manually collecting instance data, retrieving pricing information, and performing calculations. This can be time-consuming and error-prone, leaving room for overspending or underutilization of resources.

The Solution: Amazon SageMaker Notebooks

Amazon SageMaker is a fully managed service that simplifies the process of building, training, and deploying machine learning models at scale. While SageMaker is renowned for its machine learning capabilities, it also offers a powerful environment for general-purpose computing, including cost analysis.

One of the key best practices when using SageMaker for cost analysis is leveraging IAM roles instead of access keys. IAM (Identity and Access Management) roles provide a secure way to grant permissions to SageMaker notebook instances without exposing access keys or credentials in your code.

How IAM Roles Enhance Security

IAM roles are AWS Identity and Access Management entities that define a set of permissions for making AWS service requests. These roles can be assumed by AWS services on your behalf, eliminating the need to manage access keys or secrets. When attached to a SageMaker notebook instance, IAM roles provide the following benefits:

1. Enhanced Security

  • No Access Key Exposure: With IAM roles, you don't need to store or manage access keys within your notebook code. This eliminates the risk of exposing sensitive credentials.

  • Fine-Grained Permissions: You can define precise permissions for the IAM role, ensuring that your notebook instance only has access to the resources it needs.

2. Simplified Credential Management

  • Automatic Credential Handling: SageMaker automatically manages the credentials for the attached IAM role. Your code can seamlessly use AWS SDKs without manually configuring credentials.

3. Auditing and Compliance

  • Traceability: Actions performed by the notebook instance are logged with the role's identity, providing traceability and accountability.

Implementing IAM Roles in Amazon SageMaker

To implement IAM roles for your SageMaker notebook instances, follow these steps:

  1. Create an IAM Role: Create an IAM role with the required permissions for your notebook instance. For example, if your notebook needs access to specific S3 buckets, attach an S3 policy to the role.

  2. Attach the IAM Role to the Notebook Instance: During the SageMaker notebook instance creation or configuration process, specify the IAM role that should be associated with it.

  3. Seamless Access to AWS Services: Your notebook code can use AWS SDKs or AWS CLI commands without the need for explicit access keys or credentials. SageMaker will automatically assume the IAM role's permissions.

Practical Example: Cost Analysis with SageMaker

Let's walk through a practical example of using SageMaker notebooks for cost analysis. In this scenario, we want to calculate the cost of running instances and visualize the results.

Sample Code:

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timezone
import boto3
import json

# Initialize AWS EC2 client
ec2_client = boto3.client('ec2', region_name='us-east-1')

# Initialize AWS Pricing client
pricing_client = boto3.client('pricing', region_name='us-east-1')

# Get a list of running instances and their types
def get_running_instances():
instances = []
try:
response = ec2_client.describe_instances(
Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
)
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instance_type = instance['InstanceType']
instances.append(instance_type)
except Exception as e:
print(f"Error listing running instances: {str(e)}")
return instances

running_instance_types = get_running_instances()

# Fetch pricing data for running instances
def fetch_pricing_data(instance_types):
pricing_data = {}
for instance_type in instance_types:
filters = [
{'Type': 'TERM_MATCH', 'Field': 'instanceType', 'Value': instance_type},
{'Type': 'TERM_MATCH', 'Field': 'preInstalledSw', 'Value': 'NA'},
{'Type': 'TERM_MATCH', 'Field': 'location', 'Value': 'US East (N. Virginia)'},
{'Type': 'TERM_MATCH', 'Field': 'operatingSystem', 'Value': 'Linux'},
{'Type': 'TERM_MATCH', 'Field': 'tenancy', 'Value': 'shared'},
{'Type': 'TERM_MATCH', 'Field': 'capacitystatus', 'Value': 'Used'}
]

try:
response = pricing_client.get_products(
ServiceCode='AmazonEC2',
Filters=filters + [{'Type': 'TERM_MATCH', 'Field': 'productFamily', 'Value': 'Compute Instance'}],
MaxResults=1
)
if response['PriceList']:
on_demand_details = json.loads(response['PriceList'][0])
for price_dimensions in on_demand_details['terms']['OnDemand'].values():
for price_data in price_dimensions['priceDimensions'].values():
if 'USD' in price_data['pricePerUnit']:
pricing_data[instance_type] = float(price_data['pricePerUnit']['USD'])
except Exception as e:
print(f"Error fetching pricing data for {instance_type}: {str(e)}")

return pricing_data

# Fetch pricing data for running instances
pricing = fetch_pricing_data(running_instance_types)

# Example data from your instances
data = get_running_instances()

# Convert to DataFrame
df = pd.DataFrame(data, columns=['InstanceType'])

# Calculate running hours and cost
def calculate_cost(row):
now = datetime.now(timezone.utc)
running_hours = 24 # Assuming 1 day for simplicity
return running_hours * pricing.get(row['InstanceType'], 0)

df['Cost'] = df.apply(calculate_cost, axis=1)

# Total cost analysis
total_cost = df['Cost'].sum()
print(f"Total Cost: ${total_cost:.2f}")

# Visualization
df.groupby('InstanceType')['Cost'].sum().plot(kind='bar')
plt.title('Cost Analysis by Running Instance Type per Day')
plt.xlabel('Instance Type')
plt.ylabel('Total Cost ($)')
plt.show()


In this example, we fetch running instances, retrieve pricing data using IAM roles, calculate the cost, and visualize the results—all without managing access keys.

Conclusion

Amazon SageMaker notebooks are a versatile tool that extends beyond machine learning tasks. By leveraging IAM roles, you can enhance the security and efficiency of your cost analysis workflows. Say goodbye to manual credential management and hello to automated, secure, and auditable cost analysis in the cloud.

Start accelerating your cost analysis today with Amazon SageMaker notebooks and IAM roles. Your cloud cost optimization journey just got a whole lot simpler and more secure.

Happy cost analyzing!

James Phipps 11 November, 2023
Share this post
Tags
Archive
Sign in to leave a comment

  


Building you first Interactive Business Tree of Success with D3.js
Data Visualization