Skip to main content

Individual Lambda Blue-Green Deployment Pattern

A generic pattern for blue-green deployment of individual AWS Lambda functions using Lambda aliases and weighted routing. This approach allows independent deployment and rollback of single functions without affecting other services.

Architecture

Core Concepts

Lambda Aliases

Lambda aliases are pointers to specific Lambda versions that support:

  • Version pinning: Point to $LATEST or specific version (e.g., v42)
  • Weighted routing: Split traffic between two versions (blue/green)
  • Stable endpoint: API Gateway integrates with alias, not function directly
  • Independent rollback: Each function manages its own deployment

Weighted Routing

// Alias points to two versions with traffic weights
alias.addVersion(blueVersion, 0.9); // 90% traffic to blue (current)
alias.addVersion(greenVersion, 0.1); // 10% traffic to green (new)

Lambda Destinations

Lambda Destinations send invocation results to EventBridge:

  • Success destination: Invocation completed successfully
  • Failure destination: Invocation threw an error
  • Enables real-time monitoring without CloudWatch Logs polling

Error Rate Determination

Error rate is calculated by monitoring Lambda invocations over a time window:

Error Rate (%) = (Failed Invocations / Total Invocations) × 100

Data Sources:

  1. Lambda Destinations → EventBridge (Recommended):

    • Real-time events for success/failure
    • Event contains: functionArn, version, requestId, responsePayload
    • Aggregated by monitoring Lambda
    • Most accurate and immediate
  2. CloudWatch Metrics:

    • AWS/Lambda namespace
    • Metrics: Errors, Invocations
    • Filtered by FunctionName and ExecutedVersion dimensions
    • 1-minute granularity (can have delay)

Monitoring Window:

  • Typical: 5-minute rolling window
  • Accumulates invocations over this period
  • Calculates error rate when window closes
  • Example: In 5 minutes, 100 invocations, 8 errors = 8% error rate

Thresholds:

  • Error rate threshold: Typically 5% (configurable)
  • Minimum invocations: At least 5 invocations required (prevents false positives)
  • Example: If threshold is 5% and you have 8% error rate → trigger rollback

EventBridge Event Structure:

{
"version": "1.0",
"timestamp": "2026-01-26T12:00:00Z",
"requestContext": {
"requestId": "abc-123",
"functionArn": "arn:aws:lambda:us-east-1:123456789012:function:MyFunction:42",
"condition": "Success",
"approximateInvokeCount": 1
},
"requestPayload": { ... },
"responseContext": {
"statusCode": 200,
"executedVersion": "42"
},
"responsePayload": { ... }
}

Monitoring Lambda Logic:

// Aggregate invocations from EventBridge events
interface InvocationStats {
totalCount: number;
errorCount: number;
windowStart: number;
}

const stats: Map<string, InvocationStats> = new Map();
const WINDOW_DURATION_MS = 5 * 60 * 1000; // 5 minutes
const ERROR_THRESHOLD = 5.0; // 5%
const MIN_INVOCATIONS = 5;

async function handleLambdaEvent(event: any) {
const functionArn = event.detail.requestContext.functionArn;
const version = event.detail.responseContext.executedVersion;
const isSuccess = event.detail.requestContext.condition === 'Success';
const key = `${functionArn}:${version}`;

// Initialize or get stats for this function version
if (!stats.has(key)) {
stats.set(key, {
totalCount: 0,
errorCount: 0,
windowStart: Date.now(),
});
}

const stat = stats.get(key)!;
stat.totalCount++;
if (!isSuccess) {
stat.errorCount++;
}

// Check if window is complete
const windowAge = Date.now() - stat.windowStart;
if (windowAge >= WINDOW_DURATION_MS) {
await evaluateErrorRate(key, stat);
stats.delete(key); // Reset for next window
}
}

async function evaluateErrorRate(key: string, stat: InvocationStats) {
// Require minimum invocations to avoid false positives
if (stat.totalCount < MIN_INVOCATIONS) {
console.log(`Skipping - only ${stat.totalCount} invocations (need ${MIN_INVOCATIONS})`);
return;
}

const errorRate = (stat.errorCount / stat.totalCount) * 100;
console.log(`Error rate: ${errorRate.toFixed(2)}% (${stat.errorCount}/${stat.totalCount})`);

if (errorRate > ERROR_THRESHOLD) {
console.log(`⚠️ Error rate ${errorRate.toFixed(2)}% exceeds threshold ${ERROR_THRESHOLD}%`);
await triggerRollbackCheck(key, errorRate);
} else {
console.log(`✅ Error rate ${errorRate.toFixed(2)}% is below threshold`);
}
}

Alternative: CloudWatch Metrics Query

import { CloudWatchClient, GetMetricStatisticsCommand } from '@aws-sdk/client-cloudwatch';

const cloudwatch = new CloudWatchClient({});

async function getErrorRate(
functionName: string,
version: string,
windowMinutes: number = 5
): Promise<{ errorRate: number; totalInvocations: number; errors: number }> {
const endTime = new Date();
const startTime = new Date(endTime.getTime() - windowMinutes * 60 * 1000);

// Get total invocations
const invocationsResponse = await cloudwatch.send(
new GetMetricStatisticsCommand({
Namespace: 'AWS/Lambda',
MetricName: 'Invocations',
Dimensions: [
{ Name: 'FunctionName', Value: functionName },
{ Name: 'ExecutedVersion', Value: version },
],
StartTime: startTime,
EndTime: endTime,
Period: windowMinutes * 60,
Statistics: ['Sum'],
})
);

// Get errors
const errorsResponse = await cloudwatch.send(
new GetMetricStatisticsCommand({
Namespace: 'AWS/Lambda',
MetricName: 'Errors',
Dimensions: [
{ Name: 'FunctionName', Value: functionName },
{ Name: 'ExecutedVersion', Value: version },
],
StartTime: startTime,
EndTime: endTime,
Period: windowMinutes * 60,
Statistics: ['Sum'],
})
);

const totalInvocations = invocationsResponse.Datapoints?.[0]?.Sum || 0;
const errors = errorsResponse.Datapoints?.[0]?.Sum || 0;
const errorRate = totalInvocations > 0 ? (errors / totalInvocations) * 100 : 0;

return { errorRate, totalInvocations, errors };
}

// Usage
const { errorRate, totalInvocations, errors } = await getErrorRate('MyFunction', '42', 5);
console.log(`Error rate: ${errorRate.toFixed(2)}% (${errors}/${totalInvocations})`);

if (totalInvocations >= 5 && errorRate > 5.0) {
console.log('Triggering rollback...');
}

Why Minimum Invocations Matter:

Without minimum invocations, low traffic causes false positives:

  • 1 invocation, 1 error = 100% error rate → false rollback
  • 2 invocations, 1 error = 50% error rate → false rollback
  • 10 invocations, 1 error = 10% error rate → might rollback
  • 100 invocations, 8 errors = 8% error rate → legitimate rollback

Recommended: Minimum 5 invocations ensures statistical significance.

Lock-In Period (Canary Release Window)

After deploying a new Lambda version, automatic rollback is only enabled during the lock-in period (default: 60 minutes):

  • Within lock-in: High error rate triggers automatic rollback to last known good version
  • After lock-in: Version is considered stable, automatic rollback disabled
  • Reasoning: Errors after lock-in period are likely environmental, not code-related
  • Manual rollback: Always available via AWS CLI or console

This prevents unnecessary rollbacks for transient infrastructure issues while protecting against bad deployments.

Pattern Implementation

1. Lambda Function with Blue-Green Support

import { Construct } from 'constructs';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { Runtime, Alias, Version } from 'aws-cdk-lib/aws-lambda';
import { EventBus } from 'aws-cdk-lib/aws-events';
import { EventBridgeDestination } from 'aws-cdk-lib/aws-lambda-destinations';

export interface BlueGreenLambdaProps {
readonly functionName: string;
readonly entry: string;
readonly handler?: string;
readonly aliasName: string;
readonly eventBus: EventBus;
readonly blueWeight?: number; // 0.0 to 1.0
readonly greenWeight?: number; // 0.0 to 1.0
}

export class BlueGreenLambda extends Construct {
public readonly lambdaFunction: NodejsFunction;
public readonly productionAlias: Alias;
public readonly currentVersion: Version;

constructor(scope: Construct, id: string, props: BlueGreenLambdaProps) {
super(scope, id);

// Create Lambda function
this.lambdaFunction = new NodejsFunction(this, 'Function', {
functionName: props.functionName,
runtime: Runtime.NODEJS_22_X,
entry: props.entry,
handler: props.handler || 'handler',
timeout: Duration.seconds(30),
memorySize: 256,
environment: {
NODE_ENV: 'production',
},
// Lambda Destinations for event-based monitoring
onSuccess: new EventBridgeDestination(props.eventBus),
onFailure: new EventBridgeDestination(props.eventBus),
});

// Create version for this deployment
this.currentVersion = this.lambdaFunction.currentVersion;

// Create or update alias with weighted routing
this.productionAlias = new Alias(this, 'ProductionAlias', {
aliasName: props.aliasName,
version: this.currentVersion,
// Optional: Add additional version with weight for blue-green
// additionalVersions: [
// { version: previousVersion, weight: props.blueWeight || 0.9 },
// ],
});

// Grant permissions for EventBridge
props.eventBus.grantPutEventsTo(this.lambdaFunction);
}

/**
* Update alias to route traffic between blue and green versions
*/
public updateWeights(blueVersion: Version, blueWeight: number, greenVersion: Version, greenWeight: number) {
// Note: In CDK, this is done via UpdateAlias API at runtime
// CDK constructs define infrastructure, not runtime operations
console.log(`Update alias weights: Blue=${blueWeight}, Green=${greenWeight}`);
}
}

2. API Gateway Integration

Always integrate with the alias, not the function directly:

import { LambdaIntegration } from 'aws-cdk-lib/aws-apigateway';

// ❌ WRONG: Direct function integration
const wrongIntegration = new LambdaIntegration(lambda.lambdaFunction);

// ✅ CORRECT: Alias integration
const correctIntegration = new LambdaIntegration(lambda.productionAlias, {
proxy: true,
timeout: Duration.seconds(29),
});

// Add to API Gateway
const resource = api.root.addResource('myendpoint');
resource.addMethod('GET', correctIntegration);

3. Deployment Workflow

Initial Deployment (Version 1)

# Deploy function
cdk deploy

# Result:
# - Lambda function created
# - Version v1 created and published
# - Alias "production" points to v1 (100% weight)
# - API Gateway integrates with alias

Deploy New Version (Blue-Green)

# Option 1: Instant switch (no weights)
cdk deploy

# Result:
# - Version v2 created
# - Alias "production" updated to point to v2 (100%)
# - API Gateway automatically routes to v2
# - v1 remains available for rollback

# Option 2: Gradual rollout with weights (requires AWS CLI)
aws lambda update-alias \
--function-name MyFunction \
--name production \
--routing-config "AdditionalVersionWeights={\"2\"=0.1}"

# Result:
# - 90% traffic to v1 (blue)
# - 10% traffic to v2 (green)

4. Weighted Routing Management

# Get current alias configuration
aws lambda get-alias \
--function-name MyFunction \
--name production

# Start with 10% traffic to new version
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 2 \
--routing-config "AdditionalVersionWeights={\"1\"=0.9}"

# Increment to 50%
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 2 \
--routing-config "AdditionalVersionWeights={\"1\"=0.5}"

# Promote to 100% (remove weights)
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 2

# Rollback to v1
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 1

5. Automated Monitoring with EventBridge

import { Rule } from 'aws-cdk-lib/aws-events';
import { LambdaFunction } from 'aws-cdk-lib/aws-events-targets';

// Monitor for Lambda failures
const errorRule = new Rule(this, 'LambdaErrorRule', {
eventBus: eventBus,
eventPattern: {
source: ['lambda.destination'],
detailType: ['Lambda Function Invocation Result - Failure'],
detail: {
requestContext: {
functionArn: [lambda.lambdaFunction.functionArn],
},
},
},
});

// Trigger monitoring Lambda
errorRule.addTarget(new LambdaFunction(errorMonitorLambda));

Complete Example

Stack Implementation

import { Stack, StackProps, Duration } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import { RestApi, LambdaIntegration } from 'aws-cdk-lib/aws-apigateway';
import { EventBus } from 'aws-cdk-lib/aws-events';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { Runtime, Alias } from 'aws-cdk-lib/aws-lambda';
import { EventBridgeDestination } from 'aws-cdk-lib/aws-lambda-destinations';

export class MyServiceStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);

// Create EventBus for monitoring
const eventBus = new EventBus(this, 'EventBus', {
eventBusName: 'my-service-events',
});

// Create Lambda function
const myFunction = new NodejsFunction(this, 'MyFunction', {
functionName: 'MyFunction',
runtime: Runtime.NODEJS_22_X,
entry: './src/lambda/myFunction/src/index.ts',
handler: 'handler',
timeout: Duration.seconds(30),
memorySize: 256,
onSuccess: new EventBridgeDestination(eventBus),
onFailure: new EventBridgeDestination(eventBus),
});

// Create production alias
const productionAlias = new Alias(this, 'ProductionAlias', {
aliasName: 'production',
version: myFunction.currentVersion,
});

// Grant EventBridge permissions
eventBus.grantPutEventsTo(myFunction);

// Create API Gateway
const api = new RestApi(this, 'Api', {
restApiName: 'My Service API',
deployOptions: {
stageName: 'prod',
},
});

// Integrate with alias (not function)
const integration = new LambdaIntegration(productionAlias, {
proxy: true,
});

// Add endpoint
api.root.addResource('myendpoint').addMethod('GET', integration);
}
}

Lambda Handler

import { APIGatewayProxyEvent, APIGatewayProxyResult, Context } from 'aws-lambda';

export async function handler(
event: APIGatewayProxyEvent,
context: Context
): Promise<APIGatewayProxyResult> {
console.log('Event:', JSON.stringify(event, null, 2));
console.log('Context:', JSON.stringify(context, null, 2));

try {
// Your business logic here
const result = {
message: 'Success',
version: context.functionVersion,
requestId: context.requestId,
timestamp: new Date().toISOString(),
};

return {
statusCode: 200,
headers: {
'Content-Type': 'application/json',
'X-Function-Version': context.functionVersion,
},
body: JSON.stringify(result),
};
} catch (error) {
console.error('Error:', error);

// Lambda Destinations will capture this failure
throw error;
}
}

Rollback Strategies

1. Instant Rollback (Update Alias)

# Rollback to previous version
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 1 # Previous version

# Result: < 1 second traffic switch

2. Gradual Rollback (Reverse Weights)

# Reduce new version to 0%
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 1 \
--routing-config "AdditionalVersionWeights={\"2\"=0}"

# Remove weight configuration
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 1

3. Automated Rollback with Lock-In Period

// Monitoring Lambda checks error rate and rolls back automatically (within lock-in period)
import { LambdaClient, UpdateAliasCommand } from '@aws-sdk/client-lambda';
import { SSMClient, GetParameterCommand, PutParameterCommand } from '@aws-sdk/client-ssm';

const lambda = new LambdaClient({});
const ssm = new SSMClient({});

const LOCK_IN_PERIOD_MS = 60 * 60 * 1000; // 60 minutes

async function checkLockInPeriod(
functionName: string
): Promise<{ withinLockIn: boolean; minutesSinceDeployment: number }> {
const timestampParam = await ssm.send(
new GetParameterCommand({
Name: `/lambda/${functionName}/deployment-timestamp`,
})
);

const deploymentTime = parseInt(timestampParam.Parameter?.Value || '0', 10);
const currentTime = Date.now();
const timeSinceDeployment = currentTime - deploymentTime;
const minutesSinceDeployment = Math.floor(timeSinceDeployment / 1000 / 60);

return {
withinLockIn: timeSinceDeployment < LOCK_IN_PERIOD_MS,
minutesSinceDeployment,
};
}

async function rollbackIfNeeded(
functionName: string,
aliasName: string,
currentVersion: string,
errorRate: number,
threshold: number,
lockInPeriodMinutes: number = 60
) {
if (errorRate > threshold) {
console.log(`Error rate ${errorRate}% exceeds threshold ${threshold}%`);

// Check lock-in period
const { withinLockIn, minutesSinceDeployment } = await checkLockInPeriod(functionName);

if (!withinLockIn) {
console.log(
`⚠️ Lock-in period expired (${minutesSinceDeployment} minutes since deployment). ` +
`Automatic rollback disabled. Manual intervention required.`
);
return;
}

console.log(
`Within lock-in period (${minutesSinceDeployment}/${lockInPeriodMinutes} minutes). ` +
`Initiating automatic rollback...`
);

// Get last known good version
const lastKnownGoodParam = await ssm.send(
new GetParameterCommand({
Name: `/lambda/${functionName}/last-known-good-version`,
})
);

const lastKnownGoodVersion = lastKnownGoodParam.Parameter?.Value || '1';

// Rollback alias to last known good version
await lambda.send(
new UpdateAliasCommand({
FunctionName: functionName,
Name: aliasName,
FunctionVersion: lastKnownGoodVersion,
})
);

console.log(`✅ Rolled back to last known good version ${lastKnownGoodVersion}`);

// Update current version in SSM
await ssm.send(
new PutParameterCommand({
Name: `/lambda/${functionName}/current-version`,
Value: lastKnownGoodVersion,
Overwrite: true,
})
);
} else {
console.log(`✅ Error rate ${errorRate}% is below threshold ${threshold}%. Version is healthy.`);
}
}

Benefits

Individual Lambda Pattern

Independent deployment: Each function deploys separately
Granular rollback: Rollback individual functions without affecting others
Simple implementation: No complex orchestration required
Fast rollback: < 1 second alias update
Weighted routing: Gradual traffic shifting built into Lambda
Version history: All versions retained for rollback
Lock-in period: Automatic rollback only during initial deployment window
Canary-style safety: Gradual traffic increase with health monitoring

Limitations

Manual coordination: Must manage multiple functions separately
Inconsistent state: Functions can be at different versions
No atomic rollback: Can't rollback entire application as one unit
Complex monitoring: Need to track each function individually

When to Use This Pattern

Use Individual Lambda Blue-Green when:

  • Each Lambda is a separate microservice
  • Functions can deploy independently
  • You need fine-grained control per function
  • Services have different release schedules
  • Rollback requirements are per-function

Use Release-Level Blue-Green when:

  • Multiple Lambdas form a single application
  • Functions must deploy together atomically
  • Need consistent version across all services
  • One failure should rollback entire release
  • See README.md for release-level pattern

Version Management

CDK Version Management

// CDK automatically creates new versions on code changes
const myFunction = new NodejsFunction(this, 'Function', {
// ... config
});

// currentVersion creates a new version on each deployment
const version = myFunction.currentVersion;

// Alias always uses currentVersion
const alias = new Alias(this, 'Alias', {
aliasName: 'production',
version: myFunction.currentVersion, // Points to latest
});

SSM Parameter Tracking

import { StringParameter } from 'aws-cdk-lib/aws-ssm';

// Store current version
new StringParameter(this, 'CurrentVersion', {
parameterName: `/lambda/${props.functionName}/current-version`,
stringValue: myFunction.currentVersion.version,
});

// Store previous version for rollback
new StringParameter(this, 'PreviousVersion', {
parameterName: `/lambda/${props.functionName}/previous-version`,
stringValue: previousVersion || '1',
});

// Store last known good version
new StringParameter(this, 'LastKnownGoodVersion', {
parameterName: `/lambda/${props.functionName}/last-known-good-version`,
stringValue: previousVersion || '1',
});

// Store deployment timestamp for lock-in period
new StringParameter(this, 'DeploymentTimestamp', {
parameterName: `/lambda/${props.functionName}/deployment-timestamp`,
stringValue: Date.now().toString(),
description: 'Timestamp when current version was deployed',
});

Version Cleanup

# List all versions
aws lambda list-versions-by-function \
--function-name MyFunction

# Delete old versions (keep last 10)
aws lambda delete-function \
--function-name MyFunction \
--qualifier 5

Testing Blue-Green Deployment

1. Deploy Initial Version

cdk deploy

# Test endpoint
curl https://YOUR-API-ID.execute-api.YOUR-REGION.amazonaws.com/prod/myendpoint

# Response includes version
{
"message": "Success",
"version": "1",
"requestId": "abc123..."
}

2. Make Code Changes

// Update your Lambda handler
export async function handler() {
return {
statusCode: 200,
body: JSON.stringify({
message: 'Success from v2!', // Changed
version: context.functionVersion,
}),
};
}

3. Deploy with Weighted Routing

# Deploy new version
cdk deploy

# Set 10% traffic to v2
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 2 \
--routing-config "AdditionalVersionWeights={\"1\"=0.9}"

# Test multiple times - should see mix of v1 and v2
for i in {1..20}; do
curl https://YOUR-API-ID.execute-api.YOUR-REGION.amazonaws.com/prod/myendpoint
done

4. Monitor Error Rate

# Check CloudWatch metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Errors \
--dimensions Name=FunctionName,Value=MyFunction Name=ExecutedVersion,Value=2 \
--start-time 2026-01-25T10:00:00Z \
--end-time 2026-01-25T11:00:00Z \
--period 300 \
--statistics Sum

# Check EventBridge for failure events
aws logs tail /aws/events/my-service-events --follow

5. Promote or Rollback

# If healthy: Promote to 100%
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 2

# If unhealthy: Rollback to v1
aws lambda update-alias \
--function-name MyFunction \
--name production \
--function-version 1

Cost Analysis

Individual Lambda Blue-Green: ~$0/month (within free tier)

  • Lambda versioning: No cost (versions are metadata)
  • Lambda aliases: No cost (aliases are pointers)
  • Lambda invocations: Same cost as before (pay per invoke)
  • EventBridge events: 1M events/month free, then $1/million
  • CloudWatch Logs: $0.50/GB ingested, $0.03/GB stored

Total: ~$0-1/month depending on traffic

Best Practices

  1. Always use aliases for API Gateway integration

    • Never integrate directly with $LATEST or specific versions
    • Aliases provide stable ARN for integrations
  2. Use Lambda Destinations for monitoring

    • Real-time event-based monitoring
    • More efficient than polling CloudWatch Logs
    • Integrates with EventBridge rules
  3. Store version history in SSM

    • Track current and previous versions
    • Enables automated rollback
    • Audit trail of deployments
  4. Gradual rollout for high-risk changes

    • Start at 10% for major changes
    • Monitor for 15-30 minutes per increment
    • Rollback immediately on errors
  5. Automate monitoring and rollback

    • Don't rely on manual monitoring
    • Use EventBridge + monitoring Lambda
    • Define clear error rate thresholds
  6. Keep recent versions

    • Retain last 5-10 versions for rollback
    • Delete old versions to reduce clutter
    • Use version description for release notes
  7. Configure appropriate lock-in period

    • Default 60 minutes for most applications
    • Shorter (15-30 min) for frequently deployed services
    • Longer (2-4 hours) for critical, infrequently deployed services
    • Based on historical error patterns
  8. Test rollback procedures

    • Regularly practice rollback
    • Verify rollback scripts work
    • Document rollback runbook
    • Test both automatic and manual rollback

Comparison: Individual vs Release-Level

FeatureIndividual LambdaRelease-Level
Deployment unitSingle functionAll functions together
Rollback scopePer functionEntire release
CoordinationManualAutomatic
ConsistencyFunctions at different versionsAll at same version
ComplexitySimpleComplex orchestration
Use caseMicroservicesMonolithic app
Traffic routingLambda weightsAPI Gateway stage variables
Best forIndependent servicesCoupled services

References