Blog
March 29, 2017 Marie H.

Scheduling Lambda Functions with CloudWatch Events

Scheduling Lambda Functions with CloudWatch Events

Photo by <a href="https://unsplash.com/@walls_io?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Walls.io</a> on <a href="https://unsplash.com/?utm_source=cloudista&utm_medium=referral" target="_blank" rel="noopener">Unsplash</a>

We had a handful of cron jobs scattered across EC2 instances doing periodic maintenance tasks — old log cleanup, nightly snapshots, that kind of thing. Every time an instance got replaced or AMI-rotated, someone had to remember to set up the crontab again. Someone usually forgot. Lambda + CloudWatch Events fixes this entirely: the "cron server" is just AWS, and it never needs to be patched or rebooted.

rate() vs cron()

CloudWatch Events supports two schedule expression formats. rate() is simpler:

rate(5 minutes)
rate(1 hour)
rate(7 days)

cron() is the full-power option. The syntax is slightly different from standard cron — it takes 6 fields and uses UTC:

cron(Minutes Hours Day-of-month Month Day-of-week Year)

Examples:

cron(0 8 * * ? *)        # Every day at 8:00 AM UTC
cron(0 2 * * ? *)        # Every day at 2:00 AM UTC
cron(30 6 ? * MON-FRI *) # Weekdays at 6:30 AM UTC
cron(0 12 1 * ? *)       # First day of every month at noon

Note the ? — you can't specify both day-of-month and day-of-week, so one of them has to be ?. AWS will yell at you if you try.

The Lambda function

Let's build something practical: a daily cleanup that deletes S3 objects older than 30 days from a logs bucket. The Lambda receives a scheduled event that looks like this:

{
  "version": "0",
  "id": "53dc4d37-cffa-4f76-80c9-8b7d4a4d2eaa",
  "detail-type": "Scheduled Event",
  "source": "aws.events",
  "account": "123456789012",
  "time": "2017-03-29T06:00:00Z",
  "region": "us-east-1",
  "resources": [
    "arn:aws:events:us-east-1:123456789012:rule/daily-s3-cleanup"
  ],
  "detail": {}
}

The detail is empty for scheduled events. You get the timestamp in time if you need it. Here's the handler:

import boto3
import logging
from datetime import datetime, timezone, timedelta

logger = logging.getLogger()
logger.setLevel(logging.INFO)

s3 = boto3.client('s3')

BUCKET = 'my-app-logs'
PREFIX = 'app-logs/'
RETENTION_DAYS = 30

def handler(event, context):
    cutoff = datetime.now(timezone.utc) - timedelta(days=RETENTION_DAYS)
    logger.info(f"Deleting objects older than {cutoff.isoformat()}")

    paginator = s3.get_paginator('list_objects_v2')
    pages = paginator.paginate(Bucket=BUCKET, Prefix=PREFIX)

    deleted = 0
    to_delete = []

    for page in pages:
        for obj in page.get('Contents', []):
            if obj['LastModified'] < cutoff:
                to_delete.append({'Key': obj['Key']})

            # Delete in batches of 1000 (S3 API limit)
            if len(to_delete) >= 1000:
                s3.delete_objects(Bucket=BUCKET, Delete={'Objects': to_delete})
                deleted += len(to_delete)
                to_delete = []

    if to_delete:
        s3.delete_objects(Bucket=BUCKET, Delete={'Objects': to_delete})
        deleted += len(to_delete)

    logger.info(f"Deleted {deleted} objects from s3://{BUCKET}/{PREFIX}")
    return {'deleted': deleted}

The Lambda execution role needs s3:ListBucket and s3:DeleteObject permissions on the target bucket.

Wiring it up in the console

  1. Create the Lambda function (Python 3.6, paste the code above, set environment variables for bucket/prefix if you want to make it configurable).
  2. Go to CloudWatch → Rules → Create Rule.
  3. Under Event Source, choose Schedule.
  4. Enter your expression: cron(0 2 * * ? *) for 2 AM UTC daily.
  5. Under Targets, click Add Target, select Lambda Function, pick your function.
  6. Name the rule something descriptive like daily-s3-log-cleanup, hit Create.

CloudWatch Events will automatically add the permission for the rule to invoke your Lambda.

Testing manually

Don't wait until 2 AM to find out your function is broken. In the Lambda console, configure a test event. The body doesn't matter much for scheduled events — just use this:

{
  "source": "aws.events",
  "detail-type": "Scheduled Event",
  "detail": {}
}

Hit Test and check the execution result and CloudWatch Logs. Fix any IAM permission errors now rather than at 2 AM.

You can also invoke it directly from the CLI:

$ aws lambda invoke \
  --function-name daily-s3-log-cleanup \
  --payload '{"source": "aws.events", "detail": {}}' \
  response.json && cat response.json
{"deleted": 142}

One more thing: timeouts

The default Lambda timeout is 3 seconds, which will not be enough for anything iterating over a large S3 bucket. Set a reasonable timeout for your use case — I use 5 minutes for cleanup tasks. Maximum is 5 minutes in the current Lambda runtime (they've said this will increase eventually).

That's it. No servers, no crontab, no "wait which instance was that running on?" The rule shows up in CloudWatch, you can see invocation history in Lambda metrics, and the whole thing is essentially free unless you're running it thousands of times a day.