Getting Started with AWS Fargate

Fargate has been GA since January and I've been running real workloads on it for a few months now. The pitch — serverless containers, no EC2 to manage — lands well in theory. In practice there are a handful of gotchas that aren't obvious until you're debugging at midnight with no shell access. Here's what I've learned.

What Fargate actually is

Fargate is a launch type for ECS. With the traditional EC2 launch type, ECS schedules containers onto EC2 instances you provision, patch, and pay for whether they're idle or not. With Fargate, the underlying hosts are AWS's problem. You define a task, it runs. You never see a server.

The mental model: instead of "clusters of EC2 instances that run containers," think "tasks as the atomic unit." A Fargate task has a CPU and memory allocation; AWS finds something to run it on. You don't care what that something is, and you can't touch it.

It's not Lambda. Lambda is function-level execution with per-invocation billing and a 15-minute timeout. Fargate is container-level execution — your full Docker image, long-running processes, no timeout. The comparison that matters for most decisions is Fargate vs. ECS on EC2, not Fargate vs. Lambda.

What you gain and what you give up

Compared to the EC2 launch type, Fargate removes:
- AMI management and patching
- Node capacity planning ("do I have enough instances to schedule this task?")
- Paying for idle EC2 headroom
- Shared-kernel risk between workloads on the same host

What it takes away:
- SSH access to any underlying host
- docker exec into running containers
- Bridge and host network modes — Fargate requires awsvpc
- Some flexibility on CPU/memory combos (valid pairings are fixed, not arbitrary)

That first two bullet points in the "takes away" list are the ones that bite people. More on that below.

Task definition

A Fargate task definition looks almost like an EC2 one with three hard requirements: requiresCompatibilities must include FARGATE, networkMode must be awsvpc, and CPU and memory must be declared at the task level.

{
  "family": "my-api",
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/my-api-task-role",
  "containerDefinitions": [
    {
      "name": "my-api",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/my-api:1.0.0",
      "portMappings": [
        { "containerPort": 8080, "protocol": "tcp" }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/my-api",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "environment": [
        { "name": "APP_ENV", "value": "production" }
      ]
    }
  ]
}

Two IAM roles. The execution role (ecsTaskExecutionRole) is used by ECS itself to pull the image from ECR and write logs to CloudWatch. The task role (my-api-task-role) is what your application code uses to call AWS APIs — S3, DynamoDB, SSM Parameter Store, whatever. Keep them separate. The execution role is nearly identical across all your services; the task role is specific to what each service needs to do.

$ aws ecs register-task-definition \
  --cli-input-json file://task-definition.json

VPC and networking config

Every Fargate task needs subnets and security groups. There's no bridge mode, no host mode — each task gets its own elastic network interface in your VPC.

$ aws ecs run-task \
  --cluster my-cluster \
  --task-definition my-api:1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-abc123,subnet-def456],
    securityGroups=[sg-xyz789],
    assignPublicIp=ENABLED
  }"

assignPublicIp=ENABLED is needed if your task is in a public subnet and needs internet access (e.g., to reach ECR for the image pull). If you're using private subnets with a NAT gateway, set it to DISABLED. If you're in private subnets without NAT and you want to pull from ECR without internet traffic, you need VPC endpoints for ECR and S3 — ECR uses S3 under the hood for image layers.

CloudWatch Logs setup

Set up awslogs logging in your task definition — it's not optional if you want to see what your container is doing. Create the log group first or the task will fail to start:

$ aws logs create-log-group \
  --log-group-name /ecs/my-api \
  --region us-east-1

The awslogs-stream-prefix value gets prepended to the container name in the stream name. With ecs as the prefix, your streams end up named ecs/my-api/<task-id>, which is easy to find in the console or via CLI.

The gotchas

No shell access, no exec. This is the biggest adjustment. With EC2 you can SSH into the host or docker exec into a running container. With Fargate you have CloudWatch Logs and nothing else. If your app crashes on startup and doesn't log the error before dying, you might get nothing. Log early and log everything. I now add a startup log line at the very beginning of my main function, before anything else runs, so I can at least confirm the process started.

CPU and memory pairings are fixed. You can't request 600 CPU units — valid values are 256, 512, 1024, 2048, and 4096. Each CPU value has a range of valid memory values. If your app needs something in between, you're rounding up. Not usually a problem, but it affects cost calculations.

VPC DNS and endpoints. Your Fargate tasks run in your VPC like any other compute. If you're in a private subnet and your task calls AWS services, you need either a NAT gateway or VPC endpoints for those services. Easy to forget because EC2 instance profiles "just work" differently.

Startup time for one-off tasks. For ECS services (long-running tasks with desired count > 0), startup is fine. For tasks you trigger on-demand and need to respond quickly, you're paying the image pull cost every time. Keep images small and consider ECR image caching behaviors.

When Fargate makes sense

Good fit:
- Stateless services where you want simplicity over compute efficiency
- Dev and staging environments you run intermittently
- Teams that don't want to think about node management or patching
- Workloads that need stronger isolation between tasks

Maybe not:
- High-throughput production workloads where EC2 Reserved Instances are significantly cheaper
- Anything requiring GPU
- Long-running stateful workloads with local storage needs
- Jobs where cold start latency (image pull + container init) is a problem

Pricing reality check

Fargate bills per vCPU-second and per GB-second of memory. A task running 0.25 vCPU and 512MB 24/7 costs roughly $10-12/month. A t2.nano running the equivalent workload continuously is around $4.50/month on-demand, less on Reserved.

For steady, always-on workloads Fargate costs more. The math shifts for intermittent workloads where you'd otherwise pay for idle EC2, and it shifts again when you account for engineer time not spent managing the node fleet. That's a real number even if it's hard to put in a spreadsheet.

My take: Fargate is worth the premium for simplicity-sensitive workloads, side projects, and anything running intermittently. For high-volume production services where compute efficiency matters, the EC2 launch type with right-sized Reserved Instances will be cheaper — but you're taking on the operational overhead of managing the cluster nodes.