Terraform on AWS: Infrastructure as Code Guide

Q: How do you handle Terraform state for multiple products?

Separate state keys in S3 per product/service: letx/terraform.tfstate, quantumsketch/terraform.tfstate. Same S3 bucket, same DynamoDB lock table, separate state files. This way, applying changes to LetX doesn't lock QuantumSketch's state.

Q: What's the difference between `cpu` in ECS task vs container definition?

The task-level cpu and memory allocate the Fargate vCPU and RAM for the entire task. Container-level values are for scheduling/monitoring only on Fargate (the task limits take precedence). Set them at the task level; the container values are informational.

Terraform is the infrastructure layer that makes solo engineering of multiple products tractable. Without it, I'd be clicking through the AWS console, forgetting what I configured, and unable to reproduce environments. Here's the Terraform setup I use across six products.

Why Terraform over AWS CDK / Pulumi / CloudFormation

| Tool | Language | State | My take | |------|----------|-------|---------| | Terraform | HCL | Remote (S3) | Declarative, provider-agnostic, largest ecosystem | | AWS CDK | TypeScript/Python | CloudFormation | Good for AWS-only, more code than config | | Pulumi | TypeScript/Python | Remote | Full programming language, more power than needed | | CloudFormation | YAML/JSON | AWS | Verbose, AWS-only, no good local development |

I use Terraform because HCL is readable, the ecosystem is massive, and it works across AWS + Cloudflare + GitHub from one tool.

Remote state with S3 + DynamoDB

# backend.tf
terraform {
  backend "s3" {
    bucket         = "shahriar-terraform-state"
    key            = "letx/terraform.tfstate"
    region         = "ap-south-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# One-time setup: create state bucket and lock table
aws s3 mb s3://shahriar-terraform-state --region ap-south-1
aws s3api put-bucket-versioning \
  --bucket shahriar-terraform-state \
  --versioning-configuration Status=Enabled

aws dynamodb create-table \
  --table-name terraform-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region ap-south-1

S3 backend: state is stored remotely (shareable, not on local disk). DynamoDB: prevents concurrent applies via lock table. Encryption: state may contain sensitive values.

Module structure

infrastructure/
├── modules/
│   ├── ecs-service/        # Reusable ECS Fargate service
│   ├── rds-postgres/       # PostgreSQL instance
│   ├── s3-bucket/          # S3 + CloudFront
│   └── vpc/                # VPC, subnets, security groups
├── environments/
│   ├── prod/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── terraform.tfvars
│   └── staging/
│       ├── main.tf
│       └── terraform.tfvars
└── shared/
    ├── ecr.tf              # ECR repositories
    └── iam.tf              # IAM roles

Modules encapsulate reusable infrastructure. Each environment (prod, staging) uses the same modules with different variable values.

The ECS service module

The most-used module — every service deploys via it:

# modules/ecs-service/main.tf
variable "name" { type = string }
variable "image" { type = string }
variable "cpu" { type = number; default = 256 }
variable "memory" { type = number; default = 512 }
variable "port" { type = number; default = 8080 }
variable "desired_count" { type = number; default = 2 }
variable "environment" { type = map(string); default = {} }
variable "health_check_path" { type = string; default = "/health" }

resource "aws_ecs_task_definition" "this" {
  family                   = var.name
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = var.cpu
  memory                   = var.memory
  execution_role_arn       = aws_iam_role.execution.arn
  task_role_arn            = aws_iam_role.task.arn

  container_definitions = jsonencode([{
    name  = var.name
    image = var.image
    portMappings = [{ containerPort = var.port, protocol = "tcp" }]
    environment = [for k, v in var.environment : { name = k, value = v }]
    logConfiguration = {
      logDriver = "awslogs"
      options = {
        "awslogs-group"  = "/ecs/${var.name}"
        "awslogs-region" = data.aws_region.current.name
        "awslogs-stream-prefix" = "ecs"
      }
    }
  }])
}

resource "aws_ecs_service" "this" {
  name            = var.name
  cluster         = data.aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.service.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.this.arn
    container_name   = var.name
    container_port   = var.port
  }

  deployment_minimum_healthy_percent = 100
  deployment_maximum_percent         = 200
}

Using this module:

# environments/prod/main.tf
module "letx_api" {
  source = "../../modules/ecs-service"
  
  name          = "letx-api"
  image         = "${module.ecr.letx_api_url}:${var.image_tag}"
  cpu           = 512
  memory        = 1024
  desired_count = 2
  
  environment = {
    DATABASE_URL = module.rds.connection_string
    REDIS_URL    = module.elasticache.connection_string
    JWT_SECRET   = var.jwt_secret
  }
}

Adding a new service to production: 10 lines of HCL.

Secrets management

Never put secrets in terraform.tfvars or variables.tf as plaintext. Use AWS Secrets Manager:

# Store secret
resource "aws_secretsmanager_secret" "jwt_secret" {
  name = "letx/prod/jwt_secret"
}

resource "aws_secretsmanager_secret_version" "jwt_secret" {
  secret_id     = aws_secretsmanager_secret.jwt_secret.id
  secret_string = var.jwt_secret  # from TF_VAR_jwt_secret env var
}

# Pass secret to ECS task via secrets (not environment variables)
container_definitions = jsonencode([{
  secrets = [{
    name      = "JWT_SECRET"
    valueFrom = aws_secretsmanager_secret.jwt_secret.arn
  }]
}])

ECS Fargate injects secrets from Secrets Manager at task start — they never appear in the task definition as plaintext.

CI/CD: apply on merge

# .github/workflows/terraform.yml
on:
  push:
    branches: [main]
    paths: ["infrastructure/**"]

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.9.0"

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-south-1

      - working-directory: infrastructure/environments/prod
        run: |
          terraform init
          terraform plan -out=tfplan
          terraform apply tfplan

terraform plan before apply — even in CI, the plan is output to logs for review. For production infrastructure changes, I prefer manual apply after reviewing the plan.

FAQ

Why use Terraform for a solo project? Because you will forget how you set up your infrastructure. Terraform is documentation that also enforces itself. terraform plan shows exactly what will change before anything touches production.

Should I use workspaces or separate directories for environments? Separate directories (prod/, staging/) over workspaces. Workspaces share state files and can be confusing — a terraform apply in the wrong workspace is dangerous. Separate directories make the separation explicit.

How do you handle Terraform state for multiple products? Separate state keys in S3 per product/service: letx/terraform.tfstate, quantumsketch/terraform.tfstate. Same S3 bucket, same DynamoDB lock table, separate state files. This way, applying changes to LetX doesn't lock QuantumSketch's state.

What's the difference between cpu in ECS task vs container definition? The task-level cpu and memory allocate the Fargate vCPU and RAM for the entire task. Container-level values are for scheduling/monitoring only on Fargate (the task limits take precedence). Set them at the task level; the container values are informational.

Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: Microservices as One Engineer · Deploy Always-On AI Agents on AWS for ~$17/mo.