Deploy Always-On AI Agents on AWS for ~$17/mo

An always-on AI agent that monitors your systems, responds to webhooks, and runs scheduled tasks doesn't need to cost $100+/month. My hermes-agent-aws setup runs a persistent AI agent on AWS for ~$17/month using free OpenRouter models, a t3.micro EC2 instance, and Terraform. Here's the exact setup.

What "always-on" means

A standard AI agent is reactive — you prompt it, it responds, session ends. An always-on agent:

Runs as a persistent process (no cold starts)
Listens to webhooks (GitHub, Slack, CloudWatch alarms)
Runs scheduled tasks (daily reports, monitoring checks, batch jobs)
Maintains state between invocations (memory, task queue)

Think of it as a background worker that happens to be LLM-powered.

The architecture

EC2 t3.micro ($8.50/mo)
└── hermes-agent (Go binary)
    ├── HTTP server (webhook receiver)
    ├── Scheduler (cron-like task runner)
    ├── Agent loop (LLM-powered decision making)
    └── Tool registry
        ├── bash_exec (run shell commands)
        ├── http_fetch (call external APIs)
        ├── db_query (PostgreSQL read)
        ├── slack_notify (send Slack messages)
        └── github_pr (create/review PRs)

OpenRouter API (free tier) → Llama 3 70B / Mistral 7B
S3 bucket (agent memory + task logs)
CloudWatch (monitoring + alarms as triggers)

Total monthly cost: EC2 t3.micro ($8.50) + S3 ($0.50) + data transfer ($1) + Route53 ($0.50) + CloudWatch ($0.50) + occasional paid LLM calls ($6) ≈ $17/month.

Free LLM models via OpenRouter

OpenRouter provides access to free models (rate-limited but free). The agent routes tasks by complexity:

type ModelRouter struct {
    client *openrouter.Client
}

func (r *ModelRouter) Route(task Task) string {
    switch {
    case task.RequiresReasoning:
        return "meta-llama/llama-3.1-70b-instruct:free"
    case task.IsSimple:
        return "mistralai/mistral-7b-instruct:free"
    default:
        return "google/gemma-2-9b-it:free"
    }
}

For tasks that genuinely need Claude (code review, architecture analysis), I use Claude API — those are the ~$6/month in paid calls. Everything else runs free.

Terraform: deploy in one command

# main.tf
module "hermes_agent" {
  source = "github.com/shihabshahrier/hermes-agent-aws"

  instance_type    = "t3.micro"
  agent_version    = "v1.2.0"
  openrouter_key   = var.openrouter_key
  slack_webhook    = var.slack_webhook
  github_token     = var.github_token

  # What the agent monitors
  github_repos = ["shihabshahrier/letx", "shihabshahrier/quantumsketch"]
  cron_tasks = [
    { schedule = "0 9 * * *", task = "daily_standup_summary" },
    { schedule = "0 * * * *", task = "check_api_health" },
  ]
}

terraform init && terraform apply
# ~3 minutes to fully deployed

Agent loop implementation

The core agent loop is a simple decision-making cycle:

func (a *Agent) Loop() {
    for {
        select {
        case event := <-a.events:
            a.handleEvent(event)
        case task := <-a.scheduler.Ready():
            a.executeTask(task)
        case <-a.ctx.Done():
            return
        }
    }
}

func (a *Agent) handleEvent(event Event) {
    // Build context from event + memory
    context := a.memory.Recall(event.Type)
    
    // LLM decides what to do
    plan := a.llm.Plan(context, event, a.tools.Available())
    
    // Execute the plan
    for _, step := range plan.Steps {
        result := a.tools.Execute(step)
        a.memory.Store(step, result)
    }
}

The LLM gets: event details, relevant memory, and available tools. It returns a plan (list of tool calls). The agent executes the plan and stores results in memory for future context.

Practical examples

GitHub webhook → auto-review PR:

Event: PR opened on letx repo
Memory: "this repo uses Chi router, pgx, JWT auth"
Plan: [fetch_pr_diff, analyze_security, analyze_style, post_review_comment]

Scheduled daily health check:

Task: check_api_health (runs every hour)
Tools: http_fetch([letx-api/health, quantumsketch-api/health])
Action: if any fail → slack_notify("letx-api down: status 503")

CloudWatch alarm → incident response:

Event: CPU > 80% on letx-collab service
Plan: [check_ecs_logs, identify_pattern, suggest_remediation, notify_slack]

What doesn't work at this scale

Long-running tasks (> 5 minutes): EC2 t3.micro has limited CPU. Offload heavy computation to Lambda or ECS Fargate.
High-volume webhooks (> 100/minute): Single instance becomes a bottleneck. Use SQS as a buffer.
Real-time requirements: Cold LLM inference takes 1–3 seconds. Don't use this for latency-sensitive responses.

FAQ

What is an always-on AI agent? An always-on AI agent runs as a persistent process, listens to events (webhooks, schedules, alarms), and takes actions using LLM-powered decision making — without needing human prompts to trigger each task.

Why use OpenRouter free models? OpenRouter provides access to capable open-source LLMs (Llama 3 70B, Mistral 7B) for free (rate-limited). For most background agent tasks, these models are sufficient and cost nothing.

What AWS instance type do you use? EC2 t3.micro — 2 vCPU (burstable), 1GB RAM, ~$8.50/month. Sufficient for a single always-on agent handling 10–50 events/day.

Is hermes-agent-aws open source? Yes — the Terraform module and Go agent binary are open source at shihabshahrier/hermes-agent-aws.

Can I run multiple agents for $17/month? One agent per t3.micro. For multiple agents, use t3.small (~$17/month) and run multiple agent processes on the same instance, or use EC2 Auto Scaling with spot instances.

Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: My AI Agent Skills Stack · Terraform on AWS: Infrastructure as Code Guide.