_SH Log's
Back to Root
EST: 4 min read

Deploy Always-On AI Agents on AWS for ~$17/mo

You can run a private always-on AI agent on AWS for ~$17/month using free OpenRouter models and Terraform. Here's the exact setup I use.

#ai-agents#aws#terraform#cost

An always-on AI agent that monitors your systems, responds to webhooks, and runs scheduled tasks doesn't need to cost $100+/month. My hermes-agent-aws setup runs a persistent AI agent on AWS for ~$17/month using free OpenRouter models, a t3.micro EC2 instance, and Terraform. Here's the exact setup.

What "always-on" means

A standard AI agent is reactive — you prompt it, it responds, session ends. An always-on agent:

  • Runs as a persistent process (no cold starts)
  • Listens to webhooks (GitHub, Slack, CloudWatch alarms)
  • Runs scheduled tasks (daily reports, monitoring checks, batch jobs)
  • Maintains state between invocations (memory, task queue)

Think of it as a background worker that happens to be LLM-powered.

The architecture

EC2 t3.micro ($8.50/mo)
└── hermes-agent (Go binary)
    ├── HTTP server (webhook receiver)
    ├── Scheduler (cron-like task runner)
    ├── Agent loop (LLM-powered decision making)
    └── Tool registry
        ├── bash_exec (run shell commands)
        ├── http_fetch (call external APIs)
        ├── db_query (PostgreSQL read)
        ├── slack_notify (send Slack messages)
        └── github_pr (create/review PRs)

OpenRouter API (free tier) → Llama 3 70B / Mistral 7B
S3 bucket (agent memory + task logs)
CloudWatch (monitoring + alarms as triggers)

Total monthly cost: EC2 t3.micro ($8.50) + S3 ($0.50) + data transfer ($1) + Route53 ($0.50) + CloudWatch ($0.50) + occasional paid LLM calls ($6) ≈ $17/month.

Free LLM models via OpenRouter

OpenRouter provides access to free models (rate-limited but free). The agent routes tasks by complexity:

type ModelRouter struct {
    client *openrouter.Client
}

func (r *ModelRouter) Route(task Task) string {
    switch {
    case task.RequiresReasoning:
        return "meta-llama/llama-3.1-70b-instruct:free"
    case task.IsSimple:
        return "mistralai/mistral-7b-instruct:free"
    default:
        return "google/gemma-2-9b-it:free"
    }
}

For tasks that genuinely need Claude (code review, architecture analysis), I use Claude API — those are the ~$6/month in paid calls. Everything else runs free.

Terraform: deploy in one command

# main.tf
module "hermes_agent" {
  source = "github.com/shihabshahrier/hermes-agent-aws"

  instance_type    = "t3.micro"
  agent_version    = "v1.2.0"
  openrouter_key   = var.openrouter_key
  slack_webhook    = var.slack_webhook
  github_token     = var.github_token

  # What the agent monitors
  github_repos = ["shihabshahrier/letx", "shihabshahrier/quantumsketch"]
  cron_tasks = [
    { schedule = "0 9 * * *", task = "daily_standup_summary" },
    { schedule = "0 * * * *", task = "check_api_health" },
  ]
}
terraform init && terraform apply
# ~3 minutes to fully deployed

Agent loop implementation

The core agent loop is a simple decision-making cycle:

func (a *Agent) Loop() {
    for {
        select {
        case event := <-a.events:
            a.handleEvent(event)
        case task := <-a.scheduler.Ready():
            a.executeTask(task)
        case <-a.ctx.Done():
            return
        }
    }
}

func (a *Agent) handleEvent(event Event) {
    // Build context from event + memory
    context := a.memory.Recall(event.Type)
    
    // LLM decides what to do
    plan := a.llm.Plan(context, event, a.tools.Available())
    
    // Execute the plan
    for _, step := range plan.Steps {
        result := a.tools.Execute(step)
        a.memory.Store(step, result)
    }
}

The LLM gets: event details, relevant memory, and available tools. It returns a plan (list of tool calls). The agent executes the plan and stores results in memory for future context.

Practical examples

GitHub webhook → auto-review PR:

Event: PR opened on letx repo
Memory: "this repo uses Chi router, pgx, JWT auth"
Plan: [fetch_pr_diff, analyze_security, analyze_style, post_review_comment]

Scheduled daily health check:

Task: check_api_health (runs every hour)
Tools: http_fetch([letx-api/health, quantumsketch-api/health])
Action: if any fail → slack_notify("letx-api down: status 503")

CloudWatch alarm → incident response:

Event: CPU > 80% on letx-collab service
Plan: [check_ecs_logs, identify_pattern, suggest_remediation, notify_slack]

What doesn't work at this scale

  • Long-running tasks (> 5 minutes): EC2 t3.micro has limited CPU. Offload heavy computation to Lambda or ECS Fargate.
  • High-volume webhooks (> 100/minute): Single instance becomes a bottleneck. Use SQS as a buffer.
  • Real-time requirements: Cold LLM inference takes 1–3 seconds. Don't use this for latency-sensitive responses.

FAQ

What is an always-on AI agent? An always-on AI agent runs as a persistent process, listens to events (webhooks, schedules, alarms), and takes actions using LLM-powered decision making — without needing human prompts to trigger each task.

Why use OpenRouter free models? OpenRouter provides access to capable open-source LLMs (Llama 3 70B, Mistral 7B) for free (rate-limited). For most background agent tasks, these models are sufficient and cost nothing.

What AWS instance type do you use? EC2 t3.micro — 2 vCPU (burstable), 1GB RAM, ~$8.50/month. Sufficient for a single always-on agent handling 10–50 events/day.

Is hermes-agent-aws open source? Yes — the Terraform module and Go agent binary are open source at shihabshahrier/hermes-agent-aws.

Can I run multiple agents for $17/month? One agent per t3.micro. For multiple agents, use t3.small (~$17/month) and run multiple agent processes on the same instance, or use EC2 Auto Scaling with spot instances.


Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: My AI Agent Skills Stack · Terraform on AWS: Infrastructure as Code Guide.