Deploy Always-On AI Agents on AWS for ~$17/mo
You can run a private always-on AI agent on AWS for ~$17/month using free OpenRouter models and Terraform. Here's the exact setup I use.
An always-on AI agent that monitors your systems, responds to webhooks, and runs scheduled tasks doesn't need to cost $100+/month. My hermes-agent-aws setup runs a persistent AI agent on AWS for ~$17/month using free OpenRouter models, a t3.micro EC2 instance, and Terraform. Here's the exact setup.
What "always-on" means
A standard AI agent is reactive — you prompt it, it responds, session ends. An always-on agent:
- Runs as a persistent process (no cold starts)
- Listens to webhooks (GitHub, Slack, CloudWatch alarms)
- Runs scheduled tasks (daily reports, monitoring checks, batch jobs)
- Maintains state between invocations (memory, task queue)
Think of it as a background worker that happens to be LLM-powered.
The architecture
EC2 t3.micro ($8.50/mo)
└── hermes-agent (Go binary)
├── HTTP server (webhook receiver)
├── Scheduler (cron-like task runner)
├── Agent loop (LLM-powered decision making)
└── Tool registry
├── bash_exec (run shell commands)
├── http_fetch (call external APIs)
├── db_query (PostgreSQL read)
├── slack_notify (send Slack messages)
└── github_pr (create/review PRs)
OpenRouter API (free tier) → Llama 3 70B / Mistral 7B
S3 bucket (agent memory + task logs)
CloudWatch (monitoring + alarms as triggers)
Total monthly cost: EC2 t3.micro ($8.50) + S3 ($0.50) + data transfer ($1) + Route53 ($0.50) + CloudWatch ($0.50) + occasional paid LLM calls ($6) ≈ $17/month.
Free LLM models via OpenRouter
OpenRouter provides access to free models (rate-limited but free). The agent routes tasks by complexity:
type ModelRouter struct {
client *openrouter.Client
}
func (r *ModelRouter) Route(task Task) string {
switch {
case task.RequiresReasoning:
return "meta-llama/llama-3.1-70b-instruct:free"
case task.IsSimple:
return "mistralai/mistral-7b-instruct:free"
default:
return "google/gemma-2-9b-it:free"
}
}
For tasks that genuinely need Claude (code review, architecture analysis), I use Claude API — those are the ~$6/month in paid calls. Everything else runs free.
Terraform: deploy in one command
# main.tf
module "hermes_agent" {
source = "github.com/shihabshahrier/hermes-agent-aws"
instance_type = "t3.micro"
agent_version = "v1.2.0"
openrouter_key = var.openrouter_key
slack_webhook = var.slack_webhook
github_token = var.github_token
# What the agent monitors
github_repos = ["shihabshahrier/letx", "shihabshahrier/quantumsketch"]
cron_tasks = [
{ schedule = "0 9 * * *", task = "daily_standup_summary" },
{ schedule = "0 * * * *", task = "check_api_health" },
]
}
terraform init && terraform apply
# ~3 minutes to fully deployed
Agent loop implementation
The core agent loop is a simple decision-making cycle:
func (a *Agent) Loop() {
for {
select {
case event := <-a.events:
a.handleEvent(event)
case task := <-a.scheduler.Ready():
a.executeTask(task)
case <-a.ctx.Done():
return
}
}
}
func (a *Agent) handleEvent(event Event) {
// Build context from event + memory
context := a.memory.Recall(event.Type)
// LLM decides what to do
plan := a.llm.Plan(context, event, a.tools.Available())
// Execute the plan
for _, step := range plan.Steps {
result := a.tools.Execute(step)
a.memory.Store(step, result)
}
}
The LLM gets: event details, relevant memory, and available tools. It returns a plan (list of tool calls). The agent executes the plan and stores results in memory for future context.
Practical examples
GitHub webhook → auto-review PR:
Event: PR opened on letx repo
Memory: "this repo uses Chi router, pgx, JWT auth"
Plan: [fetch_pr_diff, analyze_security, analyze_style, post_review_comment]
Scheduled daily health check:
Task: check_api_health (runs every hour)
Tools: http_fetch([letx-api/health, quantumsketch-api/health])
Action: if any fail → slack_notify("letx-api down: status 503")
CloudWatch alarm → incident response:
Event: CPU > 80% on letx-collab service
Plan: [check_ecs_logs, identify_pattern, suggest_remediation, notify_slack]
What doesn't work at this scale
- Long-running tasks (> 5 minutes): EC2 t3.micro has limited CPU. Offload heavy computation to Lambda or ECS Fargate.
- High-volume webhooks (> 100/minute): Single instance becomes a bottleneck. Use SQS as a buffer.
- Real-time requirements: Cold LLM inference takes 1–3 seconds. Don't use this for latency-sensitive responses.
FAQ
What is an always-on AI agent? An always-on AI agent runs as a persistent process, listens to events (webhooks, schedules, alarms), and takes actions using LLM-powered decision making — without needing human prompts to trigger each task.
Why use OpenRouter free models? OpenRouter provides access to capable open-source LLMs (Llama 3 70B, Mistral 7B) for free (rate-limited). For most background agent tasks, these models are sufficient and cost nothing.
What AWS instance type do you use? EC2 t3.micro — 2 vCPU (burstable), 1GB RAM, ~$8.50/month. Sufficient for a single always-on agent handling 10–50 events/day.
Is hermes-agent-aws open source? Yes — the Terraform module and Go agent binary are open source at shihabshahrier/hermes-agent-aws.
Can I run multiple agents for $17/month? One agent per t3.micro. For multiple agents, use t3.small (~$17/month) and run multiple agent processes on the same instance, or use EC2 Auto Scaling with spot instances.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. See also: My AI Agent Skills Stack · Terraform on AWS: Infrastructure as Code Guide.