AI Agent Daily: How I Run My Web Businesses on Autopilot

233 skills. 7 MCP connectors. 4 specialized profiles. 2 active cronjobs, ~10 historical. That’s what my AI agent runs every single day on my server. Not a chatbot. A real operational system that drives my web projects while I focus on strategy.

I named this agent Hermes. Here’s exactly how it works, with real numbers pulled from a live audit of the install, not estimates.

Why I run an AI agent on my own server

I run several web projects: content sites, lead-generation pipelines, affiliate projects, niche research. Each one needs technical monitoring, repetitive tasks, content production, and reporting.

Before, I was on every terminal, every API, every spreadsheet myself. Today Hermes handles 80% of operations. I keep control of strategy and high-impact decisions.

Concrete result: over 2,000 work sessions on record, memory that persists across conversations, and automations that run 24/7 without my intervention.

The server: a $10/month VPS

Hermes runs on a standard VPS (think OVH, Hetzner, Oracle Cloud free tier, or any provider you like). No GPU, no dedicated hardware. Real specs at the moment I’m writing:

Resource	Value
vCPU	6 cores
RAM	11 GB (8.3 GB used)
Disk	96 GB (69 GB used, 72%)
OS	Ubuntu 24.04.4 LTS, kernel 6.8.0-110
Virtualization	KVM

Nothing exotic. Any VPS in the $8-$15/month range works.

The architecture: polyglot, not “just Python”

Hermes is not a Python script with network calls. It’s a stack of services running side by side. Actual ps output:

hermes-gateway              <- main orchestrator (Node.js)
hindsight-api               <- vector memory server (Python)
redis                       <- caching layer
nginx                       <- reverse proxy
discord-scrapper            <- Discord data extraction (Node.js)
firecrawl                   <- web crawling service

Node.js for the agent harness, Python for memory and search, Docker to isolate external services, Nginx as reverse proxy. Polyglot by design, each piece is the right tool for its job.

The 7 MCP connectors

MCP (Model Context Protocol) is the standard that lets the agent connect to external tools. I have 7 configured:

Connector	Function
Atomic	Self-hosted RDF knowledge base (documents, procedures, specs)
Hindsight	Long-term vector memory for volatile, contextual facts
Linear	Project management, issue tracking, sprint planning
Discord	Primary channel, notifications, in-thread command execution
Chrome DevTools	Automated web browsing, visual audits via CDP
Context7	Real-time technical documentation lookup for any library
GlitchTip	Self-hosted application error monitoring and alerting

Each connector is an independent server. The agent can query them in parallel. While I’m chatting on Discord it can check a GlitchTip issue and update a Linear task, all in the same conversation turn.

Model routing: optimized for cost

Running a frontier model 24/7 gets expensive fast. The trick is routing the right model to the right task.

Main model: mimo-v2.5-pro (Xiaomi). That’s what handles every conversation turn. 1M token context window, solid reasoning, available via API.

Auxiliary models handle side tasks like image analysis, web page summarization, session titles, context compression, and skill curation. Before optimization, these all ran on the main model, wasting premium tokens on grunt work. Now:

Task	Model	Why
Vision (images, screenshots)	Gemini 3.5 Flash (OpenRouter)	Best multimodal cheap model
All other auxiliary tasks (12)	DeepSeek (deepseek-chat)	Excellent reasoning at $0.14/M input, $0.28/M output

The 12 auxiliary tasks routed to DeepSeek: web extraction, context compression, skills hub, approval flow, MCP routing, title generation, triage, kanban decomposition, profile description, curator, session search, and memory flush.

Smart model routing adds another layer: messages under 160 characters or 28 words get routed to DeepSeek automatically instead of mimo-v2.5-pro. Simple questions get cheap answers. Complex work stays on the premium model.

Fallback chain: if the main provider goes down, Hermes falls back to Hermes 3 405B (Nous Research) on OpenRouter’s free tier. Last resort: a tiny local Granite 3B model that can still answer and reconfigure things even when all API providers fail.

Result: significant cost reduction without quality loss on the conversations that matter.

Profiles: specialized agents, one channel

The biggest optimization isn’t technical, it’s organizational. Instead of one agent doing everything, I run 4 specialized profiles that share a single Discord channel:

Profile	Tools	MCP servers	Memory	Purpose
default	14 tools	all 7 servers	Yes	Hub, receives all Discord messages
coding	7 tools	Context7, Chrome DevTools	No	Pure dev work, no distractions
business	10 tools	Atomic, Hindsight, Linear	Yes	Client work, ERP, project management
data	7 tools	Atomic, Hindsight, Discord scraper	Yes	Scraping, lead enrichment, data pipelines

Each profile has its own config.yaml, .env, SOUL.md, skills directory, and memory store. The coding profile loads only terminal, file, web, browser, delegation, vision, and session search tools. No memory overhead, no irrelevant MCP connections.

How it works from Discord:

Delegation (synchronous, under 5 min): I ask something code-related, the default agent delegates to a subagent with only the coding toolset. Result comes back immediately.
Kanban orchestration (asynchronous, long-running): complex tasks get decomposed and routed to the right worker profile automatically, based on each profile’s description. The kanban dispatcher runs in the gateway, checking every 60 seconds for ready tasks.

No manual switching. No separate channels. One conversation, intelligent routing.

The 4-layer memory system

This is the most underrated part of an AI agent. Without memory, an LLM forgets everything after each message. I run 4 layers in parallel, each with a distinct role:

Layer	Type	What it stores
MEMORY.md	Text file (~4K chars)	Non-critical conventions, public URLs, technical rules
AWS Secrets Manager	Encrypted vault	Production credentials, API tokens, signing keys
Honcho	Remote API	Behavioral patterns, user preferences, communication style
Hindsight	Self-hosted vector DB (MCP)	Volatile facts, session context, 91%+ accuracy
Atomic	Self-hosted RDF store (MCP)	Stable documents, specs, procedures

Golden rule: each information type has its own layer. Non-critical conventions stay in MEMORY.md, injected every turn. Production credentials never leave AWS Secrets Manager: the agent fetches them on demand via IAM scoped to the instance. If MEMORY.md ever leaks into a trace or a dump, nothing critical leaves with it.

A memory router skill automatically decides where to store each new piece of information. No duplicates, no drift.

Compression is tuned for the 1M-token context window: threshold at 0.65 (compresses earlier), target ratio 0.3 (preserves 300K tokens). Prompt caching TTL set to 15 minutes for better cost efficiency on long sessions.

Tools and credentials: all behind AWS Secrets Manager

The agent has access to a full tooling ecosystem. Real numbers:

Category	Count	Examples
APIs and keys	27+ API keys	DataForSEO, Cloudflare, Backblaze B2, FAL, ElevenLabs
Model providers	6 credential pools	Xiaomi (MiMo), Nous Research, OpenRouter, DeepSeek, Google
Google services	Full OAuth	Search Console, Gmail, Calendar (17+ GSC properties)
Cloud storage	Backblaze B2	S3-compatible object storage at $0.006/GB
Infrastructure	Cloudflare	Workers, D1 edge database, Pages
TTS (voice)	2 providers	Google TTS, ElevenLabs
Monitoring	GlitchTip	Self-hosted error tracking

Total: 80+ credential items managed via AWS Secrets Manager. One vault, IAM scoped to the instance, automatic rotation on sensitive secrets. That’s what lets me sleep when a disk dies or a repo leaks.

The 233 skills

The agent does not improvise. It follows precise procedures stored in 233 skill files across 51 categories:

Domain	Example skills
SEO and content	Article writing, on-page optimization, niche ideation, GEO
DevOps	Deployment, monitoring, Docker, kanban-worker, Cloudflare Workers
Research	ML literature review, data extraction, competitive analysis
GitHub	Auth, code-review, PR-workflow, repo-management
Data	Scraping, enrichment, ETL pipelines, OSINT
Creative	Image generation, video editing, infographic creation

Each skill is a markdown file with precise rules: tone, structure, banned patterns, official sources, output schemas. The agent loads the right skill based on the request.

Concrete example: when I ask for a blog post, the agent loads the blog-writer skill which carries anti-AI rules (no em-dashes, banned vocabulary, sentence-length variation), research methodology (official sources only), and the output template (MDX frontmatter, collapsible FAQ, optimized meta description).

Cronjobs: running while I sleep

Two cronjobs are running right now, untouched:

Active cronjob	Frequency	What it does
Affiliate KPI Dashboard	Daily at 7:00 PM	Scrapes the affiliate network dashboard, posts KPIs (payout, clicks, conversion, EPC, rank) to Discord in 3 lines
DMCA Blocklist Updater	Every 6 hours	Reads DMCA notices from Gmail, updates the blocklist TypeScript file, commits and pushes to main

Beyond these, Hermes has built up a dozen historical cronjobs around niche research and business operations. Real list:

SEO automation loops on niche content sites (wellness, education, affiliate verticals)
Daily indexing and crawl monitoring via Google Search Console API
Full SEO cycles on affiliate content sites (keyword research, content generation, internal linking)
Domain monitoring via RDAP (drops, transfers, expirations, useful for niche domain hunting)
Application error scanning via GlitchTip API
Memory hygiene cleanup, twice daily (Hindsight + Atomic deduplication)
Version changelog digests to the team Discord channel

Each historical cronjob left its output in the cron archive. When a niche pivots or a project ends, I disable the cron but keep the artifacts. That archive is how I decide what’s worth restarting.

Session management

Session idle timeout: 7 days. This prevents context loss on projects that span multiple days while still cleaning up stale sessions. Checkpoint snapshots preserve file state across resets.

When a session does reset, the resume mechanism injects the last 10 exchanges into context so the agent picks up where it left off. Combined with persistent memories (Honcho, Hindsight, Atomic), the agent maintains continuity even across resets.

The platforms I actually use

Hermes can speak on 8 platforms (Discord, Matrix, Telegram, WhatsApp, Slack, Mattermost, Signal, SMS). In practice, I use two:

Platform	Real usage
Discord	Primary channel, per-project threads, cronjob delivery, ad-hoc command execution
Matrix	Backup E2EE when Discord is down or when I need end-to-end encryption

The other six are configured and functional but don’t fit my workflow. Multi-platform helps if a team grows or if you want a separate on-call channel. Today, Discord is enough.

Autonomous decisions: the `/goal` command

For a long time, Hermes did one turn, returned a response, and waited for me to re-prompt. Since /goal shipped (our take on the Ralph loop, directly inspired by Codex CLI’s goal mode), I can set an objective and the agent iterates on its own until the goal validates:

/goal Fix every ruff error in src/ and verify scripts/run_tests.sh passes

What happens under the hood:

Goal accepted: Goal set (20-turn budget): <your goal>
Turn 1: Hermes starts as if the goal were a normal message
Judge: after the turn, a small auxiliary model (DeepSeek) replies strict JSON {"done": bool, "reason": "..."}
Loop: if not done, Hermes auto-runs the next turn
Termination: Goal achieved or Goal paused - N/20 turns used

For high-impact actions (deploy, deletion, modification of critical data), /goal doesn’t short-circuit anything: the agent still asks for confirmation. The loop stays closed. I decide what gets trust, the agent executes.

The raw numbers

Full recap, no filter, pulled right now:

Metric	Value
MCP servers	7 (Atomic, Hindsight, Linear, Discord, Chrome DevTools, Context7, GlitchTip)
Loaded skills	233 across 51 categories
Skill files	844 total, 624 .md files
Active profiles	4 (default, coding, business, data)
Active cronjobs	2 (Affiliate KPI, DMCA Updater)
Historical cronjobs with artifacts	~10 (SEO niches, affiliate, monitoring)
Connected platforms	8 (Discord + Matrix used)
Managed credentials	80+ via AWS Secrets Manager
Recorded sessions	2,066
Primary model	mimo-v2.5-pro (Xiaomi), 1M context
Auxiliary models	DeepSeek (12 tasks), Gemini 3.5 Flash (vision)
Smart routing	Enabled, messages under 160 chars route to DeepSeek
Compression threshold	0.65 (1M context window)
Prompt caching TTL	15 min
Session idle timeout	7 days
Fallback model	Hermes 3 405B (Nous Research, OpenRouter free tier)
Max turns per session	90
RAM used	8.3 GB / 11 GB
Disk used	69 GB / 96 GB (72%)
LLM providers	6 credential pools

What this changes for a web entrepreneur

Before Hermes, I spent 3 to 4 hours daily on operational tasks: error checking, data updates, content writing, deploy tracking, KPI reporting.

Today, those tasks are either automated (cronjobs) or delegated to the agent (on-demand, or via /goal for long-running loops). My time goes to strategy, identifying new niches, and direction calls.

The agent doesn’t replace thinking. It frees up time to think.

Updated May 29, 2026. All numbers come from a live audit of the actual install, not estimates.

FAQ

What exactly is an AI agent?

An AI agent is software powered by a large language model (LLM) that runs tasks autonomously: reads files, calls APIs, executes scripts, browses the web, and keeps persistent memory across sessions. Unlike a chatbot, it operates on your infrastructure, not inside a vendor-hosted chat window.

How much does it cost to run an AI agent 24/7?

The server is a standard VPS at around $10/month. API costs depend on the model and usage volume. With smart routing, auxiliary tasks run on DeepSeek (cheap) while complex work stays on the main model. Typical daily usage stays within a few dozen dollars per month.

Can the AI agent make decisions on its own?

Yes, since the /goal command shipped. I set an objective and the agent iterates turn after turn, judged at each step by an auxiliary model, until the goal is achieved or the turn budget runs out. For high-impact actions (deploy, deletion, critical data modification) it still asks for confirmation.

How is this different from ChatGPT or Claude web?

ChatGPT and Claude are hosted chat interfaces. Hermes runs on my server, connected to my tools (Discord, Linear, GitHub, Google Search Console, Cloudflare), with persistent memory, scheduled tasks, and direct access to my infrastructure. The data stays mine.