automa/docs/ARCHITECTURE.md
m1ngsama 49a2621f2f docs: add comprehensive documentation and architecture guides
- Add QUICKSTART.md for 5-minute setup guide
- Add CHEATSHEET.md for quick command reference
- Add OPTIMIZATION_SUMMARY.md with complete architecture overview
- Add detailed architecture documentation in docs/
  - ARCHITECTURE.md: System design and component details
  - IMPLEMENTATION.md: Step-by-step implementation guide
  - architecture-recommendations.md: Component selection rationale
- Add .env.example template for configuration

Following KISS principles and Unix philosophy for self-hosted IaC platform.
2026-01-19 16:31:24 +08:00

12 KiB

Automa Architecture

Self-hosted services platform following Unix philosophy: simple, modular, composable.

Design Principles

  1. KISS - Keep It Simple, Stupid
  2. Single Responsibility - Each service does one thing well
  3. Replaceable - Any component can be swapped
  4. Composable - Services work together via standard interfaces
  5. Observable - Everything is monitored and logged
  6. Recoverable - Regular backups, tested restore procedures

System Overview

┌─────────────────────────────────────────────────────┐
│                    Internet                          │
└───────────────────┬──────────────────────────────────┘
                    │
         ┌──────────▼──────────┐
         │  Firewall (UFW)     │
         │  Fail2ban           │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │  Caddy (80/443)     │
         │  - Auto HTTPS       │
         │  - Reverse Proxy    │
         └──────────┬──────────┘
                    │
      ┌─────────────┼─────────────┐
      │             │             │
┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
│ Nextcloud │ │ Grafana │ │ Minecraft │
│ + MariaDB │ │         │ │ (host net)│
│ + Redis   │ │         │ │           │
└───────────┘ └─────────┘ └───────────┘
      │             │             │
      │       ┌─────▼─────┐       │
      │       │Prometheus │       │
      │       │Loki       │       │
      │       │Promtail   │       │
      │       │cAdvisor   │       │
      │       └───────────┘       │
      │                           │
      └─────────┬─────────────────┘
                │
         ┌──────▼──────┐
         │ Watchtower  │
         │ Duplicati   │
         └─────────────┘
                │
         ┌──────▼──────┐
         │   Backups   │
         │  (Local +   │
         │   Remote)   │
         └─────────────┘

Component Stack

Layer 1: Edge (Internet-facing)

Component Purpose Ports Why
UFW Firewall All Simple, built-in Linux
Fail2ban Intrusion prevention - Auto-ban attackers
Caddy Reverse proxy + SSL 80, 443 Auto HTTPS, simple config

Layer 2: Applications

Service Purpose Ports Stack
Nextcloud Private cloud 80→Caddy PHP + MariaDB + Redis
Minecraft Game server 25565 Fabric 1.21.1
TeamSpeak Voice chat 9987 TeamSpeak 3

Layer 3: Observability

Component Purpose Storage Why
Prometheus Metrics DB 10GB/30d Industry standard
Grafana Dashboards 500MB Best visualization
Loki Log aggregation 5GB/30d Lightweight ELK alternative
Promtail Log collector - Pairs with Loki
cAdvisor Container metrics - Docker native

Layer 4: Automation

Component Purpose Why
Watchtower Auto-update images Label-based, simple
Duplicati Remote backups Web UI, encrypted
bin/backup.sh Local backups Custom, flexible

Network Architecture

Networks

automa-proxy (172.20.0.0/16)
  ├─ caddy
  ├─ nextcloud
  └─ grafana

automa-monitoring (172.21.0.0/16, internal)
  ├─ prometheus
  ├─ loki
  ├─ promtail
  └─ cadvisor

nextcloud (172.22.0.0/16)
  ├─ nextcloud
  ├─ nextcloud-db
  └─ nextcloud-redis

teamspeak (172.23.0.0/16)
  └─ teamspeak

(host network)
  └─ minecraft  # Needs direct port access for UDP

Port Mapping

External (public):

  • 80 → Caddy (HTTP → HTTPS redirect)
  • 443 → Caddy (HTTPS)
  • 25565 → Minecraft
  • 9987/udp → TeamSpeak voice
  • 30033 → TeamSpeak file transfer

Internal (localhost only):

  • 3000 → Grafana (proxied via Caddy)
  • 8080 → Nextcloud (proxied via Caddy)
  • 8200 → Duplicati
  • 9090 → Prometheus

Data Flow

Request Flow

User → Internet → Firewall → Caddy → Application
                                  ↓
                             Prometheus ← Metrics
                                  ↓
                               Grafana ← Query

Log Flow

Container → stdout/stderr → Docker logs → Promtail → Loki → Grafana

Backup Flow

Service data → bin/backup.sh → local backup → Duplicati → remote storage

Storage Strategy

Volume Types

Named volumes (managed by Docker):

  • Database data (MariaDB)
  • Cache (Redis)
  • Monitoring data (Prometheus, Loki, Grafana)
  • Config (Caddy, Duplicati)

Bind mounts (host filesystem):

  • Minecraft world/mods/configs (easy access)
  • Backup output directory
  • Log files

Backup Strategy

3-2-1 Rule:

  • 3 copies of data
  • 2 different media
  • 1 offsite

Implementation:

  1. Live data (volumes/bind mounts)
  2. Local backup (bin/backup.sh → ./backups/)
  3. Remote backup (Duplicati → S3/SFTP/etc)

Retention:

  • Local: 7 days
  • Remote: 30 days
  • Configs: forever

Update Strategy

Image Versioning

Pinning strategy:

# ✅ Good - pin major version, get patches
image: nextcloud:28-apache
image: mariadb:11.2-jammy
image: grafana/grafana:10-alpine

# ⚠️  Acceptable - semantic versioning not available
image: teamspeak:latest

# ❌ Bad - unpredictable
image: nextcloud:latest

Update Methods

Automatic (Watchtower):

  • Runs daily
  • Only updates labeled containers
  • Good for: Caddy, Grafana, Nextcloud app
  • Bad for: Databases, critical services

Manual:

docker compose pull
docker compose up -d
  • Good for: Databases, major version bumps
  • Requires: Testing, backup first

Security Model

Defense in Depth

Layer 1: Network

  • UFW firewall (deny all, allow specific)
  • Fail2ban (auto-ban attackers)

Layer 2: TLS

  • Caddy auto-HTTPS
  • Force HTTPS redirect
  • HSTS headers

Layer 3: Application

  • Strong passwords (16+ chars)
  • 2FA where available (Nextcloud)
  • Limited port exposure

Layer 4: Data

  • Encrypted backups (Duplicati)
  • Secrets in .env (not in Git)
  • Read-only mounts where possible

Secrets Management

Current:

.env (git-ignored)
  └─ environment variables
       └─ injected into containers

Future option:

  • Docker secrets (Swarm mode)
  • SOPS/Age encryption for .env

Resource Planning

Minimum Requirements

Resource Minimum Recommended
CPU 4 cores 6-8 cores
RAM 8 GB 16 GB
Disk 100 GB 500 GB SSD
Network 10 Mbps 100 Mbps

Resource Allocation

Heavy services (reserve resources):

  • Minecraft: 2-4 GB RAM
  • MariaDB: 500 MB RAM
  • Prometheus: 500 MB RAM

Light services (minimal):

  • Caddy: 50 MB RAM
  • Redis: 100 MB RAM
  • Watchtower: 30 MB RAM

Scaling Strategy

Vertical (single server):

  • Add RAM → increase Minecraft players
  • Add CPU → faster builds/queries
  • Add disk → longer retention

Horizontal (multiple servers):

  • Separate services by server
  • Example: Minecraft on server 1, Nextcloud on server 2
  • Use remote monitoring (Prometheus federation)

High Availability (Future)

Current state: Single server

  • No HA (single point of failure)
  • Acceptable for home lab

HA options:

  • Docker Swarm (orchestration)
  • Load balancer (HAProxy/Caddy)
  • Shared storage (NFS/GlusterFS)
  • Database replication (MariaDB master-slave)

Cost/benefit:

  • Adds significant complexity
  • Not recommended for <10 users

Disaster Recovery

Scenarios

1. Service crash

  • Auto-restart: restart: unless-stopped
  • Health checks: detect and restart

2. Data corruption

  • Restore from local backup (minutes)
  • Last resort: remote backup (hours)

3. Server failure

  • Restore to new server
  • Restore backups
  • Update DNS

Recovery Time Objective (RTO)

Scenario Target Method
Container restart <1 min Docker auto-restart
Service failure <5 min Manual restart
Data corruption <30 min Local backup restore
Server failure <4 hours New server + backup restore

Recovery Point Objective (RPO)

Service Data Loss Backup Frequency
Nextcloud <24 hours Daily
Minecraft <6 hours Every 6 hours
Configs <7 days Weekly

Monitoring & Alerting

Key Metrics

Infrastructure:

  • CPU usage (alert >80%)
  • Memory usage (alert >85%)
  • Disk space (alert >80%)
  • Network throughput

Services:

  • Container status (alert if down >5min)
  • Response time (alert >2s)
  • Error rate (alert >5%)

Business:

  • Minecraft: player count, TPS
  • Nextcloud: active users, storage
  • Backup: last success timestamp

Alert Channels

Current: Grafana alerts

  • Email
  • Webhook

Future options:

  • Telegram bot
  • Discord webhook
  • PagerDuty

Technology Choices

Why These Tools?

Component Alternatives Why Chosen
Caddy Nginx, Traefik Auto HTTPS, simplest config
Prometheus InfluxDB, VictoriaMetrics Industry standard, huge ecosystem
Grafana Kibana, Chronograf Best dashboards, most plugins
Loki ELK, Graylog 10x lighter than ELK
Watchtower Manual, Renovate Set and forget, label-based
Duplicati Restic, Borg Web UI, widest storage support
MariaDB PostgreSQL, MySQL Drop-in MySQL replacement, faster
Redis Memcached, KeyDB Persistence, richer data types

What We Avoided

Tool Why Not
Kubernetes Overkill for <10 services, steep learning curve
Traefik Over-engineered for simple reverse proxy
ELK Stack Too heavy (Elasticsearch needs 2-4GB RAM)
Zabbix Old-school, complex setup
Ansible Not needed for single-server Docker Compose

Future Enhancements

Phase 1 (Done)

  • Reverse proxy (Caddy)
  • Monitoring (Prometheus + Grafana)
  • Logging (Loki)
  • Auto-update (Watchtower)
  • Remote backup (Duplicati)
  • Security (Fail2ban)

Phase 2 (Optional)

  • Alertmanager (notifications)
  • Uptime Kuma (status page)
  • Gitea (self-hosted Git)
  • Vaultwarden (password manager)
  • Homer (dashboard)

Phase 3 (Advanced)

  • Docker Swarm (HA)
  • CI/CD (Drone)
  • Secret management (Vault)
  • Service mesh (if needed)

Development Workflow

Local Testing

# Test config syntax
docker compose -f compose.yml config

# Start in foreground
docker compose up

# Check logs
docker compose logs -f

Deployment

# Update code
git pull

# Restart services
make down
make up

# Verify
make status
make health

Rollback

# Git rollback
git log
git checkout <previous-commit>

# Or: Restore from backup

Documentation

  • README.md - Project overview
  • QUICKSTART.md - 5-minute setup
  • docs/ARCHITECTURE.md - This file
  • docs/IMPLEMENTATION.md - Step-by-step guide
  • infrastructure/README.md - Infrastructure details
  • docs/architecture-recommendations.md - Detailed component analysis

References