mirror of https://github.com/m1ngsama/automa.git synced 2026-02-08 06:24:05 +00:00

m1ngsama 49a2621f2f docs: add comprehensive documentation and architecture guides

- Add QUICKSTART.md for 5-minute setup guide
- Add CHEATSHEET.md for quick command reference
- Add OPTIMIZATION_SUMMARY.md with complete architecture overview
- Add detailed architecture documentation in docs/
  - ARCHITECTURE.md: System design and component details
  - IMPLEMENTATION.md: Step-by-step implementation guide
  - architecture-recommendations.md: Component selection rationale
- Add .env.example template for configuration

Following KISS principles and Unix philosophy for self-hosted IaC platform.

2026-01-19 16:31:24 +08:00

12 KiB

Raw Blame History

Automa Architecture

Self-hosted services platform following Unix philosophy: simple, modular, composable.

Design Principles

KISS - Keep It Simple, Stupid
Single Responsibility - Each service does one thing well
Replaceable - Any component can be swapped
Composable - Services work together via standard interfaces
Observable - Everything is monitored and logged
Recoverable - Regular backups, tested restore procedures

System Overview

┌─────────────────────────────────────────────────────┐
│                    Internet                          │
└───────────────────┬──────────────────────────────────┘
                    │
         ┌──────────▼──────────┐
         │  Firewall (UFW)     │
         │  Fail2ban           │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │  Caddy (80/443)     │
         │  - Auto HTTPS       │
         │  - Reverse Proxy    │
         └──────────┬──────────┘
                    │
      ┌─────────────┼─────────────┐
      │             │             │
┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
│ Nextcloud │ │ Grafana │ │ Minecraft │
│ + MariaDB │ │         │ │ (host net)│
│ + Redis   │ │         │ │           │
└───────────┘ └─────────┘ └───────────┘
      │             │             │
      │       ┌─────▼─────┐       │
      │       │Prometheus │       │
      │       │Loki       │       │
      │       │Promtail   │       │
      │       │cAdvisor   │       │
      │       └───────────┘       │
      │                           │
      └─────────┬─────────────────┘
                │
         ┌──────▼──────┐
         │ Watchtower  │
         │ Duplicati   │
         └─────────────┘
                │
         ┌──────▼──────┐
         │   Backups   │
         │  (Local +   │
         │   Remote)   │
         └─────────────┘

Component Stack

Layer 1: Edge (Internet-facing)

Component	Purpose	Ports	Why
UFW	Firewall	All	Simple, built-in Linux
Fail2ban	Intrusion prevention	-	Auto-ban attackers
Caddy	Reverse proxy + SSL	80, 443	Auto HTTPS, simple config

Layer 2: Applications

Service	Purpose	Ports	Stack
Nextcloud	Private cloud	80→Caddy	PHP + MariaDB + Redis
Minecraft	Game server	25565	Fabric 1.21.1
TeamSpeak	Voice chat	9987	TeamSpeak 3

Layer 3: Observability

Component	Purpose	Storage	Why
Prometheus	Metrics DB	10GB/30d	Industry standard
Grafana	Dashboards	500MB	Best visualization
Loki	Log aggregation	5GB/30d	Lightweight ELK alternative
Promtail	Log collector	-	Pairs with Loki
cAdvisor	Container metrics	-	Docker native

Layer 4: Automation

Component	Purpose	Why
Watchtower	Auto-update images	Label-based, simple
Duplicati	Remote backups	Web UI, encrypted
bin/backup.sh	Local backups	Custom, flexible

Network Architecture

Networks

automa-proxy (172.20.0.0/16)
  ├─ caddy
  ├─ nextcloud
  └─ grafana

automa-monitoring (172.21.0.0/16, internal)
  ├─ prometheus
  ├─ loki
  ├─ promtail
  └─ cadvisor

nextcloud (172.22.0.0/16)
  ├─ nextcloud
  ├─ nextcloud-db
  └─ nextcloud-redis

teamspeak (172.23.0.0/16)
  └─ teamspeak

(host network)
  └─ minecraft  # Needs direct port access for UDP

Port Mapping

External (public):

80 → Caddy (HTTP → HTTPS redirect)
443 → Caddy (HTTPS)
25565 → Minecraft
9987/udp → TeamSpeak voice
30033 → TeamSpeak file transfer

Internal (localhost only):

3000 → Grafana (proxied via Caddy)
8080 → Nextcloud (proxied via Caddy)
8200 → Duplicati
9090 → Prometheus

Data Flow

Request Flow

User → Internet → Firewall → Caddy → Application
                                  ↓
                             Prometheus ← Metrics
                                  ↓
                               Grafana ← Query

Log Flow

Container → stdout/stderr → Docker logs → Promtail → Loki → Grafana

Backup Flow

Service data → bin/backup.sh → local backup → Duplicati → remote storage

Storage Strategy

Volume Types

Named volumes (managed by Docker):

Database data (MariaDB)
Cache (Redis)
Monitoring data (Prometheus, Loki, Grafana)
Config (Caddy, Duplicati)

Bind mounts (host filesystem):

Minecraft world/mods/configs (easy access)
Backup output directory
Log files

Backup Strategy

3-2-1 Rule:

3 copies of data
2 different media
1 offsite

Implementation:

Live data (volumes/bind mounts)
Local backup (bin/backup.sh → ./backups/)
Remote backup (Duplicati → S3/SFTP/etc)

Retention:

Local: 7 days
Remote: 30 days
Configs: forever

Update Strategy

Image Versioning

Pinning strategy:

# ✅ Good - pin major version, get patches
image: nextcloud:28-apache
image: mariadb:11.2-jammy
image: grafana/grafana:10-alpine

# ⚠️  Acceptable - semantic versioning not available
image: teamspeak:latest

# ❌ Bad - unpredictable
image: nextcloud:latest

Update Methods

Automatic (Watchtower):

Runs daily
Only updates labeled containers
Good for: Caddy, Grafana, Nextcloud app
Bad for: Databases, critical services

Manual:

docker compose pull
docker compose up -d

Good for: Databases, major version bumps
Requires: Testing, backup first

Security Model

Defense in Depth

Layer 1: Network

UFW firewall (deny all, allow specific)
Fail2ban (auto-ban attackers)

Layer 2: TLS

Caddy auto-HTTPS
Force HTTPS redirect
HSTS headers

Layer 3: Application

Strong passwords (16+ chars)
2FA where available (Nextcloud)
Limited port exposure

Layer 4: Data

Encrypted backups (Duplicati)
Secrets in .env (not in Git)
Read-only mounts where possible

Secrets Management

Current:

.env (git-ignored)
  └─ environment variables
       └─ injected into containers

Future option:

Docker secrets (Swarm mode)
SOPS/Age encryption for .env

Resource Planning

Minimum Requirements

Resource	Minimum	Recommended
CPU	4 cores	6-8 cores
RAM	8 GB	16 GB
Disk	100 GB	500 GB SSD
Network	10 Mbps	100 Mbps

Resource Allocation

Heavy services (reserve resources):

Minecraft: 2-4 GB RAM
MariaDB: 500 MB RAM
Prometheus: 500 MB RAM

Light services (minimal):

Caddy: 50 MB RAM
Redis: 100 MB RAM
Watchtower: 30 MB RAM

Scaling Strategy

Vertical (single server):

Add RAM → increase Minecraft players
Add CPU → faster builds/queries
Add disk → longer retention

Horizontal (multiple servers):

Separate services by server
Example: Minecraft on server 1, Nextcloud on server 2
Use remote monitoring (Prometheus federation)

High Availability (Future)

Current state: Single server

No HA (single point of failure)
Acceptable for home lab

HA options:

Docker Swarm (orchestration)
Load balancer (HAProxy/Caddy)
Shared storage (NFS/GlusterFS)
Database replication (MariaDB master-slave)

Cost/benefit:

Adds significant complexity
Not recommended for <10 users

Disaster Recovery

Scenarios

1. Service crash

Auto-restart: restart: unless-stopped
Health checks: detect and restart

2. Data corruption

Restore from local backup (minutes)
Last resort: remote backup (hours)

3. Server failure

Restore to new server
Restore backups
Update DNS

Recovery Time Objective (RTO)

Scenario	Target	Method
Container restart	<1 min	Docker auto-restart
Service failure	<5 min	Manual restart
Data corruption	<30 min	Local backup restore
Server failure	<4 hours	New server + backup restore

Recovery Point Objective (RPO)

Service	Data Loss	Backup Frequency
Nextcloud	<24 hours	Daily
Minecraft	<6 hours	Every 6 hours
Configs	<7 days	Weekly

Monitoring & Alerting

Key Metrics

Infrastructure:

CPU usage (alert >80%)
Memory usage (alert >85%)
Disk space (alert >80%)
Network throughput

Services:

Container status (alert if down >5min)
Response time (alert >2s)
Error rate (alert >5%)

Business:

Minecraft: player count, TPS
Nextcloud: active users, storage
Backup: last success timestamp

Alert Channels

Current: Grafana alerts

Email
Webhook

Future options:

Telegram bot
Discord webhook
PagerDuty

Technology Choices

Why These Tools?

Component	Alternatives	Why Chosen
Caddy	Nginx, Traefik	Auto HTTPS, simplest config
Prometheus	InfluxDB, VictoriaMetrics	Industry standard, huge ecosystem
Grafana	Kibana, Chronograf	Best dashboards, most plugins
Loki	ELK, Graylog	10x lighter than ELK
Watchtower	Manual, Renovate	Set and forget, label-based
Duplicati	Restic, Borg	Web UI, widest storage support
MariaDB	PostgreSQL, MySQL	Drop-in MySQL replacement, faster
Redis	Memcached, KeyDB	Persistence, richer data types

What We Avoided

Tool	Why Not
Kubernetes	Overkill for <10 services, steep learning curve
Traefik	Over-engineered for simple reverse proxy
ELK Stack	Too heavy (Elasticsearch needs 2-4GB RAM)
Zabbix	Old-school, complex setup
Ansible	Not needed for single-server Docker Compose

Future Enhancements

Phase 1 (Done)

✅ Reverse proxy (Caddy)
✅ Monitoring (Prometheus + Grafana)
✅ Logging (Loki)
✅ Auto-update (Watchtower)
✅ Remote backup (Duplicati)
✅ Security (Fail2ban)

Phase 2 (Optional)

Alertmanager (notifications)
Uptime Kuma (status page)
Gitea (self-hosted Git)
Vaultwarden (password manager)
Homer (dashboard)

Phase 3 (Advanced)

Docker Swarm (HA)
CI/CD (Drone)
Secret management (Vault)
Service mesh (if needed)

Development Workflow

Local Testing

# Test config syntax
docker compose -f compose.yml config

# Start in foreground
docker compose up

# Check logs
docker compose logs -f

Deployment

# Update code
git pull

# Restart services
make down
make up

# Verify
make status
make health

Rollback

# Git rollback
git log
git checkout <previous-commit>

# Or: Restore from backup

Documentation

README.md - Project overview
QUICKSTART.md - 5-minute setup
docs/ARCHITECTURE.md - This file
docs/IMPLEMENTATION.md - Step-by-step guide
infrastructure/README.md - Infrastructure details
docs/architecture-recommendations.md - Detailed component analysis

12 KiB Raw Blame History