mirror of
https://github.com/m1ngsama/automa.git
synced 2026-02-08 06:24:05 +00:00
- Add QUICKSTART.md for 5-minute setup guide - Add CHEATSHEET.md for quick command reference - Add OPTIMIZATION_SUMMARY.md with complete architecture overview - Add detailed architecture documentation in docs/ - ARCHITECTURE.md: System design and component details - IMPLEMENTATION.md: Step-by-step implementation guide - architecture-recommendations.md: Component selection rationale - Add .env.example template for configuration Following KISS principles and Unix philosophy for self-hosted IaC platform.
12 KiB
12 KiB
Automa Architecture
Self-hosted services platform following Unix philosophy: simple, modular, composable.
Design Principles
- KISS - Keep It Simple, Stupid
- Single Responsibility - Each service does one thing well
- Replaceable - Any component can be swapped
- Composable - Services work together via standard interfaces
- Observable - Everything is monitored and logged
- Recoverable - Regular backups, tested restore procedures
System Overview
┌─────────────────────────────────────────────────────┐
│ Internet │
└───────────────────┬──────────────────────────────────┘
│
┌──────────▼──────────┐
│ Firewall (UFW) │
│ Fail2ban │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Caddy (80/443) │
│ - Auto HTTPS │
│ - Reverse Proxy │
└──────────┬──────────┘
│
┌─────────────┼─────────────┐
│ │ │
┌─────▼─────┐ ┌────▼────┐ ┌─────▼─────┐
│ Nextcloud │ │ Grafana │ │ Minecraft │
│ + MariaDB │ │ │ │ (host net)│
│ + Redis │ │ │ │ │
└───────────┘ └─────────┘ └───────────┘
│ │ │
│ ┌─────▼─────┐ │
│ │Prometheus │ │
│ │Loki │ │
│ │Promtail │ │
│ │cAdvisor │ │
│ └───────────┘ │
│ │
└─────────┬─────────────────┘
│
┌──────▼──────┐
│ Watchtower │
│ Duplicati │
└─────────────┘
│
┌──────▼──────┐
│ Backups │
│ (Local + │
│ Remote) │
└─────────────┘
Component Stack
Layer 1: Edge (Internet-facing)
| Component | Purpose | Ports | Why |
|---|---|---|---|
| UFW | Firewall | All | Simple, built-in Linux |
| Fail2ban | Intrusion prevention | - | Auto-ban attackers |
| Caddy | Reverse proxy + SSL | 80, 443 | Auto HTTPS, simple config |
Layer 2: Applications
| Service | Purpose | Ports | Stack |
|---|---|---|---|
| Nextcloud | Private cloud | 80→Caddy | PHP + MariaDB + Redis |
| Minecraft | Game server | 25565 | Fabric 1.21.1 |
| TeamSpeak | Voice chat | 9987 | TeamSpeak 3 |
Layer 3: Observability
| Component | Purpose | Storage | Why |
|---|---|---|---|
| Prometheus | Metrics DB | 10GB/30d | Industry standard |
| Grafana | Dashboards | 500MB | Best visualization |
| Loki | Log aggregation | 5GB/30d | Lightweight ELK alternative |
| Promtail | Log collector | - | Pairs with Loki |
| cAdvisor | Container metrics | - | Docker native |
Layer 4: Automation
| Component | Purpose | Why |
|---|---|---|
| Watchtower | Auto-update images | Label-based, simple |
| Duplicati | Remote backups | Web UI, encrypted |
| bin/backup.sh | Local backups | Custom, flexible |
Network Architecture
Networks
automa-proxy (172.20.0.0/16)
├─ caddy
├─ nextcloud
└─ grafana
automa-monitoring (172.21.0.0/16, internal)
├─ prometheus
├─ loki
├─ promtail
└─ cadvisor
nextcloud (172.22.0.0/16)
├─ nextcloud
├─ nextcloud-db
└─ nextcloud-redis
teamspeak (172.23.0.0/16)
└─ teamspeak
(host network)
└─ minecraft # Needs direct port access for UDP
Port Mapping
External (public):
- 80 → Caddy (HTTP → HTTPS redirect)
- 443 → Caddy (HTTPS)
- 25565 → Minecraft
- 9987/udp → TeamSpeak voice
- 30033 → TeamSpeak file transfer
Internal (localhost only):
- 3000 → Grafana (proxied via Caddy)
- 8080 → Nextcloud (proxied via Caddy)
- 8200 → Duplicati
- 9090 → Prometheus
Data Flow
Request Flow
User → Internet → Firewall → Caddy → Application
↓
Prometheus ← Metrics
↓
Grafana ← Query
Log Flow
Container → stdout/stderr → Docker logs → Promtail → Loki → Grafana
Backup Flow
Service data → bin/backup.sh → local backup → Duplicati → remote storage
Storage Strategy
Volume Types
Named volumes (managed by Docker):
- Database data (MariaDB)
- Cache (Redis)
- Monitoring data (Prometheus, Loki, Grafana)
- Config (Caddy, Duplicati)
Bind mounts (host filesystem):
- Minecraft world/mods/configs (easy access)
- Backup output directory
- Log files
Backup Strategy
3-2-1 Rule:
- 3 copies of data
- 2 different media
- 1 offsite
Implementation:
- Live data (volumes/bind mounts)
- Local backup (bin/backup.sh → ./backups/)
- Remote backup (Duplicati → S3/SFTP/etc)
Retention:
- Local: 7 days
- Remote: 30 days
- Configs: forever
Update Strategy
Image Versioning
Pinning strategy:
# ✅ Good - pin major version, get patches
image: nextcloud:28-apache
image: mariadb:11.2-jammy
image: grafana/grafana:10-alpine
# ⚠️ Acceptable - semantic versioning not available
image: teamspeak:latest
# ❌ Bad - unpredictable
image: nextcloud:latest
Update Methods
Automatic (Watchtower):
- Runs daily
- Only updates labeled containers
- Good for: Caddy, Grafana, Nextcloud app
- Bad for: Databases, critical services
Manual:
docker compose pull
docker compose up -d
- Good for: Databases, major version bumps
- Requires: Testing, backup first
Security Model
Defense in Depth
Layer 1: Network
- UFW firewall (deny all, allow specific)
- Fail2ban (auto-ban attackers)
Layer 2: TLS
- Caddy auto-HTTPS
- Force HTTPS redirect
- HSTS headers
Layer 3: Application
- Strong passwords (16+ chars)
- 2FA where available (Nextcloud)
- Limited port exposure
Layer 4: Data
- Encrypted backups (Duplicati)
- Secrets in .env (not in Git)
- Read-only mounts where possible
Secrets Management
Current:
.env (git-ignored)
└─ environment variables
└─ injected into containers
Future option:
- Docker secrets (Swarm mode)
- SOPS/Age encryption for .env
Resource Planning
Minimum Requirements
| Resource | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 6-8 cores |
| RAM | 8 GB | 16 GB |
| Disk | 100 GB | 500 GB SSD |
| Network | 10 Mbps | 100 Mbps |
Resource Allocation
Heavy services (reserve resources):
- Minecraft: 2-4 GB RAM
- MariaDB: 500 MB RAM
- Prometheus: 500 MB RAM
Light services (minimal):
- Caddy: 50 MB RAM
- Redis: 100 MB RAM
- Watchtower: 30 MB RAM
Scaling Strategy
Vertical (single server):
- Add RAM → increase Minecraft players
- Add CPU → faster builds/queries
- Add disk → longer retention
Horizontal (multiple servers):
- Separate services by server
- Example: Minecraft on server 1, Nextcloud on server 2
- Use remote monitoring (Prometheus federation)
High Availability (Future)
Current state: Single server
- No HA (single point of failure)
- Acceptable for home lab
HA options:
- Docker Swarm (orchestration)
- Load balancer (HAProxy/Caddy)
- Shared storage (NFS/GlusterFS)
- Database replication (MariaDB master-slave)
Cost/benefit:
- Adds significant complexity
- Not recommended for <10 users
Disaster Recovery
Scenarios
1. Service crash
- Auto-restart:
restart: unless-stopped - Health checks: detect and restart
2. Data corruption
- Restore from local backup (minutes)
- Last resort: remote backup (hours)
3. Server failure
- Restore to new server
- Restore backups
- Update DNS
Recovery Time Objective (RTO)
| Scenario | Target | Method |
|---|---|---|
| Container restart | <1 min | Docker auto-restart |
| Service failure | <5 min | Manual restart |
| Data corruption | <30 min | Local backup restore |
| Server failure | <4 hours | New server + backup restore |
Recovery Point Objective (RPO)
| Service | Data Loss | Backup Frequency |
|---|---|---|
| Nextcloud | <24 hours | Daily |
| Minecraft | <6 hours | Every 6 hours |
| Configs | <7 days | Weekly |
Monitoring & Alerting
Key Metrics
Infrastructure:
- CPU usage (alert >80%)
- Memory usage (alert >85%)
- Disk space (alert >80%)
- Network throughput
Services:
- Container status (alert if down >5min)
- Response time (alert >2s)
- Error rate (alert >5%)
Business:
- Minecraft: player count, TPS
- Nextcloud: active users, storage
- Backup: last success timestamp
Alert Channels
Current: Grafana alerts
- Webhook
Future options:
- Telegram bot
- Discord webhook
- PagerDuty
Technology Choices
Why These Tools?
| Component | Alternatives | Why Chosen |
|---|---|---|
| Caddy | Nginx, Traefik | Auto HTTPS, simplest config |
| Prometheus | InfluxDB, VictoriaMetrics | Industry standard, huge ecosystem |
| Grafana | Kibana, Chronograf | Best dashboards, most plugins |
| Loki | ELK, Graylog | 10x lighter than ELK |
| Watchtower | Manual, Renovate | Set and forget, label-based |
| Duplicati | Restic, Borg | Web UI, widest storage support |
| MariaDB | PostgreSQL, MySQL | Drop-in MySQL replacement, faster |
| Redis | Memcached, KeyDB | Persistence, richer data types |
What We Avoided
| Tool | Why Not |
|---|---|
| Kubernetes | Overkill for <10 services, steep learning curve |
| Traefik | Over-engineered for simple reverse proxy |
| ELK Stack | Too heavy (Elasticsearch needs 2-4GB RAM) |
| Zabbix | Old-school, complex setup |
| Ansible | Not needed for single-server Docker Compose |
Future Enhancements
Phase 1 (Done)
- ✅ Reverse proxy (Caddy)
- ✅ Monitoring (Prometheus + Grafana)
- ✅ Logging (Loki)
- ✅ Auto-update (Watchtower)
- ✅ Remote backup (Duplicati)
- ✅ Security (Fail2ban)
Phase 2 (Optional)
- Alertmanager (notifications)
- Uptime Kuma (status page)
- Gitea (self-hosted Git)
- Vaultwarden (password manager)
- Homer (dashboard)
Phase 3 (Advanced)
- Docker Swarm (HA)
- CI/CD (Drone)
- Secret management (Vault)
- Service mesh (if needed)
Development Workflow
Local Testing
# Test config syntax
docker compose -f compose.yml config
# Start in foreground
docker compose up
# Check logs
docker compose logs -f
Deployment
# Update code
git pull
# Restart services
make down
make up
# Verify
make status
make health
Rollback
# Git rollback
git log
git checkout <previous-commit>
# Or: Restore from backup
Documentation
README.md- Project overviewQUICKSTART.md- 5-minute setupdocs/ARCHITECTURE.md- This filedocs/IMPLEMENTATION.md- Step-by-step guideinfrastructure/README.md- Infrastructure detailsdocs/architecture-recommendations.md- Detailed component analysis