aiworker/K8S-CLUSTER.md

# ☸️ Kubernetes Cluster - AiWorker

**Ubicación**: Houston, Texas (us-hou-1)
**K3s**: v1.35.0+k3s1
**Costo**: $148/mes

---

## 🖥️ Servidores

| Hostname       | IP Pública      | IP Privada  | Specs            | Role          |
|----------------|-----------------|-------------|------------------|---------------|
| k8s-cp-01      | 108.165.47.233  | 10.100.0.2  | 4 vCPU, 8 GB     | control-plane |
| k8s-cp-02      | 108.165.47.235  | 10.100.0.3  | 4 vCPU, 8 GB     | control-plane |
| k8s-cp-03      | 108.165.47.215  | 10.100.0.4  | 4 vCPU, 8 GB     | control-plane |
| k8s-worker-01  | 108.165.47.225  | 10.100.0.5  | 8 vCPU, 16 GB    | worker        |
| k8s-worker-02  | 108.165.47.224  | 10.100.0.6  | 8 vCPU, 16 GB    | worker        |
| k8s-worker-03  | 108.165.47.222  | 10.100.0.7  | 8 vCPU, 16 GB    | worker        |
| k8s-lb-01      | 108.165.47.221  | 10.100.0.8  | 2 vCPU, 4 GB     | load-balancer |
| k8s-lb-02      | 108.165.47.203  | 10.100.0.9  | 2 vCPU, 4 GB     | load-balancer |

**Total**: 48 vCPU, 104 GB RAM, ~2.9 TB Storage

---

## 🌐 Acceso

### Kubeconfig
```bash
export KUBECONFIG=~/.kube/aiworker-config
kubectl get nodes
```

### SSH
```bash
ssh root@108.165.47.233  # cp-01
ssh root@108.165.47.225  # worker-01
ssh root@108.165.47.221  # lb-01
```

---

## 📦 Componentes Instalados

| Software        | Versión      | Namespace       | URL |
|-----------------|--------------|-----------------|-----|
| K3s             | v1.35.0+k3s1 | -               | - |
| Longhorn        | v1.8.0       | longhorn-system | https://longhorn.fuq.tv |
| Nginx Ingress   | latest       | ingress-nginx   | - |
| Cert-Manager    | v1.16.2      | cert-manager    | - |
| MariaDB         | 11.4 LTS     | control-plane   | mariadb:3306 |
| Redis           | 7            | control-plane   | redis:6379 |
| Gitea           | 1.25.3       | gitea           | https://git.fuq.tv |
| ArgoCD          | stable       | argocd          | https://argocd.fuq.tv |
| Gitea Runner    | latest       | gitea-actions   | - |

---

## 🗄️ Storage (Longhorn HA)

- **StorageClass**: `longhorn` (default)
- **Replicación**: 3 réplicas por volumen
- **Tolerancia**: Pierde hasta 2 workers sin pérdida de datos

**Volúmenes**:
- `mariadb-pvc`: 20Gi (control-plane)
- `gitea-data`: 50Gi (gitea)

---

## 🌍 DNS y Networking

**DNS** (*.fuq.tv → Load Balancers):
```
*.fuq.tv    → 108.165.47.221, 108.165.47.203
*.r.fuq.tv  → 108.165.47.221, 108.165.47.203
```

**Red privada**: 10.100.0.0/24 (eth1)
**Load balancing**: HAProxy en LB-01 y LB-02 → Workers NodePort

---

## 📋 Namespaces

| Namespace       | Propósito              | Quota CPU/RAM |
|-----------------|------------------------|---------------|
| control-plane   | Backend, DB, Redis     | 8 CPU, 16 GB  |
| agents          | Claude agents          | 20 CPU, 40 GB |
| gitea           | Git server             | 2 CPU, 4 GB   |
| gitea-actions   | CI/CD runner           | -             |
| argocd          | GitOps                 | -             |
| ingress-nginx   | Ingress controller     | -             |
| cert-manager    | TLS management         | -             |
| longhorn-system | Distributed storage    | -             |

---

## ⚡ Comandos Esenciales

```bash
# Estado
kubectl get nodes -o wide
kubectl get pods -A

# Recursos
kubectl top nodes
kubectl top pods -A

# Deploy
kubectl apply -f k8s/backend/
kubectl rollout status deployment/backend -n control-plane

# Logs
kubectl logs -f deployment/backend -n control-plane

# Troubleshooting
kubectl describe pod <pod> -n <namespace>
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
```

---

## 🔐 Conexiones Internas (DNS Cluster)

```
mariadb.control-plane.svc.cluster.local:3306
redis.control-plane.svc.cluster.local:6379
gitea.gitea.svc.cluster.local:3000
```

---

## 💪 Alta Disponibilidad

| Componente     | Implementación    | Tolerancia a Fallos |
|----------------|-------------------|---------------------|
| Control Plane  | 3 nodos etcd      | 1 nodo              |
| Workers        | 3 nodos           | 2 nodos             |
| Load Balancers | DNS round-robin   | 1 LB                |
| Storage        | Longhorn 3x       | 2 workers           |
| Ingress        | En todos workers  | 2 workers           |

---

## 🔧 Maintenance

### Backup
```bash
# etcd
ssh root@108.165.47.233 "k3s etcd-snapshot save"

# MariaDB
kubectl exec -n control-plane mariadb-0 -- \
  mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql
```

### Upgrade K3s
```bash
# Workers primero, luego control planes
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"
```

---

**Setup completo**: `CLUSTER-SETUP-COMPLETE.md`
**Troubleshooting**: `TROUBLESHOOTING.md`
**Para agentes IA**: `AGENT-GUIDE.md`