Files
aiworker/K8S-CLUSTER.md
Hector Ros e5e039504e Rename CLUSTER-READY → K8S-CLUSTER (more direct)
Also added:
- DEVELOPMENT-WORKFLOW.md - Complete dev process documented
- Updated all references across documentation

Documentation is now centralized and direct.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-01-20 00:44:29 +01:00

169 lines
4.7 KiB
Markdown

# ☸️ Kubernetes Cluster - AiWorker
**Ubicación**: Houston, Texas (us-hou-1)
**K3s**: v1.35.0+k3s1
**Costo**: $148/mes
---
## 🖥️ Servidores
| Hostname | IP Pública | IP Privada | Specs | Role |
|----------------|-----------------|-------------|------------------|---------------|
| k8s-cp-01 | 108.165.47.233 | 10.100.0.2 | 4 vCPU, 8 GB | control-plane |
| k8s-cp-02 | 108.165.47.235 | 10.100.0.3 | 4 vCPU, 8 GB | control-plane |
| k8s-cp-03 | 108.165.47.215 | 10.100.0.4 | 4 vCPU, 8 GB | control-plane |
| k8s-worker-01 | 108.165.47.225 | 10.100.0.5 | 8 vCPU, 16 GB | worker |
| k8s-worker-02 | 108.165.47.224 | 10.100.0.6 | 8 vCPU, 16 GB | worker |
| k8s-worker-03 | 108.165.47.222 | 10.100.0.7 | 8 vCPU, 16 GB | worker |
| k8s-lb-01 | 108.165.47.221 | 10.100.0.8 | 2 vCPU, 4 GB | load-balancer |
| k8s-lb-02 | 108.165.47.203 | 10.100.0.9 | 2 vCPU, 4 GB | load-balancer |
**Total**: 48 vCPU, 104 GB RAM, ~2.9 TB Storage
---
## 🌐 Acceso
### Kubeconfig
```bash
export KUBECONFIG=~/.kube/aiworker-config
kubectl get nodes
```
### SSH
```bash
ssh root@108.165.47.233 # cp-01
ssh root@108.165.47.225 # worker-01
ssh root@108.165.47.221 # lb-01
```
---
## 📦 Componentes Instalados
| Software | Versión | Namespace | URL |
|-----------------|--------------|-----------------|-----|
| K3s | v1.35.0+k3s1 | - | - |
| Longhorn | v1.8.0 | longhorn-system | https://longhorn.fuq.tv |
| Nginx Ingress | latest | ingress-nginx | - |
| Cert-Manager | v1.16.2 | cert-manager | - |
| MariaDB | 11.4 LTS | control-plane | mariadb:3306 |
| Redis | 7 | control-plane | redis:6379 |
| Gitea | 1.25.3 | gitea | https://git.fuq.tv |
| ArgoCD | stable | argocd | https://argocd.fuq.tv |
| Gitea Runner | latest | gitea-actions | - |
---
## 🗄️ Storage (Longhorn HA)
- **StorageClass**: `longhorn` (default)
- **Replicación**: 3 réplicas por volumen
- **Tolerancia**: Pierde hasta 2 workers sin pérdida de datos
**Volúmenes**:
- `mariadb-pvc`: 20Gi (control-plane)
- `gitea-data`: 50Gi (gitea)
---
## 🌍 DNS y Networking
**DNS** (*.fuq.tv → Load Balancers):
```
*.fuq.tv → 108.165.47.221, 108.165.47.203
*.r.fuq.tv → 108.165.47.221, 108.165.47.203
```
**Red privada**: 10.100.0.0/24 (eth1)
**Load balancing**: HAProxy en LB-01 y LB-02 → Workers NodePort
---
## 📋 Namespaces
| Namespace | Propósito | Quota CPU/RAM |
|-----------------|------------------------|---------------|
| control-plane | Backend, DB, Redis | 8 CPU, 16 GB |
| agents | Claude agents | 20 CPU, 40 GB |
| gitea | Git server | 2 CPU, 4 GB |
| gitea-actions | CI/CD runner | - |
| argocd | GitOps | - |
| ingress-nginx | Ingress controller | - |
| cert-manager | TLS management | - |
| longhorn-system | Distributed storage | - |
---
## ⚡ Comandos Esenciales
```bash
# Estado
kubectl get nodes -o wide
kubectl get pods -A
# Recursos
kubectl top nodes
kubectl top pods -A
# Deploy
kubectl apply -f k8s/backend/
kubectl rollout status deployment/backend -n control-plane
# Logs
kubectl logs -f deployment/backend -n control-plane
# Troubleshooting
kubectl describe pod <pod> -n <namespace>
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
```
---
## 🔐 Conexiones Internas (DNS Cluster)
```
mariadb.control-plane.svc.cluster.local:3306
redis.control-plane.svc.cluster.local:6379
gitea.gitea.svc.cluster.local:3000
```
---
## 💪 Alta Disponibilidad
| Componente | Implementación | Tolerancia a Fallos |
|----------------|-------------------|---------------------|
| Control Plane | 3 nodos etcd | 1 nodo |
| Workers | 3 nodos | 2 nodos |
| Load Balancers | DNS round-robin | 1 LB |
| Storage | Longhorn 3x | 2 workers |
| Ingress | En todos workers | 2 workers |
---
## 🔧 Maintenance
### Backup
```bash
# etcd
ssh root@108.165.47.233 "k3s etcd-snapshot save"
# MariaDB
kubectl exec -n control-plane mariadb-0 -- \
mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql
```
### Upgrade K3s
```bash
# Workers primero, luego control planes
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"
```
---
**Setup completo**: `CLUSTER-SETUP-COMPLETE.md`
**Troubleshooting**: `TROUBLESHOOTING.md`
**Para agentes IA**: `AGENT-GUIDE.md`