Also added: - DEVELOPMENT-WORKFLOW.md - Complete dev process documented - Updated all references across documentation Documentation is now centralized and direct. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
169 lines
4.7 KiB
Markdown
169 lines
4.7 KiB
Markdown
# ☸️ Kubernetes Cluster - AiWorker
|
|
|
|
**Ubicación**: Houston, Texas (us-hou-1)
|
|
**K3s**: v1.35.0+k3s1
|
|
**Costo**: $148/mes
|
|
|
|
---
|
|
|
|
## 🖥️ Servidores
|
|
|
|
| Hostname | IP Pública | IP Privada | Specs | Role |
|
|
|----------------|-----------------|-------------|------------------|---------------|
|
|
| k8s-cp-01 | 108.165.47.233 | 10.100.0.2 | 4 vCPU, 8 GB | control-plane |
|
|
| k8s-cp-02 | 108.165.47.235 | 10.100.0.3 | 4 vCPU, 8 GB | control-plane |
|
|
| k8s-cp-03 | 108.165.47.215 | 10.100.0.4 | 4 vCPU, 8 GB | control-plane |
|
|
| k8s-worker-01 | 108.165.47.225 | 10.100.0.5 | 8 vCPU, 16 GB | worker |
|
|
| k8s-worker-02 | 108.165.47.224 | 10.100.0.6 | 8 vCPU, 16 GB | worker |
|
|
| k8s-worker-03 | 108.165.47.222 | 10.100.0.7 | 8 vCPU, 16 GB | worker |
|
|
| k8s-lb-01 | 108.165.47.221 | 10.100.0.8 | 2 vCPU, 4 GB | load-balancer |
|
|
| k8s-lb-02 | 108.165.47.203 | 10.100.0.9 | 2 vCPU, 4 GB | load-balancer |
|
|
|
|
**Total**: 48 vCPU, 104 GB RAM, ~2.9 TB Storage
|
|
|
|
---
|
|
|
|
## 🌐 Acceso
|
|
|
|
### Kubeconfig
|
|
```bash
|
|
export KUBECONFIG=~/.kube/aiworker-config
|
|
kubectl get nodes
|
|
```
|
|
|
|
### SSH
|
|
```bash
|
|
ssh root@108.165.47.233 # cp-01
|
|
ssh root@108.165.47.225 # worker-01
|
|
ssh root@108.165.47.221 # lb-01
|
|
```
|
|
|
|
---
|
|
|
|
## 📦 Componentes Instalados
|
|
|
|
| Software | Versión | Namespace | URL |
|
|
|-----------------|--------------|-----------------|-----|
|
|
| K3s | v1.35.0+k3s1 | - | - |
|
|
| Longhorn | v1.8.0 | longhorn-system | https://longhorn.fuq.tv |
|
|
| Nginx Ingress | latest | ingress-nginx | - |
|
|
| Cert-Manager | v1.16.2 | cert-manager | - |
|
|
| MariaDB | 11.4 LTS | control-plane | mariadb:3306 |
|
|
| Redis | 7 | control-plane | redis:6379 |
|
|
| Gitea | 1.25.3 | gitea | https://git.fuq.tv |
|
|
| ArgoCD | stable | argocd | https://argocd.fuq.tv |
|
|
| Gitea Runner | latest | gitea-actions | - |
|
|
|
|
---
|
|
|
|
## 🗄️ Storage (Longhorn HA)
|
|
|
|
- **StorageClass**: `longhorn` (default)
|
|
- **Replicación**: 3 réplicas por volumen
|
|
- **Tolerancia**: Pierde hasta 2 workers sin pérdida de datos
|
|
|
|
**Volúmenes**:
|
|
- `mariadb-pvc`: 20Gi (control-plane)
|
|
- `gitea-data`: 50Gi (gitea)
|
|
|
|
---
|
|
|
|
## 🌍 DNS y Networking
|
|
|
|
**DNS** (*.fuq.tv → Load Balancers):
|
|
```
|
|
*.fuq.tv → 108.165.47.221, 108.165.47.203
|
|
*.r.fuq.tv → 108.165.47.221, 108.165.47.203
|
|
```
|
|
|
|
**Red privada**: 10.100.0.0/24 (eth1)
|
|
**Load balancing**: HAProxy en LB-01 y LB-02 → Workers NodePort
|
|
|
|
---
|
|
|
|
## 📋 Namespaces
|
|
|
|
| Namespace | Propósito | Quota CPU/RAM |
|
|
|-----------------|------------------------|---------------|
|
|
| control-plane | Backend, DB, Redis | 8 CPU, 16 GB |
|
|
| agents | Claude agents | 20 CPU, 40 GB |
|
|
| gitea | Git server | 2 CPU, 4 GB |
|
|
| gitea-actions | CI/CD runner | - |
|
|
| argocd | GitOps | - |
|
|
| ingress-nginx | Ingress controller | - |
|
|
| cert-manager | TLS management | - |
|
|
| longhorn-system | Distributed storage | - |
|
|
|
|
---
|
|
|
|
## ⚡ Comandos Esenciales
|
|
|
|
```bash
|
|
# Estado
|
|
kubectl get nodes -o wide
|
|
kubectl get pods -A
|
|
|
|
# Recursos
|
|
kubectl top nodes
|
|
kubectl top pods -A
|
|
|
|
# Deploy
|
|
kubectl apply -f k8s/backend/
|
|
kubectl rollout status deployment/backend -n control-plane
|
|
|
|
# Logs
|
|
kubectl logs -f deployment/backend -n control-plane
|
|
|
|
# Troubleshooting
|
|
kubectl describe pod <pod> -n <namespace>
|
|
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
|
|
```
|
|
|
|
---
|
|
|
|
## 🔐 Conexiones Internas (DNS Cluster)
|
|
|
|
```
|
|
mariadb.control-plane.svc.cluster.local:3306
|
|
redis.control-plane.svc.cluster.local:6379
|
|
gitea.gitea.svc.cluster.local:3000
|
|
```
|
|
|
|
---
|
|
|
|
## 💪 Alta Disponibilidad
|
|
|
|
| Componente | Implementación | Tolerancia a Fallos |
|
|
|----------------|-------------------|---------------------|
|
|
| Control Plane | 3 nodos etcd | 1 nodo |
|
|
| Workers | 3 nodos | 2 nodos |
|
|
| Load Balancers | DNS round-robin | 1 LB |
|
|
| Storage | Longhorn 3x | 2 workers |
|
|
| Ingress | En todos workers | 2 workers |
|
|
|
|
---
|
|
|
|
## 🔧 Maintenance
|
|
|
|
### Backup
|
|
```bash
|
|
# etcd
|
|
ssh root@108.165.47.233 "k3s etcd-snapshot save"
|
|
|
|
# MariaDB
|
|
kubectl exec -n control-plane mariadb-0 -- \
|
|
mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql
|
|
```
|
|
|
|
### Upgrade K3s
|
|
```bash
|
|
# Workers primero, luego control planes
|
|
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"
|
|
```
|
|
|
|
---
|
|
|
|
**Setup completo**: `CLUSTER-SETUP-COMPLETE.md`
|
|
**Troubleshooting**: `TROUBLESHOOTING.md`
|
|
**Para agentes IA**: `AGENT-GUIDE.md`
|