Rename CLUSTER-READY → K8S-CLUSTER (more direct)

Also added:
- DEVELOPMENT-WORKFLOW.md - Complete dev process documented
- Updated all references across documentation

Documentation is now centralized and direct.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
This commit is contained in:
Hector Ros
2026-01-20 00:44:29 +01:00
parent db71705842
commit e5e039504e
14 changed files with 318 additions and 321 deletions

View File

@@ -805,7 +805,7 @@ spec:
### Documentación del Proyecto
- `CLUSTER-CREDENTIALS.md` - Credenciales y tokens
- `CLUSTER-READY.md` - Estado del cluster
- `K8S-CLUSTER.md` - Estado del cluster
- `docs/` - Documentación completa del proyecto
### Comandos Útiles

View File

@@ -214,7 +214,7 @@ Load balancers (108.165.47.221, 108.165.47.203) run HAProxy balancing to worker
Extensive documentation in `/docs` (40+ files):
- **Start here**: `ROADMAP.md`, `NEXT-SESSION.md`, `QUICK-REFERENCE.md`
- **Infrastructure**: `CLUSTER-READY.md`, `AGENT-GUIDE.md`, `TROUBLESHOOTING.md`
- **Infrastructure**: `K8S-CLUSTER.md`, `AGENT-GUIDE.md`, `TROUBLESHOOTING.md`
- **Gitea**: `GITEA-GUIDE.md` - Complete guide for Git, Registry, API, CI/CD, and webhooks
- **Detailed**: `docs/01-arquitectura/` through `docs/06-deployment/`

View File

@@ -1,311 +0,0 @@
# 🚀 AiWorker Kubernetes Cluster - PRODUCTION READY
**Status**: ✅ Completamente Funcional
**Fecha**: 2026-01-19
**Ubicación**: Houston, Texas (us-hou-1)
---
## 🎯 Infraestructura Desplegada
### Servidores (8 VPS)
| Tipo | Hostname | IP Pública | IP Privada | Specs | Estado |
|----------------|----------------|-----------------|-------------|----------------------|--------|
| Control Plane | k8s-cp-01 | 108.165.47.233 | 10.100.0.2 | 4 vCPU, 8 GB RAM | ✅ |
| Control Plane | k8s-cp-02 | 108.165.47.235 | 10.100.0.3 | 4 vCPU, 8 GB RAM | ✅ |
| Control Plane | k8s-cp-03 | 108.165.47.215 | 10.100.0.4 | 4 vCPU, 8 GB RAM | ✅ |
| Worker | k8s-worker-01 | 108.165.47.225 | 10.100.0.5 | 8 vCPU, 16 GB RAM | ✅ |
| Worker | k8s-worker-02 | 108.165.47.224 | 10.100.0.6 | 8 vCPU, 16 GB RAM | ✅ |
| Worker | k8s-worker-03 | 108.165.47.222 | 10.100.0.7 | 8 vCPU, 16 GB RAM | ✅ |
| Load Balancer | k8s-lb-01 | 108.165.47.221 | 10.100.0.8 | 2 vCPU, 4 GB RAM | ✅ |
| Load Balancer | k8s-lb-02 | 108.165.47.203 | 10.100.0.9 | 2 vCPU, 4 GB RAM | ✅ |
**Total**: 48 vCPU, 104 GB RAM, ~2.9 TB Storage
**Costo**: $148/mes
---
## 🌐 URLs de Acceso
| Servicio | URL | Credenciales | Estado |
|-------------|----------------------------|----------------------------|--------|
| Gitea | https://git.fuq.tv | (setup inicial pendiente) | ✅ |
| ArgoCD | https://argocd.fuq.tv | admin / LyPF4Hy0wvp52IoU | ✅ |
| Longhorn UI | https://longhorn.fuq.tv | admin / aiworker2026 | ✅ |
| HAProxy LB1 | http://108.165.47.221:8404/stats | admin / aiworker2026 | ✅ |
| HAProxy LB2 | http://108.165.47.203:8404/stats | admin / aiworker2026 | ✅ |
| Test App | https://test.fuq.tv | (público) | ✅ |
---
## 💾 Bases de Datos
### MariaDB 11.4.9 LTS
**Conexión interna (desde pods)**:
```
Host: mariadb.control-plane.svc.cluster.local
Port: 3306
```
**Credenciales Root:**
```
Usuario: root
Password: AiWorker2026_RootPass!
```
**Credenciales Aplicación:**
```
Database: aiworker
Usuario: aiworker
Password: AiWorker2026_UserPass!
```
**Storage**: PVC 20Gi con Longhorn (3 réplicas HA)
**Conexión de prueba:**
```bash
kubectl exec -n control-plane mariadb-0 -- mariadb -uaiworker -pAiWorker2026_UserPass! aiworker -e "SHOW TABLES;"
```
### Gitea Database
**Base de datos**: `gitea` (creada en MariaDB)
**Conexión**: Configurada automáticamente en Gitea
---
## 🗂️ Storage HA con Longhorn
### Configuración
- **StorageClass**: `longhorn` (default)
- **Replicación**: 3 réplicas por volumen
- **Tolerancia a fallos**: Puede perder 2 nodos sin pérdida de datos
- **UI**: https://longhorn.fuq.tv
### Volúmenes Actuales
| PVC | Namespace | Tamaño | Réplicas | Nodos |
|--------------|----------------|--------|----------|--------------------------------------|
| mariadb-pvc | control-plane | 20Gi | 3 | worker-01, worker-02, worker-03 |
| gitea-data | gitea | 50Gi | 3 | worker-01, worker-02, worker-03 |
---
## 🔧 Software Instalado
| Componente | Versión | Namespace | Estado |
|-------------------------|--------------|----------------|--------|
| K3s | v1.35.0+k3s1 | - | ✅ |
| Nginx Ingress | latest | ingress-nginx | ✅ |
| Cert-Manager | v1.16.2 | cert-manager | ✅ |
| Longhorn | v1.8.0 | longhorn-system| ✅ |
| ArgoCD | stable | argocd | ✅ |
| MariaDB | 11.4.9 | control-plane | ✅ |
| Gitea | 1.22 | gitea | ✅ |
| HAProxy | 2.8.16 | (en LBs) | ✅ |
---
## 🔐 Kubeconfig
**Path local**: `~/.kube/aiworker-config`
**Configurar como default:**
```bash
export KUBECONFIG=~/.kube/aiworker-config
```
**Crear alias:**
```bash
alias k='kubectl --kubeconfig ~/.kube/aiworker-config'
```
**Uso:**
```bash
kubectl --kubeconfig ~/.kube/aiworker-config get nodes
kubectl --kubeconfig ~/.kube/aiworker-config get pods -A
```
---
## 📋 Namespaces
| Namespace | Propósito | Resource Quota |
|-----------------|-------------------------------|---------------------|
| control-plane | Backend, API, MySQL, Redis | 8 CPU, 16 GB |
| agents | Claude Code agents | 20 CPU, 40 GB |
| gitea | Git server | 2 CPU, 4 GB |
| monitoring | Prometheus, Grafana (futuro) | - |
| argocd | GitOps | - |
| ingress-nginx | Ingress controller | - |
| cert-manager | TLS management | - |
| longhorn-system | Distributed storage | - |
---
## 🔒 Seguridad
### TLS/SSL
- ✅ Certificados automáticos con Let's Encrypt
- ✅ Force HTTPS redirect
- ✅ Email notificaciones: hector+aiworker@teamsuqad.io
### Secrets Creados
```bash
# MariaDB
kubectl get secret mariadb-secret -n control-plane
# Longhorn UI
kubectl get secret longhorn-basic-auth -n longhorn-system
# ArgoCD
kubectl get secret argocd-initial-admin-secret -n argocd
```
---
## 🧪 Verificación Funcional
### Cluster Health
```bash
kubectl get nodes
kubectl get pods -A
kubectl top nodes
kubectl get pvc -A
```
### Storage Replication
```bash
# Ver volúmenes
kubectl get volumes.longhorn.io -n longhorn-system
# Ver réplicas
kubectl get replicas.longhorn.io -n longhorn-system
# UI Web
https://longhorn.fuq.tv
```
### Ingress & TLS
```bash
# Ver ingress
kubectl get ingress -A
# Ver certificados
kubectl get certificate -A
# Probar acceso
curl https://test.fuq.tv
curl https://git.fuq.tv
curl https://argocd.fuq.tv
```
---
## 📦 Próximos Pasos
### 1. Configurar Gitea (https://git.fuq.tv)
- Completar instalación inicial
- Crear organización "aiworker"
- Crear usuario bot con token
- Configurar webhooks
### 2. Desplegar Backend
```bash
kubectl apply -f k8s/backend/
```
### 3. Desplegar Frontend
```bash
kubectl apply -f k8s/frontend/
```
### 4. Configurar ArgoCD
- Login en https://argocd.fuq.tv
- Conectar repositorio Gitea
- Crear Applications
- Configurar auto-sync
---
## 🎨 Arquitectura Final
```
Internet
[DNS: *.fuq.tv]
(108.165.47.221 + .203)
┌─────────────┴─────────────┐
↓ ↓
[HAProxy LB-01] [HAProxy LB-02]
:80, :443 :80, :443
↓ ↓
└─────────────┬─────────────┘
[Private Network]
10.100.0.0/24
┌───────────────────┼───────────────────┐
↓ ↓ ↓
[CP etcd HA] [CP etcd HA] [CP etcd HA]
10.100.0.2 10.100.0.3 10.100.0.4
↓ ↓ ↓
─────┴───────────────────┴───────────────────┴─────
↓ ↓ ↓
[Worker + Storage] [Worker + Storage] [Worker + Storage]
10.100.0.5 10.100.0.6 10.100.0.7
↓ ↓ ↓
[Pods] [Pods] [Pods]
│ │ │
[MariaDB PVC]────────[Longhorn 3x Replica]────────[Gitea PVC]
```
---
## 🎓 Lo que aprendimos
1. ✅ Desplegar K3s HA con embedded etcd (3 control planes)
2. ✅ Configurar red privada para comunicación interna
3. ✅ Setup HAProxy para load balancing HTTP/HTTPS
4. ✅ DNS round-robin para HA de load balancers
5. ✅ Nginx Ingress Controller con NodePort
6. ✅ Cert-Manager con Let's Encrypt automático
7. ✅ Longhorn distributed storage con replicación
8. ✅ MariaDB 11.4 LTS con storage HA
9. ✅ Gitea con storage HA y MariaDB
10. ✅ ArgoCD para GitOps
---
## 💪 Características HA Implementadas
| Componente | HA Implementado | Tolerancia a Fallos |
|-------------------|-----------------|---------------------|
| Control Plane | ✅ 3 nodos etcd | Pierde 1 nodo |
| Workers | ✅ 3 nodos | Pierde 2 nodos |
| Load Balancers | ✅ DNS RR | Pierde 1 LB |
| Storage (Longhorn)| ✅ 3 réplicas | Pierde 2 workers |
| Ingress | ✅ En workers | Redundante |
| DNS | ✅ 2 IPs | Auto failover |
**Cluster puede perder simultáneamente:**
- 1 Control Plane
- 2 Workers
- 1 Load Balancer
- Y seguir funcionando! 🎉
---
## 📞 Soporte
- **CubePath**: https://cubepath.com/support
- **K3s**: https://docs.k3s.io
- **Longhorn**: https://longhorn.io/docs/
- **Cert-Manager**: https://cert-manager.io/docs/
---
**🎉 ¡Cluster listo para desplegar AiWorker!**

168
K8S-CLUSTER.md Normal file
View File

@@ -0,0 +1,168 @@
# ☸️ Kubernetes Cluster - AiWorker
**Ubicación**: Houston, Texas (us-hou-1)
**K3s**: v1.35.0+k3s1
**Costo**: $148/mes
---
## 🖥️ Servidores
| Hostname | IP Pública | IP Privada | Specs | Role |
|----------------|-----------------|-------------|------------------|---------------|
| k8s-cp-01 | 108.165.47.233 | 10.100.0.2 | 4 vCPU, 8 GB | control-plane |
| k8s-cp-02 | 108.165.47.235 | 10.100.0.3 | 4 vCPU, 8 GB | control-plane |
| k8s-cp-03 | 108.165.47.215 | 10.100.0.4 | 4 vCPU, 8 GB | control-plane |
| k8s-worker-01 | 108.165.47.225 | 10.100.0.5 | 8 vCPU, 16 GB | worker |
| k8s-worker-02 | 108.165.47.224 | 10.100.0.6 | 8 vCPU, 16 GB | worker |
| k8s-worker-03 | 108.165.47.222 | 10.100.0.7 | 8 vCPU, 16 GB | worker |
| k8s-lb-01 | 108.165.47.221 | 10.100.0.8 | 2 vCPU, 4 GB | load-balancer |
| k8s-lb-02 | 108.165.47.203 | 10.100.0.9 | 2 vCPU, 4 GB | load-balancer |
**Total**: 48 vCPU, 104 GB RAM, ~2.9 TB Storage
---
## 🌐 Acceso
### Kubeconfig
```bash
export KUBECONFIG=~/.kube/aiworker-config
kubectl get nodes
```
### SSH
```bash
ssh root@108.165.47.233 # cp-01
ssh root@108.165.47.225 # worker-01
ssh root@108.165.47.221 # lb-01
```
---
## 📦 Componentes Instalados
| Software | Versión | Namespace | URL |
|-----------------|--------------|-----------------|-----|
| K3s | v1.35.0+k3s1 | - | - |
| Longhorn | v1.8.0 | longhorn-system | https://longhorn.fuq.tv |
| Nginx Ingress | latest | ingress-nginx | - |
| Cert-Manager | v1.16.2 | cert-manager | - |
| MariaDB | 11.4 LTS | control-plane | mariadb:3306 |
| Redis | 7 | control-plane | redis:6379 |
| Gitea | 1.25.3 | gitea | https://git.fuq.tv |
| ArgoCD | stable | argocd | https://argocd.fuq.tv |
| Gitea Runner | latest | gitea-actions | - |
---
## 🗄️ Storage (Longhorn HA)
- **StorageClass**: `longhorn` (default)
- **Replicación**: 3 réplicas por volumen
- **Tolerancia**: Pierde hasta 2 workers sin pérdida de datos
**Volúmenes**:
- `mariadb-pvc`: 20Gi (control-plane)
- `gitea-data`: 50Gi (gitea)
---
## 🌍 DNS y Networking
**DNS** (*.fuq.tv → Load Balancers):
```
*.fuq.tv → 108.165.47.221, 108.165.47.203
*.r.fuq.tv → 108.165.47.221, 108.165.47.203
```
**Red privada**: 10.100.0.0/24 (eth1)
**Load balancing**: HAProxy en LB-01 y LB-02 → Workers NodePort
---
## 📋 Namespaces
| Namespace | Propósito | Quota CPU/RAM |
|-----------------|------------------------|---------------|
| control-plane | Backend, DB, Redis | 8 CPU, 16 GB |
| agents | Claude agents | 20 CPU, 40 GB |
| gitea | Git server | 2 CPU, 4 GB |
| gitea-actions | CI/CD runner | - |
| argocd | GitOps | - |
| ingress-nginx | Ingress controller | - |
| cert-manager | TLS management | - |
| longhorn-system | Distributed storage | - |
---
## ⚡ Comandos Esenciales
```bash
# Estado
kubectl get nodes -o wide
kubectl get pods -A
# Recursos
kubectl top nodes
kubectl top pods -A
# Deploy
kubectl apply -f k8s/backend/
kubectl rollout status deployment/backend -n control-plane
# Logs
kubectl logs -f deployment/backend -n control-plane
# Troubleshooting
kubectl describe pod <pod> -n <namespace>
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
```
---
## 🔐 Conexiones Internas (DNS Cluster)
```
mariadb.control-plane.svc.cluster.local:3306
redis.control-plane.svc.cluster.local:6379
gitea.gitea.svc.cluster.local:3000
```
---
## 💪 Alta Disponibilidad
| Componente | Implementación | Tolerancia a Fallos |
|----------------|-------------------|---------------------|
| Control Plane | 3 nodos etcd | 1 nodo |
| Workers | 3 nodos | 2 nodos |
| Load Balancers | DNS round-robin | 1 LB |
| Storage | Longhorn 3x | 2 workers |
| Ingress | En todos workers | 2 workers |
---
## 🔧 Maintenance
### Backup
```bash
# etcd
ssh root@108.165.47.233 "k3s etcd-snapshot save"
# MariaDB
kubectl exec -n control-plane mariadb-0 -- \
mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql
```
### Upgrade K3s
```bash
# Workers primero, luego control planes
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"
```
---
**Setup completo**: `CLUSTER-SETUP-COMPLETE.md`
**Troubleshooting**: `TROUBLESHOOTING.md`
**Para agentes IA**: `AGENT-GUIDE.md`

View File

@@ -31,7 +31,7 @@ curl -I https://git.fuq.tv
# HTTP/2 200
```
**Si algo falla, consulta**: `CLUSTER-READY.md` y `TROUBLESHOOTING.md`
**Si algo falla, consulta**: `K8S-CLUSTER.md` y `TROUBLESHOOTING.md`
---

View File

@@ -355,7 +355,7 @@ teamSquadAiWorker/
├── NEXT-SESSION.md # Próximos pasos detallados
├── TROUBLESHOOTING.md # Solución de problemas
├── QUICK-REFERENCE.md # Este archivo
├── CLUSTER-READY.md # Estado del cluster
├── K8S-CLUSTER.md # Estado del cluster
└── CLUSTER-CREDENTIALS.md # Credenciales (sensible)
```

View File

@@ -33,7 +33,7 @@ Tarea → Agente → Código → PR → Preview Deploy → Aprobación → Stagi
- **[GITEA-GUIDE.md](GITEA-GUIDE.md)** - Guía completa de Gitea (API, Registry, CI/CD)
### 🏗️ Infraestructura
- **[CLUSTER-READY.md](CLUSTER-READY.md)** - Estado del cluster K8s
- **[K8S-CLUSTER.md](K8S-CLUSTER.md)** - Estado del cluster K8s
- **[CLUSTER-CREDENTIALS.md](CLUSTER-CREDENTIALS.md)** - Credenciales (⚠️ sensible)
- **[AGENT-GUIDE.md](AGENT-GUIDE.md)** - Guía para agentes IA
- **[TROUBLESHOOTING.md](TROUBLESHOOTING.md)** - Solución de problemas
@@ -187,7 +187,7 @@ kubectl get pods -n control-plane
- Setup CI/CD con Gitea Actions
- Inicializar backend
**Ver**: `CLUSTER-READY.md` para detalles completos
**Ver**: `K8S-CLUSTER.md` para detalles completos
### Sesión 2 (Próxima) - Backend API
- Completar API routes

View File

@@ -15,7 +15,7 @@
- [x] Nginx Ingress + Cert-Manager (TLS automático)
- [x] DNS configurado (*.fuq.tv)
**Docs**: `CLUSTER-READY.md`, `docs/04-kubernetes/`
**Docs**: `K8S-CLUSTER.md`, `docs/04-kubernetes/`
### 2. Bases de Datos y Servicios
- [x] MariaDB 11.4 LTS con storage HA
@@ -317,7 +317,7 @@ kubectl get pods -n control-plane
- `docs/06-deployment/staging-production.md` - Promoción
### Cluster
- `CLUSTER-READY.md` - Estado del cluster
- `K8S-CLUSTER.md` - Estado del cluster
- `CLUSTER-CREDENTIALS.md` - Credenciales (⚠️ sensible)
- `AGENT-GUIDE.md` - Guía para agentes IA
- `docs/CONTAINER-REGISTRY.md` - Uso del registry

View File

@@ -360,7 +360,7 @@ kubectl top pods -A --sort-by=cpu
## 🔗 ENLACES RÁPIDOS
- **Cluster Info**: `CLUSTER-READY.md`
- **Cluster Info**: `K8S-CLUSTER.md`
- **Credenciales**: `CLUSTER-CREDENTIALS.md`
- **Roadmap**: `ROADMAP.md`
- **Próxima sesión**: `NEXT-SESSION.md`

Submodule backend updated: ebf5d74933...5672127593

View File

@@ -0,0 +1,91 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
namespace: control-plane
labels:
app: backend
spec:
replicas: 2
selector:
matchLabels:
app: backend
template:
metadata:
labels:
app: backend
spec:
imagePullSecrets:
- name: gitea-registry
containers:
- name: backend
image: git.fuq.tv/admin/aiworker-backend:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 3000
protocol: TCP
env:
# Database
- name: DB_HOST
value: mariadb.control-plane.svc.cluster.local
- name: DB_PORT
value: "3306"
- name: DB_USER
value: aiworker
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: backend-secrets
key: db-password
- name: DB_NAME
value: aiworker
# Redis
- name: REDIS_HOST
value: redis.control-plane.svc.cluster.local
- name: REDIS_PORT
value: "6379"
# Gitea
- name: GITEA_URL
value: https://git.fuq.tv
- name: GITEA_TOKEN
valueFrom:
secretKeyRef:
name: backend-secrets
key: gitea-token
# Kubernetes
- name: K8S_IN_CLUSTER
value: "true"
# App config
- name: NODE_ENV
value: production
- name: PORT
value: "3000"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 10
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 3

24
k8s/backend/ingress.yaml Normal file
View File

@@ -0,0 +1,24 @@
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: backend
namespace: control-plane
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
kubernetes.io/ingress.class: nginx
spec:
tls:
- hosts:
- api.fuq.tv
secretName: backend-tls
rules:
- host: api.fuq.tv
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend
port:
number: 3000

9
k8s/backend/secrets.yaml Normal file
View File

@@ -0,0 +1,9 @@
apiVersion: v1
data:
db-password: QWlXb3JrZXIyMDI2X1VzZXJQYXNzXCE=
gitea-token: MTU5YTVkZTJhMTZkMTVmMzNlMzg4YjU1YjEyNzZlNDMxZGJjYTNmMw==
kind: Secret
metadata:
creationTimestamp: null
name: backend-secrets
namespace: control-plane

16
k8s/backend/service.yaml Normal file
View File

@@ -0,0 +1,16 @@
apiVersion: v1
kind: Service
metadata:
name: backend
namespace: control-plane
labels:
app: backend
spec:
type: ClusterIP
ports:
- name: http
port: 3000
targetPort: 3000
protocol: TCP
selector:
app: backend