Files
aiworker/K8S-CLUSTER.md
Hector Ros e5e039504e Rename CLUSTER-READY → K8S-CLUSTER (more direct)
Also added:
- DEVELOPMENT-WORKFLOW.md - Complete dev process documented
- Updated all references across documentation

Documentation is now centralized and direct.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-01-20 00:44:29 +01:00

4.7 KiB

☸️ Kubernetes Cluster - AiWorker

Ubicación: Houston, Texas (us-hou-1) K3s: v1.35.0+k3s1 Costo: $148/mes


🖥️ Servidores

Hostname IP Pública IP Privada Specs Role
k8s-cp-01 108.165.47.233 10.100.0.2 4 vCPU, 8 GB control-plane
k8s-cp-02 108.165.47.235 10.100.0.3 4 vCPU, 8 GB control-plane
k8s-cp-03 108.165.47.215 10.100.0.4 4 vCPU, 8 GB control-plane
k8s-worker-01 108.165.47.225 10.100.0.5 8 vCPU, 16 GB worker
k8s-worker-02 108.165.47.224 10.100.0.6 8 vCPU, 16 GB worker
k8s-worker-03 108.165.47.222 10.100.0.7 8 vCPU, 16 GB worker
k8s-lb-01 108.165.47.221 10.100.0.8 2 vCPU, 4 GB load-balancer
k8s-lb-02 108.165.47.203 10.100.0.9 2 vCPU, 4 GB load-balancer

Total: 48 vCPU, 104 GB RAM, ~2.9 TB Storage


🌐 Acceso

Kubeconfig

export KUBECONFIG=~/.kube/aiworker-config
kubectl get nodes

SSH

ssh root@108.165.47.233  # cp-01
ssh root@108.165.47.225  # worker-01
ssh root@108.165.47.221  # lb-01

📦 Componentes Instalados

Software Versión Namespace URL
K3s v1.35.0+k3s1 - -
Longhorn v1.8.0 longhorn-system https://longhorn.fuq.tv
Nginx Ingress latest ingress-nginx -
Cert-Manager v1.16.2 cert-manager -
MariaDB 11.4 LTS control-plane mariadb:3306
Redis 7 control-plane redis:6379
Gitea 1.25.3 gitea https://git.fuq.tv
ArgoCD stable argocd https://argocd.fuq.tv
Gitea Runner latest gitea-actions -

🗄️ Storage (Longhorn HA)

  • StorageClass: longhorn (default)
  • Replicación: 3 réplicas por volumen
  • Tolerancia: Pierde hasta 2 workers sin pérdida de datos

Volúmenes:

  • mariadb-pvc: 20Gi (control-plane)
  • gitea-data: 50Gi (gitea)

🌍 DNS y Networking

DNS (*.fuq.tv → Load Balancers):

*.fuq.tv    → 108.165.47.221, 108.165.47.203
*.r.fuq.tv  → 108.165.47.221, 108.165.47.203

Red privada: 10.100.0.0/24 (eth1) Load balancing: HAProxy en LB-01 y LB-02 → Workers NodePort


📋 Namespaces

Namespace Propósito Quota CPU/RAM
control-plane Backend, DB, Redis 8 CPU, 16 GB
agents Claude agents 20 CPU, 40 GB
gitea Git server 2 CPU, 4 GB
gitea-actions CI/CD runner -
argocd GitOps -
ingress-nginx Ingress controller -
cert-manager TLS management -
longhorn-system Distributed storage -

Comandos Esenciales

# Estado
kubectl get nodes -o wide
kubectl get pods -A

# Recursos
kubectl top nodes
kubectl top pods -A

# Deploy
kubectl apply -f k8s/backend/
kubectl rollout status deployment/backend -n control-plane

# Logs
kubectl logs -f deployment/backend -n control-plane

# Troubleshooting
kubectl describe pod <pod> -n <namespace>
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

🔐 Conexiones Internas (DNS Cluster)

mariadb.control-plane.svc.cluster.local:3306
redis.control-plane.svc.cluster.local:6379
gitea.gitea.svc.cluster.local:3000

💪 Alta Disponibilidad

Componente Implementación Tolerancia a Fallos
Control Plane 3 nodos etcd 1 nodo
Workers 3 nodos 2 nodos
Load Balancers DNS round-robin 1 LB
Storage Longhorn 3x 2 workers
Ingress En todos workers 2 workers

🔧 Maintenance

Backup

# etcd
ssh root@108.165.47.233 "k3s etcd-snapshot save"

# MariaDB
kubectl exec -n control-plane mariadb-0 -- \
  mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql

Upgrade K3s

# Workers primero, luego control planes
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"

Setup completo: CLUSTER-SETUP-COMPLETE.md Troubleshooting: TROUBLESHOOTING.md Para agentes IA: AGENT-GUIDE.md