Files

Hector Ros e5e039504e Rename CLUSTER-READY → K8S-CLUSTER (more direct)

Also added:
- DEVELOPMENT-WORKFLOW.md - Complete dev process documented
- Updated all references across documentation

Documentation is now centralized and direct.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

2026-01-20 00:44:29 +01:00

4.7 KiB

Raw Blame History

☸️ Kubernetes Cluster - AiWorker

Ubicación: Houston, Texas (us-hou-1) K3s: v1.35.0+k3s1 Costo: $148/mes

🖥️ Servidores

Hostname	IP Pública	IP Privada	Specs	Role
k8s-cp-01	108.165.47.233	10.100.0.2	4 vCPU, 8 GB	control-plane
k8s-cp-02	108.165.47.235	10.100.0.3	4 vCPU, 8 GB	control-plane
k8s-cp-03	108.165.47.215	10.100.0.4	4 vCPU, 8 GB	control-plane
k8s-worker-01	108.165.47.225	10.100.0.5	8 vCPU, 16 GB	worker
k8s-worker-02	108.165.47.224	10.100.0.6	8 vCPU, 16 GB	worker
k8s-worker-03	108.165.47.222	10.100.0.7	8 vCPU, 16 GB	worker
k8s-lb-01	108.165.47.221	10.100.0.8	2 vCPU, 4 GB	load-balancer
k8s-lb-02	108.165.47.203	10.100.0.9	2 vCPU, 4 GB	load-balancer

Total: 48 vCPU, 104 GB RAM, ~2.9 TB Storage

🌐 Acceso

Kubeconfig

export KUBECONFIG=~/.kube/aiworker-config
kubectl get nodes

SSH

ssh root@108.165.47.233  # cp-01
ssh root@108.165.47.225  # worker-01
ssh root@108.165.47.221  # lb-01

📦 Componentes Instalados

Software	Versión	Namespace	URL
K3s	v1.35.0+k3s1	-	-
Longhorn	v1.8.0	longhorn-system	https://longhorn.fuq.tv
Nginx Ingress	latest	ingress-nginx	-
Cert-Manager	v1.16.2	cert-manager	-
MariaDB	11.4 LTS	control-plane	mariadb:3306
Redis	7	control-plane	redis:6379
Gitea	1.25.3	gitea	https://git.fuq.tv
ArgoCD	stable	argocd	https://argocd.fuq.tv
Gitea Runner	latest	gitea-actions	-

🗄️ Storage (Longhorn HA)

StorageClass: longhorn (default)
Replicación: 3 réplicas por volumen
Tolerancia: Pierde hasta 2 workers sin pérdida de datos

Volúmenes:

mariadb-pvc: 20Gi (control-plane)
gitea-data: 50Gi (gitea)

🌍 DNS y Networking

DNS (*.fuq.tv → Load Balancers):

*.fuq.tv    → 108.165.47.221, 108.165.47.203
*.r.fuq.tv  → 108.165.47.221, 108.165.47.203

Red privada: 10.100.0.0/24 (eth1) Load balancing: HAProxy en LB-01 y LB-02 → Workers NodePort

📋 Namespaces

Namespace	Propósito	Quota CPU/RAM
control-plane	Backend, DB, Redis	8 CPU, 16 GB
agents	Claude agents	20 CPU, 40 GB
gitea	Git server	2 CPU, 4 GB
gitea-actions	CI/CD runner	-
argocd	GitOps	-
ingress-nginx	Ingress controller	-
cert-manager	TLS management	-
longhorn-system	Distributed storage	-

⚡ Comandos Esenciales

# Estado
kubectl get nodes -o wide
kubectl get pods -A

# Recursos
kubectl top nodes
kubectl top pods -A

# Deploy
kubectl apply -f k8s/backend/
kubectl rollout status deployment/backend -n control-plane

# Logs
kubectl logs -f deployment/backend -n control-plane

# Troubleshooting
kubectl describe pod <pod> -n <namespace>
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

🔐 Conexiones Internas (DNS Cluster)

mariadb.control-plane.svc.cluster.local:3306
redis.control-plane.svc.cluster.local:6379
gitea.gitea.svc.cluster.local:3000

💪 Alta Disponibilidad

Componente	Implementación	Tolerancia a Fallos
Control Plane	3 nodos etcd	1 nodo
Workers	3 nodos	2 nodos
Load Balancers	DNS round-robin	1 LB
Storage	Longhorn 3x	2 workers
Ingress	En todos workers	2 workers

🔧 Maintenance

Backup

# etcd
ssh root@108.165.47.233 "k3s etcd-snapshot save"

# MariaDB
kubectl exec -n control-plane mariadb-0 -- \
  mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql

Upgrade K3s

# Workers primero, luego control planes
ssh root@<worker-ip> "curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.X.X+k3s1 sh -"

Setup completo: CLUSTER-SETUP-COMPLETE.md Troubleshooting: TROUBLESHOOTING.md Para agentes IA: AGENT-GUIDE.md

4.7 KiB Raw Blame History