- CLAUDE.md for AI agents to understand the codebase - GITEA-GUIDE.md centralizes all Gitea operations (API, Registry, Auth) - DEVELOPMENT-WORKFLOW.md explains complete dev process - ROADMAP.md, NEXT-SESSION.md for planning - QUICK-REFERENCE.md, TROUBLESHOOTING.md for daily use - 40+ detailed docs in /docs folder - Backend as submodule from Gitea Everything documented for autonomous operation. Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
16 KiB
16 KiB
Guía para Agentes IA - Gestión del Cluster Kubernetes
Este documento contiene toda la información necesaria para que agentes IA puedan gestionar y operar el cluster de Kubernetes de AiWorker.
🔑 Acceso al Cluster
Kubeconfig
export KUBECONFIG=~/.kube/aiworker-config
Todos los comandos kubectl deben usar:
kubectl --kubeconfig ~/.kube/aiworker-config <comando>
O con el alias:
alias k='kubectl --kubeconfig ~/.kube/aiworker-config'
📋 Comandos Esenciales
Verificación del Cluster
# Estado de nodos
kubectl get nodes -o wide
# Todos los pods
kubectl get pods -A
# Pods por namespace
kubectl get pods -n <namespace>
# Recursos del cluster
kubectl top nodes
kubectl top pods -A
# Eventos recientes
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
Gestión de Deployments
# Ver deployments
kubectl get deployments -A
# Detalles de un deployment
kubectl describe deployment <name> -n <namespace>
# Escalar deployment
kubectl scale deployment <name> -n <namespace> --replicas=3
# Restart deployment
kubectl rollout restart deployment <name> -n <namespace>
# Ver historial
kubectl rollout history deployment <name> -n <namespace>
# Rollback
kubectl rollout undo deployment <name> -n <namespace>
Gestión de Pods
# Ver logs
kubectl logs -f <pod-name> -n <namespace>
# Logs de contenedor específico
kubectl logs -f <pod-name> -c <container-name> -n <namespace>
# Ejecutar comando en pod
kubectl exec -n <namespace> <pod-name> -- <command>
# Shell interactivo
kubectl exec -it -n <namespace> <pod-name> -- /bin/bash
# Copiar archivos
kubectl cp <namespace>/<pod>:/path/to/file ./local-file
kubectl cp ./local-file <namespace>/<pod>:/path/to/file
Gestión de Services
# Ver servicios
kubectl get svc -A
# Port-forward para testing
kubectl port-forward -n <namespace> svc/<service-name> 8080:80
# Endpoints de un servicio
kubectl get endpoints -n <namespace> <service-name>
Ingress y TLS
# Ver ingress
kubectl get ingress -A
# Ver certificados
kubectl get certificate -A
# Detalles de certificado
kubectl describe certificate <name> -n <namespace>
# Ver CertificateRequests
kubectl get certificaterequest -A
Storage y PVCs
# Ver PVCs
kubectl get pvc -A
# Ver PVs
kubectl get pv
# Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system
# Réplicas de storage
kubectl get replicas.longhorn.io -n longhorn-system
📦 Desplegar Aplicaciones
Crear Deployment Básico
cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: control-plane
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1
memory: 2Gi
EOF
Crear Service
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
name: myapp
namespace: control-plane
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 3000
type: ClusterIP
EOF
Crear Ingress con TLS
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp
namespace: control-plane
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.fuq.tv
secretName: myapp-fuq-tv-tls
rules:
- host: myapp.fuq.tv
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
EOF
Crear PVC con Longhorn HA
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-data
namespace: control-plane
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
EOF
🗄️ Acceso a Bases de Datos
MariaDB (Internal)
Connection String:
Host: mariadb.control-plane.svc.cluster.local
Port: 3306
Database: aiworker
User: aiworker
Password: AiWorker2026_UserPass!
Root Access:
kubectl exec -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass!
Crear nueva base de datos:
kubectl exec -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass! -e "CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
Backup:
kubectl exec -n control-plane mariadb-0 -- mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql
Restore:
cat backup.sql | kubectl exec -i -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass!
🔧 Troubleshooting
Pod no arranca
# Ver eventos
kubectl describe pod <pod-name> -n <namespace>
# Ver logs
kubectl logs <pod-name> -n <namespace>
# Logs del contenedor anterior (si crasheó)
kubectl logs <pod-name> -n <namespace> --previous
# Shell en pod fallido
kubectl debug -it <pod-name> -n <namespace> --image=busybox
Ingress no funciona
# Verificar Ingress
kubectl get ingress -n <namespace>
kubectl describe ingress <name> -n <namespace>
# Ver logs de Nginx Ingress
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller --tail=100
# Verificar certificado
kubectl get certificate -n <namespace>
kubectl describe certificate <name> -n <namespace>
# Si TLS falla, ver CertificateRequest
kubectl get certificaterequest -A
Storage/PVC issues
# Ver PVC
kubectl get pvc -n <namespace>
kubectl describe pvc <name> -n <namespace>
# Ver Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system
# Longhorn UI
https://longhorn.fuq.tv (admin / aiworker2026)
# Ver réplicas
kubectl get replicas.longhorn.io -n longhorn-system
Nodo con problemas
# Cordon (no asignar nuevos pods)
kubectl cordon <node-name>
# Drain (mover pods a otros nodos)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Uncordon (volver a habilitar)
kubectl uncordon <node-name>
🚀 Workflows Comunes
Desplegar nueva aplicación completa
# 1. Crear namespace si no existe
kubectl create namespace myapp
# 2. Crear secret si necesita
kubectl create secret generic myapp-secret -n myapp \
--from-literal=api-key=xxx
# 3. Aplicar manifests
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml
# 4. Verificar
kubectl get all -n myapp
kubectl get ingress -n myapp
kubectl get certificate -n myapp
# 5. Ver logs
kubectl logs -f -n myapp deployment/myapp
Actualizar imagen de deployment
# Opción 1: Imperativa
kubectl set image deployment/<name> <container>=<new-image>:<tag> -n <namespace>
# Opción 2: Patch
kubectl patch deployment <name> -n <namespace> \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","image":"<new-image>:<tag>"}]}}}}'
# Opción 3: Edit
kubectl edit deployment <name> -n <namespace>
Preview Environment (nuevo namespace temporal)
# 1. Crear namespace
kubectl create namespace preview-task-123
# 2. Label para cleanup automático
kubectl label namespace preview-task-123 environment=preview ttl=168h
# 3. Deploy app
kubectl apply -f app.yaml -n preview-task-123
# 4. Crear ingress
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: preview
namespace: preview-task-123
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- task-123.r.fuq.tv
secretName: preview-task-123-tls
rules:
- host: task-123.r.fuq.tv
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
EOF
# 5. Cleanup cuando termine
kubectl delete namespace preview-task-123
🛡️ Seguridad
Secrets Management
# Crear secret
kubectl create secret generic mysecret -n <namespace> \
--from-literal=username=admin \
--from-literal=password=xxx
# Ver secrets (no muestra valores)
kubectl get secrets -n <namespace>
# Ver secret value
kubectl get secret mysecret -n <namespace> -o jsonpath='{.data.password}' | base64 -d
RBAC
# Ver service accounts
kubectl get sa -A
# Ver roles
kubectl get roles -A
kubectl get clusterroles
# Ver bindings
kubectl get rolebindings -A
kubectl get clusterrolebindings
📊 Monitoring
Resource Usage
# Uso por nodo
kubectl top nodes
# Uso por pod
kubectl top pods -A
# Uso en namespace específico
kubectl top pods -n control-plane
Health Checks
# Componentes del sistema
kubectl get componentstatuses
# API server health
kubectl get --raw='/readyz?verbose'
# etcd health (desde control plane)
ssh root@108.165.47.233 "k3s kubectl get endpoints -n kube-system kube-apiserver"
🔄 GitOps con ArgoCD
Acceso
- URL: https://argocd.fuq.tv
- User: admin
- Pass: LyPF4Hy0wvp52IoU
Crear Application
cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
spec:
project: default
source:
repoURL: https://git.fuq.tv/aiworker/myapp
targetRevision: HEAD
path: k8s
destination:
server: https://kubernetes.default.svc
namespace: control-plane
syncPolicy:
automated:
prune: true
selfHeal: true
EOF
📍 Información del Cluster
URLs de Servicios
- Gitea: https://git.fuq.tv
- ArgoCD: https://argocd.fuq.tv
- Longhorn: https://longhorn.fuq.tv
- Test App: https://test.fuq.tv
Conexiones Internas
- MariaDB:
mariadb.control-plane.svc.cluster.local:3306 - Gitea:
gitea.gitea.svc.cluster.local:3000 - ArgoCD API:
argocd-server.argocd.svc.cluster.local:443
SSH a Nodos
# Control planes
ssh root@108.165.47.233 # k8s-cp-01
ssh root@108.165.47.235 # k8s-cp-02
ssh root@108.165.47.215 # k8s-cp-03
# Workers
ssh root@108.165.47.225 # k8s-worker-01
ssh root@108.165.47.224 # k8s-worker-02
ssh root@108.165.47.222 # k8s-worker-03
# Load balancers
ssh root@108.165.47.221 # k8s-lb-01
ssh root@108.165.47.203 # k8s-lb-02
🎯 Tareas Comunes para Agentes
1. Desplegar nueva versión de app
# Actualizar imagen
kubectl set image deployment/<name> <container>=<image>:<new-tag> -n <namespace>
# Verificar rollout
kubectl rollout status deployment/<name> -n <namespace>
# Si falla, rollback
kubectl rollout undo deployment/<name> -n <namespace>
2. Crear preview environment
# Namespace
kubectl create namespace preview-<task-id>
# Deploy
kubectl apply -f manifests/ -n preview-<task-id>
# Ingress
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: preview
namespace: preview-<task-id>
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- <task-id>.r.fuq.tv
secretName: preview-tls
rules:
- host: <task-id>.r.fuq.tv
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app
port:
number: 80
EOF
# Verificar URL
curl https://<task-id>.r.fuq.tv
3. Escalar aplicación
# Auto-scaling
kubectl autoscale deployment <name> -n <namespace> --cpu-percent=80 --min=2 --max=10
# Manual
kubectl scale deployment <name> -n <namespace> --replicas=5
4. Investigar problema
# 1. Ver estado general
kubectl get pods -n <namespace>
# 2. Describir pod con problema
kubectl describe pod <pod-name> -n <namespace>
# 3. Ver logs
kubectl logs <pod-name> -n <namespace> --tail=100
# 4. Ver eventos
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 5. Si es storage
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
# 6. Si es networking
kubectl get svc,endpoints -n <namespace>
kubectl get ingress -n <namespace>
5. Backup de configuración
# Exportar todos los recursos
kubectl get all,ingress,certificate,pvc -n <namespace> -o yaml > backup.yaml
# Backup específico
kubectl get deployment <name> -n <namespace> -o yaml > deployment-backup.yaml
🏗️ Estructura de Manifests
Template Completo
---
# Namespace
apiVersion: v1
kind: Namespace
metadata:
name: myapp
---
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: myapp
data:
NODE_ENV: "production"
LOG_LEVEL: "info"
---
# Secret
apiVersion: v1
kind: Secret
metadata:
name: myapp-secret
namespace: myapp
type: Opaque
stringData:
api-key: "your-api-key"
---
# PVC (si necesita storage)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-data
namespace: myapp
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: myapp
spec:
replicas: 2
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
valueFrom:
configMapKeyRef:
name: myapp-config
key: NODE_ENV
- name: API_KEY
valueFrom:
secretKeyRef:
name: myapp-secret
key: api-key
volumeMounts:
- name: data
mountPath: /data
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1
memory: 2Gi
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
volumes:
- name: data
persistentVolumeClaim:
claimName: myapp-data
---
# Service
apiVersion: v1
kind: Service
metadata:
name: myapp
namespace: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
# Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp
namespace: myapp
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- myapp.fuq.tv
secretName: myapp-tls
rules:
- host: myapp.fuq.tv
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
📚 Recursos de Referencia
Documentación del Proyecto
CLUSTER-CREDENTIALS.md- Credenciales y tokensCLUSTER-READY.md- Estado del clusterdocs/- Documentación completa del proyecto
Comandos Útiles
# Ver todo en un namespace
kubectl get all -n <namespace>
# Aplicar un directorio completo
kubectl apply -f ./k8s/ -R
# Diff antes de aplicar
kubectl diff -f manifest.yaml
# Validar YAML
kubectl apply --dry-run=client -f manifest.yaml
# Formatear output
kubectl get pods -o wide
kubectl get pods -o json
kubectl get pods -o yaml
⚡ Quick Reference
Namespaces del Proyecto
control-plane- Backend, API, MySQL, Redisagents- Claude Code agentsgitea- Git servermonitoring- Metrics, logsargocd- GitOps
StorageClass
longhorn(default) - HA storage con 3 réplicas
ClusterIssuers
letsencrypt-prod- Certificados producciónletsencrypt-staging- Certificados testing
IngressClass
nginx- Usar para todos los Ingress
Con esta guía, cualquier agente IA puede operar el cluster de forma autónoma.