Files
aiworker/AGENT-GUIDE.md
Hector Ros e5e039504e Rename CLUSTER-READY → K8S-CLUSTER (more direct)
Also added:
- DEVELOPMENT-WORKFLOW.md - Complete dev process documented
- Updated all references across documentation

Documentation is now centralized and direct.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
2026-01-20 00:44:29 +01:00

16 KiB

Guía para Agentes IA - Gestión del Cluster Kubernetes

Este documento contiene toda la información necesaria para que agentes IA puedan gestionar y operar el cluster de Kubernetes de AiWorker.


🔑 Acceso al Cluster

Kubeconfig

export KUBECONFIG=~/.kube/aiworker-config

Todos los comandos kubectl deben usar:

kubectl --kubeconfig ~/.kube/aiworker-config <comando>

O con el alias:

alias k='kubectl --kubeconfig ~/.kube/aiworker-config'

📋 Comandos Esenciales

Verificación del Cluster

# Estado de nodos
kubectl get nodes -o wide

# Todos los pods
kubectl get pods -A

# Pods por namespace
kubectl get pods -n <namespace>

# Recursos del cluster
kubectl top nodes
kubectl top pods -A

# Eventos recientes
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Gestión de Deployments

# Ver deployments
kubectl get deployments -A

# Detalles de un deployment
kubectl describe deployment <name> -n <namespace>

# Escalar deployment
kubectl scale deployment <name> -n <namespace> --replicas=3

# Restart deployment
kubectl rollout restart deployment <name> -n <namespace>

# Ver historial
kubectl rollout history deployment <name> -n <namespace>

# Rollback
kubectl rollout undo deployment <name> -n <namespace>

Gestión de Pods

# Ver logs
kubectl logs -f <pod-name> -n <namespace>

# Logs de contenedor específico
kubectl logs -f <pod-name> -c <container-name> -n <namespace>

# Ejecutar comando en pod
kubectl exec -n <namespace> <pod-name> -- <command>

# Shell interactivo
kubectl exec -it -n <namespace> <pod-name> -- /bin/bash

# Copiar archivos
kubectl cp <namespace>/<pod>:/path/to/file ./local-file
kubectl cp ./local-file <namespace>/<pod>:/path/to/file

Gestión de Services

# Ver servicios
kubectl get svc -A

# Port-forward para testing
kubectl port-forward -n <namespace> svc/<service-name> 8080:80

# Endpoints de un servicio
kubectl get endpoints -n <namespace> <service-name>

Ingress y TLS

# Ver ingress
kubectl get ingress -A

# Ver certificados
kubectl get certificate -A

# Detalles de certificado
kubectl describe certificate <name> -n <namespace>

# Ver CertificateRequests
kubectl get certificaterequest -A

Storage y PVCs

# Ver PVCs
kubectl get pvc -A

# Ver PVs
kubectl get pv

# Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system

# Réplicas de storage
kubectl get replicas.longhorn.io -n longhorn-system

📦 Desplegar Aplicaciones

Crear Deployment Básico

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: control-plane
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          value: production
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 2Gi
EOF

Crear Service

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: control-plane
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP
EOF

Crear Ingress con TLS

cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  namespace: control-plane
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - myapp.fuq.tv
    secretName: myapp-fuq-tv-tls
  rules:
  - host: myapp.fuq.tv
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
EOF

Crear PVC con Longhorn HA

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myapp-data
  namespace: control-plane
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
EOF

🗄️ Acceso a Bases de Datos

MariaDB (Internal)

Connection String:

Host: mariadb.control-plane.svc.cluster.local
Port: 3306
Database: aiworker
User: aiworker
Password: AiWorker2026_UserPass!

Root Access:

kubectl exec -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass!

Crear nueva base de datos:

kubectl exec -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass! -e "CREATE DATABASE mydb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"

Backup:

kubectl exec -n control-plane mariadb-0 -- mariadb-dump -uroot -pAiWorker2026_RootPass! --all-databases > backup.sql

Restore:

cat backup.sql | kubectl exec -i -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass!

🔧 Troubleshooting

Pod no arranca

# Ver eventos
kubectl describe pod <pod-name> -n <namespace>

# Ver logs
kubectl logs <pod-name> -n <namespace>

# Logs del contenedor anterior (si crasheó)
kubectl logs <pod-name> -n <namespace> --previous

# Shell en pod fallido
kubectl debug -it <pod-name> -n <namespace> --image=busybox

Ingress no funciona

# Verificar Ingress
kubectl get ingress -n <namespace>
kubectl describe ingress <name> -n <namespace>

# Ver logs de Nginx Ingress
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller --tail=100

# Verificar certificado
kubectl get certificate -n <namespace>
kubectl describe certificate <name> -n <namespace>

# Si TLS falla, ver CertificateRequest
kubectl get certificaterequest -A

Storage/PVC issues

# Ver PVC
kubectl get pvc -n <namespace>
kubectl describe pvc <name> -n <namespace>

# Ver Longhorn volumes
kubectl get volumes.longhorn.io -n longhorn-system

# Longhorn UI
https://longhorn.fuq.tv (admin / aiworker2026)

# Ver réplicas
kubectl get replicas.longhorn.io -n longhorn-system

Nodo con problemas

# Cordon (no asignar nuevos pods)
kubectl cordon <node-name>

# Drain (mover pods a otros nodos)
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Uncordon (volver a habilitar)
kubectl uncordon <node-name>

🚀 Workflows Comunes

Desplegar nueva aplicación completa

# 1. Crear namespace si no existe
kubectl create namespace myapp

# 2. Crear secret si necesita
kubectl create secret generic myapp-secret -n myapp \
  --from-literal=api-key=xxx

# 3. Aplicar manifests
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f ingress.yaml

# 4. Verificar
kubectl get all -n myapp
kubectl get ingress -n myapp
kubectl get certificate -n myapp

# 5. Ver logs
kubectl logs -f -n myapp deployment/myapp

Actualizar imagen de deployment

# Opción 1: Imperativa
kubectl set image deployment/<name> <container>=<new-image>:<tag> -n <namespace>

# Opción 2: Patch
kubectl patch deployment <name> -n <namespace> \
  -p '{"spec":{"template":{"spec":{"containers":[{"name":"<container>","image":"<new-image>:<tag>"}]}}}}'

# Opción 3: Edit
kubectl edit deployment <name> -n <namespace>

Preview Environment (nuevo namespace temporal)

# 1. Crear namespace
kubectl create namespace preview-task-123

# 2. Label para cleanup automático
kubectl label namespace preview-task-123 environment=preview ttl=168h

# 3. Deploy app
kubectl apply -f app.yaml -n preview-task-123

# 4. Crear ingress
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: preview
  namespace: preview-task-123
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - task-123.r.fuq.tv
    secretName: preview-task-123-tls
  rules:
  - host: task-123.r.fuq.tv
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
EOF

# 5. Cleanup cuando termine
kubectl delete namespace preview-task-123

🛡️ Seguridad

Secrets Management

# Crear secret
kubectl create secret generic mysecret -n <namespace> \
  --from-literal=username=admin \
  --from-literal=password=xxx

# Ver secrets (no muestra valores)
kubectl get secrets -n <namespace>

# Ver secret value
kubectl get secret mysecret -n <namespace> -o jsonpath='{.data.password}' | base64 -d

RBAC

# Ver service accounts
kubectl get sa -A

# Ver roles
kubectl get roles -A
kubectl get clusterroles

# Ver bindings
kubectl get rolebindings -A
kubectl get clusterrolebindings

📊 Monitoring

Resource Usage

# Uso por nodo
kubectl top nodes

# Uso por pod
kubectl top pods -A

# Uso en namespace específico
kubectl top pods -n control-plane

Health Checks

# Componentes del sistema
kubectl get componentstatuses

# API server health
kubectl get --raw='/readyz?verbose'

# etcd health (desde control plane)
ssh root@108.165.47.233 "k3s kubectl get endpoints -n kube-system kube-apiserver"

🔄 GitOps con ArgoCD

Acceso

Crear Application

cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://git.fuq.tv/aiworker/myapp
    targetRevision: HEAD
    path: k8s
  destination:
    server: https://kubernetes.default.svc
    namespace: control-plane
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
EOF

📍 Información del Cluster

URLs de Servicios

Conexiones Internas

  • MariaDB: mariadb.control-plane.svc.cluster.local:3306
  • Gitea: gitea.gitea.svc.cluster.local:3000
  • ArgoCD API: argocd-server.argocd.svc.cluster.local:443

SSH a Nodos

# Control planes
ssh root@108.165.47.233  # k8s-cp-01
ssh root@108.165.47.235  # k8s-cp-02
ssh root@108.165.47.215  # k8s-cp-03

# Workers
ssh root@108.165.47.225  # k8s-worker-01
ssh root@108.165.47.224  # k8s-worker-02
ssh root@108.165.47.222  # k8s-worker-03

# Load balancers
ssh root@108.165.47.221  # k8s-lb-01
ssh root@108.165.47.203  # k8s-lb-02

🎯 Tareas Comunes para Agentes

1. Desplegar nueva versión de app

# Actualizar imagen
kubectl set image deployment/<name> <container>=<image>:<new-tag> -n <namespace>

# Verificar rollout
kubectl rollout status deployment/<name> -n <namespace>

# Si falla, rollback
kubectl rollout undo deployment/<name> -n <namespace>

2. Crear preview environment

# Namespace
kubectl create namespace preview-<task-id>

# Deploy
kubectl apply -f manifests/ -n preview-<task-id>

# Ingress
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: preview
  namespace: preview-<task-id>
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - <task-id>.r.fuq.tv
    secretName: preview-tls
  rules:
  - host: <task-id>.r.fuq.tv
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app
            port:
              number: 80
EOF

# Verificar URL
curl https://<task-id>.r.fuq.tv

3. Escalar aplicación

# Auto-scaling
kubectl autoscale deployment <name> -n <namespace> --cpu-percent=80 --min=2 --max=10

# Manual
kubectl scale deployment <name> -n <namespace> --replicas=5

4. Investigar problema

# 1. Ver estado general
kubectl get pods -n <namespace>

# 2. Describir pod con problema
kubectl describe pod <pod-name> -n <namespace>

# 3. Ver logs
kubectl logs <pod-name> -n <namespace> --tail=100

# 4. Ver eventos
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 5. Si es storage
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

# 6. Si es networking
kubectl get svc,endpoints -n <namespace>
kubectl get ingress -n <namespace>

5. Backup de configuración

# Exportar todos los recursos
kubectl get all,ingress,certificate,pvc -n <namespace> -o yaml > backup.yaml

# Backup específico
kubectl get deployment <name> -n <namespace> -o yaml > deployment-backup.yaml

🏗️ Estructura de Manifests

Template Completo

---
# Namespace
apiVersion: v1
kind: Namespace
metadata:
  name: myapp

---
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: myapp
data:
  NODE_ENV: "production"
  LOG_LEVEL: "info"

---
# Secret
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secret
  namespace: myapp
type: Opaque
stringData:
  api-key: "your-api-key"

---
# PVC (si necesita storage)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myapp-data
  namespace: myapp
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          valueFrom:
            configMapKeyRef:
              name: myapp-config
              key: NODE_ENV
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: myapp-secret
              key: api-key
        volumeMounts:
        - name: data
          mountPath: /data
        resources:
          requests:
            cpu: 250m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 2Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: myapp-data

---
# Service
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: myapp
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP

---
# Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  namespace: myapp
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - myapp.fuq.tv
    secretName: myapp-tls
  rules:
  - host: myapp.fuq.tv
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80

📚 Recursos de Referencia

Documentación del Proyecto

  • CLUSTER-CREDENTIALS.md - Credenciales y tokens
  • K8S-CLUSTER.md - Estado del cluster
  • docs/ - Documentación completa del proyecto

Comandos Útiles

# Ver todo en un namespace
kubectl get all -n <namespace>

# Aplicar un directorio completo
kubectl apply -f ./k8s/ -R

# Diff antes de aplicar
kubectl diff -f manifest.yaml

# Validar YAML
kubectl apply --dry-run=client -f manifest.yaml

# Formatear output
kubectl get pods -o wide
kubectl get pods -o json
kubectl get pods -o yaml

Quick Reference

Namespaces del Proyecto

  • control-plane - Backend, API, MySQL, Redis
  • agents - Claude Code agents
  • gitea - Git server
  • monitoring - Metrics, logs
  • argocd - GitOps

StorageClass

  • longhorn (default) - HA storage con 3 réplicas

ClusterIssuers

  • letsencrypt-prod - Certificados producción
  • letsencrypt-staging - Certificados testing

IngressClass

  • nginx - Usar para todos los Ingress

Con esta guía, cualquier agente IA puede operar el cluster de forma autónoma.