# Guía para Agentes IA - Gestión del Cluster Kubernetes Este documento contiene toda la información necesaria para que agentes IA puedan gestionar y operar el cluster de Kubernetes de AiWorker. --- ## 🔑 Acceso al Cluster ### Kubeconfig ```bash export KUBECONFIG=~/.kube/aiworker-config ``` Todos los comandos kubectl deben usar: ```bash kubectl --kubeconfig ~/.kube/aiworker-config ``` O con el alias: ```bash alias k='kubectl --kubeconfig ~/.kube/aiworker-config' ``` --- ## 📋 Comandos Esenciales ### Verificación del Cluster ```bash # Estado de nodos kubectl get nodes -o wide # Todos los pods kubectl get pods -A # Pods por namespace kubectl get pods -n # Recursos del cluster kubectl top nodes kubectl top pods -A # Eventos recientes kubectl get events -A --sort-by='.lastTimestamp' | tail -20 ``` ### Gestión de Deployments ```bash # Ver deployments kubectl get deployments -A # Detalles de un deployment kubectl describe deployment -n # Escalar deployment kubectl scale deployment -n --replicas=3 # Restart deployment kubectl rollout restart deployment -n # Ver historial kubectl rollout history deployment -n # Rollback kubectl rollout undo deployment -n ``` ### Gestión de Pods ```bash # Ver logs kubectl logs -f -n # Logs de contenedor específico kubectl logs -f -c -n # Ejecutar comando en pod kubectl exec -n -- # Shell interactivo kubectl exec -it -n -- /bin/bash # Copiar archivos kubectl cp /:/path/to/file ./local-file kubectl cp ./local-file /:/path/to/file ``` ### Gestión de Services ```bash # Ver servicios kubectl get svc -A # Port-forward para testing kubectl port-forward -n svc/ 8080:80 # Endpoints de un servicio kubectl get endpoints -n ``` ### Ingress y TLS ```bash # Ver ingress kubectl get ingress -A # Ver certificados kubectl get certificate -A # Detalles de certificado kubectl describe certificate -n # Ver CertificateRequests kubectl get certificaterequest -A ``` ### Storage y PVCs ```bash # Ver PVCs kubectl get pvc -A # Ver PVs kubectl get pv # Longhorn volumes kubectl get volumes.longhorn.io -n longhorn-system # Réplicas de storage kubectl get replicas.longhorn.io -n longhorn-system ``` --- ## 📦 Desplegar Aplicaciones ### Crear Deployment Básico ```bash cat < backup.sql ``` **Restore:** ```bash cat backup.sql | kubectl exec -i -n control-plane mariadb-0 -- mariadb -uroot -pAiWorker2026_RootPass! ``` --- ## 🔧 Troubleshooting ### Pod no arranca ```bash # Ver eventos kubectl describe pod -n # Ver logs kubectl logs -n # Logs del contenedor anterior (si crasheó) kubectl logs -n --previous # Shell en pod fallido kubectl debug -it -n --image=busybox ``` ### Ingress no funciona ```bash # Verificar Ingress kubectl get ingress -n kubectl describe ingress -n # Ver logs de Nginx Ingress kubectl logs -n ingress-nginx deployment/ingress-nginx-controller --tail=100 # Verificar certificado kubectl get certificate -n kubectl describe certificate -n # Si TLS falla, ver CertificateRequest kubectl get certificaterequest -A ``` ### Storage/PVC issues ```bash # Ver PVC kubectl get pvc -n kubectl describe pvc -n # Ver Longhorn volumes kubectl get volumes.longhorn.io -n longhorn-system # Longhorn UI https://longhorn.fuq.tv (admin / aiworker2026) # Ver réplicas kubectl get replicas.longhorn.io -n longhorn-system ``` ### Nodo con problemas ```bash # Cordon (no asignar nuevos pods) kubectl cordon # Drain (mover pods a otros nodos) kubectl drain --ignore-daemonsets --delete-emptydir-data # Uncordon (volver a habilitar) kubectl uncordon ``` --- ## 🚀 Workflows Comunes ### Desplegar nueva aplicación completa ```bash # 1. Crear namespace si no existe kubectl create namespace myapp # 2. Crear secret si necesita kubectl create secret generic myapp-secret -n myapp \ --from-literal=api-key=xxx # 3. Aplicar manifests kubectl apply -f deployment.yaml kubectl apply -f service.yaml kubectl apply -f ingress.yaml # 4. Verificar kubectl get all -n myapp kubectl get ingress -n myapp kubectl get certificate -n myapp # 5. Ver logs kubectl logs -f -n myapp deployment/myapp ``` ### Actualizar imagen de deployment ```bash # Opción 1: Imperativa kubectl set image deployment/ =: -n # Opción 2: Patch kubectl patch deployment -n \ -p '{"spec":{"template":{"spec":{"containers":[{"name":"","image":":"}]}}}}' # Opción 3: Edit kubectl edit deployment -n ``` ### Preview Environment (nuevo namespace temporal) ```bash # 1. Crear namespace kubectl create namespace preview-task-123 # 2. Label para cleanup automático kubectl label namespace preview-task-123 environment=preview ttl=168h # 3. Deploy app kubectl apply -f app.yaml -n preview-task-123 # 4. Crear ingress cat < \ --from-literal=username=admin \ --from-literal=password=xxx # Ver secrets (no muestra valores) kubectl get secrets -n # Ver secret value kubectl get secret mysecret -n -o jsonpath='{.data.password}' | base64 -d ``` ### RBAC ```bash # Ver service accounts kubectl get sa -A # Ver roles kubectl get roles -A kubectl get clusterroles # Ver bindings kubectl get rolebindings -A kubectl get clusterrolebindings ``` --- ## 📊 Monitoring ### Resource Usage ```bash # Uso por nodo kubectl top nodes # Uso por pod kubectl top pods -A # Uso en namespace específico kubectl top pods -n control-plane ``` ### Health Checks ```bash # Componentes del sistema kubectl get componentstatuses # API server health kubectl get --raw='/readyz?verbose' # etcd health (desde control plane) ssh root@108.165.47.233 "k3s kubectl get endpoints -n kube-system kube-apiserver" ``` --- ## 🔄 GitOps con ArgoCD ### Acceso - **URL**: https://argocd.fuq.tv - **User**: admin - **Pass**: LyPF4Hy0wvp52IoU ### Crear Application ```bash cat < =: -n # Verificar rollout kubectl rollout status deployment/ -n # Si falla, rollback kubectl rollout undo deployment/ -n ``` ### 2. Crear preview environment ```bash # Namespace kubectl create namespace preview- # Deploy kubectl apply -f manifests/ -n preview- # Ingress kubectl apply -f - < annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: ingressClassName: nginx tls: - hosts: - .r.fuq.tv secretName: preview-tls rules: - host: .r.fuq.tv http: paths: - path: / pathType: Prefix backend: service: name: app port: number: 80 EOF # Verificar URL curl https://.r.fuq.tv ``` ### 3. Escalar aplicación ```bash # Auto-scaling kubectl autoscale deployment -n --cpu-percent=80 --min=2 --max=10 # Manual kubectl scale deployment -n --replicas=5 ``` ### 4. Investigar problema ```bash # 1. Ver estado general kubectl get pods -n # 2. Describir pod con problema kubectl describe pod -n # 3. Ver logs kubectl logs -n --tail=100 # 4. Ver eventos kubectl get events -n --sort-by='.lastTimestamp' # 5. Si es storage kubectl get pvc -n kubectl describe pvc -n # 6. Si es networking kubectl get svc,endpoints -n kubectl get ingress -n ``` ### 5. Backup de configuración ```bash # Exportar todos los recursos kubectl get all,ingress,certificate,pvc -n -o yaml > backup.yaml # Backup específico kubectl get deployment -n -o yaml > deployment-backup.yaml ``` --- ## 🏗️ Estructura de Manifests ### Template Completo ```yaml --- # Namespace apiVersion: v1 kind: Namespace metadata: name: myapp --- # ConfigMap apiVersion: v1 kind: ConfigMap metadata: name: myapp-config namespace: myapp data: NODE_ENV: "production" LOG_LEVEL: "info" --- # Secret apiVersion: v1 kind: Secret metadata: name: myapp-secret namespace: myapp type: Opaque stringData: api-key: "your-api-key" --- # PVC (si necesita storage) apiVersion: v1 kind: PersistentVolumeClaim metadata: name: myapp-data namespace: myapp spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 10Gi --- # Deployment apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: myapp spec: replicas: 2 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: app image: myapp:latest ports: - containerPort: 3000 env: - name: NODE_ENV valueFrom: configMapKeyRef: name: myapp-config key: NODE_ENV - name: API_KEY valueFrom: secretKeyRef: name: myapp-secret key: api-key volumeMounts: - name: data mountPath: /data resources: requests: cpu: 250m memory: 512Mi limits: cpu: 1 memory: 2Gi livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 10 periodSeconds: 5 volumes: - name: data persistentVolumeClaim: claimName: myapp-data --- # Service apiVersion: v1 kind: Service metadata: name: myapp namespace: myapp spec: selector: app: myapp ports: - port: 80 targetPort: 3000 type: ClusterIP --- # Ingress apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: myapp namespace: myapp annotations: cert-manager.io/cluster-issuer: "letsencrypt-prod" nginx.ingress.kubernetes.io/force-ssl-redirect: "true" spec: ingressClassName: nginx tls: - hosts: - myapp.fuq.tv secretName: myapp-tls rules: - host: myapp.fuq.tv http: paths: - path: / pathType: Prefix backend: service: name: myapp port: number: 80 ``` --- ## 📚 Recursos de Referencia ### Documentación del Proyecto - `CLUSTER-CREDENTIALS.md` - Credenciales y tokens - `K8S-CLUSTER.md` - Estado del cluster - `docs/` - Documentación completa del proyecto ### Comandos Útiles ```bash # Ver todo en un namespace kubectl get all -n # Aplicar un directorio completo kubectl apply -f ./k8s/ -R # Diff antes de aplicar kubectl diff -f manifest.yaml # Validar YAML kubectl apply --dry-run=client -f manifest.yaml # Formatear output kubectl get pods -o wide kubectl get pods -o json kubectl get pods -o yaml ``` --- ## ⚡ Quick Reference ### Namespaces del Proyecto - `control-plane` - Backend, API, MySQL, Redis - `agents` - Claude Code agents - `gitea` - Git server - `monitoring` - Metrics, logs - `argocd` - GitOps ### StorageClass - `longhorn` (default) - HA storage con 3 réplicas ### ClusterIssuers - `letsencrypt-prod` - Certificados producción - `letsencrypt-staging` - Certificados testing ### IngressClass - `nginx` - Usar para todos los Ingress --- **Con esta guía, cualquier agente IA puede operar el cluster de forma autónoma.**