Files

Hector Ros db71705842 Complete documentation for future sessions

- CLAUDE.md for AI agents to understand the codebase
- GITEA-GUIDE.md centralizes all Gitea operations (API, Registry, Auth)
- DEVELOPMENT-WORKFLOW.md explains complete dev process
- ROADMAP.md, NEXT-SESSION.md for planning
- QUICK-REFERENCE.md, TROUBLESHOOTING.md for daily use
- 40+ detailed docs in /docs folder
- Backend as submodule from Gitea

Everything documented for autonomous operation.

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

2026-01-20 00:37:19 +01:00

8.3 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AiWorker is an AI agent orchestration platform that uses Claude Code agents running in Kubernetes pods to autonomously complete development tasks. The system manages a full workflow from task creation to production deployment.

Core Flow: Task → Agent (via MCP) → Code → PR → Preview Deploy → Approval → Staging → Production

Current Status: Infrastructure complete (K8s HA cluster), backend initialized (20% done), frontend and agents pending.

Architecture

Three-Tier System

Infrastructure Layer: K3s HA cluster (8 VPS servers in Houston)
- 3 control planes with etcd HA
- 3 workers with Longhorn distributed storage (3 replicas)
- 2 HAProxy load balancers for HTTP/HTTPS
- Private network (10.100.0.0/24) for inter-node communication
Platform Layer: MariaDB, Redis, Gitea, ArgoCD
- MariaDB 11.4 LTS with HA storage (database: aiworker)
- Gitea 1.25.3 with built-in container registry
- Gitea Actions for CI/CD (runner in K8s)
- TLS automatic via Cert-Manager + Let's Encrypt
Application Layer: Backend (Bun), Frontend (React), Agents (Claude Code pods)
- Backend uses Bun.serve() native API (NOT Express despite dependency)
- Drizzle ORM with auto-migrations on startup
- MCP protocol for agent communication

Data Model (Drizzle schema in `backend/src/db/schema.ts`)

projects: User projects linked to Gitea repos and K8s namespaces
agents: Claude Code pods running in K8s (status: idle/busy/error/offline)
tasks: Development tasks with state machine (backlog → in_progress → needs_input → ready_to_test → approved → staging → production)

Relations: projects → many tasks, tasks → one agent, agents → one current task

Development Commands

Backend (Bun 1.3.6)

cd backend

# Development with hot-reload
bun run dev

# Start production
bun run start

# Database migrations
bun run db:generate  # Generate new migration from schema changes
bun run db:migrate   # Apply migrations (also runs on app startup)
bun run db:studio    # Visual database explorer

# Code quality
bun run lint
bun run format

IMPORTANT: Use Bun native APIs:

Bun.serve() for HTTP server (NOT Express)
Bun.sql() or mysql2 for MariaDB (decision pending)
Native WebSocket support in Bun.serve()
.env is auto-loaded by Bun

Kubernetes Operations

# Set kubeconfig (ALWAYS required)
export KUBECONFIG=~/.kube/aiworker-config

# Cluster status
kubectl get nodes
kubectl get pods -A

# Deploy to K8s
kubectl apply -f k8s/backend/
kubectl apply -f k8s/frontend/

# Logs
kubectl logs -f -n control-plane deployment/backend
kubectl logs -n gitea gitea-0
kubectl logs -n gitea-actions deployment/gitea-runner -c runner

CI/CD Workflow

Push to main branch triggers automatic build:

Git push → Gitea receives webhook
Gitea Actions Runner (in K8s) picks up job
Docker build inside runner pod (DinD)
Push to git.fuq.tv/admin/<repo>:latest
View progress: https://git.fuq.tv/admin/aiworker-backend/actions

Registry format: git.fuq.tv/<owner>/<package>:<tag>

Critical Architecture Details

Database Migrations

Migrations run automatically on app startup in src/index.ts:

await runMigrations()  // First thing on startup
await testConnection()

Never manually port-forward to run migrations. The app handles this in production when pods start.

Bun.serve() Routing Pattern

Unlike Express, Bun.serve() uses a single fetch(req) function:

Bun.serve({
  async fetch(req) {
    const url = new URL(req.url)

    if (url.pathname === '/api/health') {
      return Response.json({ status: 'ok' })
    }

    if (url.pathname.startsWith('/api/projects')) {
      return handleProjectRoutes(req, url)
    }

    return new Response('Not Found', { status: 404 })
  }
})

Route handlers should be organized in src/api/routes/ and imported into main fetch.

MCP Communication Flow

Agents communicate via Model Context Protocol:

Agent calls MCP tool (e.g., get_next_task)
Backend MCP server (port 3100) handles request
Backend queries database, performs actions
Returns result to agent
Agent continues work autonomously

MCP tools to implement (see docs/05-agents/mcp-tools.md):

get_next_task, update_task_status, ask_user_question, create_branch, create_pull_request, trigger_preview_deploy

Preview Environments

Each task gets isolated namespace: preview-task-{taskId}

Auto-deploy on PR creation
Accessible at task-{shortId}.r.fuq.tv
Auto-cleanup after 7 days (TTL label)

Key Environment Variables

Backend (.env file):

# Database (MariaDB in K8s)
DB_HOST=mariadb.control-plane.svc.cluster.local
DB_USER=aiworker
DB_PASSWORD=AiWorker2026_UserPass!
DB_NAME=aiworker

# Redis
REDIS_HOST=redis.control-plane.svc.cluster.local

# Gitea
GITEA_URL=https://git.fuq.tv
GITEA_TOKEN=159a5de2a16d15f33e388b55b1276e431dbca3f3

# Kubernetes
K8S_IN_CLUSTER=false  # true when running in K8s
K8S_CONFIG_PATH=~/.kube/aiworker-config

Local development: Port-forward services from K8s

kubectl port-forward -n control-plane svc/mariadb 3306:3306 &
kubectl port-forward -n control-plane svc/redis 6379:6379 &

Important Constraints

Storage HA Strategy

All stateful data uses Longhorn with 3 replicas for high availability:

MariaDB PVC: 20Gi replicated across 3 workers
Gitea PVC: 50Gi replicated across 3 workers
Can tolerate 2 worker node failures without data loss

DNS and Domains

All services use *.fuq.tv with DNS round-robin pointing to 2 load balancers:

api.fuq.tv → Backend API
app.fuq.tv → Frontend dashboard
git.fuq.tv → Gitea
*.r.fuq.tv → Preview environments (e.g., task-abc.r.fuq.tv)

Load balancers (108.165.47.221, 108.165.47.203) run HAProxy balancing to worker NodePorts.

Namespace Organization

control-plane: Backend API, MariaDB, Redis
agents: Claude Code agent pods
gitea: Git server
gitea-actions: CI/CD runner with Docker-in-Docker
preview-*: Temporary namespaces for preview deployments

Documentation Structure

Extensive documentation in /docs (40+ files):

Start here: ROADMAP.md, NEXT-SESSION.md, QUICK-REFERENCE.md
Infrastructure: CLUSTER-READY.md, AGENT-GUIDE.md, TROUBLESHOOTING.md
Gitea: GITEA-GUIDE.md - Complete guide for Git, Registry, API, CI/CD, and webhooks
Detailed: docs/01-arquitectura/ through docs/06-deployment/

For agent AI operations: Read AGENT-GUIDE.md - contains all kubectl commands and workflows needed to manage the cluster autonomously.

For Gitea operations: Read GITEA-GUIDE.md - complete API usage, registry, tokens, webhooks, and CI/CD setup.

For credentials: See CLUSTER-CREDENTIALS.md (not in git, local only)

Next Development Steps

Current phase: Backend API implementation (see NEXT-SESSION.md for detailed checklist)

Priority order:

Verify CI/CD build successful → image in registry
Implement REST API routes (/api/projects, /api/tasks, /api/agents)
Implement MCP Server (port 3100) for agent communication
Integrate Gitea API client (repos, PRs, webhooks)
Integrate Kubernetes client (create namespaces, deployments, ingress)
Deploy backend to K8s at api.fuq.tv

Frontend and agents come after backend is functional.

External References

Lucia Auth (for React frontend): https://github.com/lucia-auth/lucia
Vercel Agent Skills (for React frontend): https://github.com/vercel-labs/agent-skills
Gitea API: https://git.fuq.tv/api/swagger
MCP SDK: @modelcontextprotocol/sdk documentation

Deployment Flow

Backend Deployment

Code change → Git push → Gitea Actions → Docker build → Push to git.fuq.tv → ArgoCD sync → K8s deploy

Agent Deployment

Backend creates pod → Agent starts → Registers via MCP → Polls for tasks → Works autonomously → Reports back

Preview Deployment

Agent completes task → Create PR → Trigger preview → K8s namespace created → Deploy at task-{id}.r.fuq.tv → User tests

Read NEXT-SESSION.md for detailed next steps. All credentials and cluster access info in QUICK-REFERENCE.md.

8.3 KiB Raw Blame History