Foundation

The roadmap for making the mson group's shared infrastructure ready for its first workloads

Foundation makes the mson group’s shared infrastructure ready for its first workloads. It is complete when eve-skill-farm runs in the lab Kubernetes cluster.

Each phase below is a GitLab group milestone, where its status and progress live. This page records the dependency structure and the sequencing decisions behind it.

Phases

  • Phase 0: Close open loops finishes in-flight work across projects so later phases start from a clean base.
  • Phase 1: CI capacity delivers fast pipelines with security scan results in merge requests. It covers the AWS landing zone runner-critical path and ARM spot runners. The landing zone path includes org topology, Identity Center, bootstrap and state, OIDC CI, KMS, and IaC scanning.
  • Phase 2: Group management via tofu evaluates OpenTofu with the GitLab provider for declarative management of the group, decides bart’s role, and promotes the standard label set to group labels.
  • Phase 3: Group standards docs documents group-wide standards and security conventions on this site.
  • Phase 4: Lab Kubernetes brings up the Talos cluster with GitOps in the lab and migrates the runner manager into it.
  • Phase 5: Workload enablement adds minimal platform services (ingress, storage, secrets, DNS) and deploys the first workload as the exit proof.

Dependency Structure

Phase 0 precedes everything. Phase 1 is the critical path to fast CI. Phases 2 and 3 run parallel to Phase 1. Phase 4’s cluster work is independent of Phases 1 through 3, though the runner-manager migration inside it assumes the Phase 1 manager exists. Phase 5 depends on Phase 4 and completes Foundation.

Phase 0: Close open loops
    |
    +--> Phase 1: CI capacity (critical path to fast CI)
    |
    +--> Phase 2: Group management via tofu  (parallel)
    |
    +--> Phase 3: Group standards docs       (parallel)
    |
    +--> Phase 4: Lab Kubernetes  <... Phase 1 (manager migration only)
             |
             v
         Phase 5: Workload enablement (Foundation exit)

Sequencing Rationale

Run the runner manager on a VM until the lab cluster exists

Fast pipelines shouldn’t wait on lab Kubernetes (Phase 4), so the runner manager runs on a Proxmox VM in the meantime. The manager is a small always-on process with no cluster requirements. The durable pieces (runner config.toml, AWS IAM role, AMI bake, job routing) carry over to Kubernetes unchanged; only VM provisioning is replaced. The migration is tracked in mson/runners#5.

Group milestones hold only the cross-project critical path

GitLab allows one milestone per issue, so the split between group and project scope is explicit: project milestones keep the full local scope. The aws-landing-zone “Tier 1: foundation” milestone keeps guardrail work (SCPs, CloudTrail, budgets, tags) off Foundation’s critical path. The lab “Kubernetes Platform” milestone keeps platform services beyond the Phase 5 minimum (cert-manager, observability, backup).

Freeze bart feature work during the Phase 2 evaluation

New settings surfaces would be provider resource blocks in OpenTofu, so building them in bart first would duplicate the work. The evaluation is tracked in mson/bart#39.

Phase 5 exists because a bootstrapped cluster is not a workload-ready platform

The exit proof is a deploy (mson/eve-skill-farm#120), which forces ingress, storage, secrets, and DNS to actually work.

After Foundation

eve-skill-farm development resumes, with its own milestones carrying that roadmap. A cloud workload runtime may get its own roadmap later; it is not part of Foundation.