From 10+ Kubernetes clusters to 4: a hub-and-spoke ArgoCD story

2026-06-27 · kubernetes · argocd · gitops · platform-engineering

We ran more than ten Kubernetes clusters across four clouds. Deploys were inconsistent, on-call was painful, and there was no single place to see what ran where. We consolidated into four clusters under one hub-and-spoke ArgoCD control plane. This is the reasoning, the topology, and the gotchas worth knowing before you try it.

The problem

10+ clusters across 4 clouds (DigitalOcean, AWS, GKE, Hetzner), each with its own deploy story.
No unified GitOps — some Helm-by-hand, some CI-push, some clickops.
Per-cluster observability — no federated view, slow incident triage.

The cost wasn't compute, it was cognitive load: every cluster was a slightly different snowflake, and nobody could hold all ten in their head.

The target: hub-and-spoke

One hub cluster runs the control plane; spoke (workload) clusters run only workloads plus thin agents and register to the hub. Everything between hub and spokes travels over mTLS.

            ┌────────────── hub ──────────────┐
            │ ArgoCD · Prometheus(fed) · Loki  │
            │ Tempo · Alertmanager · Vault     │
            └───┬───────────┬───────────┬──────┘
            mTLS│       mTLS │      mTLS │
          ┌─────▼───┐  ┌─────▼───┐ ┌─────▼───┐
          │ spoke A │  │ spoke B │ │ spoke C │
          └─────────┘  └─────────┘ └─────────┘

The hub holds: ArgoCD, a central Prometheus (federating from spokes), Loki, Tempo, Alertmanager, Vault, and a policy/scanning layer (Trivy Operator). Spokes stay deliberately boring.

How GitOps is structured

App-of-apps. One root Application points at a repo of Applications, so the entire fleet is described in Git and bootstraps itself.
ArgoCD Projects per environment/team — RBAC and guardrails (which repos, which destinations, which namespaces) live here, not in tribal knowledge.
Sync waves order the dependency chain: CRDs → operators → workloads. This single change removes most "resource type not found" flakes on a fresh cluster.
Secrets via Vault (agent injector / external-secrets). Nothing sensitive in Git, ever.

Gotchas we hit

Cross-cloud reachability. Hub↔spoke registration assumes a network path. Plan egress, peering and firewall rules before you register clusters, not after ArgoCD reports Unknown.
CRD ordering. Operators must exist before their custom resources. Sync waves fix this; without them, first-apply on a clean cluster is a coin toss.
Federated metrics cardinality. Federating everything centrally will melt your hub Prometheus. Federate selected aggregates, keep raw series local.
One hub = one blast radius. A single control plane is the whole point — and the whole risk. HA the hub across zones and back up ArgoCD + Vault as if production depends on them, because it does.

Results

One pane of glass to see and deploy everything; consistent, repeatable rollouts.
Lower operational overhead and faster triage with federated logs and metrics.
Clear separation of control plane vs workloads — spokes became replaceable.

This federated view is exactly the kind of thing KubeMeridian is built to surface — cluster topology and health across a fleet, from one Grafana app.

What I'd do differently

Introduce policy-as-code (Kyverno/OPA) from day one, not after the third "how did that get deployed".
Treat the hub's disaster recovery as a first-class, rehearsed runbook before go-live — not a wiki page written after the first scare.

Written by the engineer behind KubeMeridian · About · GitHub

From 10+ Kubernetes clusters to 4: a hub-and-spoke ArgoCD story ​

The problem ​

The target: hub-and-spoke ​

How GitOps is structured ​

Gotchas we hit ​

Results ​

What I'd do differently ​