477 lines
13 KiB
Markdown
477 lines
13 KiB
Markdown
# 05 — Création du cluster k3s avec hetzner-k3s
|
|
|
|
C'est le fichier central. Suivez chaque étape dans l'ordre.
|
|
|
|
---
|
|
|
|
## Qu'est-ce que hetzner-k3s ?
|
|
|
|
[hetzner-k3s](https://github.com/vitobotta/hetzner-k3s) est un outil CLI qui automatise la création d'un cluster k3s sur Hetzner Cloud. En une commande, il :
|
|
|
|
1. Crée les serveurs (control plane + workers)
|
|
2. Configure le réseau privé
|
|
3. Installe k3s sur tous les nœuds
|
|
4. Installe le Hetzner Cloud Controller Manager (provisionne LB + volumes depuis K8s)
|
|
5. Installe le Hetzner CSI Driver (PersistentVolumes sur Hetzner Volumes)
|
|
6. Configure le Cluster Autoscaler (scale automatique des workers)
|
|
7. Installe le System Upgrade Controller (upgrades k3s automatiques)
|
|
8. Configure kubectl localement
|
|
|
|
---
|
|
|
|
## Fichier de configuration du cluster
|
|
|
|
Créez le fichier `cluster.yaml` à la racine du projet ou dans un dossier sécurisé (jamais dans le repo Git) :
|
|
|
|
```bash
|
|
mkdir -p ~/.xpeditis
|
|
cat > ~/.xpeditis/cluster.yaml << 'EOF'
|
|
# ============================================================
|
|
# Xpeditis Production Cluster — hetzner-k3s configuration
|
|
# ============================================================
|
|
|
|
# Token API Hetzner (garder secret)
|
|
hetzner_token: "<VOTRE_HCLOUD_TOKEN>"
|
|
|
|
# Nom du cluster
|
|
cluster_name: xpeditis-prod
|
|
|
|
# Chemin du kubeconfig qui sera généré
|
|
kubeconfig_path: "~/.kube/kubeconfig-xpeditis-prod"
|
|
|
|
# Version k3s
|
|
# Vérifier la dernière stable sur https://github.com/k3s-io/k3s/releases
|
|
k3s_version: v1.30.4+k3s1
|
|
|
|
# Clés SSH
|
|
public_ssh_key_path: "~/.ssh/xpeditis_hetzner.pub"
|
|
private_ssh_key_path: "~/.ssh/xpeditis_hetzner"
|
|
use_ssh_agent: false
|
|
ssh_port: 22
|
|
|
|
# Réseaux autorisés pour SSH et API Kubernetes
|
|
# Remplacer par votre IP fixe pour plus de sécurité
|
|
ssh_allowed_networks:
|
|
- "<VOTRE_IP>/32"
|
|
|
|
api_allowed_networks:
|
|
- "<VOTRE_IP>/32"
|
|
|
|
# Réseau privé Hetzner
|
|
# Créé dans le doc 03-hetzner-setup.md
|
|
existing_network: "xpeditis-network"
|
|
private_network_subnet: 10.0.0.0/16
|
|
|
|
# CIDRs Kubernetes (ne pas changer sauf conflit)
|
|
cluster_cidr: 10.244.0.0/16
|
|
service_cidr: 10.96.0.0/16
|
|
cluster_dns: 10.96.0.10
|
|
|
|
# Image OS
|
|
image: ubuntu-24.04
|
|
snapshot_os: ubuntu
|
|
|
|
# Datacenter (même région que l'Object Storage)
|
|
location: fsn1
|
|
|
|
# k3s options
|
|
disable_flannel: false # Flannel CNI (par défaut dans k3s)
|
|
schedule_workloads_on_masters: false # Masters dédiés au control plane
|
|
|
|
# Packages additionnels installés sur chaque nœud
|
|
additional_packages:
|
|
- curl
|
|
- jq
|
|
- htop
|
|
- fail2ban # Protection brute force SSH
|
|
|
|
# Commandes post-création sur chaque nœud
|
|
post_create_commands:
|
|
- apt-get update -qq
|
|
- apt-get install -y -qq fail2ban
|
|
- systemctl enable fail2ban
|
|
- systemctl start fail2ban
|
|
- |
|
|
cat >> /etc/fail2ban/jail.local << 'FAIL2BAN'
|
|
[sshd]
|
|
enabled = true
|
|
maxretry = 3
|
|
bantime = 3600
|
|
FAIL2BAN
|
|
- systemctl restart fail2ban
|
|
|
|
# Helm charts installés automatiquement
|
|
cloud_controller_manager_manifest_url: "https://github.com/hetznercloud/hcloud-cloud-controller-manager/releases/download/v1.21.0/ccm-networks.yaml"
|
|
csi_driver_manifest_url: "https://raw.githubusercontent.com/hetznercloud/csi-driver/v2.8.0/deploy/kubernetes/hcloud-csi.yml"
|
|
|
|
# System Upgrade Controller (upgrades k3s automatiques)
|
|
system_upgrade_controller_install: true
|
|
system_upgrade_controller_manifest_url: "https://github.com/rancher/system-upgrade-controller/releases/download/v0.13.4/system-upgrade-controller.yaml"
|
|
|
|
# Cluster Autoscaler
|
|
cluster_autoscaler_install: true
|
|
cluster_autoscaler_version: "9.36.0"
|
|
cluster_autoscaler_image: "registry.k8s.io/autoscaling/cluster-autoscaler"
|
|
cluster_autoscaler_cmdline_args:
|
|
- --scan-interval=10s
|
|
- --scale-down-delay-after-add=5m
|
|
- --scale-down-unneeded-time=5m
|
|
- --max-nodes-total=12
|
|
|
|
# Metrics Server (pour HPA)
|
|
metrics_server_manifest_url: "https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml"
|
|
|
|
# kube-apiserver extra args (sécurité)
|
|
kube_api_server_args:
|
|
- "--audit-log-path=/var/log/kubernetes/audit.log"
|
|
- "--audit-log-maxage=30"
|
|
- "--audit-log-maxbackup=3"
|
|
- "--audit-log-maxsize=100"
|
|
|
|
# kubelet extra args
|
|
kubelet_args:
|
|
- "--max-pods=110"
|
|
- "--system-reserved=cpu=200m,memory=200Mi"
|
|
- "--kube-reserved=cpu=200m,memory=200Mi"
|
|
|
|
# ============================================================
|
|
# CONTROL PLANE
|
|
# ============================================================
|
|
masters:
|
|
instance_type: cx22 # 2 vCPU, 4 GB
|
|
instance_count: 1 # Passer à 3 pour HA (10 000 users)
|
|
location: fsn1
|
|
image: ~ # Utilise l'image globale
|
|
|
|
# ============================================================
|
|
# WORKER NODE POOLS
|
|
# ============================================================
|
|
worker_node_pools:
|
|
- name: app-workers
|
|
instance_type: cx32 # 4 vCPU, 8 GB (MVP)
|
|
instance_count: 2 # Min pods
|
|
location: fsn1
|
|
image: ~
|
|
additional_packages: ~
|
|
post_create_commands: ~
|
|
taints: []
|
|
labels:
|
|
- "xpeditis.io/node-role=app"
|
|
autoscaling:
|
|
enabled: true
|
|
min_instances: 2 # Minimum pour HA
|
|
max_instances: 6 # Max pour limiter les coûts
|
|
EOF
|
|
```
|
|
|
|
> **Pour le palier 1 000 users**, changez `cx32` → `cx42` et `max_instances: 8`
|
|
> **Pour le palier 10 000 users**, changez `cx42` → `cx52`, `instance_count: 4`, `max_instances: 12`, et `masters.instance_count: 3`
|
|
|
|
---
|
|
|
|
## Création du cluster
|
|
|
|
```bash
|
|
# Vérifier la configuration
|
|
hetzner-k3s validate --config ~/.xpeditis/cluster.yaml
|
|
|
|
# Créer le cluster (prend 5-10 minutes)
|
|
hetzner-k3s create --config ~/.xpeditis/cluster.yaml
|
|
|
|
# Output attendu :
|
|
# Creating infrastructure...
|
|
# Creating network...
|
|
# Creating SSH key...
|
|
# Creating firewall...
|
|
# Creating placement group...
|
|
# Creating load balancer...
|
|
# Creating masters...
|
|
# Waiting for masters to be ready...
|
|
# Creating worker pools...
|
|
# Waiting for workers to be ready...
|
|
# Installing k3s on masters...
|
|
# Installing k3s on workers...
|
|
# Installing Hetzner CCM...
|
|
# Installing Hetzner CSI...
|
|
# Installing Cluster Autoscaler...
|
|
# Installing System Upgrade Controller...
|
|
# Installing Metrics Server...
|
|
# Configuring kubeconfig...
|
|
# ✅ Cluster xpeditis-prod created successfully!
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration de kubectl
|
|
|
|
```bash
|
|
# Définir le KUBECONFIG
|
|
export KUBECONFIG=~/.kube/kubeconfig-xpeditis-prod
|
|
|
|
# Ajouter au .zshrc ou .bashrc pour persistance
|
|
echo 'export KUBECONFIG=~/.kube/kubeconfig-xpeditis-prod' >> ~/.zshrc
|
|
|
|
# Vérifier la connexion au cluster
|
|
kubectl cluster-info
|
|
# Kubernetes control plane is running at https://<IP>:6443
|
|
# CoreDNS is running at https://<IP>:6443/api/v1/...
|
|
|
|
# Lister les nœuds
|
|
kubectl get nodes -o wide
|
|
# NAME STATUS ROLES AGE VERSION
|
|
# xpeditis-prod-cx22-master-1 Ready control-plane,master 5m v1.30.4+k3s1
|
|
# xpeditis-prod-cx32-worker-1 Ready <none> 4m v1.30.4+k3s1
|
|
# xpeditis-prod-cx32-worker-2 Ready <none> 4m v1.30.4+k3s1
|
|
|
|
# Vérifier tous les pods système
|
|
kubectl get pods --all-namespaces
|
|
# Tous les pods doivent être Running
|
|
```
|
|
|
|
---
|
|
|
|
## Vérification du Hetzner Cloud Controller Manager
|
|
|
|
Le CCM permet à Kubernetes de provisionner des ressources Hetzner (LB, volumes) :
|
|
|
|
```bash
|
|
# Vérifier que le CCM tourne
|
|
kubectl get pods -n kube-system | grep hcloud
|
|
|
|
# Vérifier que les nœuds ont le label de région
|
|
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.topology\.kubernetes\.io/zone}{"\n"}{end}'
|
|
# xpeditis-prod-cx22-master-1 fsn1
|
|
# xpeditis-prod-cx32-worker-1 fsn1
|
|
# xpeditis-prod-cx32-worker-2 fsn1
|
|
```
|
|
|
|
---
|
|
|
|
## Vérification du Hetzner CSI Driver
|
|
|
|
```bash
|
|
# Le CSI driver permet de créer des PersistentVolumes sur Hetzner
|
|
kubectl get pods -n kube-system | grep hcloud-csi
|
|
|
|
# Vérifier les StorageClasses disponibles
|
|
kubectl get storageclass
|
|
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
|
|
# hcloud-volumes (default) csi.hetzner.cloud Delete WaitForFirstConsumer
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration de Traefik (Ingress Controller)
|
|
|
|
k3s installe Traefik par défaut. Nous devons le configurer pour :
|
|
1. Redirection HTTP → HTTPS
|
|
2. Support WebSocket (Socket.IO)
|
|
3. Sticky sessions pour le backend
|
|
|
|
```bash
|
|
# Créer le fichier de configuration Traefik
|
|
cat > /tmp/traefik-config.yaml << 'EOF'
|
|
apiVersion: helm.cattle.io/v1
|
|
kind: HelmChartConfig
|
|
metadata:
|
|
name: traefik
|
|
namespace: kube-system
|
|
spec:
|
|
valuesContent: |-
|
|
# Logs
|
|
logs:
|
|
general:
|
|
level: INFO
|
|
access:
|
|
enabled: true
|
|
|
|
# Ports
|
|
ports:
|
|
web:
|
|
port: 8000
|
|
redirectTo:
|
|
port: websecure # Force HTTPS
|
|
websecure:
|
|
port: 8443
|
|
tls:
|
|
enabled: true
|
|
|
|
# Sticky sessions pour WebSocket
|
|
service:
|
|
spec:
|
|
externalTrafficPolicy: Local
|
|
|
|
# Annotations pour le Load Balancer Hetzner
|
|
service:
|
|
annotations:
|
|
load-balancer.hetzner.cloud/name: "xpeditis-lb"
|
|
load-balancer.hetzner.cloud/location: "fsn1"
|
|
load-balancer.hetzner.cloud/health-check-interval: "15s"
|
|
load-balancer.hetzner.cloud/health-check-timeout: "10s"
|
|
load-balancer.hetzner.cloud/health-check-retries: "3"
|
|
load-balancer.hetzner.cloud/use-private-ip: "true"
|
|
|
|
# Ressources
|
|
resources:
|
|
requests:
|
|
cpu: 100m
|
|
memory: 128Mi
|
|
limits:
|
|
cpu: 500m
|
|
memory: 256Mi
|
|
|
|
# Replicas (1 suffit pour MVP)
|
|
deployment:
|
|
replicas: 1
|
|
|
|
# Providers supplémentaires
|
|
providers:
|
|
kubernetesCRD:
|
|
enabled: true
|
|
allowCrossNamespace: true
|
|
kubernetesIngress:
|
|
enabled: true
|
|
publishedService:
|
|
enabled: true
|
|
EOF
|
|
|
|
kubectl apply -f /tmp/traefik-config.yaml
|
|
|
|
# Attendre que Traefik soit mis à jour
|
|
kubectl rollout status deployment/traefik -n kube-system --timeout=120s
|
|
```
|
|
|
|
---
|
|
|
|
## Installation de cert-manager
|
|
|
|
cert-manager gère les certificats TLS automatiquement via Let's Encrypt :
|
|
|
|
```bash
|
|
# Ajouter le repo Helm cert-manager
|
|
helm repo add jetstack https://charts.jetstack.io
|
|
helm repo update
|
|
|
|
# Installer cert-manager
|
|
helm install cert-manager jetstack/cert-manager \
|
|
--namespace cert-manager \
|
|
--create-namespace \
|
|
--version v1.15.3 \
|
|
--set installCRDs=true \
|
|
--set resources.requests.cpu=50m \
|
|
--set resources.requests.memory=64Mi \
|
|
--set webhook.resources.requests.cpu=50m \
|
|
--set webhook.resources.requests.memory=32Mi
|
|
|
|
# Attendre que cert-manager soit prêt
|
|
kubectl wait --for=condition=Ready pod \
|
|
--selector=app.kubernetes.io/instance=cert-manager \
|
|
-n cert-manager \
|
|
--timeout=120s
|
|
|
|
# Vérification
|
|
kubectl get pods -n cert-manager
|
|
# NAME READY STATUS
|
|
# cert-manager-7f9f87595d-xxx 1/1 Running
|
|
# cert-manager-cainjector-54db9f97d8-xxx 1/1 Running
|
|
# cert-manager-webhook-8698c586b7-xxx 1/1 Running
|
|
```
|
|
|
|
---
|
|
|
|
## ClusterIssuers Let's Encrypt
|
|
|
|
```bash
|
|
# Créer les issuers (staging pour test, prod pour production)
|
|
cat > /tmp/cluster-issuers.yaml << 'EOF'
|
|
---
|
|
# STAGING — Pour tester sans risquer le rate limit
|
|
apiVersion: cert-manager.io/v1
|
|
kind: ClusterIssuer
|
|
metadata:
|
|
name: letsencrypt-staging
|
|
spec:
|
|
acme:
|
|
server: https://acme-staging-v02.api.letsencrypt.org/directory
|
|
email: admin@xpeditis.com # ← Remplacer
|
|
privateKeySecretRef:
|
|
name: letsencrypt-staging-key
|
|
solvers:
|
|
- http01:
|
|
ingress:
|
|
class: traefik
|
|
---
|
|
# PRODUCTION — Certificats réels (max 5 renouvellements/semaine)
|
|
apiVersion: cert-manager.io/v1
|
|
kind: ClusterIssuer
|
|
metadata:
|
|
name: letsencrypt-prod
|
|
spec:
|
|
acme:
|
|
server: https://acme-v02.api.letsencrypt.org/directory
|
|
email: admin@xpeditis.com # ← Remplacer
|
|
privateKeySecretRef:
|
|
name: letsencrypt-prod-key
|
|
solvers:
|
|
- http01:
|
|
ingress:
|
|
class: traefik
|
|
EOF
|
|
|
|
kubectl apply -f /tmp/cluster-issuers.yaml
|
|
|
|
# Vérifier les issuers
|
|
kubectl get clusterissuers
|
|
# NAME READY AGE
|
|
# letsencrypt-staging True 30s
|
|
# letsencrypt-prod True 30s
|
|
```
|
|
|
|
---
|
|
|
|
## Récapitulatif : état du cluster après cette étape
|
|
|
|
```bash
|
|
# Vue d'ensemble complète
|
|
kubectl get nodes
|
|
kubectl get pods --all-namespaces --field-selector=status.phase!=Running
|
|
|
|
# Doit afficher :
|
|
# kube-system traefik-* Running
|
|
# kube-system hcloud-cloud-controller Running
|
|
# kube-system hcloud-csi-* Running
|
|
# kube-system coredns-* Running
|
|
# kube-system metrics-server-* Running
|
|
# cert-manager cert-manager-* Running
|
|
|
|
echo "✅ Cluster prêt pour le déploiement de l'application"
|
|
```
|
|
|
|
---
|
|
|
|
## Opérations sur le cluster
|
|
|
|
### Ajouter un nœud worker manuellement
|
|
|
|
```bash
|
|
# Modifier le fichier cluster.yaml
|
|
# Changer instance_count de 2 → 3 dans worker_node_pools
|
|
hetzner-k3s apply --config ~/.xpeditis/cluster.yaml
|
|
```
|
|
|
|
### Supprimer le cluster (⚠️ irréversible)
|
|
|
|
```bash
|
|
hetzner-k3s delete --config ~/.xpeditis/cluster.yaml
|
|
```
|
|
|
|
### Lister les composants Hetzner créés
|
|
|
|
```bash
|
|
hcloud server list
|
|
hcloud load-balancer list
|
|
hcloud network list
|
|
hcloud firewall list
|
|
hcloud placement-group list
|
|
```
|