How to Backup and Restore ETCD ?

Backup ETCD

How to Backup and Restore ETCD ? This guide will show you how to backup and restore ETCD. If you encounter any issues please refer the Troubleshooting at the end of this guide.

First, we need to find CA and Server Certificates

cat /etc/kubernetes/manifest/etcd.yaml

Note 3 lines: –trusted-ca-file, –cert-file= , –key-file=

ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
snapshot save /opt/snapshot.db

Restore ETCD to a new folder

ETCDCTL_API=3 etcdctl  --data-dir /var/lib/etcd-backup \
snapshot restore /opt/snapshot.db

Modify etcd.yaml and point data-dir to the restored directory in the previous step: /var/lib/etcd-backup

Note: There are 3 values of data-dir. Please change all of them. If not you will encounter many problems.

vi /etc/kubernetes/manifests/etcd.yaml
Change the values below: 

--data-dir=/var/lib/etcd-backup

...
...
...
...
...
...
...

 volumeMounts:
    - mountPath: /var/lib/etcd-backup
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd/etcd-backup
systemctl daemon-reload
systemctl restart kubelet

Wait for 3 minutes to let ETCD static pod recreate

Let check the pod

Troubleshoting

No resources found on node

Try to reload the daemon and restart kubelet service

systemctl daemon-reload
systemctl restart kubelet

Check controlplane status after service restarted.

kubectl get node

NotReady status on node

Please may cause by the data-dir in etcd.yaml. Please make sure you change all of the data-dir values:

vi /etc/kubernetes/manifests/etcd.yaml
Change the values below: 

--data-dir=/var/lib/etcd-backup

...
...
...
...
...
...
...

volumeMounts:
- mountPath: /var/lib/etcd-backup
name: etcd-data
- mountPath: /etc/kubernetes/pki/etcd
name: etcd-certs
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd/etcd-backup

Pending status of all Pods

If the data-dir is correct but all pods are in the pending state or no pod is displayed, try to remove ETCD new data-dir

rm -rf /var/lib/etcd-backup

Re-run the restore process

ETCDCTL_API=3 etcdctl  --data-dir /var/lib/etcd-backup \
snapshot restore /opt/snapshot.db

Delete ETCD pod and check the ETCD pod status after restarting

kubectl delete pod -n kube-system etcd-controlplane

Check running pods

kubectl get pod --all-namespaces

Leave a Reply

Your email address will not be published.