We’ve covered off prepping and installing K8s on this blog a few different ways; with VM templates built manually, with cloud-init, and with ClusterAPI vSphere. Let’s say you’ve grown attached to some of the workloads you’re running on one of your clusters, naturally. It would be nice to backup and restore those should something go wrong - or even, as was my case, I deployed a distro of K8s on my Raspberry Pi cluster that I wasn’t wild about and wanted to move to another - how do you migrate those workloads?
Enter Velero. Velero (formerly Heptio Ark) is a backup, restore and DR orchestration application for your K8s workloads. In this post i’d like to take you through the installation and use of Velero, as well as some test backup and restores so you can kick the tyres on your own clusters and maybe give the team some feedback!
I’m assuming you have a K8s cluster up and running with a working storage system. I mean, otherwise you’d have nothing to back up. If not - check the blogs mentioned above to get one running.
If you just want to see it running - check out my VMWorld session and go to 19:30
I am using macOS, so will be using the
brew package manager to install and manage my tools, if you are using Linux or Windows, use the appropriate install guide for each tool, according to your OS.
For each tool I will list the
brew install command and the link to the install instructions for other OSes.
- git -
brew install git
- helm -
brew install kubernetes-helm
- kubectl -
brew install kubernetes-cli
Installation and Use Workflow
To get Velero running on our cluster there are a few steps we need to run through, at a high level (explaination on these components in a bit):
- Download and install the Velero CLI to our local machine
- Install Minio on our cluster for use as a backup repo
- Install Velero on our cluster
The Velero CLI isn’t strictly required but it handles a lot of the heavy lifting of creating Velero specific custom resources (CRDs) in K8s that you’d have to do manually otherwise, things like backup schedules and all that jazz.
The Velero CLI is pre-compiled and available for download on the Velero GitHub page, as stated before i’m running macOS so i’ll download and move the binary into my PATH (adjust this to suit your OS).
wget https://github.com/vmware-tanzu/velero/releases/download/v1.1.0/velero-v1.1.0-darwin-amd64.tar.gz tar -zxvf velero-v1.1.0-darwin-amd64.tar.gz mv velero-v1.1.0-darwin-amd64/velero /usr/local/bin/.
As long as
/usr/local/bin is in your PATH, you’ll be able to now run the CLI:
$ velero version Client: Version: v1.1.0 Git commit: a357f21aec6b39a8244dd23e469cc4519f1fe608 <error getting server version: the server could not find the requested resource (post serverstatusrequests.velero.io)>
The error is expected as we haven’t yet installed Velero into our cluster - but it shows that the CLI is working. An important thing to note is that when using the Velero CLI, it uses the currently active K8s cluster that’s in your terminal session.
Velero uses S3 API-compatible object storage as its backup location, that means to create a backup we need something that exposes and S3 API. Minio is a small, easy to deploy S3 object store you can run on-prem.
For this example, we’re going to run Minio on our K8s cluster, in production you’d want your S3 store somewhere else, for reasons that should be obvious.
To install Minio we’re going to use helm which is a package manager for K8s - this simplifies the installation down to creating a
yaml file for the configuration.
Let’s create the
yaml file for the setup of Minio with helm (a full list of variables can be found on the chart page in the repo):
$ cat minio.yaml image: tag: latest accessKey: "minio" secretKey: "minio123" service: type: LoadBalancer defaultBucket: enabled: true name: velero persistence: size: 50G
Stepping through this, it will deploy the latest version of Minio available, set the username and password to
minio123 respectively, expose the service using a
LoadBalancer (consequently, you’ll need a
LoadBalancer of some kind in your cluster - I recommend MetalLB for labs). Next up, we tell it to automatically create a bucket called
velero and to persist the data in a 50GB volume.
I’m assuming you have the file saved as
minio.yaml - so let’s now use helm to deploy this to our cluster.
helm install stable/minio --name minio --namespace infra -f minio.yaml
This installs Minio to your cluster, in a namespace called
infra and the helm deployment is given a name of
minio (otherwise you’ll get a randomly allocated name).
If we run the following, we’ll get the IP and Port that Minio will be accessible on outside the cluster - in my case the IP is
10.198.26.3 and is accessible on port
$ kubectl get service minio -n infra NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE minio LoadBalancer 10.110.94.210 10.198.26.3 9000:32549/TCP 127m
If you substitute your own details into the below and login using
minio123 you’ll see the Minio UI with the Velero bucket present.
Ta-da, an S3 compliant object store, running on K8s.
Velero can be installed either via a
helm chart or via the Velero CLI, my preferred method is to use the
helm chart as it means I can store the configuration in a
yaml file and deploy it repeatably without having to memorise commands.
If you want to deploy via the CLI, see the Velero documentation, we are going to use
Again, as with the Minio chart, the first step is to create the configuration
$ cat velero.yaml image: tag: v1.1.0 configuration: provider: aws backupStorageLocation: name: aws bucket: velero config: region: minio s3ForcePathStyle: true publicUrl: http://10.198.26.3:9000 s3Url: http://minio.infra.svc:9000 credentials: useSecret: true secretContents: cloud: | [default] aws_access_key_id = minio aws_secret_access_key = minio123 snapshotsEnabled: false configMaps: restic-restore-action-config: labels: velero.io/plugin-config: "" velero.io/restic: RestoreItemAction data: image: gcr.io/heptio-images/velero-restic-restore-helper:v1.1.0 deployRestic: true
So, it may look a little strange with the provider type
aws and such, but that is simply there to allow us to use the S3 backup target - notice that we just use the IP address and port of the Minio service we deployed in the previous step as the URL to send the backups to.
One thing i’d like to call out is the difference between
publicUrl is what the Velero CLI will communicate with when it needs to get things like logs and such, the
s3Url is what the Velero in-cluster process sends the data and logs to. In this case
s3Url is not publically accessible, it uses a Kubernetes in-cluster DNS record (
minio.infra.svc:9000) - this says, send the data to service
minio in namespace
infra and of type
service on port
s3Url is only resolvable within the K8s cluster, we must also specify the
publicUrl to allow the CLI to also interface with the assets in that object store.
The last line may be something you’re wondering about -
deployRestic tells Velero to deploy the
restic data mover to pull bits off the disk from inside the cluster, rather than relying on native snapshotting and diff capabilities and is required for vSphere installations.
With all that said, once you’ve adjusted the above to suit your environment (likely just
s3Url) you can deploy the helm chart.
helm install stable/velero --name velero --namespace velero -f velero.yaml
With Velero deployed to our cluster, we can now get to creating some backup schedules and test how it all works.
Deploying a Sample Application
As of Velero v1.1.0, CSI volumes are supported, meaning we can backup the contents of PVs on kubernetes clusters running CSI plugins, as well as the manifests that make up that app.
To test this out, let’s deploy an app - a Slack clone i’m awfully fond of called RocketChat - as usual, we’ll create the config
yaml file first:
$ cat rocketchat.yaml persistence: enabled: true service: type: LoadBalancer mongodb: mongodbPassword: password mongodbRootPassword: password
This will deploy RocketChat (which uses MongoDB as a database) to our cluster and expose it using another
LoadBalancer IP - again, ideally this would be done using an
Ingress Controller instead, but for simplicity - we’ll do it this way.
helm install stable/rocketchat --name rocketchat --namespace rocketchat -f rocketchat.yaml
If you watch the pods as this comes up, you should see the arbiter, the primary and then the secondary MongoDB nodes come up, following that - the RocketChat app itself will come up and at that point, will be accessible within the browser:
kubectl get pod -n rocketchat -w
Once all the pods show
1/1 - we can grab the LoadBalancer IP and port and access the app:
$ kubectl get svc -n rocketchat NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE rocketchat-mongodb ClusterIP 10.102.96.222 <none> 27017/TCP 3m34s rocketchat-mongodb-headless ClusterIP None <none> 27017/TCP 3m34s rocketchat-rocketchat LoadBalancer 10.106.105.16 10.198.26.4 80:30904/TCP 3m34s
So, to access this service, as with Minio - sub in your own IP into the following:
Go through the motions of creating a user account with whatever name and password you like until you get to the main page:
Navigate to the
#general channel and upload something or type in some text - this will be the data we want to protect with Velero!
Now, we can’t have that data going missing - i’m sure you’ll agree, so let’s back it up with Velero!
Backup and Restore with Velero
Now that we have an application, and data we want to protect - let’s tag the
PersistentVolumes so Velero will back them up. First - we need to find out what the volumes are called:
$ kubectl get pvc -n rocketchat NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-rocketchat-mongodb-primary-0 Bound pvc-dda3e972-e5fa-11e9-a30e-00505691513e 8Gi RWO space-efficient 23m datadir-rocketchat-mongodb-secondary-0 Bound pvc-ddb17d70-e5fa-11e9-a30e-00505691513e 8Gi RWO space-efficient 23m rocketchat-rocketchat Bound pvc-dd633f78-e5fa-11e9-a30e-00505691513e 8Gi RWO space-efficient 23m
The first word in the name of each PVC, is the name of the volume - so
rocketchat. Let’s tell Velero to backup those
datadir volumes by tagging the pods.
$ kubectl annotate pod -n rocketchat --selector=release=rocketchat,app=mongodb backup.velero.io/backup-volumes=datadir --overwrite pod/rocketchat-mongodb-arbiter-0 annotated pod/rocketchat-mongodb-primary-0 annotated pod/rocketchat-mongodb-secondary-0 annotated
The above command looks for all pods in the
rocketchat namespace with the tags
app=mongodb and annotates them with a label
backup.velero.io/backup-volumes=datadir - this tells Velero to backup the Persistent Volumes that are consumed with the name
Set up a Velero Schedule
Now that our app is set up to request Velero backups - let’s schedule some - in the below example, we are asking for a backup to be taken every hour and for them to be held for 24 hours each.
velero schedule create hourly --schedule="@every 1h" --ttl 24h0m0s
Let’s create another that runs daily and retains the backups for 7 days:
velero schedule create daily --schedule="@every 24h" --ttl 168h0m0s
If we query Velero, we can now see what schedules are set up:
$ velero get schedules NAME STATUS CREATED SCHEDULE BACKUP TTL LAST BACKUP SELECTOR daily Enabled 2019-10-03 17:57:43 +0100 BST @every 24h 168h0m0s 23s ago <none> hourly Enabled 2019-10-03 17:56:20 +0100 BST @every 1h 24h0m0s 1m ago <none>
Additionally, we can see they’ve already taken a backup each, we can query those backups with the following command:
$ velero get backups NAME STATUS CREATED EXPIRES STORAGE LOCATION SELECTOR daily-20191003165757 Completed 2019-10-03 17:58:33 +0100 BST 6d default <none> hourly-20191003165634 Completed 2019-10-03 17:56:34 +0100 BST 23h default <none>
If we wanted to take an ad-hoc backup that can be achieved through the following (in this case, we will only backup the
$ velero backup create before-disaster --include-namespaces rocketchat Backup request "before-disaster" submitted successfully. Run `velero backup describe before-disaster` or `velero backup logs before-disaster` for more details.
As the command says - we can query progress with the following:
velero backup describe before-disaster --details
--details option will show us the
restic backup status of the persistent volumes at the very bottom:
Restic Backups: Completed: rocketchat/rocketchat-mongodb-primary-0: datadir rocketchat/rocketchat-mongodb-secondary-0: datadir
And now if we go to Minio, in the velero bucket you will see the backups and their contents (they are all encrypted on disk by default):
Simulating a disaster
Now that we have a backup and some scheduled backups, let’s delete the rocketchat app - and all it’s data off disk and restore it using Velero.
helm delete --purge rocketchat
This will delete the RocketChat app - but because MongoDB uses a
StatefulSet, the data volumes will stick around - as you can see from the CNS UI:
We can delete these PVs by deleting the namespace too:
kubectl delete ns rocketchat
So, now all our data is truely gone - as evidenced by the CNS UI no longer showing any volumes for the
Restoring with Velero
Our app is dead, and the data is gone - so it’s time to restore it from one of the backups we took - i’ll use the ad-hoc one for ease of naming:
$ velero restore create --from-backup before-disaster --include-namespaces rocketchat Restore request "before-disaster-20191003181320" submitted successfully. Run `velero restore describe before-disaster-20191003181320` or `velero restore logs before-disaster-20191003181320` for more details.
Again - let’s monitor it with the command from above:
velero restore describe before-disaster-20191003181320 --details
Once the output of the command shows completed and the Restic Restores at the bottom are done, like below, we can check on our app:
Name: before-disaster-20191003181320 Namespace: velero Labels: <none> Annotations: <none> Phase: Completed Backup: before-disaster Namespaces: Included: rocketchat Excluded: <none> Resources: Included: * Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io Cluster-scoped: auto Namespace mappings: <none> Label selector: <none> Restore PVs: auto Restic Restores: Completed: rocketchat/rocketchat-mongodb-primary-0: datadir rocketchat/rocketchat-mongodb-secondary-0: datadir
Let’s see if the pods are back up and running, and our PVCs are restored in our namespace:
$ kubectl get po,pvc -n rocketchat NAME READY STATUS RESTARTS AGE pod/rocketchat-mongodb-arbiter-0 1/1 Running 0 3m5s pod/rocketchat-mongodb-primary-0 1/1 Running 0 3m5s pod/rocketchat-mongodb-secondary-0 1/1 Running 0 3m5s pod/rocketchat-rocketchat-7bdf95cb47-86q9t 1/1 Running 0 3m4s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/datadir-rocketchat-mongodb-primary-0 Bound pvc-1d90d0d7-e601-11e9-a30e-00505691513e 8Gi RWO space-efficient 3m5s persistentvolumeclaim/datadir-rocketchat-mongodb-secondary-0 Bound pvc-1d95abc7-e601-11e9-a30e-00505691513e 8Gi RWO space-efficient 3m5s persistentvolumeclaim/rocketchat-rocketchat Bound pvc-1d99ea27-e601-11e9-a30e-00505691513e 8Gi RWO space-efficient 3m5s
In the CNS UI - we’ll see the volumes again present - this time with some extra
velero labels against them:
And our app should once be again accessible and our data safe:
A tip on troubleshooting Velero backups - make liberal use of the
velero restore logs before-disaster-20191003181320
This is where the
publicUrl section from the very start matters - if you don’t have that populated, your logs won’t get displayed to you, so if you’re experiencing that, make sure you’ve defined that parameter.
The logs have a trove of information in them, so if Restic is having trouble pulling data from a volume or such, all that info is in there!
This brings us to the end of our look at Velero on vSphere - and in particular the integration with CSI. If you have feedback for the Velero team - please reach out on GitHub and file some issues, whether is enhancements, bugs - or if you just need help. Stay tuned for more K8s goodness in the near future!
Why not follow @mylesagray on Twitter for more like this!