Skip to content

Instantly share code, notes, and snippets.

@davidliyutong
Forked from deefdragon/migrate.configmap.yaml
Created November 23, 2022 13:27
Show Gist options
  • Save davidliyutong/21e76ca49fa1350648dd539de66e2a99 to your computer and use it in GitHub Desktop.
Save davidliyutong/21e76ca49fa1350648dd539de66e2a99 to your computer and use it in GitHub Desktop.
Migrating Kubernetes PVC/PVs from one storage class to another

How to Migrate Between Kubernetes Storage Classes

Context

I recently found myself in the prediciment of needing to migrate data between storage classes. I upgraded my home lab to include a shiny new database server to help aleviate the trafic spikes that I encounter when the site gets hit hard by some of the streamers who use it (PLUG: if you need an instant runoff or first past the post polling site with live results, check me out at streem.tech). I was migrating from a VM based NFS server, where all the data was piped to the same physical drive that the k8s nodes were on, to a new truenas server based on a Ryzen 3600, 64 GB of ram, 4x 1TB NVME drives, and a dozen HDDs in a disk shelf. The goal was to be able to have the data migrated over as quickly as possible, however, given the circumstances, there was an expectation and acceptance of downtime. Tho outside its scope, discussion of how to potentially avoid downtime is the final section of this gist.

Tools

This gist is going to be working with a number of tools, but the heart of the operation is Velero, the backup and restore tool. Velero itself will back up most of the metadata of the cluster, but it does not itself backup the data in the PVs. To do this, it reslies on Restic, another tool. We will not be interacting with it directly, but it is worth mentioning if you wish to dig deeper. On top of those, some form of S3 storage is required for Velero/Restic to store the data. I used the Minio plugin that comes nativly with truenas, and the velero quickstart mentioned below has an easy to deploy minio container if needed.

Obviously a kubernetes cluster and kubectl are required, and the storage classes that are being moved to/from must also be part of that cluster. For reference, in my cluster I am moving FROM managed-nfs-storage and moving TO freenas-nfs-csi and freenas-iscsi-csi (based on the data sets that I am moving).

I will be including all files that I use in this as part of the gist, but specifically the nginx deployment will be important. The Nginx deployment that I am including is the example deployment that I use to show off how to migrate. I recomend that you create the deployment and set its storage class to the one you are migrating away from. This will give you an oppertunity to practice, and make sure that the new class and migration process are working. I would also recomend that after you practice on the nginx deployment, that you migrate a less vital PVC, or a PVC that you can and have backed up in a secondairy manner (such as a directory you can download the contents of by hand) so as to verify the results.

Getting Started

Much of this tutorial will be parroting the quickstart guide on the velero website tho I do hope to minimize the unnecessary steps. The First step is to visit the velero github releases page, and download the velero install that matches your machine. Then create a credentials-velero file like the following.

[default]
aws_access_key_id = minio
aws_secret_access_key = minio123

Using that file, you then run the following command to install velero on your cluster, changing http://minio.velero.svc:9000 to the URL of your S3/Minio bucket, and --bucket velero to match the name of the bucket. If not using minio, you will likely need to change other config values, but I can only leave you with the documentation as that is out of scope.

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.0.0 \
    --bucket velero \
    --secret-file ./credentials-velero \
    --secret-file ./credentials-velero \
    --use-restic \
    --default-volumes-to-restic \
    --use-volume-snapshots=false \
    --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.velero.svc:9000

You can confirm the velero install finished via kubectl get deployments -l component=velero --namespace=velero, and the nginx container through similar means.

Migrating

The actual migration of the data is to Freeze, Backup, Delete, and Restore. This is why I said above that downtime would be hard to avoid, as you have to basically stop the servers to migrate them (again, reference the final section on some ideas to avoide down time). The actual work of migrating the data is done via a single config file (documentation) that I called migrate.configmap.yaml (dont forget to change the storage class names on line 22). Adding this config map to the velero namespace sets velero to restore the data to a different storage class on restoration of a backup.

Because the PVC must be deleted, as stated above it is recomended that you test/practice on an example container. I actually made the mistake of not checking the data was transfered over while writing this and had to restore data from a VM backup. Id throw a "test" in the nginx acccess log to check for on reload. Via kubectl you can install the nginx testing deployment (dont forget to change the storageClassName on line 33 to your old storage class) to practice on for this purpose. This section is written assuming this is the PVC being migrated, but little would change if using something else.

Once you have added this config map to the cluster, run velero backup create nginx-backup --include-namespaces=nginx-example to begin the backup. Something to note about this backup command is that it backs up everything in the namespace. This can be tuned to be more specific but it really shouldnt matter in this case. Once completed, check the log output from the backup to make sure the backup was successfull. The example deployment includes a fsfreeze command that I could actually not run, causing the backup to fail! Freezeing before the snapshot is the recomended process, but technically not 100% necessary, circumstance dependant. If you do get an error, disable the freeze/unfreeze annotations and try again. It is possible that you may have other issues occur during the backup that cause restic to fail, but I can only point you to the documentaiton.

You can now use the command velero backup describe nginx-backup to see the status of the backup in detail. It should be marked as completed. You can also check all backups via velero backup get. If the backup does not show as completed, do not continue!

If you want to be safe, set the Reclaim policy on the PV connected to the PVC to Retain. This will allow the next step to be slightly less terrifying as removing the PVC will not immediatly delete the data. Now that the backup is complete, scale down the deployment, and Delete the PVC that you wish to migrate. This will allow velero the "space" that it needs to restore the pvc, and based on the config that you entered earlier, use a different storage class. Once the PVC has been removed, restore it from the backup using velero restore create --from-backup nginx-backup. While everything in the namespace will technically have been backed up, because you only cleared the PVC, and the PV will be on a different storage class, only those will be restored.

Make sure to verify that the restoration is complete via velero restore describe <INSERT NAME HERE> as output from the restore create command, and via velero restore get. Some warnings are to be expected if you backed up the entire namespace, given it can not restore something that already exists. So long as you dont get could not restore, persistentvolumeclaims or a restic error, you should be fine. You can now re-scale up the deployment, and check to confirm that the data properly transfered over. (make sure to do this. As I said, I had to do a VM level restore because I lost data by not checking the restore)

To clean up the backup, simply run velero backup delete nginx-backup and it will be deleted from minio. You can also remove the PV that you set to Retain earlier. Now that you have transfered the nginx test PVC, you should know what you need to transfer the rest of your PVCs.

Avoiding Downtime

Each scenareo will be different, and will need to be dealt with on a case by case basis, however there are two situations in which I can offer atleast some suggestion on avoiding full downtime when migrating. They should only be taken as that however, suggestions and starting points. You know your system better than anyone, and I take no responsiblity for your individual migration.

High Avalability Database

In the case where you have a higly avaliable database, it may be possible to migrate the data without any downtime by moving each node one at a time, and giving the database time to recover between migrations. This will also depend on the database as some dont recover well from being out of the loop for long periods of time.

Read Only

Depending on the scenareo and controll of your system, it might be possible to do a migration in a read only state, thus avoiding total downtime.

  1. set your system to a read-only state
  2. duplicate the database to the new storage class and namespace
  3. start a second database using the new data
  4. change the program configs to point to the new database
  5. restart the programs, and then disabling read-only
apiVersion: v1
kind: ConfigMap
metadata:
# any name can be used; Velero uses the labels (below)
# to identify it rather than the name
name: change-storage-class-config
# must be in the velero namespace
namespace: velero
# the below labels should be used verbatim in your
# ConfigMap.
labels:
# this value-less label identifies the ConfigMap as
# config for a plugin (i.e. the built-in restore item action plugin)
velero.io/plugin-config: ""
# this label identifies the name and kind of plugin
# that this ConfigMap is for.
velero.io/change-storage-class: RestoreItemAction
data:
# add 1+ key-value pairs here, where the key is the old
# storage class name and the value is the new storage
# class name.
managed-nfs-storage: freenas-iscsi-csi
# Copyright 2017 the Velero contributors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
apiVersion: v1
kind: Namespace
metadata:
name: nginx-example
labels:
app: nginx
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: nginx-logs
namespace: nginx-example
labels:
app: nginx
spec:
# Optional:
storageClassName: managed-nfs-storage
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
namespace: nginx-example
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
annotations:
pre.hook.backup.velero.io/container: fsfreeze
pre.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--freeze", "/var/log/nginx"]'
post.hook.backup.velero.io/container: fsfreeze
post.hook.backup.velero.io/command: '["/sbin/fsfreeze", "--unfreeze", "/var/log/nginx"]'
spec:
volumes:
- name: nginx-logs
persistentVolumeClaim:
claimName: nginx-logs
containers:
- image: nginx:1.17.6
name: nginx
ports:
- containerPort: 80
volumeMounts:
- mountPath: "/var/log/nginx"
name: nginx-logs
readOnly: false
- image: ubuntu:bionic
name: fsfreeze
securityContext:
privileged: true
volumeMounts:
- mountPath: "/var/log/nginx"
name: nginx-logs
readOnly: false
command:
- "/bin/bash"
- "-c"
- "sleep infinity"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment