A production-ready PostgreSQL on Kubernetes with KubeDB

Running a database in production is hard. It usually appears to be simple at first, but this changes when something goes wrong. Someone deleted a table, you got hacked, or something got corrupted. Now you need to restore a backup that you made (better hope that you made one!). It may also be that you get more load on your database and your system no longer works well enough. Now you need to scale or modify your setup.

I've been searching for a good Kubernetes-native solution for running real databases. As part of the research, I'm trying out different database operators - and writing a tutorial on how to use them. If you like to know about the others, make sure to subscribe to our newsletter.

This is a tutorial (and a video!) on using Appscode's KubeDB and Stash to create a database on Kubernetes and setup a system for regular backups in a reliable way.

A video walkthrough of this tutorial

KubeDB by AppsCode simplifies and automates routine database tasks such as provisioning, patching, backup, recovery, failure detection, and repair for various popular databases on private and public clouds

KubeDB website

In this tutorial I will:

  1. Install the KubeDB operator and create a simple PostgreSQL database
  2. Installing Stash/KubeDB Enterprise (for backup and restore)
  3. Explain what kind of features you would like to have beyond a simple database
  4. Closing remarks

1. Install the KubeDB operator and create a simple PostgreSQL database

Installing the KubeDB operator is described in the KubeDB guides but it comes down to adding the Appscode Helm repository and then installing a couple of Helm charts, as shown below. Because I like repeatable steps, I always create a Makefile, and this is what I will be showing you.

From what I understand it's mandatory to install the operator in the kube-system namespace.

repository: helm repo add appscode https://charts.appscode.com/stable/update: helm repo update# check what the latest version is with:  versions: helm search repo appscode/kubedb# installing the operatorinstall-community: helm install kubedb-community appscode/kubedb \ --version v0.16.1 \ --namespace kube-system# installing the latest supported versioninstall-catalog: helm install kubedb-catalog appscode/kubedb-catalog \ --version v0.16.1 \ --namespace kube-system

This should have installed what you need. You can check with kubectl get postgresversions

The next step would be to install an actual PostgreSQL database. We can use the following minimal kubespec to create the "Postgres" resource (my-postgres.yaml):

apiVersion: kubedb.com/v1alpha2kind: Postgresmetadata:  name: my-postgresspec:  version: "10.6-v3"  storageType: Durable  storage:    accessModes:      - ReadWriteOnce    resources:      requests:        storage: 1Gi  terminationPolicy: DoNotTerminate

For more details see the guide on Postgres

This is the moment where you check that you are in the namespace that you want. By the way: I use kubens for this purpose, together with kubectx a great helper to switch namespaces and clusters.

To create the database you can simply run kubectl apply -f my-postgres.yaml

Now check what resources are being created: kubectl get all. If all worked well you should see at least the following resources:

pod/postgres-0 service/postgres  statefulset.apps/postgresappbinding.appcatalog.appscode.com/postgrespostgres.kubedb.com/postgres

2. Installing Stash/KubeDB Enterprise (for backup and restore)

2.1 Get the license

Now, in order to get access to the automated backup; and restore features of KubeDB you'll need to get the 'enterprise' license of KubeDB. If you get that license it will include the ability to use Stash for the backup and restore of databases.

Currently, how it works is you fetch a 14-day trial license from the license issuer. This is enough to get you started. After that, you can get in touch with KubeDB and sign an agreement for using the pay-as-you-go licensing, after which you will then also be added to priority support.

I think it's very reasonably priced (your own server + license comes out to less than what you'd pay for AWS managed databases)

2.2 Install the operators

# Install KubeDB-enterpriseinstall-enterprise: helm install kubedb-enterprise appscode/kubedb-enterprise \ --version v0.2.1 \ --namespace kube-system \ --set-file license=kubedb-enterprise-license.txt# install stashhelm install stash-enterprise appscode/stash-enterprise  \ --version v0.11.7                  \ --namespace kube-system                       \ --set-file license=stash-enterprise-license.txt

2.3 Install stash-postgres-addon

Next, you'll need to install the Postgres add-on for Stash. It basically includes the intelligence to deal with this particular database type.

stash-postgres-addon: curl -fsSL https://github.com/stashed/catalog/raw/v2020.11.17/deploy/helm3.sh | bash -s -- --catalog=stash-postgres

If you have successfully installed the stash-postgres addon you should be able to retrieve a list of task options like so: kubectl get tasks. It shows me a bunch of tasks like:

postgres-backup-xxpostgres-restore-xx

3. Configure a backup job

3.1 Configure a backup repository

First, we'll configure a place where the backups will go. Here we will use an S3 bucket. Stash uses Restic, and Restic supports symmetric encryption so that your backups are not stored in plaintext. Make sure you store the Restic password safely outside of your cluster. Otherwise, in case of a catastrophic failure, you would still have no actual data!

You will need to create a secret with these three key-value pairs. I find it easiest to create a file secrets and put the key-value pairs in there like so:

$ cat > secretsRESTIC_PASSWORD=a_strong_passwordAWS_ACCESS_KEY_ID=keyAWS_SECRET_ACCESS_KEY=secret

Then create this secret from that file

kubectl create secret generic bucket-secrets \ --from-env-file="secrets"

Now you can create the repository resource (the storage definition). I have used a Minio server, but any bucket should work. Please note the 'http://' prefix on the endpoint address. Since we run the bucket on the cluster we do not want to use TLS, and emitting this prefix otherwise causes a TLS error.

# repository.yamlapiVersion: stash.appscode.com/v1alpha1kind: Repositorymetadata:  name: minio-repospec:  backend:    s3:      endpoint: http://minio.minio.svc.cluster.local:7777      bucket: database-backups      prefix: /backups/demo/    storageSecretName: bucket-secrets

We can apply this resource again with kubectl apply -f repository.yaml

3.2 Create the backupconfiguration

Now we are ready to create the backup configuration. Here is an example:

#backupconfiguration.yamlapiVersion: stash.appscode.com/v1beta1kind: BackupConfigurationmetadata:  name: minio-backupspec:  driver: Restic  repository:    name: minio-repo  task:    name: postgres-backup-10.14.0-v3  target:    ref:      apiVersion: appcatalog.appscode.com/v1alpha1      kind: AppBinding      name: my-postgres  schedule: "* */1 * * *" # backup every hour  paused: false  backupHistoryLimit: 1  retentionPolicy:    name: "keep-some"    keepLast: 15    keepHourly: 24    keepDaily: 30    keepMonthly: 12    prune: true

The 'task' field is a reference to the particular backup task that you want to run. From the output in kubectl get tasks select the version that matches the major version of your database.

The target.name needs to be given a reference to the name you have given to the database when you created it in step 1 (in my case my-postgres).

The retentionPolicy is a configuration for Restic to clean up a certain amount of backups to clear up space, while still keeping some. -- Personally I really like this feature, as it allows you to do more frequent backups without bloating your storage.

Once applied you should see that a backupsession (kubectl get backupsession) is started every time (set the cron interval to shorter while you are testing). In case the session starts, but problems occur a good way to debug is to look at the pod that is started as part of this process; you may need to look into the init container.

Finally, you can check the repository (kubectl get repository) to see if it has created. Use your favorite tool to check if the backup files have also arrived where you expect.

If this all works you have completed your automatic backup configuration!

4. Closing remarks

While this tutorial stops here, if you plan to run a database in production you should make sure to regularly test a full restore of your database.

Overall I have found that the KubeDB and Stash operators are reasonably well documented, and the support is good.

Finally: Be sure to sign up for the Leafcloud newsletter, as I hope to complete evaluations of alternatives and a comparison as well.