I want to run PostgreSQL in my Kubernetes cluster with automated backups, a UI to query and manage the database, and a means to easily transfer databases between the cluster and my other environments. This post discovers that such goals are not very difficult to achieve.

Installing Postgres#

There are a number of methods to install Postgres to your Kubernetes cluster with many steps. Bitnami’s helm chart however, is as simple as this;

bash
helm repo add bitnami https://charts.bitnami.com/bitnami helm install my-release bitnami/postgresql

Wow. After reading so many other approaches I was expecting this bit to be much harder.

Installing Adminer#

Adminer is a fantastic web-based tool for managing and querying many different databases including Microsoft SQL Server, MySQL and PostgreSQL. There’s a docker container for it also meaning we can run it from any docker-enabled host. I run multiple database types across multiple devices and having one tool that can connect to them all is great.

So lets stick it in our Kubernetes cluster. I installed it manually with a deployment and service as defined in this git repo. Instead of using a LoadBalancer service type, I used ClusterIP as my expectation (due to no security) is to authenticate with kubectl and port-forward to the service when I use it.

However, as of a day ago, there’s a helm chart, so maybe look at using that instead.

Scheduling backups#

I’m messing with my cluster, tearing it down and replacing it, running development and test apps on it, but I’m not super keen to lose the data I have in postgres during my constant messing around. I don’t have a ton of data either, so, I want to back it up every hour and send those backups to S3, where I’ll mostly forget about them until some time in the future when I spontaneously run az aks delete.

Create Bucket and IAM User#

To create an S3 bucket to store our backups I’ll run the following;

bash
aws s3api create-bucket --bucket my-bucket --region us-east-1

I used the AWS Console for IAM to add a new user with a single permission: PutObject on the bucket added above. The policy attached to the user looks like this;

json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:PutObject", "Resource": "arn:aws:s3:::my-bucket/*" } ] }

The result of adding this user will give us an Access Key and an Access Key Secret that we can use in the backup job below.

Perform a Backup#

We’d now need to write a script that would do something like this;

  1. Use pg_dump to backup the database to file
  2. Use the aws cli to push the file to S3

We’re doing this in Kubernetes so it’d be easiest to put the above script inside a container. This seems like a problem someone may have already solved and I had the good sense to google whether such a container already existed, and it does! Thankfully Johannes Schickling has done most of the work for us, and has provided two container images: postgres-backup-s3 and postgres-restore-s3.

I forked this and made two small edits before pushing postgres-backup-s3 to docker hub.

Version mismatch#

Off the bat using the existing container failed with the following message;

pg_dump: server version: 12.1 (Debian 12.1-1.pgdg100+1); pg_dump version: 10.4
pg_dump: aborting because of server version mismatch

This was because the version of pg_dump inside the postgres-backup-s3 container was older than the postgres version running inside my postgres container. Updating to latest alpine fixes this.

Folder format#

The existing container just saved every backup into the bucket/prefix path with a file name with the date and time in the file name. Over time this will end up with a large number of files a single directory. Instead I’ll split the date function up and create year, month and day folders.

Before#
bash
s3://$S3_BUCKET/$S3_PREFIX/${POSTGRES_DATABASE}_$(date +"%Y-%m-%dT%H:%M:%SZ").sql.gz
After#
bash
s3://$S3_BUCKET/$S3_PREFIX/$(date +"%Y")/$(date +"%m")/$(date +"%d")/${POSTGRES_DATABASE}_$(date +"%H:%M:%SZ").sql.gz

Creating secrets#

Instead of having sensitive information lying around in yaml files in source, I’ll put that data in Kubernetes Secrets. I found the simplest way to do this was to just create a temporary folder, add a file per secret and set the contents of each file to value of the secret. When we pass these files to kubectl create secret each file name will become a key.

bash
mkdir secrets cd secrets echo -n 'xxx' > S3_ACCESS_KEY_ID echo -n 'xxx' > S3_SECRET_ACCESS_KEY echo -n 'xxx' > S3_BUCKET kubectl create secret generic s3-postgres-backup \ --from-file=./S3_ACCESS_KEY_ID \ --from-file=./S3_SECRET_ACCESS_KEY \ --from-file=./S3_BUCKET rm -rf ../secrets

We can reference these secrets in cron job.

Creating a CronJob#

The configuration that we’ll apply with kubectl apply -f this-config.yml is listed below, and is pretty simple. We’ll set a schedule, in this case to every hour, and set history limits so the pods don’t hang around when their finished. On that schedule, a container will be run, in this case the postgres-backup-s3 from above, and we set the environment variables both directly in the yaml and as references to the secret defined above. The POSTGRES_PASSWORD comes from the secret automatically created for us when we installed the postgres helm chart.

yaml
apiVersion: batch/v1beta1 kind: CronJob metadata: name: postgres-backup spec: schedule: "0 * * * *" successfulJobsHistoryLimit: 1 failedJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: postgres-backup image: staff0rd/postgres-backup-s3 envFrom: - secretRef: name: s3-postgres-backup env: - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: my-release-postgresql key: postgresql-password - name: S3_PREFIX value: test - name: POSTGRES_DATABASE value: postgres - name: POSTGRES_USER value: postgres - name: POSTGRES_HOST value: my-release-postgresql.default.svc.cluster.local restartPolicy: Never

Restoring from backup#

Don’t trust your backup solution until you confirm both that it’s writing backups, and, that you can restore those backups. To restore we’ll use the postgres-restore-s3 container mentioned earlier, and we’ll run it directly in our cluster as a pod. We’ll pass the secrets similar to before as environment variables, and this time I’ll set S3_PREFIX to reflect the folder structure edit I made to postgres-backup-s3. The command below will start a container in the cluster, grab the latest backup from S3, and restore it to our postgres pod.

bash
kubectl run restore-backup \ --env S3_ACCESS_KEY_ID=$(kubectl get secret --namespace default s3-postgres-backup -o jsonpath="{.data.S3_ACCESS_KEY_ID}" | base64 --decode) \ --env S3_SECRET_ACCESS_KEY=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_SECRET_ACCESS_KEY}" | base64 --decode) \ --env S3_BUCKET=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_BUCKET}" | base64 --decode) \ --env S3_PREFIX=test/2020/05/05 \ --env POSTGRES_DATABASE=postgres \ --env POSTGRES_USER=postgres \ --env POSTGRES_PASSWORD=$(kubectl get secret my-release-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode) \ --env POSTGRES_HOST=my-release-postgresql.default.svc.cluster.local \ --image staff0rd/postgres-restore-s3 \ --restart Never

This approach could easily be used to move backups between environments, even our local. Simply docker run the postgres-backup-s3 container to backup your local db to some s3 location, and then run the above command to restore it to the postgres pod in the cluster.

Read policy#

But wait, did you get access denied on the previous section? You’ll need another policy attached to the previously configured IAM user to grant read access to the bucket.

json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*" ] } ] }

Conclusion#

Overall this approach was quite easy considering we could use existing tools and open-source software. Small edits and filling out config was all that was really needed.