PostgreSQL in Kubernetes
I want to run PostgreSQL in my Kubernetes cluster with automated backups, a UI to query and manage the database, and a means to easily transfer databases between the cluster and my other environments. This post discovers that such goals are not very difficult to achieve.
Installing Postgres#
There are a number of methods to install Postgres to your Kubernetes cluster with many steps. Bitnami’s helm chart however, is as simple as this;
bashhelm repo add bitnami https://charts.bitnami.com/bitnami helm install my-release bitnami/postgresql
Wow. After reading so many other approaches I was expecting this bit to be much harder.
Installing Adminer#
Adminer is a fantastic web-based tool for managing and querying many different databases including Microsoft SQL Server, MySQL and PostgreSQL. There’s a docker container for it also meaning we can run it from any docker-enabled host. I run multiple database types across multiple devices and having one tool that can connect to them all is great.
So lets stick it in our Kubernetes cluster. I installed it manually with a deployment and service as defined in this git repo. Instead of using a LoadBalancer
service type, I used ClusterIP
as my expectation (due to no security) is to authenticate with kubectl
and port-forward
to the service when I use it.
However, as of a day ago, there’s a helm chart, so maybe look at using that instead.
Scheduling backups#
I’m messing with my cluster, tearing it down and replacing it, running development and test apps on it, but I’m not super keen to lose the data I have in postgres during my constant messing around. I don’t have a ton of data either, so, I want to back it up every hour and send those backups to S3, where I’ll mostly forget about them until some time in the future when I spontaneously run az aks delete
.
Create Bucket and IAM User#
To create an S3 bucket to store our backups I’ll run the following;
bashaws s3api create-bucket --bucket my-bucket --region us-east-1
I used the AWS Console for IAM to add a new user with a single permission: PutObject
on the bucket added above. The policy attached to the user looks like this;
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
The result of adding this user will give us an Access Key and an Access Key Secret that we can use in the backup job below.
Perform a Backup#
We’d now need to write a script that would do something like this;
We’re doing this in Kubernetes so it’d be easiest to put the above script inside a container. This seems like a problem someone may have already solved and I had the good sense to google whether such a container already existed, and it does! Thankfully Johannes Schickling has done most of the work for us, and has provided two container images: postgres-backup-s3 and postgres-restore-s3.
I forked this and made two small edits before push
ing postgres-backup-s3
to docker hub.
Version mismatch#
Off the bat using the existing container failed with the following message;
pg_dump: server version: 12.1 (Debian 12.1-1.pgdg100+1); pg_dump version: 10.4
pg_dump: aborting because of server version mismatch
This was because the version of pg_dump
inside the postgres-backup-s3
container was older than the postgres version running inside my postgres
container. Updating to latest alpine fixes this.
Folder format#
The existing container just saved every backup into the bucket/prefix
path with a file name with the date and time in the file name. Over time this will end up with a large number of files a single directory. Instead I’ll split the date function up and create year, month and day folders.
Before#
bash
s3://$S3_BUCKET/$S3_PREFIX/${POSTGRES_DATABASE}_$(date +"%Y-%m-%dT%H:%M:%SZ").sql.gz
After#
bash
s3://$S3_BUCKET/$S3_PREFIX/$(date +"%Y")/$(date +"%m")/$(date +"%d")/${POSTGRES_DATABASE}_$(date +"%H:%M:%SZ").sql.gz
Creating secrets#
Instead of having sensitive information lying around in yaml files in source, I’ll put that data in Kubernetes Secrets. I found the simplest way to do this was to just create a temporary folder, add a file per secret and set the contents of each file to value of the secret. When we pass these files to kubectl create secret
each file name will become a key.
bash
mkdir secrets
cd secrets
echo -n 'xxx' > S3_ACCESS_KEY_ID
echo -n 'xxx' > S3_SECRET_ACCESS_KEY
echo -n 'xxx' > S3_BUCKET
kubectl create secret generic s3-postgres-backup \
--from-file=./S3_ACCESS_KEY_ID \
--from-file=./S3_SECRET_ACCESS_KEY \
--from-file=./S3_BUCKET
rm -rf ../secrets
We can reference these secrets in cron job.
Creating a CronJob#
The configuration that we’ll apply with kubectl apply -f this-config.yml
is listed below, and is pretty simple. We’ll set a schedule, in this case to every hour, and set history limits so the pods don’t hang around when their finished. On that schedule, a container will be run, in this case the postgres-backup-s3
from above, and we set the environment variables both directly in the yaml and as references to the secret defined above. The POSTGRES_PASSWORD
comes from the secret automatically created for us when we installed the postgres helm chart.
yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 * * * *"
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: staff0rd/postgres-backup-s3
envFrom:
- secretRef:
name: s3-postgres-backup
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: my-release-postgresql
key: postgresql-password
- name: S3_PREFIX
value: test
- name: POSTGRES_DATABASE
value: postgres
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_HOST
value: my-release-postgresql.default.svc.cluster.local
restartPolicy: Never
Restoring from backup#
Don’t trust your backup solution until you confirm both that it’s writing backups, and, that you can restore those backups. To restore we’ll use the postgres-restore-s3
container mentioned earlier, and we’ll run it directly in our cluster as a pod. We’ll pass the secrets similar to before as environment variables, and this time I’ll set S3_PREFIX
to reflect the folder structure edit I made to postgres-backup-s3
. The command below will start a container in the cluster, grab the latest backup from S3, and restore it to our postgres pod.
bash
kubectl run restore-backup \
--env S3_ACCESS_KEY_ID=$(kubectl get secret --namespace default s3-postgres-backup -o jsonpath="{.data.S3_ACCESS_KEY_ID}" | base64 --decode) \
--env S3_SECRET_ACCESS_KEY=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_SECRET_ACCESS_KEY}" | base64 --decode) \
--env S3_BUCKET=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_BUCKET}" | base64 --decode) \
--env S3_PREFIX=test/2020/05/05 \
--env POSTGRES_DATABASE=postgres \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=$(kubectl get secret my-release-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode) \
--env POSTGRES_HOST=my-release-postgresql.default.svc.cluster.local \
--image staff0rd/postgres-restore-s3 \
--restart Never
This approach could easily be used to move backups between environments, even our local. Simply docker run
the postgres-backup-s3
container to backup your local db to some s3 location, and then run the above command to restore it to the postgres pod in the cluster.
Read policy#
But wait, did you get access denied on the previous section? You’ll need another policy attached to the previously configured IAM user to grant read access to the bucket.
json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}
Conclusion#
Overall this approach was quite easy considering we could use existing tools and open-source software. Small edits and filling out config was all that was really needed.