PostgreSQL in Kubernetes
I want to run PostgreSQL in my Kubernetes cluster with automated backups, a UI to query and manage the database, and a means to easily transfer databases between the cluster and my other environments. This post discovers that such goals are not very difficult to achieve.
Installing Postgres
There are a number of methods to install Postgres to your Kubernetes cluster with many steps. Bitnami’s helm chart however, is as simple as this;
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install my-release bitnami/postgresql
Wow. After reading so many other approaches I was expecting this bit to be much harder.
Installing Adminer
Adminer is a fantastic web-based tool for managing and querying many different databases including Microsoft SQL Server, MySQL and PostgreSQL. There’s a docker container for it also meaning we can run it from any docker-enabled host. I run multiple database types across multiple devices and having one tool that can connect to them all is great.
So lets stick it in our Kubernetes cluster. I installed it manually with a deployment and service as defined in this git repo. Instead of using a LoadBalancer
service type, I used ClusterIP
as my expectation (due to no security) is to authenticate with kubectl
and port-forward
to the service when I use it.
However, as of a day ago, there’s a helm chart, so maybe look at using that instead.
Scheduling backups
I’m messing with my cluster, tearing it down and replacing it, running development and test apps on it, but I’m not super keen to lose the data I have in postgres during my constant messing around. I don’t have a ton of data either, so, I want to back it up every hour and send those backups to S3, where I’ll mostly forget about them until some time in the future when I spontaneously run az aks delete
.
Create Bucket and IAM User
To create an S3 bucket to store our backups I’ll run the following;
aws s3api create-bucket --bucket my-bucket --region us-east-1
I used the AWS Console for IAM to add a new user with a single permission: PutObject
on the bucket added above. The policy attached to the user looks like this;
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
The result of adding this user will give us an Access Key and an Access Key Secret that we can use in the backup job below.
Perform a Backup
We’d now need to write a script that would do something like this;
We’re doing this in Kubernetes so it’d be easiest to put the above script inside a container. This seems like a problem someone may have already solved and I had the good sense to google whether such a container already existed, and it does! Thankfully Johannes Schickling has done most of the work for us, and has provided two container images: postgres-backup-s3 and postgres-restore-s3.
I forked this and made two small edits before push
ing postgres-backup-s3
to docker hub.
Version mismatch
Off the bat using the existing container failed with the following message;
pg_dump: server version: 12.1 (Debian 12.1-1.pgdg100+1); pg_dump version: 10.4
pg_dump: aborting because of server version mismatch
This was because the version of pg_dump
inside the postgres-backup-s3
container was older than the postgres version running inside my postgres
container. Updating to latest alpine fixes this.
Folder format
The existing container just saved every backup into the bucket/prefix
path with a file name with the date and time in the file name. Over time this will end up with a large number of files a single directory. Instead I’ll split the date function up and create year, month and day folders.
Before
s3://$S3_BUCKET/$S3_PREFIX/${POSTGRES_DATABASE}_$(date +"%Y-%m-%dT%H:%M:%SZ").sql.gz
After
s3://$S3_BUCKET/$S3_PREFIX/$(date +"%Y")/$(date +"%m")/$(date +"%d")/${POSTGRES_DATABASE}_$(date +"%H:%M:%SZ").sql.gz
Creating secrets
Instead of having sensitive information lying around in yaml files in source, I’ll put that data in Kubernetes Secrets. I found the simplest way to do this was to just create a temporary folder, add a file per secret and set the contents of each file to value of the secret. When we pass these files to kubectl create secret
each file name will become a key.
mkdir secrets
cd secrets
echo -n 'xxx' > S3_ACCESS_KEY_ID
echo -n 'xxx' > S3_SECRET_ACCESS_KEY
echo -n 'xxx' > S3_BUCKET
kubectl create secret generic s3-postgres-backup \
--from-file=./S3_ACCESS_KEY_ID \
--from-file=./S3_SECRET_ACCESS_KEY \
--from-file=./S3_BUCKET
rm -rf ../secrets
We can reference these secrets in cron job.
Creating a CronJob
The configuration that we’ll apply with kubectl apply -f this-config.yml
is listed below, and is pretty simple. We’ll set a schedule, in this case to every hour, and set history limits so the pods don’t hang around when their finished. On that schedule, a container will be run, in this case the postgres-backup-s3
from above, and we set the environment variables both directly in the yaml and as references to the secret defined above. The POSTGRES_PASSWORD
comes from the secret automatically created for us when we installed the postgres helm chart.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 * * * *"
successfulJobsHistoryLimit: 1
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: staff0rd/postgres-backup-s3
envFrom:
- secretRef:
name: s3-postgres-backup
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: my-release-postgresql
key: postgresql-password
- name: S3_PREFIX
value: test
- name: POSTGRES_DATABASE
value: postgres
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_HOST
value: my-release-postgresql.default.svc.cluster.local
restartPolicy: Never
Restoring from backup
Don’t trust your backup solution until you confirm both that it’s writing backups, and, that you can restore those backups. To restore we’ll use the postgres-restore-s3
container mentioned earlier, and we’ll run it directly in our cluster as a pod. We’ll pass the secrets similar to before as environment variables, and this time I’ll set S3_PREFIX
to reflect the folder structure edit I made to postgres-backup-s3
. The command below will start a container in the cluster, grab the latest backup from S3, and restore it to our postgres pod.
kubectl run restore-backup \
--env S3_ACCESS_KEY_ID=$(kubectl get secret --namespace default s3-postgres-backup -o jsonpath="{.data.S3_ACCESS_KEY_ID}" | base64 --decode) \
--env S3_SECRET_ACCESS_KEY=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_SECRET_ACCESS_KEY}" | base64 --decode) \
--env S3_BUCKET=$(kubectl get secret s3-postgres-backup -o jsonpath="{.data.S3_BUCKET}" | base64 --decode) \
--env S3_PREFIX=test/2020/05/05 \
--env POSTGRES_DATABASE=postgres \
--env POSTGRES_USER=postgres \
--env POSTGRES_PASSWORD=$(kubectl get secret my-release-postgresql -o jsonpath="{.data.postgresql-password}" | base64 --decode) \
--env POSTGRES_HOST=my-release-postgresql.default.svc.cluster.local \
--image staff0rd/postgres-restore-s3 \
--restart Never
This approach could easily be used to move backups between environments, even our local. Simply docker run
the postgres-backup-s3
container to backup your local db to some s3 location, and then run the above command to restore it to the postgres pod in the cluster.
Read policy
But wait, did you get access denied on the previous section? You’ll need another policy attached to the previously configured IAM user to grant read access to the bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
]
}
Conclusion
Overall this approach was quite easy considering we could use existing tools and open-source software. Small edits and filling out config was all that was really needed.