Redis Operator for Kubernetes released

Published in

Spotahome Product

5 min readDec 1, 2017

We, at Spotahome, are happy to announce that we’ve just released a new Redis Operator for Kubernetes. This project is able to manage a Redis Failover installation inside a Kubernetes cluster, being able to deploy, maintain, heal and delete all the pieces necessary to have a Redis in High Availability.

What is a Kubernetes Operator? CoreOS Post

This post is meant to explain why we found out that this was needed to be done and how we did it.

Motivation

In order to save the user favorites on our web, speed up it’s navigation (caching some database requests), and more, we needed to have a fast and reliable database.
The development team decided to use Redis as a cache/database, so we started to create a high availability deployment.

Steps followed

As a first movement, we decided to create a Helm chart, that was able to deploy a Redis Failover with a HAProxy on top of it to redirect the requests only to the master node.

This approach had some good things:

Highly fault tolerant
Single entry point (via the HAProxys, with a service)
With the deployments, we ensured that all pieces would be running (but separately)

But also, it had bad things:

The bootstrap had to be done manually
The Redis and Sentinel nodes needed a custom startup script to be able to find the rest of the nodes
The HAProxy added a unnecessary step
From time to time, and because of Kubernetes pods life cycle, the sentinel nodes lost the master and all the cluster had a split brain

Secondly, we removed the HAProxy to eliminate that step on the requests, changing the method the clients connected to the Redis master. This was not enough, the rest of the cons still were there.

At the end, we decided to do things as they should have been done, with care and excellence. Therefore we started developing a Kubernetes Custom Resource.

Kubernetes CRD Definition

The Redis Failover CRD must meet the following requirements:

Been able to bootstrap the application
If an error occurred, try to fix it
When deleting, clear all elements
Log operator events
Export Redis metrics

Also, it has to be able to control the status of the Redis Failover elements:

Only one node as master
Number of Redis is equal as the set on the Redis Failover specification
Number of Sentinels is equal as the set on the Redis Failover specification
All Redis slaves have the same master
All Sentinels point to the same Redis master
Sentinel has not dead nodes

Redis Failover CRD bootstrap procedure

Create a standalone Redis Master with Sentinel, in order to allow other nodes to connect to this one
Create a Sentinel service that will allow to have a single entrance point to connect to a Sentinel node, as all Sentinel nodes are equal
Create a Sentinel deployment that will control the Redis nodes (at this moment only the bootstrap node is running)
Create a Redis service to expose the Redis exporter metrics
Create a Redis statefulset to create Redis nodes one by one
Delete the bootstrap pod

Redis Failover CRD liveness probe

Once the Redis Failover is deployed, the liveness probe of the nodes will be control by two elements:

The kubernetes deployments and statefulsets will control that all the pods are running and working. If not, it will destroy the failed pod and start it again
The Sentinel nodes will control that the Redis master is running. If not, they will do an automate failover voting for a new master and promoting it from slave to master

Because of the life cycle of a pod inside Kubernetes (pods are volatile, they can be deleted and recreated, and also moved from host node), the CRD has to control that Sentinel is working properly, preventing it to cause a split brain. This can be caused because there’s no way of a Sentinel node to self-deregister from the Sentinel Cluster before die, provoking the Sentinel node list to increase without any control.

If this happens, the CRD will send to every Sentinel node a RESET signal (if no failover is being running at that moment).

How to deploy a Redis Failover CRD inside a Kubernetes Cluster

In order to have our Redis Failover CRD running inside Kubernetes, first of all we need to deploy it inside a deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 labels:
 app: redisoperator
 component: app
 name: redisoperator
 namespace: tools
spec:
 replicas: 1
 selector:
 matchLabels:
 app: redisoperator
 component: app
 template:
 metadata:
 labels:
 app: redisoperator
 component: app
 spec:
 containers:
 — image: quay.io/spotahome/redis-operator:0.1.0
 imagePullPolicy: IfNotPresent
 name: app
 resources:
 limits:
 cpu: 100m
 memory: 50Mi
 requests:
 cpu: 10m
 memory: 50Mi
 restartPolicy: Always

With this, the operator will start managing all redisfailover elements inside our Kubernetes cluster.

To deploy a redisfailover, we need to create the following specification:

apiVersion: spotahome.com/v1alpha1
kind: RedisFailover
metadata:
 name: redisfailover
 namespace: test
spec:
 redis:
 replicas: 3
 resources:
 limits:
 cpu: 500m
 memory: 500Mi
 requests:
 cpu: 200m
 memory: 200Mi
 exporter: true
 sentinel:
 replicas: 3
 resources:
 limits:
 cpu: 200m
 memory: 200Mi
 requests:
 cpu: 100m
 memory: 100Mi

With this, the operator will create all the required elements and check that the failover is ready and running.

Redis Failover specification options

The Redis Failover is configurable. This allow us to, for example, decide which version of Redis has to be deployed, or the number of Sentinel Replicas:

apiVersion: spotahome.com/v1alpha1
kind: RedisFailover
metadata:
 name: myredisfailover
 namespace: mynamespace
spec:
 sentinel:
 replicas: 3 # Optional. Value by default, can be set higher.
 resources: # Optional. If not set, it won’t be defined on created 
 requests:
 cpu: 100m # Optional
 memory: 100Mi # Optional
 limits:
 cpu: 200m # Optional
 memory: 200Mi # Optional
 redis:
 replicas: 3 # Optional. Value by default, can be set higher.
 resources: # Optional. If not set, it won’t be defined on created
 requests:
 cpu: 100m # Optional
 memory: 100Mi # Optional
 limits:
 cpu: 200m # Optional
 memory: 200Mi # Optional
 exporter: false # Optional. False by default. Adds a redis-exporter container to export metrics.

Participate on the project

As this is an open source project, you can collaborate with it creating a Pull Request on our github page or opening an issue.