Redis Operator for Kubernetes released

Julio Chana
Spotahome Product
Published in
5 min readDec 1, 2017

--

We, at Spotahome, are happy to announce that we’ve just released a new Redis Operator for Kubernetes. This project is able to manage a Redis Failover installation inside a Kubernetes cluster, being able to deploy, maintain, heal and delete all the pieces necessary to have a Redis in High Availability.

What is a Kubernetes Operator? CoreOS Post

This post is meant to explain why we found out that this was needed to be done and how we did it.

Motivation

In order to save the user favorites on our web, speed up it’s navigation (caching some database requests), and more, we needed to have a fast and reliable database.
The development team decided to use Redis as a cache/database, so we started to create a high availability deployment.

Steps followed

As a first movement, we decided to create a Helm chart, that was able to deploy a Redis Failover with a HAProxy on top of it to redirect the requests only to the master node.

This approach had some good things:

  • Highly fault tolerant
  • Single entry point (via the HAProxys, with a service)
  • With the deployments, we ensured that all pieces would be running (but separately)

But also, it had bad things:

  • The bootstrap had to be done manually
  • The Redis and Sentinel nodes needed a custom startup script to be able to find the rest of the nodes
  • The HAProxy added a unnecessary step
  • From time to time, and because of Kubernetes pods life cycle, the sentinel nodes lost the master and all the cluster had a split brain

Secondly, we removed the HAProxy to eliminate that step on the requests, changing the method the clients connected to the Redis master. This was not enough, the rest of the cons still were there.

At the end, we decided to do things as they should have been done, with care and excellence. Therefore we started developing a Kubernetes Custom Resource.

Kubernetes CRD Definition

The Redis Failover CRD must meet the following requirements:

  • Been able to bootstrap the application
  • If an error occurred, try to fix it
  • When deleting, clear all elements
  • Log operator events
  • Export Redis metrics

Also, it has to be able to control the status of the Redis Failover elements:

  • Only one node as master
  • Number of Redis is equal as the set on the Redis Failover specification
  • Number of Sentinels is equal as the set on the Redis Failover specification
  • All Redis slaves have the same master
  • All Sentinels point to the same Redis master
  • Sentinel has not dead nodes

Redis Failover CRD bootstrap procedure

  1. Create a standalone Redis Master with Sentinel, in order to allow other nodes to connect to this one
  2. Create a Sentinel service that will allow to have a single entrance point to connect to a Sentinel node, as all Sentinel nodes are equal
  3. Create a Sentinel deployment that will control the Redis nodes (at this moment only the bootstrap node is running)
  4. Create a Redis service to expose the Redis exporter metrics
  5. Create a Redis statefulset to create Redis nodes one by one
  6. Delete the bootstrap pod

Redis Failover CRD liveness probe

Once the Redis Failover is deployed, the liveness probe of the nodes will be control by two elements:

  1. The kubernetes deployments and statefulsets will control that all the pods are running and working. If not, it will destroy the failed pod and start it again
  2. The Sentinel nodes will control that the Redis master is running. If not, they will do an automate failover voting for a new master and promoting it from slave to master

Because of the life cycle of a pod inside Kubernetes (pods are volatile, they can be deleted and recreated, and also moved from host node), the CRD has to control that Sentinel is working properly, preventing it to cause a split brain. This can be caused because there’s no way of a Sentinel node to self-deregister from the Sentinel Cluster before die, provoking the Sentinel node list to increase without any control.

If this happens, the CRD will send to every Sentinel node a RESET signal (if no failover is being running at that moment).

How to deploy a Redis Failover CRD inside a Kubernetes Cluster

In order to have our Redis Failover CRD running inside Kubernetes, first of all we need to deploy it inside a deployment:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: redisoperator
component: app
name: redisoperator
namespace: tools
spec:
replicas: 1
selector:
matchLabels:
app: redisoperator
component: app
template:
metadata:
labels:
app: redisoperator
component: app
spec:
containers:
— image: quay.io/spotahome/redis-operator:0.1.0
imagePullPolicy: IfNotPresent
name: app
resources:
limits:
cpu: 100m
memory: 50Mi
requests:
cpu: 10m
memory: 50Mi
restartPolicy: Always

With this, the operator will start managing all redisfailover elements inside our Kubernetes cluster.

To deploy a redisfailover, we need to create the following specification:

apiVersion: spotahome.com/v1alpha1
kind: RedisFailover
metadata:
name: redisfailover
namespace: test
spec:
redis:
replicas: 3
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 200m
memory: 200Mi
exporter: true
sentinel:
replicas: 3
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi

With this, the operator will create all the required elements and check that the failover is ready and running.

Redis Failover specification options

The Redis Failover is configurable. This allow us to, for example, decide which version of Redis has to be deployed, or the number of Sentinel Replicas:

apiVersion: spotahome.com/v1alpha1
kind: RedisFailover
metadata:
name: myredisfailover
namespace: mynamespace
spec:
sentinel:
replicas: 3 # Optional. Value by default, can be set higher.
resources: # Optional. If not set, it won’t be defined on created
requests:
cpu: 100m # Optional
memory: 100Mi # Optional
limits:
cpu: 200m # Optional
memory: 200Mi # Optional
redis:
replicas: 3 # Optional. Value by default, can be set higher.
resources: # Optional. If not set, it won’t be defined on created
requests:
cpu: 100m # Optional
memory: 100Mi # Optional
limits:
cpu: 200m # Optional
memory: 200Mi # Optional
exporter: false # Optional. False by default. Adds a redis-exporter container to export metrics.

Participate on the project

As this is an open source project, you can collaborate with it creating a Pull Request on our github page or opening an issue.

--

--