Redis

Redis stands for REmote DIctionary Server. By default, redis stores all data in memory. It's a key-structure database. redis-server is the actual datastore. redis-cli is the command line interface that performs any redis command. By default, redis binds to port 6379.

Starting the redis server

redis-server

While you can build a complete system using Redis only, I think most people will find that it supplements their more generic data solution - whether that be a traditional relational database, a document-oriented system, or something else. It’s the kind of solution you use to implement specific features.

Starting a redis container with persistent storage

docker run --name redis-test -d -v `{pwd}`/data:/data redis:alpine redis-server --appendonly yes

Starting a redis container with a custom config

docker run --name redis-test -d -v `{pwd}`/data:/data -v `{pwd}`/config/redis.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

Connecting to the redis server using redis-cli

docker run -it --link redis-test:redis-server --rm redis:alpine redis-cli -h redis-server -p 6379

Data structures

Redis exposes different data structures. Each one comes with a set of commands that run on the server in order to manipulate the data, this is very powerful since you don't have to read the value, change in the client and then send the altered value back to the server. You just tell the server what you want to do and everything happens in the server which is very performant. This is what set Redis aparts from other cache systems.

Strings
Lists
Hashes
Sets
Sorted Sets
Binary logs
HyperLogLog

Strings

It can store any type of data: text, integers, floats or binary data(video, image or audio). A String value cannot exceed 512 MB.

use cases

cache mechanisms. SET, GET, MSET and MGET
cache with automatic expiration. Very useful to cache DB queries that take a long time to run for a short period of time. SETEX, EXPIRE and EXPIREAT
Counting. e.g page views, likes, metrics. INCR, INCRBY, DECR, DECRBY and INCRFLOATBY.

$ redis-cli
127.0.0.1:6379> MSET first "First Key value" second "Second Key value"
OK
127.0.0.1:6379> MGET first second
1) "First Key value"
2) "Second Key value"
127.0.0.1:6379> SET current_chapter "Chapter 1"
OK
127.0.0.1:6379> EXPIRE current_chapter 10
(integer) 1
127.0.0.1:6379> GET current_chapter
"Chapter 1"
127.0.0.1:6379> TTL current_chapter
(integer) 3
127.0.0.1:6379> SET counter 100
OK
127.0.0.1:6379> INCR counter
(integer) 101
127.0.0.1:6379> INCRBY counter 5 
(integer) 106
127.0.0.1:6379> DECR counter
(integer) 105
127.0.0.1:6379> DECRBY counter 100
(integer) 5

Lists

Lists are linked lists, inserts/deletes from the beginning or the end run in constant time, O(1). Meaning, it doesn't depend on the length of the list. It could be memory optimized if it has less elements than list-max-ziplist-entries and each value is smaller in size than list-max-ziplist-value (bytes). The max number of entries is 2^32 - 1, more than 4 billions elements. List's indices are zero-based and can be positive or negative.

use cases

Event queue. e.g Kue.js
Storing most recent "something". e.g most recent user posts, news, user activity, etc. LPUSH, RPUSH, LLEN, LINDEX, LRANGE,LPOP,RPOP,RPOPPUSH

$ redis-cli
127.0.0.1:6379> LPUSH books "Clean Code"
(integer) 1
127.0.0.1:6379> RPUSH books "Code Complete"
(integer) 2
127.0.0.1:6379> LPUSH books "Peopleware"
(integer) 3
127.0.0.1:6379> LLEN books
(integer) 3
127.0.0.1:6379> LINDEX books 1
"Clean Code"
127.0.0.1:6379> LRANGE books 0 1
1) "Peopleware"
2) "Clean Code"
127.0.0.1:6379> LPOP books
"Peopleware"
127.0.0.1:6379> RPOP books
"Code Complete"

Hashes

Hashes are a great data structure for storing objects because you can map fields to values. It could be memory optimized if it has less elements than hash-max-ziplist-entries and each value is smaller in size than hash-max-ziplist-value (bytes). Internally, a Hash can be a ziplist or a hash table. A ziplist is a dually linked list designed to be memory efficient. In a ziplist, integers are stored as real integers rather than a sequence of characters. Although a ziplist has memory optimizations, lookups are not performed in constant time. On the other hand, a hash table has constant-time lookup but is not memory-optimized.

$ redis-cli
127.0.0.1:6379> HSET movie "title" "The Godfather"
(integer) 1
127.0.0.1:6379> HMSET movie "year" 1972 "rating" 9.2 "watchers" 10000000
OK
127.0.0.1:6379> HINCRBY movie "watchers" 3
(integer) 10000003
127.0.0.1:6379> HGET movie "title"
"The Godfather"
127.0.0.1:6379> HMGET movie "title" "watchers"
1) "The Godfather"
2) "10000003"
127.0.0.1:6379> HDEL movie "watchers"
(integer) 1
127.0.0.1:6379> HGETALL movie
1) "title"
2) "The Godfather"
3) "year"
4) "1972"
5) "rating"
6) "9.2"

It is possible to retrieve only the field names or field values of a Hash with the commands HKEYS and HVALS respectively.

Sets

A Set in Redis is an unordered collection of distinct Strings—it's not possible to add repeated elements to a Set. Internally, a Set is implemented as a hash table. The maximum number of elements that a Set can hold is 2^32 - 1, which means that there can be more than 4 billion elements per Set.

use cases

Data filtering
Data grouping
Membership checking

$ redis-cli
127.0.0.1:6379> SADD user:max:favorite_artists "Arcade Fire" "Arctic Monkeys" "Belle & Sebastian" "Lenine"
(integer) 4
127.0.0.1:6379> SADD user:hugo:favorite_artists "Daft Punk" "The Kooks" "Arctic Monkeys"
(integer) 3
127.0.0.1:6379> SINTER user:max:favorite_artists user:hugo:favorite_artists
1) "Arctic Monkeys"
127.0.0.1:6379> SDIFF user:max:favorite_artists user:hugo:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"
127.0.0.1:6379> SUNION user:max:favorite_artists user:hugo:favorite_artists
1) "Lenine"
2) "Daft Punk"
3) "Belle & Sebastian"
4) "Arctic Monkeys"
5) "Arcade Fire"
6) "The Kooks"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Arcade Fire"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Lenine"
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SREM user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 0
127.0.0.1:6379> SCARD user:max:favorite_artists
(integer) 3
127.0.0.1:6379> SMEMBERS user:max:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"

Sorted sets

a Sorted Set is a collection of nonrepeating Strings sorted by score. It is possible to have elements with repeated scores. In this case, the repeated elements are ordered lexicographically (in alphabetical order).

use cases

Build a real time waiting list for customer service
Show a leaderboard of a massive online game that displays the top players, users with similar scores, or the scores of your friends
Build an autocomplete system using millions of words

$ redis-cli
127.0.0.1:6379> ZADD leaders 100 "Alice"
(integer) 1
127.0.0.1:6379> ZADD leaders 100 "Zed"
(integer) 1
127.0.0.1:6379> ZADD leaders 102 "Hugo"
(integer) 1
127.0.0.1:6379> ZADD leaders 101 "Max"
(integer) 1

There is a family of commands that can fetch ranges in a Sorted Set: ZRANGE, ZRANGEBYLEX, ZRANGEBYSCORE, ZREVRANGE, ZREVRANGEBYLEX, and ZREVRANGEBYSCORE.

ZRANGE returns elements from the lowest to the highest score, and it uses ascending lexicographical order if a score tie exists
ZREVRANGE returns elements from the highest to the lowest score, and it uses descending lexicographical order if a score tie exists Both of these commands expect a key name, a start index, and an end index.

127.0.0.1:6379> ZREVRANGE leaders 0 -1
1) "Hugo"
2) "Max"
3) "Zed"
4) "Alice"
127.0.0.1:6379> ZREVRANGE leaders 0 -1 WITHSCORES
1) "Hugo"
2) "102"
3) "Max"
4) "101"
5) "Zed"
6) "100"
7) "Alice"
8) "100"
127.0.0.1:6379> ZREM leaders "Hugo"
(integer) 1
127.0.0.1:6379> ZSCORE leaders "Max"
"101"
127.0.0.1:6379> ZRANK leaders "Max"
(integer) 2
127.0.0.1:6379> ZREVRANK leaders "Max"
(integer) 0

Bitmaps

A Bitmap is not a real data type in Redis. Under the hood, a Bitmap is a String. We can also say that a Bitmap is a set of bit operations on a String. A Bitmap is a sequence of bits where each bit can store 0 or 1. You can think of a Bitmap as an array of ones and zeroes. Bitmaps are memory efficient, support fast data lookups, and can store up to 2^32 bits (more than 4 billion bits).

use cases Bitmaps are a great match for applications that involve real-time analytics, because they can tell whether a user performed an action (that is, "Did user X perform action Y today?") or how many times an event occurred (that is, "How many users performed action Y this week?"). Each user is identified by an ID, which is a sequential integer. Each Bitmap offset represents a user: user 1 is offset 1, user 30 is offset 30, and so on.

127.0.0.1:6379> SETBIT visits:2015-01-01 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-01 15 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 11 1
(integer) 0
127.0.0.1:6379> GETBIT visits:2015-01-01 10
(integer) 1
127.0.0.1:6379> GETBIT visits:2015-01-02 15
(integer) 0
127.0.0.1:6379> BITCOUNT visits:2015-01-01
(integer) 2

HyperLogLog

Conceptually, a HyperLogLog is an algorithm that uses randomization in order to provide a very good approximation of the number of unique elements that exist in a Set. The Redis implementation of the HyperLogLog has a standard error of 0.81 percent.

use cases

Counting the number of unique users who visited a website
Counting the number of distinct terms that were searched for on your website on a specific date or time
Counting the number of distinct hashtags that were used by a user
Counting the number of distinct words that appear in a book A HyperLogLog uses up to 12 kB to store 100,000 unique visits (or any cardinality). On the other hand, a Set uses 3.2 MB to store 100,000 UUIDs that are 32 bytes each.

$ redis-cli
127.0.0.1:6379> PFADD visits:2015-01-01 "carl" "max" "hugo" "arthur"
(integer) 1
127.0.0.1:6379> PFADD visits:2015-01-01 "max" "hugo"
(integer) 0
127.0.0.1:6379> PFADD visits:2015-01-02 "max" "kc" "hugo" "renata"
(integer) 
127.0.0.1:6379> PFCOUNT visits:2015-01-01
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-02
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-01 visits:2015-01-02
(integer) 6
127.0.0.1:6379> PFMERGE visits:total visits:2015-01-01 visits:2015-01-02
OK
127.0.0.1:6379> PFCOUNT visits:total
(integer) 6

Pub/Sub

Pub/Sub stands for Publish-Subscribe, which is a pattern where messages are not sent directly to specific receivers. Publishers send messages to channels, and subscribers receive these messages if they are listening to a given channel.

The command PUBLISH sends a message to the Redis channel, and it returns the number of clients that received that message. A message gets lost if there are no clients subscribed to the channel when it comes in. The command SUBSCRIBE subscribes a client to one or many channels. The command UNSUBSCRIBE unsubscribes a client from one or many channels. The command PUBSUB introspects the state of the Redis Pub/Sub system. This command accepts three subcommands: CHANNELS, NUMSUB, and NUMPAT.

Transactions

A transaction in Redis is a sequence of commands executed in order and atomically. The command MULTI marks the beginning of a transaction, and the command EXEC marks its end. Any commands between the MULTI and EXEC commands are serialized and executed as an atomic operation. Redis does not serve any other client in the middle of a transaction.

All commands in a transaction are queued in the client and are only sent to the server when the EXEC command is executed. It is possible to prevent a transaction from being executed by using the DISCARD command instead of EXEC.

var redis = require("redis");
var client = redis.createClient();

function transfer(from, to, value, callback) { // 1
  client.get(from, function(err, balance) { // 2
    var multi = client.multi(); // 3
    multi.decrby(from, value); // 4
    multi.incrby(to, value); // 5
    if (balance >= value) { // 6
      multi.exec(function(err, reply) { // 7
        callback(null, reply[0]); // 8
      });
    } else {
      multi.discard(); // 9
      callback(new Error("Insufficient funds"), null); // 10
    }
  });
}

Pipelines

In Redis, a pipeline is a way to send multiple commands together to the Redis server without waiting for individual replies. Redis commands sent in a pipeline must be independent. They run sequentially in the server (the order is preserved), but they do not run as a transaction. Even though pipelines are neither transactional nor atomic (this means that different Redis commands may occur between the ones in the pipeline), they are still useful because they can save a lot of network time, preventing the network from becoming a bottleneck as it often does with heavy load applications.

best practices

When sending many commands, it might be a good idea to use multiple pipelines rather than one big pipeline.
It is a good idea to send transactions in a pipeline to avoid an extra round trip.

Scripting

Redis 2.6 introduced the scripting feature, and the language that was chosen to extend Redis was Lua. Lua scripts are atomically executed, which means that the Redis server is blocked during script execution. Because of this, Redis has a default timeout of 5 seconds to run any script, although this value can be changed through the configuration lua-time-limit.

Ideally, scripts should be simple, have a single responsibility, and run fast. It is possible to pass Redis key names and parameters to a Lua script, and they will be available inside the Lua script through the variables KEYS and ARGV, respectively.

There are two commands for running Lua scripts: EVAL and EVALSHA. The next example will use EVAL, and its syntax is the following:

EVAL script numkeys key [key ...] arg [arg ...]

var redis = require("redis");
var client = redis.createClient();

client.set("mykey", "myvalue"); // 1

var luaScript = 'return redis.call("GET", KEYS[1])'; // 2
client.eval(luaScript, 1, "mykey", function(err, reply) { // 3
  console.log(reply); // 4
  client.quit();
});

best practices

Avoid using hardcoded key names inside a Lua script; pass all key names as parameters to the commands EVAL/EVALSHA.
Many Redis users have replaced their transactional code in the form of WATCH/MULTI/EXEC with Lua scripts.
In order to make scripts play nicely with Redis replication, you should write scripts that do not change Redis keys in non-deterministic ways (that is, do not use random values). Well-written scripts behave the same way when they are re-executed with the same data.

The command SCRIPT LOAD caches a Lua script and returns an identifier (which is the SHA1 hash of the script). The command EVALSHA executes a Lua script based on an identifier returned by SCRIPT LOAD.

Miscellaneous commands

The INFO command returns all Redis server statistics
The DBSIZE command returns the number of existing keys in a Redis server
The DEBUG SEGFAULT command crashes the Redis server process by performing an invalid memory access
The command MONITOR shows all the commands processed by the Redis server in real time. MONITOR could reduce Redis's throughput by over 50%.
The CLIENT LIST command returns a list of all clients connected to the server
The CLIENT SETNAME command changes a client name; it is only useful for debugging purposes.
The CLIENT KILL command terminates a client connection
The FLUSHALL command deletes all keys from Redis
The command RANDOMKEY returns a random existing key name
The PERSIST command removes the existing timeout of a given key
The EXISTS command returns 1 if a certain key exists and 0 if it does not
The PING command returns the string "PONG". It is useful for testing a server/client connection and verifying that Redis is able to exchange data
The AUTH command is used to authorize a client to connect to Redis.
The SCRIPT KILL command terminates the running Lua script if no write operations have been performed by the script. If the script has performed any write operations, the SCRIPT KILL command will not be able to terminate it; in that case, the SHUTDOWN NOSAVE command must be executed.
The SHUTDOWN command stops all clients, causes data to persist if enabled, and shuts down the Redis server
The OBJECT ENCODING command returns the encoding used by a given key

Data type optimizations

In Redis, all data types can use different encodings to save memory or improve performance. For instance, a String that has only digits (for example, 12345) uses less memory than a string of letters (for example, abcde) because they use different encodings. Data types will use different encodings based on thresholds defined in the Redis server configuration.

If you have a large dataset and need to optimize for memory, tweak these configurations until you find a good trade-off between memory and performance.

Security Techniques

Redis was designed to be used in a trusted private network. It supports a very basic security system to protect the connection between the client and server via a plain-text password. Redis does not implement Access Control List (ACL). Therefore, it is not possible to have users with different permission levels.

The authentication feature can be enabled through the configuration requirepass. Choose a complex password of at least 64 characters. The command AUTH authenticates a Redis client.

$ redis-cli
127.0.0.1:6379> SET hello world
(error) NOAUTH Authentication required.
127.0.0.1:6379> AUTH a7f$f35eceb7e@3edd502D892f5885007869dd2f80434Fed5b4!fac0057f51fM
OK
127.0.0.1:6379> SET hello world
OK

Another interesting technique is obfuscating or disabling some critical commands, such as FLUSHDB, FLUSHALL, CONFIG, KEYS, DEBUG, and SAVE. To disable a command, you should set the new name to an empty string. It is good practice to create a configuration file called rename-commands.conf for organization purposes. Use the directive include in redis.conf to include the rename-commands.conf file.

rename-command FLUSHDB e0cc96ad2eab73c2c347011806a76b73
rename-command FLUSHALL a31907b21c437f46808ea49322c91d23a
rename-command CONFIG ""
rename-command KEYS ""
rename-command DEBUG ""
rename-command SAVE ""

Add the following to redis.conf and then restart the redis-server:

include /path/to/config/rename-commands.conf

$ redis-cli
127.0.0.1:6379> SAVE
(error) ERR unknown command 'SAVE'
127.0.0.1:6379> FLUSHALL
(error) ERR unknown command 'FLUSHALL'
127.0.0.1:6379> a31907b21c437f46808ea49322c91d23a
OK

There are many ways to make Redis secure, such as the following:

Use firewall rules to block access from unknown clients
Run Redis on the loopback interface, rather than a publicly accessible network interface. Bind redis to 127.0.0.1.
Run Redis in a virtual private cloud instead of the public Internet
Encrypt client-to-server communication. Use a tool called Stunnel.

Scaling Redis (Beyond a Single Instance)

if a Redis instance is shut down, crashes, or needs to be rebooted, all of the stored data will be lost. To solve this problem, Redis provides two mechanisms to deal with persistence: Redis Database (RDB) and Append-only File (AOF). Both of these mechanisms can be used separately or simultaneously in the same Redis instance.

Recommended reading Redis persistence demystified

RDB (Redis Database)

A .rdb file is a binary that has a point in time representing the data stored in a Redis instance. The RDB file format is optimized for fast reads and writes. To achieve the necessary performance, the internal representation of a .rdb file on a disk is very similar to Redis's in-memory representation. A single RDB file is sufficient to restore a Redis instance completely.

RDB is great for backups and disaster recovery because it allows you to save an RDB file every hour, day, week, or month, depending on your needs.

The command SAVE creates an RDB immediately, but it should be avoided because it blocks the Redis server during snapshot creation. The command BGSAVE (background save) should be used instead; it has the same effect as SAVE, but it runs in a child process so as not to block Redis.

Redis creates snapshots based on two conditions: if in X seconds, Y amount of write operations have happened in your Redis instance, it will create a .rdb file. The RDB filename is based on the directive dbfilename (this defaults to dump.rdb). it is not recommended to use save directives less than 30 seconds apart from each other. RDB is not a 100% guaranteed data recovery approach.

Another downside to RDB is that every time that you need to create a snapshot, the Redis main process will execute a fork() to create a child process to cause the data to persist on the disk. It can make your Redis instance stop serving clients for milliseconds, sometimes even for a few seconds, depending on the hardware and the size of the dataset.

AOF (Append-only File)

When AOF is enabled, every time Redis receives a command that changes the dataset, it will append that command to the AOF (Append-only File). With this being said, if you have AOF enabled and Redis is restarted, it will restore the data by executing all commands listed in AOF, preserving the order, and rebuild the state of the dataset. AOF is a "human-readable" append-only log file. There is a tool called redis-check-aof that checks and fixes AOF files easily.

These are the most important directives in the Redis configuration for AOF:

appendonly: This will enable or disable AOF
appendfsync: options to flush data to disk. no | always | everysec

Note: Restoring data from an RDB is faster than AOF when recovering a big dataset. This is because an RDB does not need to re-execute every change made in the entire database; it only needs to load the data that was previously stored.

Replication

Replication means that while you write to a Redis instance (usually referred to as the master), it will ensure that one or more instances (usually referred to as the slaves) become exact copies of the master.

There are three ways of making a Redis server instance a slave:

Add the directive slaveof IP PORT to the configuration file and start a Redis server using this configuration
Use the redis-server command-line option --slaveof IP PORT

$ redis-server --port 5555
$ redis-server --port 6666 --slaveof 127.0.0.1 5555
$ redis-server --port 7777 --slaveof 127.0.0.1 5555
$ redis-cli -p 5555 SET testkey testvalue
OK
$ redis-cli -p 6666 GET testkey
"testvalue"

Use the command SLAVEOF IP PORT Replicas are widely used for scalability purposes so that all read operations are handled by replicas and the master handles only write operations.

Data redundancy is another reason for having multiple replicas.

Persistence can be moved to the replicas so that the master does not perform disk I/O operations. In this scenario, the master server needs to disable persistence, and it should not restart automatically for any reason; otherwise, it will restart with an empty dataset and replicate it to the replicas, making them delete all of their stored data.

It is possible to improve data consistency guarantees by requiring a minimum number of replicas connected to the master server (min-slaves-to-write).

Replicas are very useful in a master failure scenario because they contain all of the most recent data and can be promoted to master. Unfortunately, when Redis is running in single-instance mode, there is no automatic failover to promote a slave to master. All replicas and clients connected to the old master need to be reconfigured with the new master. The automatic failover feature is the core of Redis Sentinel.

The command SLAVEOF NO ONE converts a slave into a master instance, and it should be used in a failover scenario.

$ redis-cli -p 5555 DEBUG SEGFAULT
$ redis-cli -p 6666 SLAVEOF NO ONE
$ redis-cli -p 7777 SLAVEOF 127.0.0.1 6666

In the previous scenario, all clients that were connected to 127.0.0.1:5555 need to be reconfigured to connect to 127.0.0.1:6666.

Starting the redis master instance

docker run --name redis-master -d redis:alpine redis-server

Starting 2 slave instances pointing to the master instance. By default, slave instances are read-only.

docker run --name redis-slave-1 --link redis-master:redis-master -d -v `{pwd}`/data/slave-1:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379
docker run --name redis-slave-2 --link redis-master:redis-master -d -v `{pwd}`/data/slave-2:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379

Connect to the master and make some changes to the dataset

docker run -it --link redis-master:redis-master --rm redis:alpine redis-cli -h redis-master -p 6379

Connect to the slave instances and double check you can read keys you created in the master instance

docker run -it --link redis-slave-1:redis-slave-1 --rm redis:alpine redis-cli -h redis-slave-1 -p 6379
docker run -it --link redis-slave-2:redis-slave-2 --rm redis:alpine redis-cli -h redis-slave-2 -p 6379

In case of a master failure scenario, an slave instance can be promoted to master. All clients should connect to the new master instance. Execute this in a slave instance:

slaveof no one

Sentinel

Redis sentinel is a distributed system designed to automatically promote a Redis slave to master if the existing master fails. One sentinel for each Redis server. Sentinel listens on its own port and is a separate process.

A client always connects to a Redis instance, but it needs to query a Sentinel to find out what Redis instance it is going to connect to. Communication between all Sentinels takes place through a Pub/Sub channel called __sentinel__:hello in the Redis master.

Partitioning

Partitioning is a general term used to describe the act of breaking up data and distributing it across different hosts. There are two types of partitioning: horizontal partitioning (keys are distributed across different servers aka sharding) and vertical partitioning. Partitioning is performed in a cluster of hosts when better performance, maintainability, or availability is desired.

This is useful for case where:

The total data to be stored is larger than the total memory available in a Redis server
The network bandwidth is not enough to handle all of the traffic

Partitioning types:

Range. Data is distributed based on a range of keys.
Hash. It consists in finding the instance to send the commands to by applying a hash function to the Redis key.
Consistent hashing. Consistent hashing, in our context, is a kind of hashing that remaps only a small portion of the data to different servers when the list of Redis servers is changed (only K/n keys are remapped, where K is the number of keys and n is the number of servers). The technique consists of creating multiple points in a circle for each Redis key and server. The appropriate server for a given key is the closest server to that key in the circle (clockwise); this circle is also referred to as "ring." The points are created using a hash function, such as MD5.

Different ways to implement partitioning:

The client layer. Your own implementation.
The proxy layer. It's an extra layer that proxies all redis queries and performs partitioning for applications. e.g twemproxy, also read this, this
The query router layer. It's implemented in the data store itself. e.g Redis Cluster

Tagging

It's a technique of ensuring that keys are stored on the same server. The convention is to add a tag to a key name with the tag name inside curly braces.

users:1{users}
users:3{users}

Redis Cluster

Official documentation

It was designed to automatically shard data across different Redis instances and perform automatic failover if any problems happens to any master instance. It uses to ports, lower (for client connections) and higher (node-to-node communication).

It requires at least 3 master instances. It's recommended that you have at least one replica per master.

When connecting to a Redis cluster using the redis-cli, the -c parameter is required to enable cluster mode.

redis-cli -c -h <hostname or IP> -p <port-number>

The data partitioning method used is called hash slot. Each master in a cluster owns a portion of the 16384 slots. A master without any slots won't be able to store any data. You need to manually assign x number of slots to each master.

HASH_SLOT = CRC16(key) mod 16384

hash tags are used to apply the hast function and ensure than different key names end up in the same hash slot. In the following example, all keys would be stored in the same slot based on the hash tag {user123}.

SADD {user123}:friends:usa "John" "Bob"
SADD {user123}:friends:brazil "Max" "Hugo"

Creating a cluster

Since the redis instances need to be able to connect to each other, we should create a docker network they can join

docker network create redis-cluster-network

Creating 3 redis instances in cluster mode

docker run --name redis-master-1 --network redis-cluster-network -d -v `{pwd}`/data/master-1:/data -v `{pwd}`/config/redis-cluster-master-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-master-2 --network redis-cluster-network -d -v `{pwd}`/data/master-2:/data -v `{pwd}`/config/redis-cluster-master-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-master-3 --network redis-cluster-network -d -v `{pwd}`/data/master-3:/data -v `{pwd}`/config/redis-cluster-master-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

Listing all redis master nodes

docker container ps

Connecting to a master node and getting information about the cluster. It should report the cluster state as fail since we're not done setting up the cluster.

docker container exec -it redis-master-3 /bin/sh 
redis-cli -c
cluster info

Next, we should distribute the 16384 slots evenly across all 3 Redis instances. The cluster addslots informs the node what slots it should own.

Note: Since we will use bash text expansion {0..5460}, it's bash trick, If you need to install bash on Linux alpine, do the following:

apk update
apk add bash
bash

Assigning the slots each redis instance should own. Slots are where keys will be stored based on the key's hash. In order to allow redis cluster to start in a safe way, we should manually change the configuration epoch. Note: don't do this again, this is the only time when you need to change the configuration epoch.

docker container exec -it redis-master-1 /bin/sh 
redis-cli -c cluster addslots {0..5460}
redis-cli -c cluster set-config-epoch 1

docker container exec -it redis-master-2 /bin/sh 
redis-cli -c cluster addslots {5461..10922}
redis-cli -c cluster set-config-epoch 2

docker container exec -it redis-master-3 /bin/sh 
redis-cli -c cluster addslots {10923..16383}
redis-cli -c cluster set-config-epoch 3

Making all redis instances aware of each other so they can exchange information. e.g on redis-master-1 execute:

redis-cli -c cluster meet <redis-master-2 IP> 6379
redis-cli -c cluster meet <redis-master-3 IP> 6379

Double-checking the cluster is up and running:

redis-cli -c cluster info

cluster_state:ok                                                                                                       
cluster_slots_assigned:16384                                                                                           
cluster_slots_ok:16384                                                                                                 
cluster_slots_pfail:0                                                                                                  
cluster_slots_fail:0                                                                                                   
cluster_known_nodes:3                                                                                                  
cluster_size:3                                                                                                         
cluster_current_epoch:3     
cluster_my_epoch:1                                                                                                    
cluster_stats_messages_sent:191
cluster_stats_messages_received:191

Adding replicas to the master Redis instances. So far, we have 3 Redis masters but not slaves, we should have at least one slave per master, and even having one or two extra slaves above the minimum required (cluster-migration-barrier) is recommended.

Create a new Redis instance in cluster mode

docker run --name redis-slave-1 --network redis-cluster-network -d -v `{pwd}`/data/slave-1:/data -v `{pwd}`/config/redis-cluster-slave-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-slave-2 --network redis-cluster-network -d -v `{pwd}`/data/slave-2:/data -v `{pwd}`/config/redis-cluster-slave-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

docker run --name redis-slave-3 --network redis-cluster-network -d -v `{pwd}`/data/slave-3:/data -v `{pwd}`/config/redis-cluster-slave-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf

Add the new Redis instance to the cluster using cluster meet

docker container exec -it redis-slave-1 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379

docker container exec -it redis-slave-2 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379

docker container exec -it redis-slave-3 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379

Getting the node ID of the master that it'll be replicated. cluster nodes outputs a list of all the nodes that belong to the cluster, alogn with their properties. The node ID is the first string that is displayed in each row.

redis-cli -c cluster nodes

Start the replication by using the command cluster replicate <master-node-id>

-- Slave 1
redis-cli -c cluster replicate 7e78c9a76ee462350a064694683fae266b1afc3a

-- Slave 2
redis-cli -c cluster replicate 2eb1abc6c8ad9a98333eeb1dafe088748ecf97d5

--Slave 3
redis-cli -c cluster replicate b749483152945869cdd062cb29a0f780b6f0ce29

Now that the cluster is up and running, let's add a key for testing sake:

Connect to any redis instance in the cluster
Create a key e.g set cebroker:dev:test-cluster "Yay!". It will display in which Redis master that key was stored.
Connect to the replica redis instance of the mastr (in the last step) and try to get the newly created key. e.g get cebroker:dev:test-cluster

redis-cli -c
set cebroker:dev:test-cluster "Yay!"
get cebroker:dev:test-cluster

Avoiding traps (best practices)

Use benchmarks to decide what data type works best for your case. FLUSHALL + create keys + INFO memory.
Instead of using multiple redis DBs, you should run multiple redis servers. Since redis is single threaded, a redis server with multiple DBs will only use one CPU.
Use namespaces for your keys. e.g namespace:key-name, music-online:album:10001:songs
There is a Linux kernel parameter called swappiness that controls when the operating system will start using the swap space. We recommend that you use a swappiness of 0 when your data always fits into the RAM and 1 when you are not sure.

sysctl -w vm.swappiness=0
vm.swappiness=0 </etc/sysctl.conf>

The Redis server needs enough memory to perform backups if any strategy is enabled. In the worst-case scenario, redis-server may double the used memory during the backup. There is a configuration directive called maxmemory that limits the amount of memory that Redis is allowed to use (in bytes). Redis should not use more than 50 percent of the available memory when any backup strategy is enabled. Make sure that you set up alarms for Redis memory usage.
Inappropriate persistence strategy. If your applicartion doesn't need persistence, disable RDB and AOF. If your application has tolerance for data loss, use RDB. If your application requires fully durable persistence, use both RDB and AOF.
Enable authentication e.g requirepass password-in-plain-text
Disable critical commands. e.g FLUSHDB, FLUSHALL, CONFIG, KEYS, DEBUG and SAVE. You do this by including a renamed-commands.conf into the redis.conf file.
Encrypt client to server communication using stunnel.
All read operations are handled by slave instances. All write operations are handled by the master instance.
Persistance can be moved to the slaves so the master don't have to write to disk. Don't restart the master otherwise it will lose all the data and will replicate its empty dataset to the slaves.

akotlov/redis.md

Redis

Data structures

Strings

Lists

Hashes

Sets

Sorted sets

Bitmaps

HyperLogLog

Pub/Sub

Transactions

Pipelines

Scripting

Miscellaneous commands

Data type optimizations

Security Techniques

Scaling Redis (Beyond a Single Instance)

RDB (Redis Database)

AOF (Append-only File)

Replication

Starting the redis master instance

Sentinel

Partitioning

Tagging

Redis Cluster

Creating a cluster

Avoiding traps (best practices)