Redis stands for REmote DIctionary Server. By default, redis stores all data in memory. It's a key-structure database. redis-server
is the actual datastore. redis-cli
is the command line interface that performs any redis command. By default, redis binds to port 6379
.
Starting the redis server
redis-server
While you can build a complete system using Redis only, I think most people will find that it supplements their more generic data solution - whether that be a traditional relational database, a document-oriented system, or something else. It’s the kind of solution you use to implement specific features.
Starting a redis container with persistent storage
docker run --name redis-test -d -v `{pwd}`/data:/data redis:alpine redis-server --appendonly yes
Starting a redis container with a custom config
docker run --name redis-test -d -v `{pwd}`/data:/data -v `{pwd}`/config/redis.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
Connecting to the redis server using redis-cli
docker run -it --link redis-test:redis-server --rm redis:alpine redis-cli -h redis-server -p 6379
Redis exposes different data structures. Each one comes with a set of commands that run on the server in order to manipulate the data, this is very powerful since you don't have to read the value, change in the client and then send the altered value back to the server. You just tell the server what you want to do and everything happens in the server which is very performant. This is what set Redis aparts from other cache systems.
- Strings
- Lists
- Hashes
- Sets
- Sorted Sets
- Binary logs
- HyperLogLog
It can store any type of data: text, integers, floats or binary data(video, image or audio). A String
value cannot exceed 512 MB.
use cases
- cache mechanisms.
SET
,GET
,MSET
andMGET
- cache with automatic expiration. Very useful to cache DB queries that take a long time to run for a short period of time.
SETEX
,EXPIRE
andEXPIREAT
- Counting. e.g page views, likes, metrics.
INCR
,INCRBY
,DECR
,DECRBY
andINCRFLOATBY
.
$ redis-cli
127.0.0.1:6379> MSET first "First Key value" second "Second Key value"
OK
127.0.0.1:6379> MGET first second
1) "First Key value"
2) "Second Key value"
127.0.0.1:6379> SET current_chapter "Chapter 1"
OK
127.0.0.1:6379> EXPIRE current_chapter 10
(integer) 1
127.0.0.1:6379> GET current_chapter
"Chapter 1"
127.0.0.1:6379> TTL current_chapter
(integer) 3
127.0.0.1:6379> SET counter 100
OK
127.0.0.1:6379> INCR counter
(integer) 101
127.0.0.1:6379> INCRBY counter 5
(integer) 106
127.0.0.1:6379> DECR counter
(integer) 105
127.0.0.1:6379> DECRBY counter 100
(integer) 5
Lists are linked lists, inserts/deletes from the beginning or the end run in constant time, O(1). Meaning, it doesn't depend on the length of the list. It could be memory optimized if it has less elements than list-max-ziplist-entries
and each value is smaller in size than list-max-ziplist-value
(bytes). The max number of entries is 2^32 - 1, more than 4 billions elements. List's indices are zero-based and can be positive or negative.
use cases
- Event queue. e.g Kue.js
- Storing most recent "something". e.g most recent user posts, news, user activity, etc.
LPUSH
,RPUSH
,LLEN
,LINDEX
,LRANGE
,LPOP
,RPOP
,RPOPPUSH
$ redis-cli
127.0.0.1:6379> LPUSH books "Clean Code"
(integer) 1
127.0.0.1:6379> RPUSH books "Code Complete"
(integer) 2
127.0.0.1:6379> LPUSH books "Peopleware"
(integer) 3
127.0.0.1:6379> LLEN books
(integer) 3
127.0.0.1:6379> LINDEX books 1
"Clean Code"
127.0.0.1:6379> LRANGE books 0 1
1) "Peopleware"
2) "Clean Code"
127.0.0.1:6379> LPOP books
"Peopleware"
127.0.0.1:6379> RPOP books
"Code Complete"
Hashes are a great data structure for storing objects because you can map fields to values. It could be memory optimized if it has less elements than hash-max-ziplist-entries
and each value is smaller in size than hash-max-ziplist-value
(bytes). Internally, a Hash can be a ziplist or a hash table. A ziplist is a dually linked list designed to be memory efficient. In a ziplist, integers are stored as real integers rather than a sequence of characters. Although a ziplist has memory optimizations, lookups are not performed in constant time. On the other hand, a hash table has constant-time lookup but is not memory-optimized.
$ redis-cli
127.0.0.1:6379> HSET movie "title" "The Godfather"
(integer) 1
127.0.0.1:6379> HMSET movie "year" 1972 "rating" 9.2 "watchers" 10000000
OK
127.0.0.1:6379> HINCRBY movie "watchers" 3
(integer) 10000003
127.0.0.1:6379> HGET movie "title"
"The Godfather"
127.0.0.1:6379> HMGET movie "title" "watchers"
1) "The Godfather"
2) "10000003"
127.0.0.1:6379> HDEL movie "watchers"
(integer) 1
127.0.0.1:6379> HGETALL movie
1) "title"
2) "The Godfather"
3) "year"
4) "1972"
5) "rating"
6) "9.2"
It is possible to retrieve only the field names or field values of a Hash with the commands HKEYS
and HVALS
respectively.
A Set in Redis is an unordered collection of distinct Strings—it's not possible to add repeated elements to a Set. Internally, a Set is implemented as a hash table. The maximum number of elements that a Set can hold is 2^32 - 1, which means that there can be more than 4 billion elements per Set.
use cases
- Data filtering
- Data grouping
- Membership checking
$ redis-cli
127.0.0.1:6379> SADD user:max:favorite_artists "Arcade Fire" "Arctic Monkeys" "Belle & Sebastian" "Lenine"
(integer) 4
127.0.0.1:6379> SADD user:hugo:favorite_artists "Daft Punk" "The Kooks" "Arctic Monkeys"
(integer) 3
127.0.0.1:6379> SINTER user:max:favorite_artists user:hugo:favorite_artists
1) "Arctic Monkeys"
127.0.0.1:6379> SDIFF user:max:favorite_artists user:hugo:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"
127.0.0.1:6379> SUNION user:max:favorite_artists user:hugo:favorite_artists
1) "Lenine"
2) "Daft Punk"
3) "Belle & Sebastian"
4) "Arctic Monkeys"
5) "Arcade Fire"
6) "The Kooks"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Arcade Fire"
127.0.0.1:6379> SRANDMEMBER user:max:favorite_artists
"Lenine"
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SREM user:max:favorite_artists "Arctic Monkeys"
(integer) 1
127.0.0.1:6379> SISMEMBER user:max:favorite_artists "Arctic Monkeys"
(integer) 0
127.0.0.1:6379> SCARD user:max:favorite_artists
(integer) 3
127.0.0.1:6379> SMEMBERS user:max:favorite_artists
1) "Belle & Sebastian"
2) "Arcade Fire"
3) "Lenine"
a Sorted Set is a collection of nonrepeating Strings sorted by score. It is possible to have elements with repeated scores. In this case, the repeated elements are ordered lexicographically (in alphabetical order).
use cases
- Build a real time waiting list for customer service
- Show a leaderboard of a massive online game that displays the top players, users with similar scores, or the scores of your friends
- Build an autocomplete system using millions of words
$ redis-cli
127.0.0.1:6379> ZADD leaders 100 "Alice"
(integer) 1
127.0.0.1:6379> ZADD leaders 100 "Zed"
(integer) 1
127.0.0.1:6379> ZADD leaders 102 "Hugo"
(integer) 1
127.0.0.1:6379> ZADD leaders 101 "Max"
(integer) 1
There is a family of commands that can fetch ranges in a Sorted Set: ZRANGE
, ZRANGEBYLEX
, ZRANGEBYSCORE
, ZREVRANGE
, ZREVRANGEBYLEX
, and ZREVRANGEBYSCORE
.
- ZRANGE returns elements from the lowest to the highest score, and it uses ascending lexicographical order if a score tie exists
- ZREVRANGE returns elements from the highest to the lowest score, and it uses descending lexicographical order if a score tie exists Both of these commands expect a key name, a start index, and an end index.
127.0.0.1:6379> ZREVRANGE leaders 0 -1
1) "Hugo"
2) "Max"
3) "Zed"
4) "Alice"
127.0.0.1:6379> ZREVRANGE leaders 0 -1 WITHSCORES
1) "Hugo"
2) "102"
3) "Max"
4) "101"
5) "Zed"
6) "100"
7) "Alice"
8) "100"
127.0.0.1:6379> ZREM leaders "Hugo"
(integer) 1
127.0.0.1:6379> ZSCORE leaders "Max"
"101"
127.0.0.1:6379> ZRANK leaders "Max"
(integer) 2
127.0.0.1:6379> ZREVRANK leaders "Max"
(integer) 0
A Bitmap is not a real data type in Redis. Under the hood, a Bitmap is a String. We can also say that a Bitmap is a set of bit operations on a String. A Bitmap is a sequence of bits where each bit can store 0 or 1. You can think of a Bitmap as an array of ones and zeroes. Bitmaps are memory efficient, support fast data lookups, and can store up to 2^32 bits (more than 4 billion bits).
use cases Bitmaps are a great match for applications that involve real-time analytics, because they can tell whether a user performed an action (that is, "Did user X perform action Y today?") or how many times an event occurred (that is, "How many users performed action Y this week?"). Each user is identified by an ID, which is a sequential integer. Each Bitmap offset represents a user: user 1 is offset 1, user 30 is offset 30, and so on.
127.0.0.1:6379> SETBIT visits:2015-01-01 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-01 15 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 10 1
(integer) 0
127.0.0.1:6379> SETBIT visits:2015-01-02 11 1
(integer) 0
127.0.0.1:6379> GETBIT visits:2015-01-01 10
(integer) 1
127.0.0.1:6379> GETBIT visits:2015-01-02 15
(integer) 0
127.0.0.1:6379> BITCOUNT visits:2015-01-01
(integer) 2
Conceptually, a HyperLogLog is an algorithm that uses randomization in order to provide a very good approximation of the number of unique elements that exist in a Set. The Redis implementation of the HyperLogLog has a standard error of 0.81 percent.
use cases
- Counting the number of unique users who visited a website
- Counting the number of distinct terms that were searched for on your website on a specific date or time
- Counting the number of distinct hashtags that were used by a user
- Counting the number of distinct words that appear in a book A HyperLogLog uses up to 12 kB to store 100,000 unique visits (or any cardinality). On the other hand, a Set uses 3.2 MB to store 100,000 UUIDs that are 32 bytes each.
$ redis-cli
127.0.0.1:6379> PFADD visits:2015-01-01 "carl" "max" "hugo" "arthur"
(integer) 1
127.0.0.1:6379> PFADD visits:2015-01-01 "max" "hugo"
(integer) 0
127.0.0.1:6379> PFADD visits:2015-01-02 "max" "kc" "hugo" "renata"
(integer)
127.0.0.1:6379> PFCOUNT visits:2015-01-01
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-02
(integer) 4
127.0.0.1:6379> PFCOUNT visits:2015-01-01 visits:2015-01-02
(integer) 6
127.0.0.1:6379> PFMERGE visits:total visits:2015-01-01 visits:2015-01-02
OK
127.0.0.1:6379> PFCOUNT visits:total
(integer) 6
Pub/Sub stands for Publish-Subscribe, which is a pattern where messages are not sent directly to specific receivers. Publishers send messages to channels, and subscribers receive these messages if they are listening to a given channel.
The command PUBLISH
sends a message to the Redis channel, and it returns the number of clients that received that message. A message gets lost if there are no clients subscribed to the channel when it comes in. The command SUBSCRIBE
subscribes a client to one or many channels. The command UNSUBSCRIBE
unsubscribes a client from one or many channels. The command PUBSUB
introspects the state of the Redis Pub/Sub system. This command accepts three subcommands: CHANNELS, NUMSUB, and NUMPAT.
A transaction in Redis is a sequence of commands executed in order and atomically. The command MULTI
marks the beginning of a transaction, and the command EXEC
marks its end. Any commands between the MULTI
and EXEC
commands are serialized and executed as an atomic operation. Redis does not serve any other client in the middle of a transaction.
All commands in a transaction are queued in the client and are only sent to the server when the EXEC
command is executed. It is possible to prevent a transaction from being executed by using the DISCARD
command instead of EXEC
.
var redis = require("redis");
var client = redis.createClient();
function transfer(from, to, value, callback) { // 1
client.get(from, function(err, balance) { // 2
var multi = client.multi(); // 3
multi.decrby(from, value); // 4
multi.incrby(to, value); // 5
if (balance >= value) { // 6
multi.exec(function(err, reply) { // 7
callback(null, reply[0]); // 8
});
} else {
multi.discard(); // 9
callback(new Error("Insufficient funds"), null); // 10
}
});
}
In Redis, a pipeline is a way to send multiple commands together to the Redis server without waiting for individual replies. Redis commands sent in a pipeline must be independent. They run sequentially in the server (the order is preserved), but they do not run as a transaction. Even though pipelines are neither transactional nor atomic (this means that different Redis commands may occur between the ones in the pipeline), they are still useful because they can save a lot of network time, preventing the network from becoming a bottleneck as it often does with heavy load applications.
best practices
- When sending many commands, it might be a good idea to use multiple pipelines rather than one big pipeline.
- It is a good idea to send transactions in a pipeline to avoid an extra round trip.
Redis 2.6 introduced the scripting feature, and the language that was chosen to extend Redis was Lua. Lua scripts are atomically executed, which means that the Redis server is blocked during script execution. Because of this, Redis has a default timeout of 5 seconds to run any script, although this value can be changed through the configuration lua-time-limit
.
Ideally, scripts should be simple, have a single responsibility, and run fast. It is possible to pass Redis key names and parameters to a Lua script, and they will be available inside the Lua script through the variables KEYS
and ARGV
, respectively.
There are two commands for running Lua scripts: EVAL
and EVALSHA
. The next example will use EVAL
, and its syntax is the following:
EVAL script numkeys key [key ...] arg [arg ...]
var redis = require("redis");
var client = redis.createClient();
client.set("mykey", "myvalue"); // 1
var luaScript = 'return redis.call("GET", KEYS[1])'; // 2
client.eval(luaScript, 1, "mykey", function(err, reply) { // 3
console.log(reply); // 4
client.quit();
});
best practices
- Avoid using hardcoded key names inside a Lua script; pass all key names as parameters to the commands
EVAL
/EVALSHA
. - Many Redis users have replaced their transactional code in the form of WATCH/MULTI/EXEC with Lua scripts.
- In order to make scripts play nicely with Redis replication, you should write scripts that do not change Redis keys in non-deterministic ways (that is, do not use random values). Well-written scripts behave the same way when they are re-executed with the same data.
The command SCRIPT LOAD
caches a Lua script and returns an identifier (which is the SHA1 hash of the script). The command EVALSHA
executes a Lua script based on an identifier returned by SCRIPT LOAD
.
- The INFO command returns all Redis server statistics
- The DBSIZE command returns the number of existing keys in a Redis server
- The DEBUG SEGFAULT command crashes the Redis server process by performing an invalid memory access
- The command MONITOR shows all the commands processed by the Redis server in real time. MONITOR could reduce Redis's throughput by over 50%.
- The CLIENT LIST command returns a list of all clients connected to the server
- The CLIENT SETNAME command changes a client name; it is only useful for debugging purposes.
- The CLIENT KILL command terminates a client connection
- The FLUSHALL command deletes all keys from Redis
- The command RANDOMKEY returns a random existing key name
- The PERSIST command removes the existing timeout of a given key
- The EXISTS command returns 1 if a certain key exists and 0 if it does not
- The PING command returns the string "PONG". It is useful for testing a server/client connection and verifying that Redis is able to exchange data
- The AUTH command is used to authorize a client to connect to Redis.
- The SCRIPT KILL command terminates the running Lua script if no write operations have been performed by the script. If the script has performed any write operations, the SCRIPT KILL command will not be able to terminate it; in that case, the SHUTDOWN NOSAVE command must be executed.
- The SHUTDOWN command stops all clients, causes data to persist if enabled, and shuts down the Redis server
- The OBJECT ENCODING command returns the encoding used by a given key
In Redis, all data types can use different encodings to save memory or improve performance. For instance, a String that has only digits (for example, 12345) uses less memory than a string of letters (for example, abcde) because they use different encodings. Data types will use different encodings based on thresholds defined in the Redis server configuration.
If you have a large dataset and need to optimize for memory, tweak these configurations until you find a good trade-off between memory and performance.
Redis was designed to be used in a trusted private network. It supports a very basic security system to protect the connection between the client and server via a plain-text password. Redis does not implement Access Control List (ACL)
. Therefore, it is not possible to have users with different permission levels.
The authentication feature can be enabled through the configuration requirepass
. Choose a complex password of at least 64 characters. The command AUTH authenticates a Redis client.
$ redis-cli
127.0.0.1:6379> SET hello world
(error) NOAUTH Authentication required.
127.0.0.1:6379> AUTH a7f$f35eceb7e@3edd502D892f5885007869dd2f80434Fed5b4!fac0057f51fM
OK
127.0.0.1:6379> SET hello world
OK
Another interesting technique is obfuscating or disabling some critical commands, such as FLUSHDB
, FLUSHALL
, CONFIG
, KEYS
, DEBUG
, and SAVE
. To disable a command, you should set the new name to an empty string. It is good practice to create a configuration file called rename-commands.conf
for organization purposes. Use the directive include
in redis.conf
to include the rename-commands.conf
file.
rename-command FLUSHDB e0cc96ad2eab73c2c347011806a76b73
rename-command FLUSHALL a31907b21c437f46808ea49322c91d23a
rename-command CONFIG ""
rename-command KEYS ""
rename-command DEBUG ""
rename-command SAVE ""
Add the following to redis.conf and then restart the redis-server:
include /path/to/config/rename-commands.conf
$ redis-cli
127.0.0.1:6379> SAVE
(error) ERR unknown command 'SAVE'
127.0.0.1:6379> FLUSHALL
(error) ERR unknown command 'FLUSHALL'
127.0.0.1:6379> a31907b21c437f46808ea49322c91d23a
OK
There are many ways to make Redis secure, such as the following:
- Use firewall rules to block access from unknown clients
- Run Redis on the loopback interface, rather than a publicly accessible network interface. Bind redis to
127.0.0.1
. - Run Redis in a virtual private cloud instead of the public Internet
- Encrypt client-to-server communication. Use a tool called Stunnel.
if a Redis instance is shut down, crashes, or needs to be rebooted, all of the stored data will be lost. To solve this problem, Redis provides two mechanisms to deal with persistence: Redis Database (RDB)
and Append-only File (AOF)
. Both of these mechanisms can be used separately or simultaneously in the same Redis instance.
Recommended reading Redis persistence demystified
A .rdb file is a binary that has a point in time representing the data stored in a Redis instance. The RDB file format is optimized for fast reads and writes. To achieve the necessary performance, the internal representation of a .rdb file on a disk is very similar to Redis's in-memory representation. A single RDB file is sufficient to restore a Redis instance completely.
RDB is great for backups and disaster recovery because it allows you to save an RDB file every hour, day, week, or month, depending on your needs.
The command SAVE
creates an RDB immediately, but it should be avoided because it blocks the Redis server during snapshot creation. The command BGSAVE
(background save) should be used instead; it has the same effect as SAVE
, but it runs in a child process so as not to block Redis.
Redis creates snapshots based on two conditions: if in X seconds, Y amount of write operations have happened in your Redis instance, it will create a .rdb file. The RDB filename is based on the directive dbfilename
(this defaults to dump.rdb). it is not recommended to use save directives less than 30 seconds apart from each other. RDB is not a 100% guaranteed data recovery approach.
Another downside to RDB is that every time that you need to create a snapshot, the Redis main process will execute a fork() to create a child process to cause the data to persist on the disk. It can make your Redis instance stop serving clients for milliseconds, sometimes even for a few seconds, depending on the hardware and the size of the dataset.
When AOF is enabled, every time Redis receives a command that changes the dataset, it will append that command to the AOF (Append-only File). With this being said, if you have AOF enabled and Redis is restarted, it will restore the data by executing all commands listed in AOF, preserving the order, and rebuild the state of the dataset. AOF is a "human-readable" append-only log file. There is a tool called redis-check-aof that checks and fixes AOF files easily.
These are the most important directives in the Redis configuration for AOF:
- appendonly: This will enable or disable AOF
- appendfsync: options to flush data to disk. no | always | everysec
Note: Restoring data from an RDB is faster than AOF when recovering a big dataset. This is because an RDB does not need to re-execute every change made in the entire database; it only needs to load the data that was previously stored.
Replication means that while you write to a Redis instance (usually referred to as the master), it will ensure that one or more instances (usually referred to as the slaves) become exact copies of the master.
There are three ways of making a Redis server instance a slave:
- Add the directive
slaveof IP PORT
to the configuration file and start a Redis server using this configuration - Use the redis-server command-line option
--slaveof IP PORT
$ redis-server --port 5555
$ redis-server --port 6666 --slaveof 127.0.0.1 5555
$ redis-server --port 7777 --slaveof 127.0.0.1 5555
$ redis-cli -p 5555 SET testkey testvalue
OK
$ redis-cli -p 6666 GET testkey
"testvalue"
- Use the command
SLAVEOF IP PORT
Replicas are widely used for scalability purposes so that all read operations are handled by replicas and the master handles only write operations.
Data redundancy is another reason for having multiple replicas.
Persistence can be moved to the replicas so that the master does not perform disk I/O operations. In this scenario, the master server needs to disable persistence, and it should not restart automatically for any reason; otherwise, it will restart with an empty dataset and replicate it to the replicas, making them delete all of their stored data.
It is possible to improve data consistency guarantees by requiring a minimum number of replicas connected to the master server (min-slaves-to-write
).
Replicas are very useful in a master failure scenario because they contain all of the most recent data and can be promoted to master. Unfortunately, when Redis is running in single-instance mode, there is no automatic failover to promote a slave to master. All replicas and clients connected to the old master need to be reconfigured with the new master. The automatic failover feature is the core of Redis Sentinel
.
The command SLAVEOF NO ONE
converts a slave into a master instance, and it should be used in a failover scenario.
$ redis-cli -p 5555 DEBUG SEGFAULT
$ redis-cli -p 6666 SLAVEOF NO ONE
$ redis-cli -p 7777 SLAVEOF 127.0.0.1 6666
In the previous scenario, all clients that were connected to 127.0.0.1:5555 need to be reconfigured to connect to 127.0.0.1:6666.
docker run --name redis-master -d redis:alpine redis-server
Starting 2 slave instances pointing to the master instance. By default, slave instances are read-only.
docker run --name redis-slave-1 --link redis-master:redis-master -d -v `{pwd}`/data/slave-1:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379
docker run --name redis-slave-2 --link redis-master:redis-master -d -v `{pwd}`/data/slave-2:/data redis:alpine redis-server --appendonly yes --slaveof redis-master 6379
Connect to the master and make some changes to the dataset
docker run -it --link redis-master:redis-master --rm redis:alpine redis-cli -h redis-master -p 6379
Connect to the slave instances and double check you can read keys you created in the master instance
docker run -it --link redis-slave-1:redis-slave-1 --rm redis:alpine redis-cli -h redis-slave-1 -p 6379
docker run -it --link redis-slave-2:redis-slave-2 --rm redis:alpine redis-cli -h redis-slave-2 -p 6379
In case of a master failure scenario, an slave instance can be promoted to master. All clients should connect to the new master instance. Execute this in a slave instance:
slaveof no one
Redis sentinel is a distributed system designed to automatically promote a Redis slave to master if the existing master fails. One sentinel for each Redis server. Sentinel listens on its own port and is a separate process.
A client always connects to a Redis instance, but it needs to query a Sentinel to find out what Redis instance it is going to connect to. Communication between all Sentinels takes place through a Pub/Sub channel called __sentinel__:hello
in the Redis master.
Partitioning is a general term used to describe the act of breaking up data and distributing it across different hosts. There are two types of partitioning: horizontal partitioning (keys are distributed across different servers aka sharding) and vertical partitioning. Partitioning is performed in a cluster of hosts when better performance, maintainability, or availability is desired.
This is useful for case where:
- The total data to be stored is larger than the total memory available in a Redis server
- The network bandwidth is not enough to handle all of the traffic
Partitioning types:
- Range. Data is distributed based on a range of keys.
- Hash. It consists in finding the instance to send the commands to by applying a hash function to the Redis key.
- Consistent hashing. Consistent hashing, in our context, is a kind of hashing that remaps only a small portion of the data to different servers when the list of Redis servers is changed (only K/n keys are remapped, where K is the number of keys and n is the number of servers). The technique consists of creating multiple points in a circle for each Redis key and server. The appropriate server for a given key is the closest server to that key in the circle (clockwise); this circle is also referred to as "ring." The points are created using a hash function, such as MD5.
Different ways to implement partitioning:
- The client layer. Your own implementation.
- The proxy layer. It's an extra layer that proxies all redis queries and performs partitioning for applications. e.g twemproxy, also read this, this
- The query router layer. It's implemented in the data store itself. e.g Redis Cluster
It's a technique of ensuring that keys are stored on the same server. The convention is to add a tag to a key name with the tag name inside curly braces.
users:1{users}
users:3{users}
It was designed to automatically shard data across different Redis instances and perform automatic failover if any problems happens to any master instance. It uses to ports, lower (for client connections) and higher (node-to-node communication).
It requires at least 3 master instances. It's recommended that you have at least one replica per master.
When connecting to a Redis cluster using the redis-cli
, the -c
parameter is required to enable cluster mode.
redis-cli -c -h <hostname or IP> -p <port-number>
The data partitioning method used is called hash slot
. Each master in a cluster owns a portion of the 16384 slots. A master without any slots won't be able to store any data. You need to manually assign x
number of slots to each master.
HASH_SLOT = CRC16(key) mod 16384
hash tags
are used to apply the hast function and ensure than different key names end up in the same hash slot. In the following example, all keys would be stored in the same slot based on the hash tag {user123}
.
SADD {user123}:friends:usa "John" "Bob"
SADD {user123}:friends:brazil "Max" "Hugo"
Since the redis instances need to be able to connect to each other, we should create a docker network they can join
docker network create redis-cluster-network
Creating 3 redis instances in cluster mode
docker run --name redis-master-1 --network redis-cluster-network -d -v `{pwd}`/data/master-1:/data -v `{pwd}`/config/redis-cluster-master-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
docker run --name redis-master-2 --network redis-cluster-network -d -v `{pwd}`/data/master-2:/data -v `{pwd}`/config/redis-cluster-master-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
docker run --name redis-master-3 --network redis-cluster-network -d -v `{pwd}`/data/master-3:/data -v `{pwd}`/config/redis-cluster-master-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
Listing all redis master nodes
docker container ps
Connecting to a master node and getting information about the cluster. It should report the cluster state as fail
since we're not done setting up the cluster.
docker container exec -it redis-master-3 /bin/sh
redis-cli -c
cluster info
Next, we should distribute the 16384 slots evenly across all 3 Redis instances. The cluster addslots
informs the node what slots it should own.
Note: Since we will use bash text expansion {0..5460}, it's bash trick, If you need to install bash
on Linux alpine, do the following:
apk update
apk add bash
bash
Assigning the slots each redis instance should own. Slots are where keys will be stored based on the key's hash. In order to allow redis cluster to start in a safe way, we should manually change the configuration epoch. Note: don't do this again, this is the only time when you need to change the configuration epoch.
docker container exec -it redis-master-1 /bin/sh
redis-cli -c cluster addslots {0..5460}
redis-cli -c cluster set-config-epoch 1
docker container exec -it redis-master-2 /bin/sh
redis-cli -c cluster addslots {5461..10922}
redis-cli -c cluster set-config-epoch 2
docker container exec -it redis-master-3 /bin/sh
redis-cli -c cluster addslots {10923..16383}
redis-cli -c cluster set-config-epoch 3
Making all redis instances aware of each other so they can exchange information. e.g on redis-master-1
execute:
redis-cli -c cluster meet <redis-master-2 IP> 6379
redis-cli -c cluster meet <redis-master-3 IP> 6379
Double-checking the cluster is up and running:
redis-cli -c cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3
cluster_size:3
cluster_current_epoch:3
cluster_my_epoch:1
cluster_stats_messages_sent:191
cluster_stats_messages_received:191
Adding replicas to the master Redis instances. So far, we have 3 Redis masters but not slaves, we should have at least one slave per master, and even having one or two extra slaves above the minimum required (cluster-migration-barrier
) is recommended.
- Create a new Redis instance in cluster mode
docker run --name redis-slave-1 --network redis-cluster-network -d -v `{pwd}`/data/slave-1:/data -v `{pwd}`/config/redis-cluster-slave-1.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
docker run --name redis-slave-2 --network redis-cluster-network -d -v `{pwd}`/data/slave-2:/data -v `{pwd}`/config/redis-cluster-slave-2.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
docker run --name redis-slave-3 --network redis-cluster-network -d -v `{pwd}`/data/slave-3:/data -v `{pwd}`/config/redis-cluster-slave-3.conf:/usr/local/etc/redis/redis.conf redis:alpine redis-server /usr/local/etc/redis/redis.conf
- Add the new Redis instance to the cluster using
cluster meet
docker container exec -it redis-slave-1 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379
docker container exec -it redis-slave-2 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379
docker container exec -it redis-slave-3 /bin/sh
redis-cli -c cluster meet 172.19.0.2 6379
- Getting the node ID of the master that it'll be replicated.
cluster nodes
outputs a list of all the nodes that belong to the cluster, alogn with their properties. The node ID is the first string that is displayed in each row.
redis-cli -c cluster nodes
- Start the replication by using the command
cluster replicate <master-node-id>
-- Slave 1
redis-cli -c cluster replicate 7e78c9a76ee462350a064694683fae266b1afc3a
-- Slave 2
redis-cli -c cluster replicate 2eb1abc6c8ad9a98333eeb1dafe088748ecf97d5
--Slave 3
redis-cli -c cluster replicate b749483152945869cdd062cb29a0f780b6f0ce29
Now that the cluster is up and running, let's add a key for testing sake:
- Connect to any redis instance in the cluster
- Create a key e.g
set cebroker:dev:test-cluster "Yay!"
. It will display in which Redis master that key was stored. - Connect to the replica redis instance of the mastr (in the last step) and try to get the newly created key. e.g
get cebroker:dev:test-cluster
redis-cli -c
set cebroker:dev:test-cluster "Yay!"
get cebroker:dev:test-cluster
- Use benchmarks to decide what data type works best for your case.
FLUSHALL
+ create keys +INFO memory
. - Instead of using multiple redis DBs, you should run multiple redis servers. Since redis is single threaded, a redis server with multiple DBs will only use one CPU.
- Use namespaces for your keys. e.g
namespace:key-name
,music-online:album:10001:songs
- There is a Linux kernel parameter called
swappiness
that controls when the operating system will start using the swap space. We recommend that you use a swappiness of 0 when your data always fits into the RAM and 1 when you are not sure.
sysctl -w vm.swappiness=0
vm.swappiness=0 </etc/sysctl.conf>
- The Redis server needs enough memory to perform backups if any strategy is enabled. In the worst-case scenario, redis-server may double the used memory during the backup. There is a configuration directive called
maxmemory
that limits the amount of memory that Redis is allowed to use (in bytes). Redis should not use more than 50 percent of the available memory when any backup strategy is enabled. Make sure that you set up alarms for Redis memory usage. - Inappropriate persistence strategy. If your applicartion doesn't need persistence, disable RDB and AOF. If your application has tolerance for data loss, use RDB. If your application requires fully durable persistence, use both RDB and AOF.
- Enable authentication e.g
requirepass password-in-plain-text
- Disable critical commands. e.g FLUSHDB, FLUSHALL, CONFIG, KEYS, DEBUG and SAVE. You do this by including a
renamed-commands.conf
into the redis.conf file. - Encrypt client to server communication using
stunnel
. - All read operations are handled by slave instances. All write operations are handled by the master instance.
- Persistance can be moved to the slaves so the master don't have to write to disk. Don't restart the master otherwise it will lose all the data and will replicate its empty dataset to the slaves.