Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save guozi/60d19fa7bdbad6e1c3826317805a8f25 to your computer and use it in GitHub Desktop.
Save guozi/60d19fa7bdbad6e1c3826317805a8f25 to your computer and use it in GitHub Desktop.
Redis Best Practices

Best Practices for Azure Redis

Below are a set of best practices that I recommend for most customers. This information is based on my experience helping hundreds of Azure Redis customers investigate various issues.

Configuration and Concepts

  1. Use Standard or Premium Tier for Production systems. The Basic Tier is a single node system with no data replication and no SLA. Also, use at least a C1 cache. C0 caches are really meant for simple dev/test scenarios since they have a shared CPU core, very little memory, are prone to "noisy neighbor", etc.
  2. Remember that Redis is an In-Memory data store. Read this article so that you are aware of scenarios where data loss can occur.
  3. Configure your client library to use a "connect timeout" of at least 10 to 15 seconds, giving the system time to connect even under higher CPU conditions. If your client or server tend to be under high load, use an even larger value. If you use a large number of connections in a single application, consider adding some type of staggered reconnect logic to prevent a flood of connections hitting the server at the same time.
  4. Develop your system such that it can handle connection blips due to patching and failover.
  5. Configure your maxmemory-reserved setting to improve system responsiveness under memory pressure conditions, especially for write-heavy workloads or if you are storing larger values (100KB or more) in Redis. I would recommend starting with 10% of the size of your cache, then increase if you have write-heavy loads. See some considerations when selecting a value.
  6. Redis works best with smaller values, so consider chopping up bigger data into multiple keys. In this Redis discussion, 100kb is considered "large". Read this article for an example problem that can be caused by large values.
  7. Locate your cache instance and your application in the same region. Connecting to a cache in a different region can significantly increase latency and reduce reliability. Connecting from outside of Azure is supported, but not recommended especially when using Redis as a cache (as opposed to a key/value store where latency may not be the primary concern).
  8. Reuse connections - Creating new connections is expensive and increases latency, so reuse connections as much as possible. If you choose to create new connections, make sure to close the old connections before you release them (even in managed memory languages like .NET or Java).
  9. Avoid Expensive Commands - Some redis operations, like the "KEYS" command, are VERY expensive and should be avoided. Read more here

Memory Management

There are several things related to memory usage within your Redis server instance that you may want to consider. Here are a few:

  1. Choose an eviction policy that works for your application. The default policy for Azure Redis is volitile-lru, which means that only keys that have an expiration value configured will be considered for eviction. If no keys have an expiration value, then the system won't evict any keys and clients will get out of memory errors when trying to write to Redis. If you want the system to allow any key to be evicted if under memory pressure, then you may want to consider the allkeys-lru policy.
  2. Set an expiration value on your keys. This will help expire keys proactively instead of waiting until there is memory pressure. Evictions due to memory pressure can cause additional load on your server, so it is always best to stay ahead of the curve whenever possible. See the Expire and ExpireAt commands for more details.

Client Library Specific Guidance

  1. StackExchange.Redis (.NET)
  2. Java - Which client should I use?
  3. Lettuce (Java)
  4. Jedis (Java)
  5. Node.js
  6. Asp.Net Session State Provider

When is it safe to retry?

Unfortunately, there is no easy answer. Each application needs to decide what operations can be retried and which cannot because each has different requirements and inter-key dependencies. Things you should consider:

  1. You can get client-side errors even though Redis successfully ran the command you asked it to run. For example:
    • Timeouts are a client-side concept. If the operation reached the server, the server will run the command even if the client gives up waiting.
    • When an error occurs on the socket connection, it is indeterminate whether or not the operation ran on the server. For example, the connection error could happen after the request was processed by the server but before the response was received by the client.
  2. How does my application react if I accidentally run the same operation twice? For instance, what if I increment an integer twice instead of just once? Is my application writing to the same key from multiple places? What if my retry logic overwrites a value set by some other part of my app?

If you would like to test how your code works under error conditions, one options would be to use the Reboot Feature as a way to trigger such connection blips, then see how your application reacts.

Performance Testing

  1. Start by using redis-benchmark.exe to get a feel for possible throughput/latency before writing your own perf tests. Redis-benchmark documentation can be found here http://redis.io/topics/benchmarks. Note that redis-benchmark does not support SSL, so you will have to enable the Non-SSL port through the Azure Portal before you run the test. A windows compatible version of redis-benchmark.exe can be found here
  2. The client VM used for testing should be in the same region as your Redis cache instance.
  3. We recommend using Dv2 VM Series for your client as they have better hardware and will give the best results.
  4. Make sure your client VM you choose has at least as much computing and bandwidth capability as the cache you are testing.
  5. Enable VRSS on the client machine if you are on Windows. See here for details. Example powershell script:

PowerShell -ExecutionPolicy Unrestricted Enable-NetAdapterRSS -Name (Get-NetAdapter).Name

  1. Premium tier Redis instances will have better network latency and throughput because they are running on better hardware for both CPU and Network.

Note: Our observed performance results are published here for your reference. Also, be aware that SSL/TLS adds some overhead, so you may get different latencies and/or throughput if you are using transport encryption.

Redis-Benchmark Examples

Setup the cache:

redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t SET -n 10 -d 1024

Test Latency for GET requests using a 1k payload:

redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t GET -d 1024 -P 50 -c 4

Test throughput you are able to get using Pipelined GET requests with 1k payload.

redis-benchmark.exe -h yourcache.redis.cache.windows.net -a yourAccesskey -t GET -n 1000000 -d 1024 -P 50 -c 50

Jedis Java Client

Jedis instances are single threaded

  • Don't use the same Jedis connection instance from multiple threads at the same time.
  • Using the same Jedis instance from multiple threads at the same time will result in socket connection errors/resets or strange error messages like "expected '$' but got ' '".

Use JedisPool

  • This allows you to talk to redis from multiple threads while still getting the benefits of reused connections.
  • The JedisPool object is thread-safe and can be used from multiple threads at the same time.
  • This pool should be configured once and reused.
  • Make sure to return the Jedis instance back to the pool when done, otherwise you will leak the connection.
  • We have seen a few cases where connections in the pool get into a bad state. As a failsafe, you may want to re-create the JedisPool if you see connection errors that continue for longer than 30 seconds.

Some important settings to consider:

Setting Description
connectTimeout How long to allow for new connections to be established (in milliseconds). In general, this should be at least 5000ms. If your client application tends to have high spikes CPU usage, setting this to 15000ms or 20000ms would be a good idea.
soTimeout This configures the socket timeout (in milliseconds) for the underlying connection. You can basically think of this as the operation timeout (how long you are willing to wait for a response from Redis). Think about this one in terms of worst case, not best case. Setting this too low can cause you to get timeout errors due to unexpected bursts in load. I typically recommend 1000ms as a good value for most customers.
port In Azure, 6379 is non-ssl and 6380 is SSL/TLS. Important Note: 6379 is disabled by default - you have to explicitly enable this insecure port if you wish to use it.

Choose JedisPoolConfig settings with care

Setting Description
maxTotal This setting controls the max number of connections that can be created at a given time. Given that Jedis connections cannot be shared across threads, this setting affects the amount of concurrency your application can have when talking to Redis. Note that each connection does have some memory and CPU overhead, so setting this to a very high value may have negative side effects. If not set, the default value is 8, which is probably too low for most applications. When chosing a value, consider how many concurrent calls into Redis you think you will have under load.
maxIdle This is the max number of connections that can be idle in the pool without being immediately evicted (closed). If not set, the default value is 8. I would recommend that this setting be configured the same as maxTotal to help avoid connection ramp-up costs when your application has many bursts of load in a short period of time. If a connection is idle for a long time, it will still be evicted until the idle connection count hits minIdle (described below).
minIdle This is the number of "warm" connections (e.g. ready for immediate use) that remain in the pool even when load has reduced. If not set, the default is 0. When choosing a value, consider your steady-state concurrent requests to Redis. For instance, if your application is calling into Redis from 10 threads simultaneously, then you should set this to at least 10 (probably a bit higher to give you some room.
blockWhenExhausted This controls behavior when a thread asks for a connection, but there aren't any that are free and the pool can't create more (due to maxTotal). If set to true, the calling thread will block for maxWaitMillis before throwing an exception. The default is true and I recommend true for production environments. You could set it to false in testing environments to help you more easily discover what value to use for maxTotal.
maxWaitMillis How long to wait in milliseconds if calling JedisPool.getResource() will block. The default is -1, which means block indefinitely. I would set this to the same as the socketTimeout configured. Related to blockWhenExhausted.
TestOnBorrow Controls whether or not the connection is tested before it is returned from the pool. The default is false. Setting to true may increase resilience to connection blips but may also have a performance cost when taking connections from the pool. In my quick testing, I saw a noticable increase in the 50th percentile latencies, but no significant increase in 98th percentile latencies.

Use Pipelining

  • This will improve the throughput of the application. Read more about redis pipelining here https://redis.io/topics/pipelining.
  • Jedis does not do pipelining automatically for you. You have to call diffeent APIs in order to get the significant performance benefits that can come from using pipelining.
  • Examples can be found here

Log Pool Usage Periodically

  • Debugging performance problems due to JedisPool contention issues will be easier if you log the pool usage regularly.
  • If you ever get an error when trying to get a connection from the pool, you should definitely log usage stats. There is sample code here that shows which values to log...

Sample Code

Node.js

Avoid Idle Connections

Azure Redis currently has 10 minute idle timeout for connections, which will cause short network blips if your connection has long periods of inactivity. The most common Node.js libraries should automatically reconnect.

However, you can avoid this brief connectivity loss if your application sends a PING command on connections to prevent them from being idle. Some client libraries send this ping automatically.

At the time of this writing, the node.js redis client library I tested (node_redis) did NOT do this automatically. You can do this yourself by adding a timer that sends PING command every minute or two. Below is an example of how to do this.

    setInterval(function(){
    	console.log('redisClient => Sending Ping...');
    	redisClient.ping();
    }, 60000); // 60 seconds

Recreate the Connection

We have seen a few cases where a node_redis connection gets into a bad state and can no longer successfully send commands to Redis even though other clients are actively able to interact with Redis. If you see connection issues that last longer than some threshold (say 30 seconds), then you may want to add logic to your app that forcefully recreates the connection instead of waiting for node_redis to reconnect.

ASP.Net Session State Provider

Session State Best Practices

  1. Enable session state only on required pages - This will avoid known session state provider performance problems.
    • You can disable session state by setting the web.config enableSessionState option to false.
      <system.web>
        <pages enableSessionState=false>
      
    • You can enable it on specific pages by setting the page directive's EnableSessionState option to true
      <%@ Page EnableSessionState=true %>
      
    • Mark pages using Session State as ReadOnly whenever possible - this helps avoid locking contention.
      <%@ Page EnableSessionState=ReadOnly %>
      
  2. Avoid Session State (or at least use ReadOnly) on pages that have long load times - When a page with write-access to the session state takes a long time to load, it will hold the lock for that session until the load completes. This can prevent other requests for other pages for the same session from loading. Also, the session state module in ASP.NET will, in the background, continue to ask for the session lock for any additional requests for that same session until the lock is available or until the executionTime is exceeded for the lock. This can generate additional load on your session state store.
  3. Make sure you understand the impact of session state locks. Read this article for an example of why this is important.
  4. Select your httpRuntime/executionTime carefully - The executionTime you select is the duration that the session lock is held should the app crash without releasing the lock. Select a value that is as low as possible while still meeting your application's specific requirements.

Note: None of these recommendations are specific to Redis - they are good recommendations regardless of which SessionStateProvider you use. Also, some of these recommendations are based on this article, which has additional recommendations beyond those specifically called out here.

StackExchange.Redis

General Guidance

  1. Set AbortConnect to false, then let the ConnectionMultiplexer reconnect automatically. See here for details

  2. Reuse the ConnectionMultiplexer - do not create a new one for each request. The Lazy<ConnectionMultiplexer> pattern shown here is strongly recommended.

  3. Configure your ThreadPool settings to avoid timeouts.

  4. Be aware of the performance costs associated with different operations you are running. For instance, the "KEYS" command is an O(n) operation and should be avoided. The redis.io site has details around the time complexity for each operation that it supports.

  5. Consider turning on "Server GC". "Workstation GC" is the default and can impact the latencies when garbage collection is in happening.

  6. Most customers find that a single ConnectionMultiplexer is sufficient for their needs. However, if you have high throughput requirements, you may consider slowly increasing the number of connections to see if you get an improvement in throughput. Avoid setting it to an arbitrarily large number of connections as it may not give the desired benefit.

  7. Configure supported TLS settings. If you are targeting .NET 4.7 or later, you should not have to do anything because StackExchange.Redis will automatically use the OS level settings when deciding which TLS versions to support (which is a good thing in most cases). If you are targeting an older version of .NET, then you should configure the client to use the highest TLS version that your client system supports (typically TLS 1.2). Once you move to a newer version of the .NET framework, then you should probably remove this configuration and let the OS settings take precedence. This can configured through the sslProtocols connection string entry (requires NuGet package version 1.2.3 or later), or through the ConfigurationOptions class as show here:

    var options = ConfigurationOptions.Parse(connectionString);

    options.SslProtocols = System.Security.Authentication.SslProtocols.Tls12;

    ConnectionMultiplexer.Connect(options);

Reconnecting with Lazy<T> pattern

We have seen a few rare cases where StackExchange.Redis fails to reconnect after a connection blip (for example, due to patching). Restarting the client or creating a new ConnectionMultiplexer will fix the issue. Here is some sample code that still uses the recommended Lazy<ConnectionMultiplexer>pattern while allowing apps to force a reconnection periodically. Make sure to update code calling into the ConnectionMultiplexer so that they handle any ObjectDisposedException errors that occur as a result of disposing the old one.

using System;
using System.Threading;
using StackExchange.Redis;
static class Redis
{
static long lastReconnectTicks = DateTimeOffset.MinValue.UtcTicks;
static DateTimeOffset firstError = DateTimeOffset.MinValue;
static DateTimeOffset previousError = DateTimeOffset.MinValue;
static object reconnectLock = new object();
// In general, let StackExchange.Redis handle most reconnects,
// so limit the frequency of how often this will actually reconnect.
public static TimeSpan ReconnectMinFrequency = TimeSpan.FromSeconds(60);
// if errors continue for longer than the below threshold, then the
// multiplexer seems to not be reconnecting, so re-create the multiplexer
public static TimeSpan ReconnectErrorThreshold = TimeSpan.FromSeconds(30);
static string connectionString = "TODO: CALL InitializeConnectionString() method with connection string";
static Lazy<ConnectionMultiplexer> multiplexer = CreateMultiplexer();
public static ConnectionMultiplexer Connection { get { return multiplexer.Value; } }
public static void InitializeConnectionString(string cnxString)
{
if (string.IsNullOrWhiteSpace(cnxString))
throw new ArgumentNullException(nameof(cnxString));
connectionString = cnxString;
}
/// <summary>
/// Force a new ConnectionMultiplexer to be created.
/// NOTES:
/// 1. Users of the ConnectionMultiplexer MUST handle ObjectDisposedExceptions, which can now happen as a result of calling ForceReconnect()
/// 2. Don't call ForceReconnect for Timeouts, just for RedisConnectionExceptions or SocketExceptions
/// 3. Call this method every time you see a connection exception, the code will wait to reconnect:
/// a. for at least the "ReconnectErrorThreshold" time of repeated errors before actually reconnecting
/// b. not reconnect more frequently than configured in "ReconnectMinFrequency"
/// </summary>
public static void ForceReconnect()
{
var utcNow = DateTimeOffset.UtcNow;
var previousTicks = Interlocked.Read(ref lastReconnectTicks);
var previousReconnect = new DateTimeOffset(previousTicks, TimeSpan.Zero);
var elapsedSinceLastReconnect = utcNow - previousReconnect;
// If mulitple threads call ForceReconnect at the same time, we only want to honor one of them.
if (elapsedSinceLastReconnect > ReconnectMinFrequency)
{
lock (reconnectLock)
{
utcNow = DateTimeOffset.UtcNow;
elapsedSinceLastReconnect = utcNow - previousReconnect;
if (firstError == DateTimeOffset.MinValue)
{
// We haven't seen an error since last reconnect, so set initial values.
firstError = utcNow;
previousError = utcNow;
return;
}
if (elapsedSinceLastReconnect < ReconnectMinFrequency)
return; // Some other thread made it through the check and the lock, so nothing to do.
var elapsedSinceFirstError = utcNow - firstError;
var elapsedSinceMostRecentError = utcNow - previousError;
var shouldReconnect =
elapsedSinceFirstError >= ReconnectErrorThreshold // make sure we gave the multiplexer enough time to reconnect on its own if it can
&& elapsedSinceMostRecentError <= ReconnectErrorThreshold; //make sure we aren't working on stale data (e.g. if there was a gap in errors, don't reconnect yet).
// Update the previousError timestamp to be now (e.g. this reconnect request)
previousError = utcNow;
if (shouldReconnect)
{
firstError = DateTimeOffset.MinValue;
previousError = DateTimeOffset.MinValue;
var oldMultiplexer = multiplexer;
CloseMultiplexer(oldMultiplexer);
multiplexer = CreateMultiplexer();
Interlocked.Exchange(ref lastReconnectTicks, utcNow.UtcTicks);
}
}
}
}
private static Lazy<ConnectionMultiplexer> CreateMultiplexer()
{
return new Lazy<ConnectionMultiplexer>(() => ConnectionMultiplexer.Connect(connectionString));
}
private static void CloseMultiplexer(Lazy<ConnectionMultiplexer> oldMultiplexer)
{
if (oldMultiplexer != null)
{
try
{
oldMultiplexer.Value.Close();
}
catch (Exception)
{
// Example error condition: if accessing old.Value causes a connection attempt and that fails.
}
}
}
}
import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.SSLPeerUnverifiedException;
import javax.net.ssl.SSLSession;
import redis.clients.jedis.*;
import javax.net.ssl.*;
public class Redis {
private static Object staticLock = new Object();
private static JedisPool pool;
private static String host;
private static int port; // 6379 for NonSSL, 6380 for SSL
private static int connectTimeout; //milliseconds
private static int operationTimeout; //milliseconds
private static String password;
private static JedisPoolConfig config;
// Should be called exactly once during App Startup logic.
public static void initializeSettings(String host, int port, String password, int connectTimeout, int operationTimeout) {
Redis.host = host;
Redis.port = port;
Redis.password = password;
Redis.connectTimeout = connectTimeout;
Redis.operationTimeout = operationTimeout;
}
// MAKE SURE to call the initializeSettings method first
public static JedisPool getPoolInstance() {
if (pool == null) { // avoid synchronization lock if initialization has already happened
synchronized(staticLock) {
if (pool == null) { // don't re-initialize if another thread beat us to it.
JedisPoolConfig poolConfig = getPoolConfig();
boolean useSsl = port == 6380 ? true : false;
int db = 0;
String clientName = "MyClientName"; // null means use default
SSLSocketFactory sslSocketFactory = null; // null means use default
SSLParameters sslParameters = null; // null means use default
HostnameVerifier hostnameVerifier = new SimpleHostNameVerifier(host);
pool = new JedisPool(poolConfig, host, port, connectTimeout,operationTimeout,password, db,
clientName, useSsl, sslSocketFactory, sslParameters, hostnameVerifier);
}
}
}
return pool;
}
public static JedisPoolConfig getPoolConfig() {
if (config == null) {
JedisPoolConfig poolConfig = new JedisPoolConfig();
// Each thread trying to access Redis needs its own Jedis instance from the pool.
// Using too small a value here can lead to performance problems, too big and you have wasted resources.
int maxConnections = 200;
poolConfig.setMaxTotal(maxConnections);
poolConfig.setMaxIdle(maxConnections);
// Using "false" here will make it easier to debug when your maxTotal/minIdle/etc settings need adjusting.
// Setting it to "true" will result better behavior when unexpected load hits in production
poolConfig.setBlockWhenExhausted(true);
// How long to wait before throwing when pool is exhausted
poolConfig.setMaxWaitMillis(operationTimeout);
// This controls the number of connections that should be maintained for bursts of load.
// Increase this value when you see pool.getResource() taking a long time to complete under burst scenarios
poolConfig.setMinIdle(50);
Redis.config = poolConfig;
}
return config;
}
public static String getPoolCurrentUsage()
{
JedisPool jedisPool = getPoolInstance();
JedisPoolConfig poolConfig = getPoolConfig();
int active = jedisPool.getNumActive();
int idle = jedisPool.getNumIdle();
int total = active + idle;
String log = String.format(
"JedisPool: Active=%d, Idle=%d, Waiters=%d, total=%d, maxTotal=%d, minIdle=%d, maxIdle=%d",
active,
idle,
jedisPool.getNumWaiters(),
total,
poolConfig.getMaxTotal(),
poolConfig.getMinIdle(),
poolConfig.getMaxIdle()
);
return log;
}
private static class SimpleHostNameVerifier implements HostnameVerifier {
private String exactCN;
private String wildCardCN;
public SimpleHostNameVerifier(String cacheHostname)
{
exactCN = "CN=" + cacheHostname;
wildCardCN = "CN=*" + cacheHostname.substring(cacheHostname.indexOf('.'));
}
public boolean verify(String s, SSLSession sslSession) {
try {
String cn = sslSession.getPeerPrincipal().getName();
return cn.equalsIgnoreCase(wildCardCN) || cn.equalsIgnoreCase(exactCN);
} catch (SSLPeerUnverifiedException ex) {
return false;
}
}
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment