Created
August 9, 2019 14:51
-
-
Save dingmaotu/b465509f5c5d54dceacf5a2eb985c739 to your computer and use it in GitHub Desktop.
fast way to remove large number of redis keys by pattern
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# to remove all keys matching a pattern in redis | |
# we could use the recommended way: redis-cli --scan --pattern 'abc:*' | xargs redis-cli del | |
# but this can be very slow if you have lots of data (like 8G redis cluster) | |
# we can use the following script to remove keys (considerably faster) | |
import time | |
import logging | |
from rediscluster import StrictRedisCluster | |
logger = logging.getLogger(__name__) | |
client = StrictRedisCluster(startup_nodes=hosts, password=password, | |
skip_full_coverage_check=True) | |
pattern = "abc:*" | |
start_time = time.time() | |
item_count = 0 | |
batch_size = 100000 | |
keys = [] | |
logger.info("Start scanning keys...") | |
for k in client.scan_iter(pattern, count=batch_size): | |
keys.append(k) | |
if len(keys) >= batch_size: | |
item_count += len(keys) | |
logger.info("batch delete to {} ...".format(item_count)) | |
client.delete(*keys) | |
keys = [] | |
if len(keys) > 0: | |
item_count += len(keys) | |
logger.info("batch delete to {}".format(item_count)) | |
client.delete(*keys) | |
end_time = time.time() | |
logger.info("deleted {0} keys in {1:0.3f} ms.".format(item_count, (end_time - start_time) / 1000.0)) |
Or you could just change xargs
to xargs -n500 -P10
, which deletes 500 keys per invocation and runs 10 clients in parallel.
Thanks for this gitst. This is much faster than not using count=batch_size
. Saves me a lot of the time.
@andsens I don't think this works with Cluster Mode enabled right? I was getting CROSSSLOT errors
@DaveLanday hm, no that would fail. A multikey (500 keys in this case) operation, if I understand cluster mode correctly (never used it), has to operate on the same hashing slot.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The key difference is to use a larger scan count, that is the batch size. It is extremely slow to scan keys in a small count (and redis-cli --scan does not provide an argument to specify it; and I suppose it uses a small default value). Since I am already using 100,000 batch in a single delete command, pipelining would not make much difference here, I think.
To do this in one round trip, you have to download all data to your local machine, and then send them back to redis server. If the data is large, it can cause some problems (for example, what if your local machine runs out of memory? what if this triggers memory swap?). So I prefer to use batches.