Skip to content

Instantly share code, notes, and snippets.

@esafwan
Last active March 21, 2024 17:24
Show Gist options
  • Save esafwan/eca524aa042772f3dd8f691f6c6e5df1 to your computer and use it in GitHub Desktop.
Save esafwan/eca524aa042772f3dd8f691f6c6e5df1 to your computer and use it in GitHub Desktop.
Python zlib use custom dictionary(pre-defined zdict) example with and without unicode.
import zlib
#Data for compression
hello = b'hello'
#Compression with dictionary
co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
compress_data = co.compress(hello) + co.flush()
#Compression without dictionary
co_nodict = zlib.compressobj(wbits=-zlib.MAX_WBITS, )
compress_data_nodict = co_nodict.compress(hello) + co_nodict.flush()
#De-compression with dictionary
do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
data = do.decompress(compress_data)
#print compressed output when dict used
print(compress_data)
print('\n')
#print compressed output when dict not used
print(compress_data_nodict)
print('\n')
#print decompressed output when dict used
print(data)
#UNICODE EXAMPLE
import zlib
#Data for compression
unicode_data = 'റെക്കോർഡ്'
hello = unicode_data.encode('utf-16be')
#Compression with dictionary
co = zlib.compressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
compress_data = co.compress(hello) + co.flush()
#Compression without dictionary
co_nodict = zlib.compressobj(wbits=-zlib.MAX_WBITS, )
compress_data_nodict = co_nodict.compress(hello) + co_nodict.flush()
#De-compression with dictionary
do = zlib.decompressobj(wbits=-zlib.MAX_WBITS, zdict=hello)
data = do.decompress(compress_data)
#print compressed output when dict used
print(compress_data)
print('\n')
#print compressed output when dict not used
print(compress_data_nodict)
print('\n')
#print decompressed output when dict used
print(data)
@ilyazub
Copy link

ilyazub commented Mar 21, 2024

Thank you. It's super useful.

Thanks to your example, we at SerpApi found out that zlib with a custom dictionary compressed data even more than zstd with a custom dictionary. (Checked with the small JSON for Sidekiq scheduled jobs.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment