Skip to content

Instantly share code, notes, and snippets.

@tim-salabim
Created February 15, 2023 16:22
Show Gist options
  • Save tim-salabim/85e616d67c676588ebe179b294ce8d9e to your computer and use it in GitHub Desktop.
Save tim-salabim/85e616d67c676588ebe179b294ce8d9e to your computer and use it in GitHub Desktop.
snappy decompression issue
---
title: "snappy decompression issue"
---
```{r}
#| message: false
library(reticulate)
library(ggplot2)
library(arrow)
dat = iris
```
```{python}
import pandas as pd
import pyarrow as pa
import base64
def b64ArrowEncode(data, codec):
ar_dat = pa.ipc.serialize_pandas(data, preserve_index = False)
ar_dat = codec.compress(ar_dat)
b64 = base64.b64encode(ar_dat).decode('utf-8')
return b64
ar_lz4 = b64ArrowEncode(r.dat, pa.Codec("lz4"))
ar_snappy = b64ArrowEncode(r.dat, pa.Codec("snappy"))
```
```{r}
b64ArrowDecode = function(b64_string, codec) {
ar_dat = enc2utf8(b64_string)
ar_dat = base64enc::base64decode(ar_dat)
buf = arrow::BufferReader$create(ar_dat)
reader = arrow::CompressedInputStream$create(
buf
, codec = codec
)
dat = reader$Read(buf$GetSize() * 20) # how to get proper size estimate?
df = arrow::read_ipc_stream(dat$data(), as_data_frame = TRUE)
return(df)
}
py$ar_lz4
py$ar_snappy
b64ArrowDecode(py$ar_lz4, arrow::Codec$create("lz4_frame"))
# b64ArrowDecode(py$ar_snappy, arrow::Codec$create("snappy"))
tryCatch(
b64ArrowDecode(py$ar_snappy, arrow::Codec$create("snappy"))
, error = function(e) print(e)
)
```
@tim-salabim
Copy link
Author

<simpleError: NotImplemented: Streaming decompression unsupported with Snappy>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment