Skip to content

Instantly share code, notes, and snippets.

@asford
Last active January 29, 2021 04:52
Show Gist options
  • Save asford/a6f8aa6fc548dbc55ac6fd448fe62cd8 to your computer and use it in GitHub Desktop.
Save asford/a6f8aa6fc548dbc55ac6fd448fe62cd8 to your computer and use it in GitHub Desktop.
Pandas Error Repro
channels:
- conda-forge
dependencies:
- pandas=1.2.1
- dask=2021.01.1
- pytest
import pandas as pd
import dask.dataframe as ddf
test_frame = pandas.DataFrame.from_dict(
{
"a": ["a", "b"],
"b": ["foo", "foo"]
}
).astype(
dtype={
"a": pandas.CategoricalDtype(["a", "b", "c"]),
"b": pandas.CategoricalDtype(["foo", "bar", "bat", "bazz"])
}
)
d_test_frame = ddf.from_pandas(test_frame, npartitions=1)
def test_pandas_observed():
test_frame.groupby("a", observed=True)["b"].value_counts()
def test_pandas_default():
# Errors out, potentially related to
# https://github.com/pandas-dev/pandas/issues/36698
test_frame.groupby("a", observed=False)["b"].value_counts()
def test_dask_observed():
d_test_frame.groupby("a", observed=True)["b"].value_counts()
def test_dask_observed():
d_test_frame.groupby("a", observed=False)["b"].value_counts()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment