Skip to content

Instantly share code, notes, and snippets.

@bkj
Created January 10, 2018 16:12
Show Gist options
  • Save bkj/035ec84f108df38a6edba350d02c62a9 to your computer and use it in GitHub Desktop.
Save bkj/035ec84f108df38a6edba350d02c62a9 to your computer and use it in GitHub Desktop.
# pyspark -- partition by key
def partition_by_key(x):
key_lookup = x.keys().distinct().collect()
key_lookup = dict(zip(key_lookup), range(len(key_lookup)))
return x.partitionBy(len(key_lookup), partitionFunc=lambda k: key_lookup[k])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment