Skip to content

Instantly share code, notes, and snippets.

@cpcloud
Last active October 9, 2017 14:42
Show Gist options
  • Save cpcloud/493bcbdf15dbed31b7474f2152ba3bc1 to your computer and use it in GitHub Desktop.
Save cpcloud/493bcbdf15dbed31b7474f2152ba3bc1 to your computer and use it in GitHub Desktop.
Sparkimalz
from pyspark.sql import Row
spark.conf.set('spark.sql.parquet.writeLegacyFormat', 'false')
spark.conf.set('spark.sql.parquet.compression.codec', 'uncompressed')
sc = spark.sparkContext
df = spark.createDataFrame(
sc.parallelize(range(1, 100)
).map(lambda i: Row(value=i)))
df.select(df.value.cast('decimal(4, 2)')).write.parquet('/mounted/int32_decimal.parquet')
df.select(df.value.cast('decimal(10, 2)')).write.parquet('/mounted/int64_decimal.parquet')
df.select(df.value.cast('decimal(25, 2)')).write.parquet('/mounted/fixed_length_decimal.parquet')
spark.conf.set('spark.sql.parquet.writeLegacyFormat', 'true')
df.select(df.value.cast('decimal(13, 2)')).write.parquet('/mounted/fixed_length_decimal_legacy.parquet')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment