Skip to content

Instantly share code, notes, and snippets.

@elowy01
Last active November 22, 2019 13:40
Show Gist options
  • Save elowy01/7e4181f554e671a5f282033628b0120d to your computer and use it in GitHub Desktop.
Save elowy01/7e4181f554e671a5f282033628b0120d to your computer and use it in GitHub Desktop.
# Reading multiple .csv files
# files/input.cova
# files/input.covb
# .....
import dask.dataframe as dd
df = dd.read_csv('files/input.cov*', names=['chr','pos','cov'], sep='\t')
print("Descriptors: {0}".format(df['cov'].describe().compute()))
//
# Operation on a single .csv that will be partioned by Dask:
import dask.dataframe as dd
df = dd.read_csv('files/out.cov', names=['chr','pos','cov'], sep='\t', blocksize=34000000) # blocksize controls the size of each partition.
print("Descriptors: {0}".format(df['cov'].describe().compute()))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment