Skip to content

Instantly share code, notes, and snippets.

@jdthorpe
Last active November 16, 2018 00:46
Show Gist options
  • Save jdthorpe/663937e0ef932bcb225e6a7edb1b38e5 to your computer and use it in GitHub Desktop.
Save jdthorpe/663937e0ef932bcb225e6a7edb1b38e5 to your computer and use it in GitHub Desktop.
A little function for generating parameters for Panda's read_csv from an R data.frame
dtype <- function(s,numpy="np",fields=names(s),pretty=TRUE){
classes <- sapply(s,class)
dtypes = character()
parse_dates = character()
for(i in 1:length(fields)){
cls <- classes[i]
fld <- fields[i]
if(cls == "numeric"){
dtypes[i] <- sprintf("%s.float64",numpy)
}else if(cls == "integer"){
dtypes[i] <- sprintf("%s.int64",numpy)
}else if(cls == "factor"){
dtypes[i] <- "str"
}else if(cls == "character"){
dtypes[i] <- "str"
}else if(cls == "logical"){
dtypes[i] <- sprintf("%s.bool",numpy)
}else if(cls == "Date"){
dtypes[i] <- "str"
parse_dates[length(parse_dates) + 1] <- fld
}else{
stop(sprintf("Don't know how to handle field %s of class %s",fld,cls))
}
}
if(pretty){
start = "\n "
end = "\n "
}else{
start = ""
end = " "
}
out = sprintf("\ndtype={%s%s},\nparse_dates=[%s%s],\n",
start,paste0(sprintf('"%s":%s,%s',fields,dtypes,end),collapse = ""),
start,paste0(sprintf('"%s",%s',parse_dates,end),collapse = ""))
cat(out)
invisible(out)
}

When you've got a data.frame in R and want to write it to csv and then import it into a Python Pandas instance, lazy code like this will fail to parse your string and date fields:

<some_file.R>
write.csv(my_data_frame, "my_data.csv" row.names=FALSE)
<another_file.py>
import pandas as pd
my_data = pd.read_csv("my_data.csv")

The soludtion is of course to pass pd.read_csv() a dtype and parse_dates parameters. This little function generates these parameters for you. For example:

> DF = data.frame(widgets=5L,size="large",today=Sys.Date())
> dtype(DF)

dtype={
    "widgets":np.int64,
    "size":str,
    "today":str,
    },
parse_dates=[
    "today",
    ],
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment