Skip to content

Instantly share code, notes, and snippets.

@stephlocke
Last active August 29, 2015 14:09
Show Gist options
  • Save stephlocke/dc638351cfdf683b3746 to your computer and use it in GitHub Desktop.
Save stephlocke/dc638351cfdf683b3746 to your computer and use it in GitHub Desktop.
Overcoming file limitations and frustrations of using Excel & data import wizard
# Source of origpostcodedata data: https://geoportal.statistics.gov.uk/geoportal/catalog/search/resource/details.page?uuid=%7BD14C75CE-95C6-4E9F-8753-ECDD88BC2B7D%7D
# Source of postcodefiles data:https://www.ordnancesurvey.co.uk/opendatadownload/products.html
library(foreach)
library(data.table)
setwd("~")
postcodefiles<-dir("../Downloads/codepo_gb/Data//CSV/",full.names = TRUE)
# Slim load of 2.5m records into memory with f(ast)Read function
origpostcodedata<-fread("../Downloads/ONSPD_MAY_2013_csv/Data//ONSPD_MAY_2013_UK_O.csv",
select=c("pcd","pcd2","pcds","dointr",
"doterm","oscty","oslaua","osward",
"usertype","ctry"))
# Specifying PK
setkey(origpostcodedata,pcd)
# Loop through OS files and compile into 1 dataset
newpostcodedata<-foreach(f=postcodefiles,.combine = rbind ) %do% {
fread(f,header = FALSE)
}
# Specifying PK
setkey(newpostcodedata,V1)
# Not exists join, and column aliasing/reduction to conform to titles needed for upload
newupload<-newpostcodedata[!origpostcodedata,.(
pcd=V1,ctry=V5,oslaua=V9,osward=V10)]
# Join, changed value detection on important column, and column aliasing/reduction to conform to titles needed for upload
modupload<-newpostcodedata[origpostcodedata][V9!=oslaua,.(
pcd=V1,ctry=V5,oslaua=V9,osward=V10)]
# Output for upload
write.csv(rbind(newupload,modupload),"trimpostcode20141127.csv",row.names=FALSE)
@stephlocke
Copy link
Author

Just pulled in OS based data to get an update into our DW, very neat how quickly all this could be achieved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment