Last active
December 29, 2015 17:49
-
-
Save hrbrmstr/7707116 to your computer and use it in GitHub Desktop.
Responding to the following LinkedIn question : https://www.linkedin.com/groups/Read-data-in-R-Hello-4066593.S.5812033094177271812 : about how to read a semi-ugly and semi-large file into R
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install.packages("data.table") # if you don't have it installed | |
library(data.table) | |
# download the file first since it's *huge* and we don't want to have to | |
# re-download it every time we work with it. comment this out after you | |
# read it in once or put an "if exists" wrapper around it to avoid | |
# re-downloading it from an errant script run. This assumes you have | |
# a "data"directory under your home directory; change destination | |
# as appropriate | |
download.file("http://fimi.ua.ac.be/data/accidents.dat", "~/data/accidents.dat") | |
# it's a wretched file format, so we need to get max # of possible columns | |
# since read.table (et al) will only scan the first 5 lines to get field count | |
max.fields <- max(count.fields("~/data/accidents.dat", sep=" ")) | |
# now read it in (you can use better column names if you want) and | |
# use a data.table since it will really help speed up further | |
# operations on this data table | |
accidents.df <- data.table(read.table("~/data/accidents.dat", | |
sep=" ", header=FALSE, fill=TRUE, | |
col.names=1:max.fields)) | |
# take a look at the data | |
summary(accidents.df) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment