Last active
December 30, 2015 08:39
-
-
Save epijim/7804470 to your computer and use it in GitHub Desktop.
load in data from data.police.uk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| setwd("Directoy where the data is saved") | |
| library(plyr) # used to read in csvs | |
| library(psych) # has the describe function | |
| ################ | |
| # import data # | |
| ################ | |
| # Data is in monthly CSV's, so loop over them pulling in data | |
| # CAREFUL this will get all .csv names in directory | |
| temp_filenames = list.files(pattern="*.csv") | |
| #Now we have a list of the csv names, read them into one data frame | |
| crime_data <- ldply(temp_filenames, read.csv) | |
| #could do in full like below, but prefer the shorter method above using plyr | |
| #crime_data <- do.call("rbind", lapply(temp, read.csv, header = TRUE)) | |
| # take a gander! | |
| head(crime_data) | |
| summary(crime_data) | |
| describe(crime_data) | |
| str(crime_data) | |
| # so 228, 949 crimes are in the database, | |
| # seems most the crime ID stuff came in later, and ties into the seperate | |
| # outcome data. Same with context. For now I'll focus on location, | |
| # time and type of crime | |
| #DATA ISSUES. Time is a factor variable. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment