Last active
August 29, 2015 14:05
-
-
Save isomorphisms/6c7500900fb4ecf7b839 to your computer and use it in GitHub Desktop.
don't use head and tail … always be sampling from the middle of the data.frame
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
taste <- function(soup, ladle=5L) sample.int(x=soup, size=ladle, replace=TRUE) | |
peek <- function(df,n=5L) df[ taste(nrow(df),n) , ] #dataframe[r,c] means "subset of dataframe row #r, column #c" | |
p <- function(df,n=5L) rbind(head(df,1L), peek(df,n), tail(df,1L)) | |
#SAMPLE OUTPUT | |
require(bigvis) | |
data(movies) | |
dim(movies) | |
#[1] 130456 14 | |
#so movies is too long to look at. | |
#yet if I always peek at it with head(movies) I will get bored, | |
#and be wasting the opportunity to gradually get to know my dataset. | |
peek(movies) | |
# title year length budget rating votes mpaa | |
#109844 Liberty Kid 2007 92 200000 6.2 126 <NA> | |
#30142 Due mattacchioni al Moulin Rouge 1964 90 NA 3.7 9 <NA> | |
#87158 Nunca es domingo 2002 19 NA 4.4 16 <NA> | |
#122636 Turning Point 1977 2009 110 NA 6.7 6 <NA> | |
#124824 Hooters! 2010 90 62250 7.0 5 <NA> | |
# | |
# Action Animation Comedy Drama Documentary Romance Short | |
#109844 FALSE FALSE FALSE FALSE FALSE FALSE FALSE | |
#30142 FALSE FALSE FALSE FALSE FALSE FALSE FALSE | |
#87158 FALSE FALSE FALSE FALSE FALSE FALSE FALSE | |
#122636 FALSE FALSE FALSE FALSE FALSE FALSE FALSE | |
#124824 FALSE FALSE FALSE FALSE FALSE FALSE FALSE | |
#TODO: | |
# - contiguous random pieces | |
# - deal with other shapes like lists |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment