Created
          September 28, 2012 20:46 
        
      - 
      
- 
        Save iros/3802000 to your computer and use it in GitHub Desktop. 
    R Trick - reading data faster
  
        
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Making R read data faster by precomputing the column | |
| # data types | |
| sample <- read.table("data.txt", nrows = 100) | |
| types <- sapply(sample, classes) | |
| allData <- read.table("data.txt", colClasses = classes) | 
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
            
Hi Irene
Does it work better with a couple of fixes?
sample <- read.table(fname, nrows = 100, header=TRUE)
types <- sapply(sample, class)
read.table(fname, colClasses = types, header=TRUE)
Or maybe I miss something about the classes function and where types is used.
Then making some benchmark with the classic method vs this one, on ~10M lines, 3 col (numeric and text), I did not find major time improvement (barely a few %).
This method might be good, but in other situation (or with older R versions)
(Anyway, I love your tweets, I'm a big fan - stucked in slow load this morning, I remembered this one 3 months ago)
Alex