billyfung · August 31, 2017 22:08
diff --git a/gistfile1.txt b/gistfile1.txt
 > library(data.table)
 data.table 1.10.5 IN DEVELOPMENT built 2017-08-22 22:20:41 UTC; travis
  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
  Release notes, videos and slides: http://r-datatable.com

 > fread('demand_full.csv', verbose=TRUE)
 Input contains no \n. Taking this to be a filename to open
 [1] Check arguments
  Using 4 threads (omp_get_max_threads()=4, nth=4)
  NAstrings = [<<NA>>]
  None of the NAstrings look like numbers.
  show progress = 1
 [2] Opening the file
  Opening file demand_full.csv
  File opened, size = 1.185GB (1272708830 bytes).
  Memory mapping ... ok
 [3] Detect and skip BOM
 [4] Detect end-of-line character(s)
  Detected eol as \n only, the UNIX and Mac standard.
 [6] Skipping initial rows if needed
  Positioned on line 1 starting: <<2017-02-26,1,BOB1101,24.122>>
 [7] Detect separator, quoting rule, and ncolumns
  Detecting sep ...
  sep=','  with 100 lines of 4 fields using quote rule 0
  Detected 4 columns on line 1. This line is either column names or first data row. Line starts as: <<2017-02-26,1,BOB1101,24.122>>
  Quote rule picked = 0
 [8] Determine column names
  Some fields on line 1 are not type character. Treating as a data row and using default column names.
 [9] Detect column types
  Number of sampling jump points = 101 because (1272708829 bytes from row 1 to eof) / (2 * 2763 jump0size) == 230312
  Type codes (jump 000)    : 6265  Quote rule 0
  Type codes (jump 100)    : 6265  Quote rule 0
  =====
  Sampled 10048 rows (handled \n inside quoted fields) at 101 jump points
  Bytes from first data row on line 1 to the end of last row: 1272708829
  Line length: mean=28.46 sd=0.74 min=27 max=34
  Estimated number of rows: 1272708829 / 28.46 = 44721260
  Initial alloc = 49193386 rows (44721260 + 10%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
  =====
 [10] Apply user overrides on column types
 After 0 type and 0 drop user overrides : 6265
 [11] Allocate memory for the datatable
  Allocating 4 column slots (4 - 0 dropped) with 49193386 rows
 [12] Read the data
 Read 99%. ETA 00:00 
 [13] Finalizing the datatable
 Read 44777716 rows x 4 columns from 1.185GB (1272708830 bytes) file in 00:10.180 wall clock time
 Thread buffers were grown 0 times (if all 4 threads each grew once, this figure would be 4)
 Final type counts
         0 : drop     
         0 : bool8    
         1 : int32    
         0 : int32    
         0 : int64    
         1 : float64  
         2 : string   
 =============================
   0.001s (  0%) Memory map 1.185GB file
   0.001s (  0%) sep=',' ncol=4 and header detection
   0.039s (  0%) Column type detection using 10048 sample rows
   1.655s ( 16%) Allocation of 44777716 rows x 4 cols (1.283GB)
   8.484s ( 83%) Reading 1216 chunks of 0.998MB (36777 rows) using 4 threads
   =    0.138s (  1%) Finding first non-embedded \n after each jump
   +    6.113s ( 60%) Parse to row-major thread buffers
   +    2.156s ( 21%) Transpose
   +    0.078s (  1%) Waiting
   0.000s (  0%) Rereading 0 columns due to out-of-sample type exceptions
  10.180s        Total
	> library(data.table)
	data.table 1.10.5 IN DEVELOPMENT built 2017-08-22 22:20:41 UTC; travis
	The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
	Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
	Release notes, videos and slides: http://r-datatable.com

	> fread('demand_full.csv', verbose=TRUE)
	Input contains no \n. Taking this to be a filename to open
	[1] Check arguments
	Using 4 threads (omp_get_max_threads()=4, nth=4)
	NAstrings = [<<NA>>]
	None of the NAstrings look like numbers.
	show progress = 1
	[2] Opening the file
	Opening file demand_full.csv
	File opened, size = 1.185GB (1272708830 bytes).
	Memory mapping ... ok
	[3] Detect and skip BOM
	[4] Detect end-of-line character(s)
	Detected eol as \n only, the UNIX and Mac standard.
	[6] Skipping initial rows if needed
	Positioned on line 1 starting: <<2017-02-26,1,BOB1101,24.122>>
	[7] Detect separator, quoting rule, and ncolumns
	Detecting sep ...
	sep=',' with 100 lines of 4 fields using quote rule 0
	Detected 4 columns on line 1. This line is either column names or first data row. Line starts as: <<2017-02-26,1,BOB1101,24.122>>
	Quote rule picked = 0
	[8] Determine column names
	Some fields on line 1 are not type character. Treating as a data row and using default column names.
	[9] Detect column types
	Number of sampling jump points = 101 because (1272708829 bytes from row 1 to eof) / (2 * 2763 jump0size) == 230312
	Type codes (jump 000) : 6265 Quote rule 0
	Type codes (jump 100) : 6265 Quote rule 0
	=====
	Sampled 10048 rows (handled \n inside quoted fields) at 101 jump points
	Bytes from first data row on line 1 to the end of last row: 1272708829
	Line length: mean=28.46 sd=0.74 min=27 max=34
	Estimated number of rows: 1272708829 / 28.46 = 44721260
	Initial alloc = 49193386 rows (44721260 + 10%) using bytes/max(mean-2sd,min) clamped between [1.1estn, 2.0*estn]
	=====
	[10] Apply user overrides on column types
	After 0 type and 0 drop user overrides : 6265
	[11] Allocate memory for the datatable
	Allocating 4 column slots (4 - 0 dropped) with 49193386 rows
	[12] Read the data
	Read 99%. ETA 00:00
	[13] Finalizing the datatable
	Read 44777716 rows x 4 columns from 1.185GB (1272708830 bytes) file in 00:10.180 wall clock time
	Thread buffers were grown 0 times (if all 4 threads each grew once, this figure would be 4)
	Final type counts
	0 : drop
	0 : bool8
	1 : int32
	0 : int32
	0 : int64
	1 : float64
	2 : string
	=============================
	0.001s ( 0%) Memory map 1.185GB file
	0.001s ( 0%) sep=',' ncol=4 and header detection
	0.039s ( 0%) Column type detection using 10048 sample rows
	1.655s ( 16%) Allocation of 44777716 rows x 4 cols (1.283GB)
	8.484s ( 83%) Reading 1216 chunks of 0.998MB (36777 rows) using 4 threads
	= 0.138s ( 1%) Finding first non-embedded \n after each jump
	+ 6.113s ( 60%) Parse to row-major thread buffers
	+ 2.156s ( 21%) Transpose
	+ 0.078s ( 1%) Waiting
	0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
	10.180s Total