Read in multiple csv files and concatenate them into a single data frame

Reading in multiple csv files as data frames and concatenating (or row binding) them into one data frame is a task we routinely face.

In R, there are many ways of doing it. But which is the best, and why? I think the best way is the simplest and most high level way; something that is easy to read and write and edit. Here are three variants of what I think is the right way. The first is close to a base R way (except for the use of read_csv and the beloved pipe),the second uses purrr and dplyr, and the third just uses purrr.

library(readr)
library(tibble)
library(purrr)
library(dplyr)

# Make K csv files, each with N rows --------------------------------------

K <- 100; N <- 1000
lapply(seq(K),
       function(i){
         data_df <- tibble(x = rnorm(N), y = rnorm(N), z = rnorm(N))
         write_csv(data_df, paste0('data_', i, '.csv'))
       }
)


# Get the file list -------------------------------------------------------

file_list <- list.files(pattern = 'data_[0-9]*.csv')

# Read in the data frames and concatenate them: Version 1 -----------------

data_df1 <- lapply(file_list, read_csv) %>% 
  do.call(rbind, .)

# Read in the data frames and concatenate them: Version 2 -----------------

data_df2 <- map(file_list, read_csv) %>% 
  bind_rows()

# Read in the data frames and concatenate them: Version 3 -----------------

data_df3 <- map_dfr(file_list, read_csv) 

# But are the all the same? -----------------------------------------------

all_equal(data_df1, data_df2)
all_equal(data_df2, data_df3)


# Delete the files (not that you'd usually want to do this) ---------------

file.remove(file_list)

mark-andrews/read_csv_and_bind.md