Skip to content

Instantly share code, notes, and snippets.

@yinleon
Last active February 25, 2021 00:14
Show Gist options
  • Select an option

  • Save yinleon/c256d0e5e0bd00cb63f5e2bd6214623b to your computer and use it in GitHub Desktop.

Select an option

Save yinleon/c256d0e5e0bd00cb63f5e2bd6214623b to your computer and use it in GitHub Desktop.
This is a Python routine to help read many files using multiprocessing. `files` can be a list of file paths and `file_parser_func` can be any function that reads a file and returns a list.
from multiprocessing import Pool
from tqdm import tqdm
import pandas as pd
# def file_parser_func(fn : str):
# return pd.read_csv(fn).to_dict('records')
# files = ['a.csv', 'b.csv']
data = []
with Pool(processes=8) as pool:
for record in tqdm(pool.imap_unordered(file_parser_func, files),
total=len(files)):
data.extend(record)
df = pd.DataFrame(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment