Last active
June 25, 2024 19:42
-
-
Save AndrewFarley/6643084316e06e35359782f473c701b9 to your computer and use it in GitHub Desktop.
This simple Python file generates a random massive CSV file efficiently taking almost no RAM while doing so streaming data into your CSV file. I've pre-done the calculation of the rows/columns to the file-size, so you can easily add or remove zeroes from "rows" variable to increase or decrease the size of the file generated
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import csv | |
import random | |
# 1000000 and 52 == roughly 1GB (WARNING TAKES a while, 30s+) | |
rows = 1000000 | |
columns = 52 | |
print_after_rows = 100000 | |
def generate_random_row(col): | |
a = [] | |
l = [i] | |
for j in range(col): | |
l.append(random.random()) | |
a.append(l) | |
return a | |
if __name__ == '__main__': | |
f = open('sample.csv', 'w') | |
w = csv.writer(f, lineterminator='\n') | |
for i in range(rows): | |
if i % print_after_rows == 0: | |
print(".", end="", flush=True) | |
w.writerows(generate_random_row(columns)) | |
f.close() | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Super logic, I generated around 10 gb of data without any issue. Thanks!!