Skip to content

Instantly share code, notes, and snippets.

@hamelsmu
Created May 11, 2022 03:03
Show Gist options
  • Save hamelsmu/a1e52b9a3b1f58810f8b1c7d69a77e26 to your computer and use it in GitHub Desktop.
Save hamelsmu/a1e52b9a3b1f58810f8b1c7d69a77e26 to your computer and use it in GitHub Desktop.
Load CSV Data in Metaflow steps
from metaflow import FlowSpec, step
url = "https://raw.githubusercontent.com/Netflix/metaflow/master/metaflow/tutorials/02-statistics/movies.csv"
local_path = "./movies.csv"
class CSVFlow(FlowSpec):
@step
def start(self):
self.next(self.get_csv_from_web)
@step
def get_csv_from_web(self):
import pandas as pd
self.df = pd.read_csv(url)
self.df.to_csv(local_path)
self.next(self.read_csv_locally)
@step
def read_csv_locally(self):
import csv
with open(local_path, newline='') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
row = next(reader)
print(f"First Movie in the list is {next(reader)[0].split(',')[1]}.")
self.next(self.end)
@step
def end(self):
print("Top Five Highest Grossing Movies:")
print(self.df.sort_values(by='gross')[-5:][::-1])
if __name__ == "__main__":
CSVFlow()
@hamelsmu
Copy link
Author

hamelsmu commented May 11, 2022

This Code Answers the Question:

*I have a CSV (locally or on the web) and want to access it in a Metaflow flow. How can I read this data in and write it to disk?

For more information, visit: https://docs-git-poc-rhs-panel-metaflow.vercel.app/how-to-guides/csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment