Skip to content

Instantly share code, notes, and snippets.

View fozziethebeat's full-sized avatar

Keith Stevens fozziethebeat

View GitHub Profile
@fozziethebeat
fozziethebeat / minimal_can_beam_pipeline.py
Created December 3, 2021 05:18
Minimal Beam CAN Pipeline
# Requires
# pip install apache-beam
# pip install apache-beam[dataframe]
#
# Associated documentation
# Beam Dataframe API: https://beam.apache.org/releases/pydoc/2.34.0/apache_beam.dataframe.html
# Beam Dataframe Overview: https://beam.apache.org/documentation/dsls/dataframes/overview/
# Beam Dataframe Differences: https://beam.apache.org/documentation/dsls/dataframes/differences-from-pandas/
@fozziethebeat
fozziethebeat / save_computation_prefect.py
Created December 15, 2021 07:35
Tries reading some Dataframes and only re-computes new output Dataframes if needed, otherwise copies old result dataframes
import numpy as np
import pandas as pd
import prefect
from os import listdir
from os.path import isfile, join
from prefect import Flow, apply_map, case, task
from prefect.tasks.control_flow import merge
INPUT_BASE_PATH = './data/input'
@fozziethebeat
fozziethebeat / prefect_caching_tasks.py
Created December 16, 2021 03:05
Same as other simple Prefect sample pipeline but caches output of tasks to local files. Data is stored as CloudPickles.
import numpy as np
import pandas as pd
import prefect
from os import listdir
from os.path import isfile, join
from prefect import Flow, apply_map, case, task
from prefect.engine.results import LocalResult
from prefect.tasks.control_flow import merge