Skip to content

Instantly share code, notes, and snippets.

View rjzamora's full-sized avatar

Richard (Rick) Zamora rjzamora

View GitHub Profile
import os
import shutil
import time
import numpy as np
import dask.dataframe as dd
from dask.dataframe.io.demo import names as name_list
import fastavro as fa
import cudf
@rjzamora
rjzamora / read_file_list.ipynb
Created October 30, 2020 20:02
Simple example of mapping multiple parquet files to each dask_cudf.DataFrame partition.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / nvtabular_client_example.py
Created December 10, 2020 22:58
NVTabular + Distributed Client Example
from dask.distributed import Client
cluster = "tcp://MachineA:8786"
client = Client(cluster)
import nvtabular as nvt
Workflow = nvt.Workflow(client=client, …)
@rjzamora
rjzamora / io_column_scaling.ipynb
Last active March 18, 2021 23:16
IO Column-Scaling Experiment
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / dask_summit_scheduler_problems.ipynb
Last active May 12, 2021 14:42
Simple examples of the Dataframe-based workflows targeted by the ongoing Scheduler-optimization project.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / fsspec_optimize.ipynb
Created September 7, 2021 19:14
Parquet and FSSpec Experiments
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / hlg_layer_dev_notes.ipynb
Last active December 9, 2021 15:55
HLG-Layer Dev Notes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import importlib
import time
import dask.dataframe as dd
from dask.distributed import LocalCluster, Client
try:
from dask_cuda import LocalCUDACluster
except ImportError:
dask_cuda = None
try:
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.