Skip to content

Instantly share code, notes, and snippets.

View rjzamora's full-sized avatar

Richard (Rick) Zamora rjzamora

View GitHub Profile
@rjzamora
rjzamora / backend_dispatch_demo.ipynb
Last active October 30, 2024 01:02
High-level demo on backend-configuration dispatching in Dask
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / remote_parquet_benchmark.py
Created March 21, 2022 18:15
Simple benchmark to measure the performance of fsspec.parquet.read_parquet_file for a single-column read.
import time
import argparse
try:
import cudf
except ImportError:
cudf = None
import pandas as pd
import numpy as np
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import importlib
import time
import dask.dataframe as dd
from dask.distributed import LocalCluster, Client
try:
from dask_cuda import LocalCUDACluster
except ImportError:
dask_cuda = None
try:
@rjzamora
rjzamora / hlg_layer_dev_notes.ipynb
Last active December 9, 2021 15:55
HLG-Layer Dev Notes
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / fsspec_optimize.ipynb
Created September 7, 2021 19:14
Parquet and FSSpec Experiments
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / dask_summit_scheduler_problems.ipynb
Last active May 12, 2021 14:42
Simple examples of the Dataframe-based workflows targeted by the ongoing Scheduler-optimization project.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / io_column_scaling.ipynb
Last active March 18, 2021 23:16
IO Column-Scaling Experiment
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rjzamora
rjzamora / nvtabular_client_example.py
Created December 10, 2020 22:58
NVTabular + Distributed Client Example
from dask.distributed import Client
cluster = "tcp://MachineA:8786"
client = Client(cluster)
import nvtabular as nvt
Workflow = nvt.Workflow(client=client, …)