Skip to content

Instantly share code, notes, and snippets.

View jacobtomlinson's full-sized avatar

Jacob Tomlinson jacobtomlinson

View GitHub Profile
@jacobtomlinson
jacobtomlinson / environment.yaml
Created December 6, 2024 14:26
Minimal RAPIDS cudf 24.10 Coiled Software Environment
channels:
- rapidsai
- conda-forge
- nvidia
dependencies:
- cudf=24.10
- python=3.12
- cuda-version>=12.0,<=12.5
- s3fs
- dask
@jacobtomlinson
jacobtomlinson / history.sh
Last active December 6, 2024 14:07
Shell History Unwrapped
awk 'NR==FNR {map[$1]=$2; next} {print ($1 in map ? map[$1] : $1)}' <(alias | sed -E "s/^([^=]*)='?([^ ]*).*/\1 \2/") <(echo "SHELL HISTORY UNWRAPPED" `date +%Y` && history | gawk '{gsub(/^\s*[0-9]+\*?(\s*[0-9/T:]+)\s+/, "", $0); print $0}' | gawk '{gsub(/ \| /, "\n", $0); print $0}' | gawk ' { i=2; while ($1 ~ /^[A-Z0-9_]+=/) { $1=$i; i++ }; print $1 }') | sort | uniq -c | sort -n | tail -n 10
@jacobtomlinson
jacobtomlinson / README.md
Last active January 15, 2025 16:01
Using Coiled to run cudf.pandas workloads on the cloud

cudf.pandas Coiled Demo

The RAPIDS cudf.pandas accelerator allows you to leverage the power of NVIDIA GPU acceleration in your pandas workflows.

Scripts that use pandas can be run via the cudf.pandas module to accelerate your code with zero-code change.

python my_code.py  # Uses the CPU
python -m cudf.pandas my_code.py  # Same pandas code uses the GPU
@jacobtomlinson
jacobtomlinson / .gitignore
Created November 14, 2024 10:37
Global gitignore
### Apple Specific ###
# ignore OS X hidden meta files
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
@jacobtomlinson
jacobtomlinson / helloworld.py
Created October 31, 2024 11:01
Say hello to computers all around the world
import contextlib
import codecs
import subprocess
import pandas as pd
# Load list of global nameservers and country code information
print("Loading data sources...")
nameservers = pd.read_csv("https://public-dns.info/nameservers.csv")
countries = pd.read_csv("https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/raw/refs/heads/master/all/all.csv")
@jacobtomlinson
jacobtomlinson / README.md
Last active June 13, 2024 09:43
Run dask-cuda on a SLURM HPC

When using LocalCUDACluster on a single node it is possible to scale your work out on a SLURM based HPC with a few small tweaks.

First install the Dask Runners package. (Note: this is a prototype and will be merged into dask-jobqueue in the future)

pip install git+https://github.com/jacobtomlinson/dask-hpc-runner.git

Then replace LocalCUDACluster with the SLURMRunner class.

station mean_temp
Abha 18.0
Abidjan 26.0
Abéché 29.4
Accra 26.4
Addis Ababa 16.0
Adelaide 17.3
Aden 29.1
Ahvaz 25.4
Albuquerque 14.0
@jacobtomlinson
jacobtomlinson / run.py
Created November 2, 2023 17:21
Databrick run
import os
import subprocess
import time
import socket
DB_IS_DRIVER = os.getenv('DB_IS_DRIVER')
DB_DRIVER_IP = os.getenv('DB_DRIVER_IP')
if DB_IS_DRIVER == "TRUE":
print("This node is the Dask scheduler.")
@jacobtomlinson
jacobtomlinson / beam_k8s.py
Last active May 12, 2023 13:42
Apache Beam Dask Limitation MRE
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.runners.dask.dask_runner import DaskRunner
from dask.distributed import Client, performance_report
class NoopDoFn(beam.DoFn):
def process(self, item):
import time
time.sleep(0.1)
@jacobtomlinson
jacobtomlinson / beam_k8s.py
Created May 11, 2023 13:41
Apache Beam Dask Limitation MRE
import warnings
import time
from contextlib import contextmanager
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.runners.dask.dask_runner import DaskRunner
from dask.distributed import Client
from distributed.versions import VersionMismatchWarning