Jacob Tomlinson jacobtomlinson

`cudf.pandas` Coiled Demo

The RAPIDS cudf.pandas accelerator allows you to leverage the power of NVIDIA GPU acceleration in your pandas workflows.

Scripts that use pandas can be run via the cudf.pandas module to accelerate your code with zero-code change.

python my_code.py  # Uses the CPU
python -m cudf.pandas my_code.py  # Same pandas code uses the GPU

When using LocalCUDACluster on a single node it is possible to scale your work out on a SLURM based HPC with a few small tweaks.

First install the Dask Runners package. (Note: this is a prototype and will be merged into dask-jobqueue in the future)

pip install git+https://github.com/jacobtomlinson/dask-hpc-runner.git

Then replace LocalCUDACluster with the SLURMRunner class.

	import contextlib
	import codecs
	import subprocess
	import pandas as pd

	# Load list of global nameservers and country code information
	print("Loading data sources...")
	nameservers = pd.read_csv("https://public-dns.info/nameservers.csv")
	countries = pd.read_csv("https://github.com/lukes/ISO-3166-Countries-with-Regional-Codes/raw/refs/heads/master/all/all.csv")

	import os
	import subprocess
	import time
	import socket

	DB_IS_DRIVER = os.getenv('DB_IS_DRIVER')
	DB_DRIVER_IP = os.getenv('DB_DRIVER_IP')

	if DB_IS_DRIVER == "TRUE":
	print("This node is the Dask scheduler.")

	import apache_beam as beam
	from apache_beam.options.pipeline_options import PipelineOptions
	from apache_beam.runners.dask.dask_runner import DaskRunner

	from dask.distributed import Client, performance_report

	class NoopDoFn(beam.DoFn):
	def process(self, item):
	import time
	time.sleep(0.1)