Last active January 22, 2025 07:40
Benchmark bandwidth and latency of P2P NVIDIA GPUs (NVLINK vs PCI)

NVIDIA GPU P2P Benchmark bandwidth/throughput and latency


You can also view the GPU topology using nvidia-smi topo -m

  1. Download repo git clone
  2. Checkout the tag that corresponds with the right CUDA version: git checkout tags/v11.1
  3. You might need to install some additional packages sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev libglfw3-dev libgles2-mesa-dev
  4. Either build everything by just execting make in root dir. Or cd Samples/p2pBandwidthLatencyTest; make
joshlk /
Created March 9, 2021 13:23
Remove debugging with pycharm

Steps to debug a program on a remote machine without using remote deployment.

  1. Start the debug server in pycharm and specify a port such as 21000
  2. SSH remote forward a port e.g. ssh host -R 21000:localhost:21000
  3. Start a Python process and insert the following line (first pip install pydevd-pycharm`):
import pydevd_pycharm; pydevd_pycharm.settrace('localhost', port=21000, stdoutToServer=True, stderrToServer=True)
joshlk /
Last active February 4, 2021 13:54
`where` in the $PATH is an executable found. Bash/shell/unix script. Extends `which` to also print the $PATH index (GNU General Public License)
#! /bin/sh
set -ef
if test -n "$KSH_VERSION"; then
puts() {
print -r -- "$*"
puts() {
printf '%s\n' "$*"
joshlk / sentence-segmentation-benchmark.ipynb
Last active October 23, 2023 18:24
A comparison of different sentence segmentation models for the English language. The Brown corpus is used to benchmark the models.
joshlk /
Created May 22, 2020 09:55
A PriorityQueue or MinHeap implementation. Items with the smallest cost are popped first. Object orientated interface for `heapq` (Python standard library).
import heapq
class PriorityQueue:
A PriorityQueue or MinHeap implementation.
Items with the smallest cost are popped first.
def __init__(self):
self.h = []
joshlk /
Last active October 1, 2019 14:23
Directed Agglomerative Clustering: similar to `sklearn.cluster.AgglomerativeClustering` but the label is the root node. The root node is the root of a connected DAG (Directed Acyclic Graph). Also can frame algorithm as determining the weakly connected components and identifying the root.
import numpy as np
from itertools import count, product
class DirectedAgglomerativeClustering:
Similar to `sklearn.cluster.AgglomerativeClustering` but the label is the root node. The root node is the root of a
connected DAG (Directed Acyclic Graph).
Also can frame algorithm as determining the weakly connected components and identifying the root.
Algorithm is naive implementation and O(n^3)
joshlk / random_numbers.pyx
Last active July 11, 2019 10:22
Random numbers in Cython. Random integers and floating points from the standard library
from libc.stdlib cimport rand, srand, RAND_MAX
from libc.time cimport time
def get_RAND_MAX():
return RAND_MAX
unsigned long is at least 32 bits and so in the range [−2,147,483,647, +2,147,483,647]
joshlk /
Created May 31, 2019 13:58
Progress bar for Python Trio tasks using tqdm
import trio
import tqdm
class TrioProgress(
def __init__(self, total, notebook_mode=False, **kwargs):
if notebook_mode:
from tqdm import tqdm_notebook as tqdm
from tqdm import tqdm
joshlk /
Created November 20, 2018 17:45
Minimal working example of pySpark memory leak
from pyspark import SparkContext
from pyspark.sql import SQLContext
import numpy as np
sc = SparkContext()
sqlContext = SQLContext(sc)
# Create dummy pySpark DataFrame with 1e5 rows and 16 partitions
df = sqlContext.range(0, int(1e5), numPartitions=16)
joshlk / uber_h3_geoindex_plot.ipynb
Last active November 2, 2020 22:09
Uber's H3 Geoindex plotted on the UK and Bristol at different levels
