Skip to content

Instantly share code, notes, and snippets.

@gmarkall
gmarkall / extending_cuda.py
Created June 16, 2020 15:15
Extending the CUDA target to support an Interval type - based on https://numba.pydata.org/numba-doc/latest/extending/interval-example.html with some modifications
# From Interval example docs - typing
from numba import types
from numba.core.extending import typeof_impl, type_callable
# Data model
from numba.core.extending import models, register_model
# Lowering
@gmarkall
gmarkall / output.txt
Created August 7, 2020 11:16
Numba CUDA kernel compile and launch traces
$ NUMBA_CUDA_LOG_LEVEL=DEBUG python repro.py
== CUDA [182] DEBUG -- call runtime api: cudaRuntimeGetVersion
Define a kernel
Copy to device
== CUDA [200] INFO -- init
== CUDA [200] DEBUG -- call driver api: cuInit
== CUDA [200] DEBUG -- call driver api: cuCtxGetCurrent
== CUDA [200] DEBUG -- call driver api: cuCtxGetCurrent
== CUDA [200] DEBUG -- call driver api: cuDeviceGetCount
== CUDA [200] DEBUG -- call driver api: cuDeviceGet
@gmarkall
gmarkall / compile_to_ptx.py
Created August 7, 2020 11:23
Numba runtime/driver calls when compiling to PTX
from numba import cuda, types
import time
def add(r, x, y):
i = cuda.grid(1)
if i < len(r):
r[i] = x[i] + y[i]
diff --git a/numba/cuda/cudadrv/nvvm.py b/numba/cuda/cudadrv/nvvm.py
index 24569f99b..505e6797f 100644
--- a/numba/cuda/cudadrv/nvvm.py
+++ b/numba/cuda/cudadrv/nvvm.py
@@ -272,29 +272,39 @@ data_layout = {
default_data_layout = data_layout[tuple.__itemsize__ * 8]
+_supported_cc = None
diff --git a/numba/cuda/cudadrv/nvvm.py b/numba/cuda/cudadrv/nvvm.py
index 24569f99b..505e6797f 100644
--- a/numba/cuda/cudadrv/nvvm.py
+++ b/numba/cuda/cudadrv/nvvm.py
@@ -272,29 +272,39 @@ data_layout = {
default_data_layout = data_layout[tuple.__itemsize__ * 8]
+_supported_cc = None
@gmarkall
gmarkall / mcjit.cpp
Created February 10, 2021 13:03 — forked from gligneul/mcjit.cpp
LLVM MCJIT Code Samples (Working!)
/*
* To compile, execute on terminal:
* g++ -o mcjit mcjit.cpp `llvm-config --cxxflags --ldflags --libs all --system-libs`
*/
#include <iostream>
#include <memory>
#include <llvm/ADT/StringRef.h>
#include <llvm/ExecutionEngine/ExecutionEngine.h>
@gmarkall
gmarkall / cuda_pipeline.diff
Created February 18, 2021 15:42
Adding a CUDA pipeline to Numba
diff --git a/numba/core/compiler.py b/numba/core/compiler.py
index a0c03fdb3..e47270a2f 100644
--- a/numba/core/compiler.py
+++ b/numba/core/compiler.py
@@ -28,7 +28,8 @@ from numba.core.typed_passes import (NopythonTypeInference, AnnotateTypes,
NopythonRewrites, PreParforPass,
ParforPass, DumpParforDiagnostics,
IRLegalization, NoPythonBackend,
- InlineOverloads, PreLowerStripPhis)
+ InlineOverloads, PreLowerStripPhis,
@gmarkall
gmarkall / cupy_pycuda_arrays.py
Created February 19, 2021 11:59
Calling PyCUDA kernels on CuPy arrays using the CUDA array interface
import pycuda.driver as cuda
import pycuda.autoinit # noqa
from collections import namedtuple
from pycuda.compiler import SourceModule
import cupy as cp
# PyCUDA will try to get a pointer to data from an object it doesn't recognise
@gmarkall
gmarkall / fwht.py
Created April 8, 2021 09:27
Fast Walsh Hadamard Transform code from Wikipedia accelerated with Numba
import numpy as np
from numba import njit
from time import perf_counter
# From https://en.wikipedia.org/wiki/Fast_Walsh%E2%80%93Hadamard_transform
def fwht(a) -> None:
"""In-place Fast Walsh–Hadamard Transform of array a."""
h = 1
while h < len(a):
@gmarkall
gmarkall / output.txt
Last active June 9, 2023 16:45
Numba CUDA Warp-aggregated atomics example. See PR #6911: https://github.com/numba/numba/pull/6911
$ python wagg.py
Running with 16777216 elements, of which approximately 25.0% are zero
There are 12584753 nonzeroes in:
[0.417022 0.72032449 0. ... 0.20570723 0.36716537 0.0979951 ]
The kernel found 12584753 elements, resulting in the array:
[0.14349547 0.43006714 0.48695992 ... 0. 0. 0. ]
Traceback (most recent call last):