davidberard98’s gists

davidberard98 / build.sh

Last active July 7, 2023 23:55

	# $CUDA_HOME/bin/nvcc binary_search_cuda.cu -std=c++17 -o binary_search_cuda -O3 # -Wl,-rpath $CUDA_HOME/lib64
	$CUDA_HOME/bin/nvcc dense_to_jagged.cu -std=c++17 -o dense_to_jagged -O3 # -Wl,-rpath $CUDA_HOME/lib64

davidberard98 / torch_bucketize.py

Created June 30, 2023 04:23

	import torch
	import torch._dynamo
	import torch._inductor.inductor_prims

	def fn(values, boundaries):
	return torch.bucketize(values, boundaries)

	def fn_ind(values, boundaries):
	return torch.ops.prims._inductor_bucketize(values, boundaries)

davidberard98 / cupti_cudagraph_test.cu

Last active March 22, 2023 23:59

	// BUILD COMMAND:
	// LD_LIBRARY_PATH=/usr/local/cuda-11.6/extras/CUPTI/lib64:$LD_LIBRARY_PATH nvcc -arch=sm_80 -std=c++17 -o cudagraph cudagraph.cu -lcupti
	#include <cstddef>
	#include <cuda_runtime_api.h>
	#include <cstdio>
	#include <sys/time.h>
	#include <iostream>
	#include <cupti.h>

	#define N 500000 // tuned such that kernel takes a few microseconds

davidberard98 / kineto_hanging.txt

Last active March 20, 2023 23:31

	[Thread debugging using libthread_db enabled]
	Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
	--Type <RET> for more, q to quit, c to continue without paging--
	futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0)
	at ../sysdeps/nptl/futex-internal.h:320
	320 ../sysdeps/nptl/futex-internal.h: No such file or directory.
	(gdb) bt
	#0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0)
	at ../sysdeps/nptl/futex-internal.h:320
	#1 do_futex_wait (sem=sem@entry=0x564019c3bda0, abstime=0x0, clockid=0) at sem_waitcommon.c:112

davidberard98 / cache_test_compile_times.csv

Last active March 10, 2023 04:17

Function	Runtimes (s)
_compile	10.0482	0.1572	0.1491	0.1555	0.1522	0.2298	0.1465	0.1443	0.1435	0.1459
OutputGraph.call_user_compiler	9.7309	0.1286	0.1210	0.1268	0.1233	0.1999	0.1209	0.1176	0.1174	0.1186
create_aot_dispatcher_function	1.0459	0.1273	0.1196	0.1254	0.1220	0.1984	0.1195	0.1163	0.1159	0.1174
compile_fx.<locals>.fw_compiler	0.0453	0.0127	0.0114	0.0107	0.0135	0.0284	0.0104	0.0122	0.0106	0.0111
GraphLowering.run	0.0062	0.0061	0.0041	0.0038	0.0063	0.0049	0.0038	0.0059	0.0037	0.0040
GraphLowering.compile_to_module	0.0362	0.0049	0.0053	0.0052	0.0054	0.0053	0.0049	0.0047	0.0049	0.0052
Scheduler.__init__	0.0038	0.0027	0.0031	0.0030	0.0032	0.0031	0.0028	0.0028	0.0028	0.0029
Scheduler.codegen	0.0005	0.0004	0.0004	0.0004	0.0004	0.0004	0.0004	0.0004	0.0004	0.0004
WrapperCodeGen.generate	0.0015	0.0013	0.0013	0.0013	0.0013	0.0011	0.0012	0.0011	0.0012	0.0013

davidberard98 / faketensor_error.log

Last active March 8, 2023 23:40

	cuda train hf_Longformer [2023-03-08 23:40:23,242] torch._dynamo.debug_utils: [WARNING] Compiled Fx GraphModule failed. Creating script to minify the error.
	[2023-03-08 23:40:23,244] torch._dynamo.debug_utils: [WARNING] Writing minified repro to /scratch/dberard/bisectdynamo/pytorch/torch_compile_debug/run_2023_03_08_23_40_23_244562-pid_3089959/minifier/minifier_launcher.py
	ERROR:common:inductor raised Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.copy_.default(*(tensor([[[[0., 0., 0., ..., 0., 0., 0.]]],


	[[[0., 0., 0., ..., 0., 0., 0.]]],


	[[[0., 0., 0., ..., 0., 0., 0.]]],

davidberard98 / triton-build-error.txt

Last active March 7, 2023 22:08

	pip3 uninstall -y triton
	WARNING: Skipping triton as it is not installed.
	pip3 install -U "git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python"
	Collecting git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python
	Cloning https://github.com/openai/triton (to revision b8b470bc597c1c5bd03682c09fe3e6b7c53787fd) to /tmp/pip-req-build-41rq5c6y
	Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-41rq5c6y
	Running command git rev-parse -q --verify 'sha^b8b470bc597c1c5bd03682c09fe3e6b7c53787fd'
	Running command git fetch -q https://github.com/openai/triton b8b470bc597c1c5bd03682c09fe3e6b7c53787fd
	Running command git checkout -q b8b470bc597c1c5bd03682c09fe3e6b7c53787fd
	Resolved https://github.com/openai/triton to commit b8b470bc597c1c5bd03682c09fe3e6b7c53787fd

davidberard98 / Super_SloMo_failure.log

Created March 7, 2023 19:19

	WARNING:root:Super_SloMo failed to load
	Eager model failed to run
	Traceback (most recent call last):
	File "/scratch/dberard/bisectdynamo/pytorch/benchmarks/dynamo/common.py", line 1115, in validate_model
	self.model_iter_fn(model, example_inputs)
	File "benchmarks/dynamo/torchbench.py", line 370, in forward_and_backward_pass
	self.grad_scaler.scale(loss).backward()
	File "/scratch/dberard/bisectdynamo/pytorch/torch/_tensor.py", line 487, in backward
	torch.autograd.backward(
	File "/scratch/dberard/bisectdynamo/pytorch/torch/autograd/__init__.py", line 204, in backward

davidberard98 / kwargs.py

Created February 8, 2023 21:20

modified https://gist.github.com/joecummings/60f413cc0400a501d55c8672d8b5b393 to work with tracing

	import torch
	import torch.nn as nn

	# LM example
	class EncoderDecoderLanguageModel(nn.Module):
	def __init__(self):
	super().__init__()

	@torch.jit.export
	def prepare_inputs(self, var1: torch.Tensor, var2: torch.Tensor):

davidberard98 / def.h

Last active January 24, 2023 04:03

Explicit template specialization: demonstrating that the presence of a generic implementation can cause errors when an explicit template specialization exists. See run.sh for description

	#pragma once

	#include <stdexcept>

	class Base {
	public:
	virtual void run() = 0;
	};

	template<int val>

David Berard davidberard98