Skip to content

Instantly share code, notes, and snippets.

View davidberard98's full-sized avatar

David Berard davidberard98

  • PyTorch
  • Menlo Park, CA
View GitHub Profile
# $CUDA_HOME/bin/nvcc binary_search_cuda.cu -std=c++17 -o binary_search_cuda -O3 # -Wl,-rpath $CUDA_HOME/lib64
$CUDA_HOME/bin/nvcc dense_to_jagged.cu -std=c++17 -o dense_to_jagged -O3 # -Wl,-rpath $CUDA_HOME/lib64
import torch
import torch._dynamo
import torch._inductor.inductor_prims
def fn(values, boundaries):
return torch.bucketize(values, boundaries)
def fn_ind(values, boundaries):
return torch.ops.prims._inductor_bucketize(values, boundaries)
// BUILD COMMAND:
// LD_LIBRARY_PATH=/usr/local/cuda-11.6/extras/CUPTI/lib64:$LD_LIBRARY_PATH nvcc -arch=sm_80 -std=c++17 -o cudagraph cudagraph.cu -lcupti
#include <cstddef>
#include <cuda_runtime_api.h>
#include <cstdio>
#include <sys/time.h>
#include <iostream>
#include <cupti.h>
#define N 500000 // tuned such that kernel takes a few microseconds
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
--Type <RET> for more, q to quit, c to continue without paging--
futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0)
at ../sysdeps/nptl/futex-internal.h:320
320 ../sysdeps/nptl/futex-internal.h: No such file or directory.
(gdb) bt
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0)
at ../sysdeps/nptl/futex-internal.h:320
#1 do_futex_wait (sem=sem@entry=0x564019c3bda0, abstime=0x0, clockid=0) at sem_waitcommon.c:112
Function Runtimes (s)
_compile 10.0482 0.1572 0.1491 0.1555 0.1522 0.2298 0.1465 0.1443 0.1435 0.1459
OutputGraph.call_user_compiler 9.7309 0.1286 0.1210 0.1268 0.1233 0.1999 0.1209 0.1176 0.1174 0.1186
create_aot_dispatcher_function 1.0459 0.1273 0.1196 0.1254 0.1220 0.1984 0.1195 0.1163 0.1159 0.1174
compile_fx.<locals>.fw_compiler 0.0453 0.0127 0.0114 0.0107 0.0135 0.0284 0.0104 0.0122 0.0106 0.0111
GraphLowering.run 0.0062 0.0061 0.0041 0.0038 0.0063 0.0049 0.0038 0.0059 0.0037 0.0040
GraphLowering.compile_to_module 0.0362 0.0049 0.0053 0.0052 0.0054 0.0053 0.0049 0.0047 0.0049 0.0052
Scheduler.__init__ 0.0038 0.0027 0.0031 0.0030 0.0032 0.0031 0.0028 0.0028 0.0028 0.0029
Scheduler.codegen 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004
WrapperCodeGen.generate 0.0015 0.0013 0.0013 0.0013 0.0013 0.0011 0.0012 0.0011 0.0012 0.0013
cuda train hf_Longformer [2023-03-08 23:40:23,242] torch._dynamo.debug_utils: [WARNING] Compiled Fx GraphModule failed. Creating script to minify the error.
[2023-03-08 23:40:23,244] torch._dynamo.debug_utils: [WARNING] Writing minified repro to /scratch/dberard/bisectdynamo/pytorch/torch_compile_debug/run_2023_03_08_23_40_23_244562-pid_3089959/minifier/minifier_launcher.py
ERROR:common:inductor raised Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.copy_.default(*(tensor([[[[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.]]],
pip3 uninstall -y triton
WARNING: Skipping triton as it is not installed.
pip3 install -U "git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python"
Collecting git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python
Cloning https://github.com/openai/triton (to revision b8b470bc597c1c5bd03682c09fe3e6b7c53787fd) to /tmp/pip-req-build-41rq5c6y
Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-41rq5c6y
Running command git rev-parse -q --verify 'sha^b8b470bc597c1c5bd03682c09fe3e6b7c53787fd'
Running command git fetch -q https://github.com/openai/triton b8b470bc597c1c5bd03682c09fe3e6b7c53787fd
Running command git checkout -q b8b470bc597c1c5bd03682c09fe3e6b7c53787fd
Resolved https://github.com/openai/triton to commit b8b470bc597c1c5bd03682c09fe3e6b7c53787fd
WARNING:root:Super_SloMo failed to load
Eager model failed to run
Traceback (most recent call last):
File "/scratch/dberard/bisectdynamo/pytorch/benchmarks/dynamo/common.py", line 1115, in validate_model
self.model_iter_fn(model, example_inputs)
File "benchmarks/dynamo/torchbench.py", line 370, in forward_and_backward_pass
self.grad_scaler.scale(loss).backward()
File "/scratch/dberard/bisectdynamo/pytorch/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/scratch/dberard/bisectdynamo/pytorch/torch/autograd/__init__.py", line 204, in backward
import torch
import torch.nn as nn
# LM example
class EncoderDecoderLanguageModel(nn.Module):
def __init__(self):
super().__init__()
@torch.jit.export
def prepare_inputs(self, var1: torch.Tensor, var2: torch.Tensor):
@davidberard98
davidberard98 / def.h
Last active January 24, 2023 04:03
Explicit template specialization: demonstrating that the presence of a generic implementation can cause errors when an explicit template specialization exists. See run.sh for description
#pragma once
#include <stdexcept>
class Base {
public:
virtual void run() = 0;
};
template<int val>