This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# $CUDA_HOME/bin/nvcc binary_search_cuda.cu -std=c++17 -o binary_search_cuda -O3 # -Wl,-rpath $CUDA_HOME/lib64 | |
$CUDA_HOME/bin/nvcc dense_to_jagged.cu -std=c++17 -o dense_to_jagged -O3 # -Wl,-rpath $CUDA_HOME/lib64 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
import torch._dynamo | |
import torch._inductor.inductor_prims | |
def fn(values, boundaries): | |
return torch.bucketize(values, boundaries) | |
def fn_ind(values, boundaries): | |
return torch.ops.prims._inductor_bucketize(values, boundaries) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// BUILD COMMAND: | |
// LD_LIBRARY_PATH=/usr/local/cuda-11.6/extras/CUPTI/lib64:$LD_LIBRARY_PATH nvcc -arch=sm_80 -std=c++17 -o cudagraph cudagraph.cu -lcupti | |
#include <cstddef> | |
#include <cuda_runtime_api.h> | |
#include <cstdio> | |
#include <sys/time.h> | |
#include <iostream> | |
#include <cupti.h> | |
#define N 500000 // tuned such that kernel takes a few microseconds |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[Thread debugging using libthread_db enabled] | |
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". | |
--Type <RET> for more, q to quit, c to continue without paging-- | |
futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0) | |
at ../sysdeps/nptl/futex-internal.h:320 | |
320 ../sysdeps/nptl/futex-internal.h: No such file or directory. | |
(gdb) bt | |
#0 futex_abstimed_wait_cancelable (private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x564019c3bda0) | |
at ../sysdeps/nptl/futex-internal.h:320 | |
#1 do_futex_wait (sem=sem@entry=0x564019c3bda0, abstime=0x0, clockid=0) at sem_waitcommon.c:112 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Function | Runtimes (s) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
_compile | 10.0482 | 0.1572 | 0.1491 | 0.1555 | 0.1522 | 0.2298 | 0.1465 | 0.1443 | 0.1435 | 0.1459 | |
OutputGraph.call_user_compiler | 9.7309 | 0.1286 | 0.1210 | 0.1268 | 0.1233 | 0.1999 | 0.1209 | 0.1176 | 0.1174 | 0.1186 | |
create_aot_dispatcher_function | 1.0459 | 0.1273 | 0.1196 | 0.1254 | 0.1220 | 0.1984 | 0.1195 | 0.1163 | 0.1159 | 0.1174 | |
compile_fx.<locals>.fw_compiler | 0.0453 | 0.0127 | 0.0114 | 0.0107 | 0.0135 | 0.0284 | 0.0104 | 0.0122 | 0.0106 | 0.0111 | |
GraphLowering.run | 0.0062 | 0.0061 | 0.0041 | 0.0038 | 0.0063 | 0.0049 | 0.0038 | 0.0059 | 0.0037 | 0.0040 | |
GraphLowering.compile_to_module | 0.0362 | 0.0049 | 0.0053 | 0.0052 | 0.0054 | 0.0053 | 0.0049 | 0.0047 | 0.0049 | 0.0052 | |
Scheduler.__init__ | 0.0038 | 0.0027 | 0.0031 | 0.0030 | 0.0032 | 0.0031 | 0.0028 | 0.0028 | 0.0028 | 0.0029 | |
Scheduler.codegen | 0.0005 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | 0.0004 | |
WrapperCodeGen.generate | 0.0015 | 0.0013 | 0.0013 | 0.0013 | 0.0013 | 0.0011 | 0.0012 | 0.0011 | 0.0012 | 0.0013 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
cuda train hf_Longformer [2023-03-08 23:40:23,242] torch._dynamo.debug_utils: [WARNING] Compiled Fx GraphModule failed. Creating script to minify the error. | |
[2023-03-08 23:40:23,244] torch._dynamo.debug_utils: [WARNING] Writing minified repro to /scratch/dberard/bisectdynamo/pytorch/torch_compile_debug/run_2023_03_08_23_40_23_244562-pid_3089959/minifier/minifier_launcher.py | |
ERROR:common:inductor raised Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten.copy_.default(*(tensor([[[[0., 0., 0., ..., 0., 0., 0.]]], | |
[[[0., 0., 0., ..., 0., 0., 0.]]], | |
[[[0., 0., 0., ..., 0., 0., 0.]]], |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
pip3 uninstall -y triton | |
WARNING: Skipping triton as it is not installed. | |
pip3 install -U "git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python" | |
Collecting git+https://github.com/openai/triton@b8b470bc597c1c5bd03682c09fe3e6b7c53787fd#subdirectory=python | |
Cloning https://github.com/openai/triton (to revision b8b470bc597c1c5bd03682c09fe3e6b7c53787fd) to /tmp/pip-req-build-41rq5c6y | |
Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-41rq5c6y | |
Running command git rev-parse -q --verify 'sha^b8b470bc597c1c5bd03682c09fe3e6b7c53787fd' | |
Running command git fetch -q https://github.com/openai/triton b8b470bc597c1c5bd03682c09fe3e6b7c53787fd | |
Running command git checkout -q b8b470bc597c1c5bd03682c09fe3e6b7c53787fd | |
Resolved https://github.com/openai/triton to commit b8b470bc597c1c5bd03682c09fe3e6b7c53787fd |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
WARNING:root:Super_SloMo failed to load | |
Eager model failed to run | |
Traceback (most recent call last): | |
File "/scratch/dberard/bisectdynamo/pytorch/benchmarks/dynamo/common.py", line 1115, in validate_model | |
self.model_iter_fn(model, example_inputs) | |
File "benchmarks/dynamo/torchbench.py", line 370, in forward_and_backward_pass | |
self.grad_scaler.scale(loss).backward() | |
File "/scratch/dberard/bisectdynamo/pytorch/torch/_tensor.py", line 487, in backward | |
torch.autograd.backward( | |
File "/scratch/dberard/bisectdynamo/pytorch/torch/autograd/__init__.py", line 204, in backward |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
import torch.nn as nn | |
# LM example | |
class EncoderDecoderLanguageModel(nn.Module): | |
def __init__(self): | |
super().__init__() | |
@torch.jit.export | |
def prepare_inputs(self, var1: torch.Tensor, var2: torch.Tensor): |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#pragma once | |
#include <stdexcept> | |
class Base { | |
public: | |
virtual void run() = 0; | |
}; | |
template<int val> |