Skip to content

Instantly share code, notes, and snippets.

View davidberard98's full-sized avatar

David Berard davidberard98

  • PyTorch
  • Menlo Park, CA
View GitHub Profile
@davidberard98
davidberard98 / torchvision_0513.txt
Last active May 17, 2022 21:14
$ PYTORCH_JIT_ENABLE_NVFUSER=1 CIRCLECI=1 python -m pytest --junitxml=test-results/junit.xml -v --durations 20
This file has been truncated, but you can view the full file.
============================= test session starts ==============================
platform linux -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0 -- /data/home/dberard/miniconda/envs/scratch39/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/scratch/dberard/local/vision/.hypothesis/examples')
rootdir: /scratch/dberard/local/vision, configfile: pytest.ini, testpaths: test
plugins: hypothesis-6.46.2
collecting ... collected 29608 items
test/test_backbone_utils.py::test_resnet_fpn_backbone[resnet18] SKIPPED [ 0%]
test/test_backbone_utils.py::test_resnet_fpn_backbone[resnet50] SKIPPED [ 0%]
@davidberard98
davidberard98 / may3-scripted.csv
Last active May 3, 2022 18:37
convolution measurements - A100, channels_last
N tdim wdim ch is_channels_last ms
16 100 3 3 1 18.43367154942825
16 100 3 3 0 18.246989184990525
16 100 3 7 1 18.320465798024088
16 100 3 7 0 18.22661985643208
16 100 3 8 1 20.038793003186584
16 100 3 8 0 18.317282549105585
16 100 5 3 1 18.30171929905191
16 100 5 3 0 18.09040275402367
16 100 5 7 1 29.462874494493008
@davidberard98
davidberard98 / backup.sh
Last active April 27, 2022 03:21
scratch tools
while true
do
sleep 300
tar cf - --directory /scratch/$USER/local pytorch | lz4 - -f /scratch/$USER/local-pytorch.lz4 -q
mv /scratch/$USER/local-pytorch.lz4 /data/home/$USER/scratch_tools/
done
@davidberard98
davidberard98 / extremal.txt
Last active April 28, 2022 21:02
extremal nvfuser opinfo failures - logs
2022-04-28T18:21:31.8743911Z ======================================================================
2022-04-28T18:21:31.8745075Z FAIL [0.152s]: test_nvfuser_extremal_values_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16 (__main__.TestCudaFuserOpInfoCUDA)
2022-04-28T18:21:31.8746363Z ----------------------------------------------------------------------
2022-04-28T18:21:31.8746833Z Traceback (most recent call last):
2022-04-28T18:21:31.8747861Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 1796, in wrapper
2022-04-28T18:21:31.8748515Z method(*args, **kwargs)
2022-04-28T18:21:31.8749414Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_utils.py", line 1796, in wrapper
2022-04-28T18:21:31.8749935Z method(*args, **kwargs)
2022-04-28T18:21:31.8750485Z File "/opt/conda/lib/python3.9/site-packages/torch/testing/_internal/common_device_type.py", line 376, in instantiated_test
2022-04-28T18:21:31.8751089Z result =
srun: error: ioctl(TIOCGWINSZ): Inappropriate ioctl for device
srun: error: Not using a pseudo-terminal, disregarding --pty option
[DUMP graph_fuser.cpp:2323] Before Fusion:
[DUMP graph_fuser.cpp:2323] graph(%t1.1 : Tensor,
[DUMP graph_fuser.cpp:2323] %t2.1 : Tensor,
[DUMP graph_fuser.cpp:2323] %t3.1 : Tensor,
[DUMP graph_fuser.cpp:2323] %t4.1 : Tensor,
[DUMP graph_fuser.cpp:2323] %i1.1 : int,
[DUMP graph_fuser.cpp:2323] %i2.1 : int):
[DUMP graph_fuser.cpp:2323] %9 : int = prim::Constant[value=-1]() # /fsx/users/dberard/pytorch/33-repro.py:8:28
@davidberard98
davidberard98 / nvfuser-microbenchmarks-apr19.csv
Last active April 19, 2022 21:01
nvfuser microbenchmark results from apr19
name eager (ms) nnc static (ms) nnc dynamic (ms) nvfuser
autogen-0 0.290 0.281 0.283 0.279
autogen-1 0.177 0.176 0.176 0.175
autogen-2 0.489 0.464 0.491 0.276
autogen-3 4.090 0.875 0.919 1.002
batchnorm-silu 0.289 0.285 0.285 0.221
autogen-4 0.372 0.372 0.368 0.431
autogen-5 0.599 0.597 0.619 0.313
autogen-6 5.152 1.212 1.169 1.384
autogen-7 0.185 0.185 0.183 0.184
import torch
import torch.utils.jit.log_extract as log_extract
ir = """graph(%0 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%1 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%2 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%3 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%4 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%5 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%6 : Double(requires_grad=0, device=cuda:0),
import torch
from torchvision.models import regnet_y_128gf
def run(model, iters: int = 20, bs: int = 64, device="cuda") -> None:
print("Warm up ...")
with torch.no_grad():
for i in range(5):
model(torch.rand(bs, 3, 224, 224, device=device))
print("Start benchmarking...")
$ python ../../../test/test_jit_cuda_fuser.py -v
test__softmax_function (__main__.TestCudaFuser) ... ok
test__softmax_function_half_to_float (__main__.TestCudaFuser) ... ok
test_addcmul_ops (__main__.TestCudaFuser) ... ok
test_alias_pass_fix (__main__.TestCudaFuser) ... ERROR
test_autocast_1 (__main__.TestCudaFuser) ... ok
test_autocast_1_bfloat (__main__.TestCudaFuser) ... skipped 'device does not support BFloat16'
test_autocast_2 (__main__.TestCudaFuser) ... ok
test_autocast_2_bfloat (__main__.TestCudaFuser) ... skipped 'device does not support BFloat16'
test_backward_type (__main__.TestCudaFuser) ... ok