Skip to content

Instantly share code, notes, and snippets.

View davidberard98's full-sized avatar

David Berard davidberard98

  • PyTorch
  • Menlo Park, CA
View GitHub Profile
import torch
import torchdynamo
import os
import logging
torchdynamo.config.verbose = True
torchdynamo.config.log_level = logging.DEBUG
def setup():
os.environ["MASTER_ADDR"] = "localhost"
@davidberard98
davidberard98 / 67434_0_log.out
Last active September 23, 2022 00:49
hf_T5, 2 nodes, dynamo+inductor, verbose=True, log_level=DEBUG; functorch..debug_graphs is FALSE.
This file has been truncated, but you can view the full file.
submitit INFO (2022-09-22 18:42:53,293) - Starting with JobEnvironment(job_id=67434, hostname=a100-st-p4d24xlarge-3, local_rank=0(8), node=0(2), global_rank=0(16))
submitit INFO (2022-09-22 18:42:53,294) - Loading pickle: /fsx/users/dberard/scratch-local/bench-fast/benchmark/logs/67434_submitted.pkl
Process group: 16 tasks, rank: 0
MY HOSTNAME: a100-st-p4d24xlarge-3
FI_PROVIDER : efa
LD_LIBRARY_PATH : /fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/lib:/opt/amazon/efa/lib:/fsx/users/dberard/scratch-local/bench-fast/aws-ofi-nccl/lib:/opt/amazon/efa/lib:/path/to/aws-ofi-nccl:/opt/amazon/efa/lib:/path/to/aws-ofi-nccl:/opt/amazon/efa/lib:/usr/local/cuda-11.6/lib:/usr/local/cuda-11.6/lib64:/opt/amazon/efa/lib:/opt/amazon/openmpi/lib:/usr/local/cuda/efa/lib:/usr/local/cuda/lib:/usr/local/cuda:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda/targets/x86_64-linux/lib:/usr/local/lib:/usr/lib:
NCCL_DEBUG : TRACE
FI_EFA_USE_DEVICE_RDMA : 1
a100-st-p4d24xlarge-3:69371:69371 [0] NCCL INFO
[I debug.cpp:49] [c10d] The debug level is set to DETAIL.
[I ProcessGroupNCCL.cpp:835] [Rank 0] NCCL watchdog thread started!
[I ProcessGroupNCCL.cpp:669] [Rank 0] ProcessGroupNCCL initialized with following options:
NCCL_ASYNC_ERROR_HANDLING: -2
NCCL_DESYNC_DEBUG: 1
NCCL_BLOCKING_WAIT: 0
TIMEOUT(ms): 1800000
USE_HIGH_PRIORITY_STREAM: 0
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
This file has been truncated, but you can view the full file.
submitit INFO (2022-09-15 00:31:04,046) - Starting with JobEnvironment(job_id=64923, hostname=a100-st-p4d24xlarge-15, local_rank=0(8), node=0(1), global_rank=0(8))
submitit INFO (2022-09-15 00:31:04,047) - Loading pickle: /fsx/users/dberard/scratch-local/bench-fast/benchmark/logs/64923_submitted.pkl
Process group: 8 tasks, rank: 0
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO NCCL_SOCKET_IFNAME set to ens
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO Bootstrap : Using ens32:10.200.83.149<0>
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin symbol (v4 or v5).
a100-st-p4d24xlarge-15:26872:26872 [0] NCCL INFO cudaDriverVersion 11060
NCCL version 2.13.4+cuda11.6
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
WARNING:root:Using TorchDynamo with a context manager will be deprecated soon.Please read https://github.com/pytorch/torchdynamo#usage-example to use TorchDynamo using an annotation.
ERROR:root:Error while processing frame
Traceback (most recent call last):
File "/data/home/dberard/miniconda/envs/bench-fast/lib/python3.8/site-packages/torchd
@davidberard98
davidberard98 / build.sh
Last active July 6, 2022 00:35
duplicate build
g++ counter.cpp -c -o counter.o
g++ x.cpp -c -o x.o
g++ x.cpp -c -o y.o
g++ main.cpp -c -o main.o
g++ counter.o x.o y.o main.o -o main
resnext50_32x4d_forward_0 54 218.74908078461885 4436.730502173305 57523.576692678034 4421.9345189630985 None
resnext50_32x4d_backward_0 50 285.865287296474 42.50716231763363 None None None
nvidia_deeprecommender_backward_0 12 15.30700083822012 19.517005421221256 None None None
nvidia_deeprecommender_forward_0 6 11.350277811288834 14.164607971906662 4775.836288928986 8.331834338605404 True
moco_forward_4 0 0.40543172508478165 0.25066547095775604 0.2816496416926384 0.22500194609165192 True
moco_backward_0 52 211.168865673244 46.68072052299976 None None None
moco_backward_7 0 0.3038998693227768 0.578882172703743 0.03198999911546707 0.017229467630386353 True
moco_forward_9 5 4.927339963614941 1.6684727743268013 None None None
moco_forward_3 0 0.7490329444408417 0.4087546840310097 0.3969920799136162 0.3871563822031021 True
moco_backward_10 0 0.23642834275960922 0.047839246690273285 0.027767382562160492 0.01574307680130005 True
====== resnext50_32x4d_forward_0 ======
Generating testing data...
...Aten2aten called ...
['getattr', '_operator.getitem', 'torch.ops.aten.abs.default', 'torch.ops.aten.abs.out', 'torch.ops.aten.acos.default', 'torch.ops.aten.acos.out', 'torch.ops.aten.acos.int', 'torch.ops.aten.acos.float', 'torch.ops.aten.acos.complex', 'torch.ops.aten.acos.Scalar', 'torch.ops.aten.acosh.default', 'torch.ops.aten.acosh.out', 'torch.ops.aten.acosh.int', 'torch.ops.aten.acosh.float', 'torch.ops.aten.acosh.complex', 'torch.ops.aten.acosh.Scalar', 'torch.ops.aten.asin.default', 'torch.ops.aten.asin.out', 'torch.ops.aten.asin.int', 'torch.ops.aten.asin.float', 'torch.ops.aten.asin.complex', 'torch.ops.aten.asin.Scalar', 'torch.ops.aten.atan.default', 'torch.ops.aten.atan.out', 'torch.ops.aten.atan.int', 'torch.ops.aten.atan.float', 'torch.ops.aten.atan.complex', 'torch.ops.aten.atan.Scalar', 'torch.ops.aten.bitwise_not.default', 'torch.ops.aten.bitwise_not.out', 'torch.ops.aten.ceil.default', 'torch.ops.aten.ceil.out', 'torch.o
import argparse
from torch.utils.jit.log_extract import load_graph_and_inputs, run_nnc, run_nvfuser
ir = ["""graph(%maskT.1 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%1 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%2 : Double(204, 204, 26, strides=[15912, 78, 3], requires_grad=0, device=cuda:0),
%zt.1 : Double(26, strides=[1], requires_grad=0, device=cuda:0)):
%4 : float = prim::Constant[value=0.79871999999999999]()
%betaT.1 : float = prim::Constant[value=0.00016699999999999999]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_isoneutral_mixing/isoneutral_pytorch.py:9:12
%rho0.1 : float = prim::Constant[value=1024.]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_isoneutral_mixing/isoneutral_pytorch.py:5:11
<GRAPH_EXPORT>
graph(%s.1 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%t.1 : Double(204, 204, 26, strides=[5304, 26, 1], requires_grad=0, device=cuda:0),
%p.1 : Double(1, 1, 26, strides=[26, 26, 1], requires_grad=0, device=cuda:0)):
%3 : float = prim::Constant[value=2500.]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_equation_of_state/eos_pytorch.py:236:10
%4 : float = prim::Constant[value=5000.]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_equation_of_state/eos_pytorch.py:197:10
%5 : float = prim::Constant[value=3.028951243773433e-17]()
%6 : float = prim::Constant[value=1.2115804975093732e-16]()
%7 : float = prim::Constant[value=10000.]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_equation_of_state/eos_pytorch.py:184:8
%v31.1 : float = prim::Constant[value=-3.3033088713864211e-05]() # /scratch/dberard/bench-june/benchmark/torchbenchmark/models/pyhpc_equation_of_state