Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save briansp2020/3e5a83cc5b25bd0c69c30174d3f8e696 to your computer and use it in GitHub Desktop.
Save briansp2020/3e5a83cc5b25bd0c69c30174d3f8e696 to your computer and use it in GitHub Desktop.
(pt) root@rocm:~# python tmp/quickstart.py
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ConvNeXt_Small_Weights.IMAGENET1K_V1`. You can also use `weights=ConvNeXt_Small_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
epoch train_loss valid_loss error_rate time
0 0.110175 0.006588 0.001353 00:34
epoch train_loss valid_loss error_rate time
0 0.015370 0.002482 0.000677 00:47
Training text processing model
epoch train_loss valid_loss accuracy time
0 0.467646 0.418376 0.808040 01:08
epoch train_loss valid_loss accuracy time
Traceback (most recent call last):---------------------------------------| 0.00% [0/390 00:00<?]
File "/root/tmp/quickstart.py", line 20, in <module>
learn.fine_tune(2, 1e-2)
File "/root/pt/lib/python3.10/site-packages/fastai/callback/schedule.py", line 168, in fine_tune
self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
File "/root/pt/lib/python3.10/site-packages/fastai/callback/schedule.py", line 119, in fit_one_cycle
self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 264, in fit
self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 253, in _do_fit
self._with_events(self._do_epoch, 'epoch', CancelEpochException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 247, in _do_epoch
self._do_epoch_train()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 239, in _do_epoch_train
self._with_events(self.all_batches, 'train', CancelTrainException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 205, in all_batches
for o in enumerate(self.dl): self.one_batch(*o)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 235, in one_batch
self._with_events(self._do_one_batch, 'batch', CancelBatchException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 223, in _do_one_batch
self._do_grad_opt()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 211, in _do_grad_opt
self._with_events(self._backward, 'backward', CancelBackwardException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 207, in _backward
def _backward(self): self.loss_grad.backward()
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 516, in backward
return handle_torch_function(
File "/root/pt/lib/python3.10/site-packages/torch/overrides.py", line 1636, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
File "/root/pt/lib/python3.10/site-packages/fastai/torch_core.py", line 382, in __torch_function__
res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 1443, in __torch_function__
ret = func(*args, **kwargs)
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/root/pt/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
_engine_run_backward(
File "/root/pt/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: unique_by_key failed on 2nd step: hipErrorSharedObjectInitFailed: shared object initialization failed
(pt) root@rocm:~# pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0
Looking in indexes: https://download.pytorch.org/whl/nightly/rocm6.0
Collecting torch
Using cached https://download.pytorch.org/whl/nightly/rocm6.0/torch-2.4.0.dev20240503%2Brocm6.0-cp310-cp310-linux_x86_64.whl (2198.3 MB)
Collecting torchvision
Using cached https://download.pytorch.org/whl/nightly/rocm6.0/torchvision-0.19.0.dev20240503%2Brocm6.0-cp310-cp310-linux_x86_64.whl (65.9 MB)
Collecting torchaudio
Using cached https://download.pytorch.org/whl/nightly/rocm6.0/torchaudio-2.2.0.dev20240503%2Brocm6.0-cp310-cp310-linux_x86_64.whl (1.7 MB)
Collecting networkx
Using cached https://download.pytorch.org/whl/nightly/networkx-3.2.1-py3-none-any.whl (1.6 MB)
Collecting typing-extensions>=4.8.0
Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting sympy
Using cached https://download.pytorch.org/whl/nightly/sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting jinja2
Using cached https://download.pytorch.org/whl/nightly/Jinja2-3.1.3-py3-none-any.whl (133 kB)
Collecting pytorch-triton-rocm==3.0.0+bbe6246e37
Using cached https://download.pytorch.org/whl/nightly/pytorch_triton_rocm-3.0.0%2Bbbe6246e37-cp310-cp310-linux_x86_64.whl (330.5 MB)
Collecting fsspec
Using cached https://download.pytorch.org/whl/nightly/fsspec-2024.2.0-py3-none-any.whl (170 kB)
Collecting filelock
Using cached https://download.pytorch.org/whl/nightly/filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting pillow!=8.3.*,>=5.3.0
Using cached https://download.pytorch.org/whl/nightly/Pillow-9.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting numpy
Using cached https://download.pytorch.org/whl/nightly/numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Collecting MarkupSafe>=2.0
Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting mpmath>=0.19
Using cached https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB)
Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, fsspec, filelock, pytorch-triton-rocm, jinja2, torch, torchvision, torchaudio
Attempting uninstall: mpmath
Found existing installation: mpmath 1.3.0
Uninstalling mpmath-1.3.0:
Successfully uninstalled mpmath-1.3.0
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.9.0
Uninstalling typing_extensions-4.9.0:
Successfully uninstalled typing_extensions-4.9.0
Attempting uninstall: sympy
Found existing installation: sympy 1.12
Uninstalling sympy-1.12:
Successfully uninstalled sympy-1.12
Attempting uninstall: pillow
Found existing installation: pillow 10.2.0
Uninstalling pillow-10.2.0:
Successfully uninstalled pillow-10.2.0
Attempting uninstall: numpy
Found existing installation: numpy 1.26.3
Uninstalling numpy-1.26.3:
Successfully uninstalled numpy-1.26.3
Attempting uninstall: networkx
Found existing installation: networkx 3.2.1
Uninstalling networkx-3.2.1:
Successfully uninstalled networkx-3.2.1
Attempting uninstall: MarkupSafe
Found existing installation: MarkupSafe 2.1.5
Uninstalling MarkupSafe-2.1.5:
Successfully uninstalled MarkupSafe-2.1.5
Attempting uninstall: fsspec
Found existing installation: fsspec 2024.2.0
Uninstalling fsspec-2024.2.0:
Successfully uninstalled fsspec-2024.2.0
Attempting uninstall: filelock
Found existing installation: filelock 3.13.1
Uninstalling filelock-3.13.1:
Successfully uninstalled filelock-3.13.1
Attempting uninstall: pytorch-triton-rocm
Found existing installation: pytorch-triton-rocm 2.3.0
Uninstalling pytorch-triton-rocm-2.3.0:
Successfully uninstalled pytorch-triton-rocm-2.3.0
Attempting uninstall: jinja2
Found existing installation: Jinja2 3.1.3
Uninstalling Jinja2-3.1.3:
Successfully uninstalled Jinja2-3.1.3
Attempting uninstall: torch
Found existing installation: torch 2.3.0+rocm6.0
Uninstalling torch-2.3.0+rocm6.0:
Successfully uninstalled torch-2.3.0+rocm6.0
Attempting uninstall: torchvision
Found existing installation: torchvision 0.18.0+rocm6.0
Uninstalling torchvision-0.18.0+rocm6.0:
Successfully uninstalled torchvision-0.18.0+rocm6.0
Attempting uninstall: torchaudio
Found existing installation: torchaudio 2.3.0+rocm6.0
Uninstalling torchaudio-2.3.0+rocm6.0:
Successfully uninstalled torchaudio-2.3.0+rocm6.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.7.15 requires torch<2.4,>=1.10, but you have torch 2.4.0.dev20240503+rocm6.0 which is incompatible.
Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.2.1 networkx-3.2.1 numpy-1.26.4 pillow-9.3.0 pytorch-triton-rocm-3.0.0+bbe6246e37 sympy-1.12 torch-2.4.0.dev20240503+rocm6.0 torchaudio-2.2.0.dev20240503+rocm6.0 torchvision-0.19.0.dev20240503+rocm6.0 typing-extensions-4.8.0
(pt) root@rocm:~# python tmp/quickstart.py
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ConvNeXt_Small_Weights.IMAGENET1K_V1`. You can also use `weights=ConvNeXt_Small_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
epoch train_loss valid_loss error_rate time
0 0.112227 0.001195 0.000000 00:35
epoch train_loss valid_loss error_rate time
0 0.011899 0.002989 0.000677 00:48
Training text processing model
epoch train_loss valid_loss accuracy time
0 0.465231 0.393258 0.821160 01:09
epoch train_loss valid_loss accuracy time
Traceback (most recent call last):---------------------------------------| 0.00% [0/390 00:00<?]
File "/root/tmp/quickstart.py", line 20, in <module>
learn.fine_tune(2, 1e-2)
File "/root/pt/lib/python3.10/site-packages/fastai/callback/schedule.py", line 168, in fine_tune
self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
File "/root/pt/lib/python3.10/site-packages/fastai/callback/schedule.py", line 119, in fit_one_cycle
self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd, start_epoch=start_epoch)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 264, in fit
self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 253, in _do_fit
self._with_events(self._do_epoch, 'epoch', CancelEpochException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 247, in _do_epoch
self._do_epoch_train()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 239, in _do_epoch_train
self._with_events(self.all_batches, 'train', CancelTrainException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 205, in all_batches
for o in enumerate(self.dl): self.one_batch(*o)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 235, in one_batch
self._with_events(self._do_one_batch, 'batch', CancelBatchException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 223, in _do_one_batch
self._do_grad_opt()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 211, in _do_grad_opt
self._with_events(self._backward, 'backward', CancelBackwardException)
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 199, in _with_events
try: self(f'before_{event_type}'); f()
File "/root/pt/lib/python3.10/site-packages/fastai/learner.py", line 207, in _backward
def _backward(self): self.loss_grad.backward()
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 514, in backward
return handle_torch_function(
File "/root/pt/lib/python3.10/site-packages/torch/overrides.py", line 1646, in handle_torch_function
result = torch_func_method(public_api, types, args, kwargs)
File "/root/pt/lib/python3.10/site-packages/fastai/torch_core.py", line 382, in __torch_function__
res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 1441, in __torch_function__
ret = func(*args, **kwargs)
File "/root/pt/lib/python3.10/site-packages/torch/_tensor.py", line 523, in backward
torch.autograd.backward(
File "/root/pt/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
_engine_run_backward(
File "/root/pt/lib/python3.10/site-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: unique_by_key failed on 2nd step: hipErrorSharedObjectInitFailed: shared object initialization failed
(pt) root@rocm:~# pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1
Looking in indexes: https://download.pytorch.org/whl/nightly/rocm6.1
Collecting torch
Using cached https://download.pytorch.org/whl/nightly/rocm6.1/torch-2.4.0.dev20240503%2Brocm6.1-cp310-cp310-linux_x86_64.whl (2496.9 MB)
Collecting torchvision
Using cached https://download.pytorch.org/whl/nightly/rocm6.1/torchvision-0.19.0.dev20240503%2Brocm6.1-cp310-cp310-linux_x86_64.whl (72.7 MB)
Collecting torchaudio
Using cached https://download.pytorch.org/whl/nightly/rocm6.1/torchaudio-2.2.0.dev20240503%2Brocm6.1-cp310-cp310-linux_x86_64.whl (1.7 MB)
Collecting jinja2
Using cached https://download.pytorch.org/whl/nightly/Jinja2-3.1.3-py3-none-any.whl (133 kB)
Collecting pytorch-triton-rocm==3.0.0+bbe6246e37
Using cached https://download.pytorch.org/whl/nightly/pytorch_triton_rocm-3.0.0%2Bbbe6246e37-cp310-cp310-linux_x86_64.whl (330.5 MB)
Collecting filelock
Using cached https://download.pytorch.org/whl/nightly/filelock-3.13.1-py3-none-any.whl (11 kB)
Collecting typing-extensions>=4.8.0
Using cached https://download.pytorch.org/whl/nightly/typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Collecting fsspec
Using cached https://download.pytorch.org/whl/nightly/fsspec-2024.2.0-py3-none-any.whl (170 kB)
Collecting networkx
Using cached https://download.pytorch.org/whl/nightly/networkx-3.2.1-py3-none-any.whl (1.6 MB)
Collecting sympy
Using cached https://download.pytorch.org/whl/nightly/sympy-1.12-py3-none-any.whl (5.7 MB)
Collecting numpy
Using cached https://download.pytorch.org/whl/nightly/numpy-1.26.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
Collecting pillow!=8.3.*,>=5.3.0
Using cached https://download.pytorch.org/whl/nightly/Pillow-9.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.2 MB)
Collecting MarkupSafe>=2.0
Using cached https://download.pytorch.org/whl/nightly/MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting mpmath>=0.19
Using cached https://download.pytorch.org/whl/nightly/mpmath-1.2.1-py3-none-any.whl (532 kB)
Installing collected packages: mpmath, typing-extensions, sympy, pillow, numpy, networkx, MarkupSafe, fsspec, filelock, pytorch-triton-rocm, jinja2, torch, torchvision, torchaudio
Attempting uninstall: mpmath
Found existing installation: mpmath 1.2.1
Uninstalling mpmath-1.2.1:
Successfully uninstalled mpmath-1.2.1
Attempting uninstall: typing-extensions
Found existing installation: typing_extensions 4.8.0
Uninstalling typing_extensions-4.8.0:
Successfully uninstalled typing_extensions-4.8.0
Attempting uninstall: sympy
Found existing installation: sympy 1.12
Uninstalling sympy-1.12:
Successfully uninstalled sympy-1.12
Attempting uninstall: pillow
Found existing installation: Pillow 9.3.0
Uninstalling Pillow-9.3.0:
Successfully uninstalled Pillow-9.3.0
Attempting uninstall: numpy
Found existing installation: numpy 1.26.4
Uninstalling numpy-1.26.4:
Successfully uninstalled numpy-1.26.4
Attempting uninstall: networkx
Found existing installation: networkx 3.2.1
Uninstalling networkx-3.2.1:
Successfully uninstalled networkx-3.2.1
Attempting uninstall: MarkupSafe
Found existing installation: MarkupSafe 2.1.5
Uninstalling MarkupSafe-2.1.5:
Successfully uninstalled MarkupSafe-2.1.5
Attempting uninstall: fsspec
Found existing installation: fsspec 2024.2.0
Uninstalling fsspec-2024.2.0:
Successfully uninstalled fsspec-2024.2.0
Attempting uninstall: filelock
Found existing installation: filelock 3.13.1
Uninstalling filelock-3.13.1:
Successfully uninstalled filelock-3.13.1
Attempting uninstall: pytorch-triton-rocm
Found existing installation: pytorch-triton-rocm 3.0.0+bbe6246e37
Uninstalling pytorch-triton-rocm-3.0.0+bbe6246e37:
Successfully uninstalled pytorch-triton-rocm-3.0.0+bbe6246e37
Attempting uninstall: jinja2
Found existing installation: Jinja2 3.1.3
Uninstalling Jinja2-3.1.3:
Successfully uninstalled Jinja2-3.1.3
Attempting uninstall: torch
Found existing installation: torch 2.4.0.dev20240503+rocm6.0
Uninstalling torch-2.4.0.dev20240503+rocm6.0:
Successfully uninstalled torch-2.4.0.dev20240503+rocm6.0
Attempting uninstall: torchvision
Found existing installation: torchvision 0.19.0.dev20240503+rocm6.0
Uninstalling torchvision-0.19.0.dev20240503+rocm6.0:
Successfully uninstalled torchvision-0.19.0.dev20240503+rocm6.0
Attempting uninstall: torchaudio
Found existing installation: torchaudio 2.2.0.dev20240503+rocm6.0
Uninstalling torchaudio-2.2.0.dev20240503+rocm6.0:
Successfully uninstalled torchaudio-2.2.0.dev20240503+rocm6.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.7.15 requires torch<2.4,>=1.10, but you have torch 2.4.0.dev20240503+rocm6.1 which is incompatible.
Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.2.0 jinja2-3.1.3 mpmath-1.2.1 networkx-3.2.1 numpy-1.26.4 pillow-9.3.0 pytorch-triton-rocm-3.0.0+bbe6246e37 sympy-1.12 torch-2.4.0.dev20240503+rocm6.1 torchaudio-2.2.0.dev20240503+rocm6.1 torchvision-0.19.0.dev20240503+rocm6.1 typing-extensions-4.8.0
(pt) root@rocm:~# python tmp/quickstart.py
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/root/pt/lib/python3.10/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ConvNeXt_Small_Weights.IMAGENET1K_V1`. You can also use `weights=ConvNeXt_Small_Weights.DEFAULT` to get the most up-to-date weights.
warnings.warn(msg)
epoch train_loss valid_loss error_rate time
0 0.117716 0.001144 0.000677 00:35
epoch train_loss valid_loss error_rate time
0 0.013864 0.000865 0.000677 00:48
Training text processing model
epoch train_loss valid_loss accuracy time
0 0.464525 0.385803 0.827080 01:09
epoch train_loss valid_loss accuracy time
0 0.289083 0.221270 0.913160 02:03
1 0.228864 0.199963 0.922200 02:04
(pt) root@rocm:~#
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment