Created
April 29, 2023 18:18
-
-
Save vjcitn/7dc4c2cdfc43d07583894bb931110c9e to your computer and use it in GitHub Desktop.
errors from anvil GPU
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> r1 = run_cifar100() # pip3 install tensorflow; BiocManager::install("vjcitn/littleDeep", dependencies=TRUE) | |
No non-system installation of Python could be found. | |
Would you like to download and install Miniconda? | |
Miniconda is an open source environment management system for Python. | |
See https://docs.conda.io/en/latest/miniconda.html for more details. | |
Would you like to install Miniconda? [Y/n]: n | |
Installation aborted. | |
2023-04-29 18:14:44.950515: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. | |
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. | |
2023-04-29 18:14:45.911588: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT | |
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz | |
169001437/169001437 [==============================] - 3s 0us/step | |
2023-04-29 18:14:59.114193: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:14:59.155711: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:14:59.156822: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:14:59.158565: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:14:59.159530: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:14:59.160423: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:15:00.141598: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:15:00.142666: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:15:00.143572: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:996] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355 | |
2023-04-29 18:15:00.144383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13745 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5 | |
Epoch 1/30 | |
2023-04-29 18:15:07.031058: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8700 | |
2023-04-29 18:15:07.849723: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory | |
2023-04-29 18:15:07.850611: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory | |
2023-04-29 18:15:07.850657: W tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:109] Couldn't get ptxas version : FAILED_PRECONDITION: Couldn't get ptxas/nvlink version string: INTERNAL: Couldn't invoke ptxas --version | |
2023-04-29 18:15:07.851492: I tensorflow/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory | |
2023-04-29 18:15:07.851594: W tensorflow/compiler/xla/stream_executor/gpu/redzone_allocator.cc:317] INTERNAL: Failed to launch ptxas | |
Relying on driver to perform ptx compilation. | |
Modify $PATH to customize ptxas location. | |
This message will be only logged once. | |
2023-04-29 18:15:08.078024: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x560c20215ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: | |
2023-04-29 18:15:08.078050: I tensorflow/compiler/xla/service/service.cc:177] StreamExecutor device (0): Tesla T4, Compute Capability 7.5 | |
2023-04-29 18:15:08.083889: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable. | |
2023-04-29 18:15:08.102549: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:530] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. | |
Searched for CUDA in the following directories: | |
./cuda_sdk_lib | |
/usr/local/cuda-11.8 | |
/usr/local/cuda | |
. | |
You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work. | |
2023-04-29 18:15:08.102920: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.103185: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.103213: I tensorflow/core/common_runtime/executor.cc:1197] [/job:localhost/replica:0/task:0/device:GPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: libdevice not found at ./libdevice.10.bc | |
[[{{node StatefulPartitionedCall_11}}]] | |
2023-04-29 18:15:08.118935: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.119170: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.134784: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.135017: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.150042: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.150280: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.462781: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.463042: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.478046: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.478271: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.550620: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.550855: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.564797: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.565008: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.630110: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.630340: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.644823: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.645067: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.684544: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.684785: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.699331: W tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:274] libdevice is required by this HLO module but was not found at ./libdevice.10.bc | |
2023-04-29 18:15:08.699613: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at xla_ops.cc:362 : INTERNAL: libdevice not found at ./libdevice.10.bc | |
Error: tensorflow.python.framework.errors_impl.InternalError: Graph execution error: | |
<... omitted ...>rs/optimizer.py", line 650, in apply_gradients | |
iteration = self._internal_apply_gradients(grads_and_vars) | |
File "/home/rstudio/.local/lib/python3.10/site-packages/keras/optimizers/optimizer.py", line 1200, in _internal_apply_gradients | |
return tf.__internal__.distribute.interim.maybe_merge_call( | |
File "/home/rstudio/.local/lib/python3.10/site-packages/keras/optimizers/optimizer.py", line 1250, in _distributed_apply_gradients_fn | |
distribution.extended.update( | |
File "/home/rstudio/.local/lib/python3.10/site-packages/keras/optimizers/optimizer.py", line 1245, in apply_grad_to_update_var | |
return self._update_step_xla(grad, var, id(self._var_key(var))) | |
Node: 'StatefulPartitionedCall_11' | |
libdevice not found at ./libdevice.10.bc | |
[[{{node StatefulPartitionedCall_11}}]] [Op:__inference_train_function_1255] | |
See `reticulate::py_last_error()` for details |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment