Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save VRehnberg/1272e6df834673583ba5133f9df45224 to your computer and use it in GitHub Desktop.
Save VRehnberg/1272e6df834673583ba5133f9df45224 to your computer and use it in GitHub Desktop.
(partial) EasyBuild log for failed build of /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-rbsrurr4/files_pr21438/d/DeepSpeed/DeepSpeed-0.14.5-foss-2023a-CUDA-12.1.1.eb (PR(s) #21438) (easyblock PR(s) #3450)
== 2024-11-12 11:53:10,711 easyblock.py:311 INFO This is EasyBuild 4.9.4 (framework: 4.9.4, easyblocks: 4.9.4) on host alvis4-25.
== 2024-11-12 11:53:10,711 easyblock.py:317 INFO This is easyblock PythonBundle from module easybuild.easyblocks.generic.pythonbundle (/apps/Common/software/EasyBuild/4.9.4/lib/python3.6/site-packages/easybuild/easyblocks/generic/pythonbundle.py)
== 2024-11-12 11:53:10,711 easyblock.py:1063 INFO Build dir set to /dev/shm/DeepSpeed/0.14.5/foss-2023a-CUDA-12.1.1
== 2024-11-12 11:53:10,711 easyblock.py:1120 INFO Software install dir set to /apps/Test/software/DeepSpeed/0.14.5-foss-2023a-CUDA-12.1.1
== 2024-11-12 11:53:10,711 easyblock.py:1125 INFO Module install dir set to /apps/Test/fmodules/all
== 2024-11-12 11:53:10,711 easyblock.py:286 INFO Init completed for application name DeepSpeed version 0.14.5
== 2024-11-12 11:53:10,711 easyconfig.py:467 INFO Performing quick parse to check for valid easyconfig file...
== 2024-11-12 11:53:10,712 environment.py:93 INFO Environment variable LMOD_QUIET set to 1 (previous value: '1')
== 2024-11-12 11:53:10,712 environment.py:93 INFO Environment variable LMOD_IGNORE_CACHE set to 1 (previous value: '1')
== 2024-11-12 11:53:10,712 environment.py:93 INFO Environment variable LMOD_REDIRECT set to no (previous value: 'no')
== 2024-11-12 11:53:10,712 environment.py:93 INFO Environment variable LMOD_EXTENDED_DEFAULT set to no (previous value: 'no')
== 2024-11-12 11:53:10,713 environment.py:93 INFO Environment variable LMOD_TERSE_DECORATIONS set to no (previous value: 'no')
== 2024-11-12 11:53:10,713 modules.py:301 INFO Full path for Lmod command is /usr/share/lmod/lmod/libexec/lmod, so using it
== 2024-11-12 11:53:10,713 modules.py:457 INFO Prepended list of module paths with path used by EasyBuild: /apps/Test/fmodules/all
== 2024-11-12 11:53:10,713 easyconfig.py:476 INFO Obtained list of valid module classes: ['base', 'ai', 'astro', 'bio', 'cae', 'chem', 'compiler', 'data', 'debugger', 'devel', 'geo', 'ide', 'lang', 'lib', 'math', 'mpi', 'numlib', 'perf', 'quantum', 'phys', 'system', 'toolchain', 'tools', 'vis']
== 2024-11-12 11:53:10,714 easyconfig.py:1911 INFO Derived full easyblock module path for PythonBundle: easybuild.easyblocks.generic.pythonbundle
== 2024-11-12 11:53:10,714 easyconfig.py:1960 INFO Successfully obtained class 'PythonBundle' for easyblock 'PythonBundle' (software name 'DeepSpeed')
== 2024-11-12 11:53:10,714 easyconfig.py:708 INFO Parsing easyconfig file None with rawcontent: easyblock = 'PythonBundle'
name = 'DeepSpeed'
version = '0.14.5'
versionsuffix = '-CUDA-%(cudaver)s'
homepage = "http://www.deepspeed.ai/"
description = """
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
"""
toolchain = {'name': 'foss', 'version': '2023a'}
builddependencies = [
('Ninja', '1.11.1'),
('Transformers', '4.39.3'),
]
dependencies = [
('Python', '3.11.3'),
('CUDA', '12.1.1', '', SYSTEM),
('NCCL', '2.18.3', versionsuffix),
('CUTLASS', '3.4.0', versionsuffix),
('PyTorch', '2.1.2', versionsuffix),
('CuPy', '13.0.0', versionsuffix),
('Triton', '2.1.0', versionsuffix),
('accelerate', '0.33.0', versionsuffix),
('PyTorch-bundle', '2.1.2', versionsuffix), # torchvision dependency for mup
('mpi4py', '3.1.4'),
('Seaborn', '0.13.2'), # dependency for mup
('DLPack', '0.8'),
('py-cpuinfo', '9.0.0'),
('pydantic', '2.5.3'),
('tqdm', '4.66.1'),
('libaio', '0.3.113'), # for async_io (builddep only?)
]
use_pip = True
github_account = 'microsoft'
exts_list = [
('hjson', '3.1.0', {
'checksums': ['55af475a27cf83a7969c808399d7bccdec8fb836a07ddbd574587593b9cdcf75'],
}),
('nvidia-ml-py', '12.535.161', {
'checksums': ['2bcc31ff7a0ea291ed8d7fc39b149391a42c2fb1cb4256c935e692de488b4d17'],
'modulename': 'pynvml',
}),
('mup', '1.0.0', {
'checksums': ['9639e3d19f90e754f985ed444542ed2f8a049f3c0488fcb6efe150f30922cf74'],
}),
(name, version, {
'ds_build_ops_to_skip': [
# Sets DS_BUILD_<OPT>=0 http://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops
'SPARSE_ATTN', # requires PyTorch<2.0
'FP_QUANTIZER', # Untested triton version (2.1.0), only 2.3.0 and 2.3.1 are known to be compatible
'CUTLASS_OPS', # requires dskernels
'RAGGED_DEVICE_OPS', # requires dskernels
],
'fix_python_shebang_for': ['bin/*'],
'installopts': '--global-option="build_ext" --global-option="-j%(parallel)s"',
'patches': [
'DeepSpeed-0.14.2_no-ninja-dep.patch',
'DeepSpeed-0.14.5_pic-compile.patch',
'DeepSpeed-0.14.5_pdsh-env-vars.patch',
'DeepSpeed-0.14.5_use-eb-cutlass.patch',
'DeepSpeed-0.14.5_test-nvme-offload.patch',
],
'runtest': (
'ln -s $PWD/tests/ ../tests'
' && cd ../'
' && pytest tests/unit/'
' -k "not TestTensorBoard' # requires tensorboard
' and not TestWandb' # requires wandb
' and not TestCometMonitor"' # requires comet
),
# Test suite not available on pypi
'source_urls': [GITHUB_SOURCE],
'sources': [{'download_filename': 'v%(version)s.tar.gz', 'filename': SOURCE_TAR_GZ}],
'testinstall': True,
'checksums': [
{'DeepSpeed-0.14.5.tar.gz': '9f5622715cbd89c7382bfecf7fb188419ad3f2af7764dc6de35917abc6390cce'},
{'DeepSpeed-0.14.2_no-ninja-dep.patch': '03ab528096387e7f18d2a5a6f5fc20ed86d1ca8f63f0e65f266f4dda30e11776'},
{'DeepSpeed-0.14.5_pic-compile.patch': '1b9c070b77cf24351bff29bab7d23baacde31c7ea211a4bc75732ac38a99d6b0'},
{'DeepSpeed-0.14.5_pdsh-env-vars.patch':
'02f053d8de17e4e607b223e836658d8223cb26a3a7d8c9135e67b69aaa7f83a9'},
{'DeepSpeed-0.14.5_use-eb-cutlass.patch':
'43675f7c84fd0b0cea1050a4419020b377de414fc7f83d69b8010ab368964d8d'},
{'DeepSpeed-0.14.5_test-nvme-offload.patch':
'1592097867c5d4594a434cca727df134fcaa0e3ea8c595eb5951856a501cf422'},
],
}),
]
sanity_pip_check = True
moduleclass = 'ai'
== 2024-11-12 11:53:10,715 parser.py:139 INFO Type checking of easyconfig parameter values passed!
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter builddependencies: value [('Ninja', '1.11.1'), ('Transformers', '4.39.3')] (type: <class 'list'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter dependencies: value [('Python', '3.11.3'), ('CUDA', '12.1.1', '', {'name': 'system', 'version': 'system'}), ('NCCL', '2.18.3', '-CUDA-%(cudaver)s'), ('CUTLASS', '3.4.0', '-CUDA-%(cudaver)s'), ('PyTorch', '2.1.2', '-CUDA-%(cudaver)s'), ('CuPy', '13.0.0', '-CUDA-%(cudaver)s'), ('Triton', '2.1.0', '-CUDA-%(cudaver)s'), ('accelerate', '0.33.0', '-CUDA-%(cudaver)s'), ('PyTorch-bundle', '2.1.2', '-CUDA-%(cudaver)s'), ('mpi4py', '3.1.4'), ('Seaborn', '0.13.2'), ('DLPack', '0.8'), ('py-cpuinfo', '9.0.0'), ('pydantic', '2.5.3'), ('tqdm', '4.66.1'), ('libaio', '0.3.113')] (type: <class 'list'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter description: value
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
(type: <class 'str'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter easyblock: value PythonBundle (type: <class 'str'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter exts_list: value [('hjson', '3.1.0', {'checksums': ['55af475a27cf83a7969c808399d7bccdec8fb836a07ddbd574587593b9cdcf75']}), ('nvidia-ml-py', '12.535.161', {'checksums': ['2bcc31ff7a0ea291ed8d7fc39b149391a42c2fb1cb4256c935e692de488b4d17'], 'modulename': 'pynvml'}), ('mup', '1.0.0', {'checksums': ['9639e3d19f90e754f985ed444542ed2f8a049f3c0488fcb6efe150f30922cf74']}), ('DeepSpeed', '0.14.5', {'ds_build_ops_to_skip': ['SPARSE_ATTN', 'FP_QUANTIZER', 'CUTLASS_OPS', 'RAGGED_DEVICE_OPS'], 'fix_python_shebang_for': ['bin/*'], 'installopts': '--global-option="build_ext" --global-option="-j%(parallel)s"', 'patches': ['DeepSpeed-0.14.2_no-ninja-dep.patch', 'DeepSpeed-0.14.5_pic-compile.patch', 'DeepSpeed-0.14.5_pdsh-env-vars.patch', 'DeepSpeed-0.14.5_use-eb-cutlass.patch', 'DeepSpeed-0.14.5_test-nvme-offload.patch'], 'runtest': 'ln -s $PWD/tests/ ../tests && cd ../ && pytest tests/unit/ -k "not TestTensorBoard and not TestWandb and not TestCometMonitor"', 'source_urls': ['https://github.com/%(github_account)s/%(name)s/archive'], 'sources': [{'download_filename': 'v%(version)s.tar.gz', 'filename': '%(name)s-%(version)s.tar.gz'}], 'testinstall': True, 'checksums': [{'DeepSpeed-0.14.5.tar.gz': '9f5622715cbd89c7382bfecf7fb188419ad3f2af7764dc6de35917abc6390cce'}, {'DeepSpeed-0.14.2_no-ninja-dep.patch': '03ab528096387e7f18d2a5a6f5fc20ed86d1ca8f63f0e65f266f4dda30e11776'}, {'DeepSpeed-0.14.5_pic-compile.patch': '1b9c070b77cf24351bff29bab7d23baacde31c7ea211a4bc75732ac38a99d6b0'}, {'DeepSpeed-0.14.5_pdsh-env-vars.patch': '02f053d8de17e4e607b223e836658d8223cb26a3a7d8c9135e67b69aaa7f83a9'}, {'DeepSpeed-0.14.5_use-eb-cutlass.patch': '43675f7c84fd0b0cea1050a4419020b377de414fc7f83d69b8010ab368964d8d'}, {'DeepSpeed-0.14.5_test-nvme-offload.patch': '1592097867c5d4594a434cca727df134fcaa0e3ea8c595eb5951856a501cf422'}]})] (type: <class 'list'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter github_account: value microsoft (type: <class 'str'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter homepage: value http://www.deepspeed.ai/ (type: <class 'str'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter moduleclass: value ai (type: <class 'str'>)
== 2024-11-12 11:53:10,715 easyconfig.py:689 INFO setting easyconfig parameter name: value DeepSpeed (type: <class 'str'>)
== 2024-11-12 11:53:10,716 easyconfig.py:689 INFO setting easyconfig parameter sanity_pip_check: value True (type: <class 'bool'>)
== 2024-11-12 11:53:10,716 easyconfig.py:689 INFO setting easyconfig parameter toolchain: value {'name': 'foss', 'version': '2023a'} (type: <class 'dict'>)
== 2024-11-12 11:53:10,716 easyconfig.py:689 INFO setting easyconfig parameter use_pip: value True (type: <class 'bool'>)
== 2024-11-12 11:53:10,716 easyconfig.py:689 INFO setting easyconfig parameter version: value 0.14.5 (type: <class 'str'>)
== 2024-11-12 11:53:10,716 easyconfig.py:689 INFO setting easyconfig parameter versionsuffix: value -CUDA-%(cudaver)s (type: <class 'str'>)
== 2024-11-12 11:53:10,716 hooks.py:205 INFO Found parse_hook hook
== 2024-11-12 11:53:10,716 hooks.py:239 INFO Running 'parse_hook' hook function (args: [<easybuild.framework.easyconfig.easyconfig.EasyConfig object at 0x1495f41f5a90>], keyword args: {})...
== 2024-11-12 11:53:10,717 easyconfig.py:750 INFO Parsing dependency specifications...
== 2024-11-12 11:53:10,717 easyconfig.py:1705 INFO Generating template values...
== 2024-11-12 11:53:10,718 mpi.py:122 INFO Using template MPI command 'mpirun -n %(nr_ranks)s %(cmd)s' for MPI family 'OpenMPI'
== 2024-11-12 11:53:10,718 mpi.py:307 INFO Using MPI command template 'mpirun -n %(nr_ranks)s %(cmd)s' (params: {'nr_ranks': 1, 'cmd': 'xxx_command_xxx'})
== 2024-11-12 11:53:10,718 easyconfig.py:1724 INFO Template values: arch='x86_64', bitbucket_account='deepspeed', cuda_cc_cmake='80;86', cuda_cc_semicolon_sep='8.0;8.6', cuda_cc_space_sep='8.0 8.6', cuda_cc_space_sep_no_period='80 86', cuda_compute_capabilities='8.0,8.6', cuda_sm_comma_sep='sm_80,sm_86', cuda_sm_space_sep='sm_80 sm_86', cudamajver='12', cudaminver='1', cudashortver='12.1', cudaver='12.1.1', github_account='microsoft', mpi_cmd_prefix='mpirun -n 1', name='DeepSpeed', nameletter='D', nameletterlower='d', namelower='deepspeed', pymajver='3', pyminver='11', pyshortver='3.11', pyver='3.11.3', software_commit='', sysroot='', toolchain_name='foss', toolchain_version='2023a', version='0.14.5', version_major='0', version_major_minor='0.14', version_minor='14', versionprefix='', versionsuffix='-CUDA-12.1.1'
== 2024-11-12 11:53:10,719 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'Ninja', 'version': '1.11.1', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': True, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,719 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,720 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'Transformers', 'version': '4.39.3', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': True, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'gfbf', 'version': '2023a'}
== 2024-11-12 11:53:10,720 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'Python', 'version': '3.11.3', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,720 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'NCCL', 'version': '2.18.3', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,721 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,722 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'CUTLASS', 'version': '3.4.0', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,722 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,723 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'PyTorch', 'version': '2.1.2', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,723 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,724 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,724 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'CuPy', 'version': '13.0.0', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,725 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,726 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'Triton', 'version': '2.1.0', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,726 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,727 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'accelerate', 'version': '0.33.0', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,727 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,728 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,728 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'PyTorch-bundle', 'version': '2.1.2', 'versionsuffix': '-CUDA-12.1.1', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'foss', 'version': '2023a'}
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,729 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,729 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'mpi4py', 'version': '3.1.4', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'gompi', 'version': '2023a'}
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,730 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'Seaborn', 'version': '0.13.2', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'gfbf', 'version': '2023a'}
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/c3se-easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /apps/easybuild-easyconfigs/easybuild/easyconfigs
== 2024-11-12 11:53:10,730 easyconfig.py:2200 INFO Found loaded index for /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/files_pr21438
== 2024-11-12 11:53:10,731 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'DLPack', 'version': '0.8', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCC', 'version': '12.3.0'}
== 2024-11-12 11:53:10,731 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'py-cpuinfo', 'version': '9.0.0', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,731 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'pydantic', 'version': '2.5.3', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,731 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'tqdm', 'version': '4.66.1', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,731 easyconfig.py:2376 INFO Minimally resolving dependency {'full_mod_name': None, 'short_mod_name': None, 'name': 'libaio', 'version': '0.3.113', 'versionsuffix': '', 'toolchain': {'name': 'foss', 'version': '2023a'}, 'toolchain_inherited': True, 'system': False, 'hidden': False, 'build_only': False, 'external_module': False, 'external_module_metadata': {}} using toolchain {'name': 'GCCcore', 'version': '12.3.0'}
== 2024-11-12 11:53:10,731 easyconfig.py:869 INFO Validating easyconfig
== 2024-11-12 11:53:10,732 easyconfig.py:874 INFO Checking OS dependencies
== 2024-11-12 11:53:10,732 easyconfig.py:929 INFO OS dependencies ok: []
== 2024-11-12 11:53:10,732 easyconfig.py:879 INFO Checking skipsteps
== 2024-11-12 11:53:10,732 easyconfig.py:884 INFO Checking build option lists
== 2024-11-12 11:53:10,732 easyconfig.py:887 INFO Checking licenses
== 2024-11-12 11:53:10,733 pythonbundle.py:73 INFO Detection of downloaded extension dependencies is enabled
== 2024-11-12 11:53:10,733 pythonbundle.py:75 INFO exts_default_options: {'buildcmd': None, 'check_ldshared': None, 'download_dep_fail': True, 'install_src': None, 'install_target': 'install', 'pip_ignore_installed': True, 'pip_no_index': None, 'pip_verbose': None, 'req_py_majver': None, 'req_py_minver': None, 'max_py_majver': None, 'max_py_minver': None, 'sanity_pip_check': True, 'runtest': True, 'testinstall': False, 'unpack_sources': None, 'unversioned_packages': [], 'use_pip': True, 'use_pip_editable': False, 'use_pip_extras': None, 'use_pip_for_deps': False, 'use_pip_requirement': False, 'zipped_egg': False, 'source_urls': ['https://pypi.python.org/packages/source/%(nameletter)s/%(name)s'], 'options': {}}
== 2024-11-12 11:53:10,733 easyblock.py:4240 INFO Obtained application instance of for DeepSpeed (easyblock: PythonBundle)
== 2024-11-12 11:53:10,733 easyconfig.py:1705 INFO Generating template values...
== 2024-11-12 11:53:10,734 mpi.py:122 INFO Using template MPI command 'mpirun -n %(nr_ranks)s %(cmd)s' for MPI family 'OpenMPI'
== 2024-11-12 11:53:10,734 mpi.py:307 INFO Using MPI command template 'mpirun -n %(nr_ranks)s %(cmd)s' (params: {'nr_ranks': 1, 'cmd': 'xxx_command_xxx'})
== 2024-11-12 11:53:10,734 easyconfig.py:1724 INFO Template values: arch='x86_64', bitbucket_account='deepspeed', cuda_cc_cmake='80;86', cuda_cc_semicolon_sep='8.0;8.6', cuda_cc_space_sep='8.0 8.6', cuda_cc_space_sep_no_period='80 86', cuda_compute_capabilities='8.0,8.6', cuda_sm_comma_sep='sm_80,sm_86', cuda_sm_space_sep='sm_80 sm_86', cudamajver='12', cudaminver='1', cudashortver='12.1', cudaver='12.1.1', github_account='microsoft', module_name='DeepSpeed/0.14.5-foss-2023a-CUDA-12.1.1', mpi_cmd_prefix='mpirun -n 1', name='DeepSpeed', nameletter='D', nameletterlower='d', namelower='deepspeed', pymajver='3', pyminver='11', pyshortver='3.11', pyver='3.11.3', software_commit='', sysroot='', toolchain_name='foss', toolchain_version='2023a', version='0.14.5', version_major='0', version_major_minor='0.14', version_minor='14', versionprefix='', versionsuffix='-CUDA-12.1.1'
== 2024-11-12 11:53:10,736 one.py:180 INFO Skipping reformatting value for parameter 'toolchain'
== 2024-11-12 11:53:10,737 filetools.py:1924 INFO Creating directory /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421 (parents: True, set_gid: False, sticky: False)
== 2024-11-12 11:53:10,737 easyblock.py:4494 INFO Dumped easyconfig instance to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/DeepSpeed-0.14.5-foss-2023a-CUDA-12.1.1.eb
== 2024-11-12 11:53:10,737 filetools.py:1924 INFO Creating directory /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/easyblocks (parents: True, set_gid: False, sticky: False)
== 2024-11-12 11:53:10,739 filetools.py:2445 INFO /apps/Common/software/EasyBuild/4.9.4/lib/python3.6/site-packages/easybuild/easyblocks/generic/bundle.py copied to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/easyblocks/bundle.py
== 2024-11-12 11:53:10,739 easyblock.py:4474 INFO Dumped easyblock bundle.py required for reproduction to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/easyblocks
== 2024-11-12 11:53:10,740 filetools.py:2445 INFO /apps/Common/software/EasyBuild/4.9.4/lib/python3.6/site-packages/easybuild/easyblocks/generic/pythonbundle.py copied to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/easyblocks/pythonbundle.py
== 2024-11-12 11:53:10,740 easyblock.py:4474 INFO Dumped easyblock pythonbundle.py required for reproduction to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/easyblocks
== 2024-11-12 11:53:10,741 filetools.py:1924 INFO Creating directory /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/hooks (parents: True, set_gid: False, sticky: False)
== 2024-11-12 11:53:10,741 filetools.py:2445 INFO /apps/c3se_hooks.py copied to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/hooks/c3se_hooks.py
== 2024-11-12 11:53:10,742 easyblock.py:4506 INFO Dumped hooks file /apps/c3se_hooks.py which is (potentially) required for reproduction to /cephyr/NOBACKUP/priv/c3-staff/eb-tmp/eb-ufzq98gi/reprod_20241112115310_981421/hooks/c3se_hooks.py
== 2024-11-12 11:53:10,744 easyblock.py:2243 INFO Number of iterations to perform for central part of installation procedure: 1
== 2024-11-12 11:53:10,744 build_log.py:267 INFO building and installing DeepSpeed/0.14.5-foss-2023a-CUDA-12.1.1...
== 2024-11-12 11:53:10,745 filetools.py:1979 INFO Lock /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock exists!
== 2024-11-12 11:53:10,750 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:126 in __init__): Lock /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock already exists, aborting! (at easybuild/tools/filetools.py:2015 in check_lock)
== 2024-11-12 11:53:10,750 easyblock.py:4297 WARNING build failed (first 300 chars): Lock /apps/Test/software/.locks/_apps_Test_software_DeepSpeed_0.14.5-foss-2023a-CUDA-12.1.1.lock already exists, aborting!
== 2024-11-12 11:53:10,750 easyblock.py:326 INFO Closing log for application name DeepSpeed version 0.14.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment