This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
#SBATCH --job-name=llm-swarm | |
#SBATCH --partition hopper-prod | |
#SBATCH --gpus={{gpus}} | |
#SBATCH --cpus-per-task=12 | |
#SBATCH --mem-per-cpu=11G | |
#SBATCH -o slurm/logs/%x_%j.out | |
# See original source here: | |
# https://github.com/huggingface/llm-swarm/blob/main/templates/tgi_h100.template.slurm |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
# Source: https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a | |
/// Fix a huggingface tokenizer to which tokens have been added after training. | |
/// | |
/// Adding tokens after training via `add_special_tokens` leads to them being added to the | |
/// `added_tokens` section but not to the `model.vocab` section. This yields warnings like: | |
/// ``` | |
/// [2023-10-17T07:54:05Z WARN tokenizers::tokenizer::serialization] Warning: Token '<|empty_usable_token_space_1023|>' was expected to have ID '129023' but was given ID 'None' | |
/// ``` |
OlderNewer