Daniel Barker dcbark01

18 followers · 26 following

Houston, TX
https://www.linkedin.com/in/danielchristopherbarker

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

dcbark01 / tgi_api.sh

Last active September 17, 2024 22:50

Huggingface Text Generation Inference SLURM

	#!/bin/bash
	#SBATCH --job-name=llm-swarm
	#SBATCH --partition hopper-prod
	#SBATCH --gpus={{gpus}}
	#SBATCH --cpus-per-task=12
	#SBATCH --mem-per-cpu=11G
	#SBATCH -o slurm/logs/%x_%j.out

	# See original source here:
	# https://github.com/huggingface/llm-swarm/blob/main/templates/tgi_h100.template.slurm

dcbark01 / fix_tokenizer.py

Last active June 17, 2024 14:03

Fix tokenizer

	"""
	# Source: https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a

	/// Fix a huggingface tokenizer to which tokens have been added after training.
	///
	/// Adding tokens after training via `add_special_tokens` leads to them being added to the
	/// `added_tokens` section but not to the `model.vocab` section. This yields warnings like:
	/// ```
	/// [2023-10-17T07:54:05Z WARN tokenizers::tokenizer::serialization] Warning: Token '<\|empty_usable_token_space_1023\|>' was expected to have ID '129023' but was given ID 'None'
	/// ```

OlderNewer