Junbum Lee Beomi

🤗Huggingface Model Downloader

Note

(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.

Features

⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.

Building fontconfig

Start up a lambda-like docker container:

docker run -i -t -v /tmp:/var/task lambci/lambda:build /bin/bash

Install some dependencies inside the container:

yum install gperf freetype-devel libxml2-devel git libtool -y

easy_install pip

	A::B is a system with 4 tokens: `A#`, `#A`, `B#` and `#B`.

	An A::B program is a sequence of tokens. Example:

	B# A# #B #A B#

	To compute a program, we must rewrite neighbor tokens, using the rules:

	A# #A ... becomes ... nothing
	A# #B ... becomes ... #B A#

	# %%
	# this is run from /notebooks on paperspace
	from huggingface_hub import login
	from dotenv import load_dotenv

	load_dotenv("/notebooks/.env")
	import os
	os.environ["TOKENIZERS_PARALLELISM"]="false"
	login(token=os.getenv("HUGGINGFACE_TOKEN"))

	"""
	Demo script for writing a pandas data frame to a CSV file on S3 using s3fs-supported pandas APIs
	"""

	import os

	import pandas as pd

	AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET")
	AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")

	set backspace=2
	set autoindent
	set smartindent
	set expandtab
	set et
	set tabstop=4
	set shiftwidth=4
	set showmatch
	set ignorecase
	set incsearch

	def plot_multi(data, cols=None, spacing=.1, **kwargs):

	from pandas import plotting

	# Get default color style from pandas - can be changed to any other color list
	if cols is None: cols = data.columns
	if len(cols) == 0: return
	# colors = getattr(getattr(plotting, '_style'), '_get_standard_colors')(num_colors=len(cols))
	# colors = [rand_color.generate() for _ in range(len(cols))]
	# colors = rand_color.generate(hue="blue", count=len(cols))

	# Sorry for the scrappiness of this. It's supposed to be scrappy.

	import time
	from botocore.exceptions import ClientError
	import boto3
	import boto3.session
	# boto3.set_stream_logger('')

	session = boto3.session.Session()
	s3 = session.client('s3')