Skip to content

Instantly share code, notes, and snippets.

@padeoe
padeoe / README_hfd.md
Last active June 19, 2025 14:08
CLI-Tool for download Huggingface models and datasets with aria2/wget: hfd

🤗Huggingface Model Downloader

Note

(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
@hanxiao
hanxiao / testRegex.js
Last active June 15, 2025 14:05
Regex for chunking by using all semantic cues
// Updated: Aug. 20, 2024
// Run: node testRegex.js whatever.txt
// Live demo: https://jina.ai/tokenizer
// LICENSE: Apache-2.0 (https://www.apache.org/licenses/LICENSE-2.0)
// COPYRIGHT: Jina AI
const fs = require('fs');
const util = require('util');
// Define variables for magic numbers
const MAX_HEADING_LENGTH = 7;