When an LLM processes a prompt, it computes a Key and Value vector for every token — the KV cache. If many requests share the same system prompt, recomputing its KV cache from scratch each time is wasteful. Radix Cache stores these computed prefixes in a Radix Tree and reuses them across requests, which is one of the main reasons SGLang achieves high throughput.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| global | |
| daemon | |
| maxconn 10000 | |
| defaults | |
| mode tcp | |
| timeout connect 5s | |
| timeout client 1h | |
| timeout server 1h |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| [package] | |
| name = "chinese_search" | |
| version = "0.1.0" | |
| edition = "2024" | |
| [dependencies] | |
| jieba-rs = "0.9.0" | |
| tantivy = "0.26.0" | |
| tantivy-jieba = "0.19.0" |
Note
(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.
Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.
- ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "architectures": [ | |
| "Qwen3ForCausalLMEagle" | |
| ], | |
| "attention_bias": false, | |
| "attention_dropout": 0.0, | |
| "bos_token_id": 151643, | |
| "eos_token_id": 151645, | |
| "head_dim": 128, | |
| "hidden_act": "silu", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Setup | |
| # conda create -n modular python=3.11 | |
| # uv pip install modular --extra-index-url https://download.pytorch.org/whl/cpu --index-url https://dl.modular.com/public/nightly/python/simple/ --index-strategy unsafe-best-match --prerelease allow | |
| # conda install -c conda-forge gcc=12.1.0 | |
| model_path = 'Qwen/Qwen2.5-0.5B' | |
| import time | |
| from max.entrypoints.llm import LLM | |
| from max.pipelines import PipelineConfig |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| window.scrollTo(0, 0) | |
| var bodyRect = document.body.getBoundingClientRect(); | |
| var items = Array.prototype.slice.call( | |
| document.querySelectorAll('*') | |
| ).map(function(element) { | |
| var rect=element.getBoundingClientRect(); | |
| return { | |
| element: element, | |
| include: (element.tagName === "BUTTON" || element.tagName === "A" || (element.onclick != null) || window.getComputedStyle(element).cursor == "pointer"), |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import torch | |
| import psutil | |
| import datasets | |
| import glob | |
| from transformers import ( | |
| AutoTokenizer, LlamaConfig, LlamaForCausalLM, Trainer, TrainingArguments, | |
| DataCollatorForLanguageModeling | |
| ) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| FROM nvcr.io/nvidia/pytorch:23.11-py3 | |
| WORKDIR /workspace | |
| RUN pip install -r r1.txt | |
| ADD requirements.txt r2.txt | |
| # FlashAttention-2 compatibility copied from https://github.com/Dao-AILab/flash-attention/issues/836#issuecomment-1951433985 | |
| RUN pip install flash-attn==2.5.1.post1 | |
| RUN apt update && apt install -y tmux git-lfs | |
| RUN pip install nvitop | |
| ADD . myproject | |
| WORKDIR /workspace/myproject |
NewerOlder

