Skip to content

Instantly share code, notes, and snippets.

View samos123's full-sized avatar
🎯
Focusing

Sam Stoelinga samos123

🎯
Focusing
View GitHub Profile
WARNING 09-28 20:31:17 preprocess.py:86] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
ERROR 09-28 20:31:18 async_llm_engine.py:61] Engine background task failed
ERROR 09-28 20:31:18 async_llm_engine.py:61] Traceback (most recent call last):
ERROR 09-28 20:31:18 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 51, in _log_task_completion
ERROR 09-28 20:31:18 async_llm_engine.py:61] return_value = task.result()
ERROR 09-28 20:31:18 async_llm_engine.py:61] ^^^^^^^^^^^^^
ERROR 09-28 20:31:18 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 755, in run_engine_loop
ERROR 09-28 20:31:18 async_llm_engine.py:61] result = task.result()
ERROR 09-28 20:31:18 async_llm_engine.py:61] ^^^^^^^^^^^^^
ERROR 09-28 20:31:18 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm
WARNING 09-28 19:47:27 preprocess.py:86] Falling back on <BOS> for decoder start token id because decoder start token id is not available.
ERROR 09-28 19:47:28 async_llm_engine.py:61] Engine background task failed
ERROR 09-28 19:47:28 async_llm_engine.py:61] Traceback (most recent call last):
ERROR 09-28 19:47:28 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 51, in _log_task_completion
ERROR 09-28 19:47:28 async_llm_engine.py:61] return_value = task.result()
ERROR 09-28 19:47:28 async_llm_engine.py:61] ^^^^^^^^^^^^^
ERROR 09-28 19:47:28 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 755, in run_engine_loop
ERROR 09-28 19:47:28 async_llm_engine.py:61] result = task.result()
ERROR 09-28 19:47:28 async_llm_engine.py:61] ^^^^^^^^^^^^^
ERROR 09-28 19:47:28 async_llm_engine.py:61] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm
INFO 09-28 19:29:57 api_server.py:526] vLLM API server version 0.6.1.dev238+ge2c6e0a82
INFO 09-28 19:29:57 api_server.py:527] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=True, enable_auto_tool_choice=False, tool_call_parser=None, model='neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='fp8', quantization_param_path=None, max_model_len=16384, guided_decoding_backend='outlines', distribut
jax 0.4.30
I0926 18:08:39.951711 136615241593984 trainer.py:318] gpt_trainer process 0 step 0] Training state size: 621.76 GiB
Training state size (partitioned): 12.68 GiB
Max training state size (partitioned): 12.68 GiB
I0926 18:08:40.992022 136615241593984 trainer.py:465] Starting loop...
2024-09-26 18:08:41.106644: E tensorflow/core/util/util.cc:131] oneDNN supports DT_INT64 only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.
Exception in thread gpt_trainer.checkpointer.gc_loop:
Traceback (most recent call last):
File "/root/axlearn/common/file_system.py", line 46, in _wrap_exception
yield
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 4894.0
python_gc_objects_collected_total{generation="1"} 4412.0
python_gc_objects_collected_total{generation="2"} 1244.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
spec:
args:
- --max-model-len=65536
- --max-num-batched-token=65536
- --gpu-memory-utilization=0.9
- --tensor-parallel-size=2
- --enable-prefix-caching
- --disable-log-requests
- --max-num-seqs=1024
engine: VLLM
# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 5528.0
python_gc_objects_collected_total{generation="1"} 4430.0
python_gc_objects_collected_total{generation="2"} 1244.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
@samos123
samos123 / gist:bd293d16bc8ad4be2c1f5b96539967a2
Last active August 4, 2024 06:29
benchmark vllm using openai backend and random dataset
#!/usr/bin/env bash
git clone https://github.com/vllm-project/vllm.git
cd vllm/benchmarks
python3 benchmark_serving.py --backend openai \
--base-url http://127.0.0.1:8080 \
--dataset-name=random \
--model neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 \
--seed 12345
#!/usr/bin/env bash
# source a SO question I forgot to capture link of
ffmpeg -y -i $1 \
-filter_complex "fps=5,scale=980:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=32[p];[s1][p]paletteuse=dither=bayer" \
$2
{
"classes": [
{
"class": "Instrument",
"description": "A musical instrument.",
"vectorIndexType": "hnsw",
"vectorizer": "text2vec-transformers",
"properties": [
{
"name": "name",