Skip to content

Instantly share code, notes, and snippets.

@razetime
Created May 3, 2024 15:21
Show Gist options
  • Save razetime/50b1d04a06c87b22409a98924b9f2bf7 to your computer and use it in GitHub Desktop.
Save razetime/50b1d04a06c87b22409a98924b9f2bf7 to your computer and use it in GitHub Desktop.
MLC FP8 error 1
[2024-05-02 23:38:12] INFO auto_config.py:115: Found model configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
[2024-05-02 23:38:12] INFO auto_device.py:79: Found device: cuda:0
[2024-05-02 23:38:13] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-02 23:38:14] INFO auto_device.py:88: Not found device: metal:0
[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:0
[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:1
[2024-05-02 23:38:15] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-02 23:38:15] INFO auto_device.py:35: Using device: cuda:0
[2024-05-02 23:38:15] INFO auto_weight.py:70: Finding weights in: dist/models/RedPajama-INCITE-Chat-3B-v1
[2024-05-02 23:38:15] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
[2024-05-02 23:38:15] INFO auto_weight.py:167: Not found Huggingface Safetensor
[2024-05-02 23:38:15] INFO auto_weight.py:106: Using source weight configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin. Use `--source` to override.
[2024-05-02 23:38:15] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override.
[2024-05-02 23:38:15] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
Weight conversion with arguments:
--config dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
--quantization PerTensorQuantize(name='e4m3_e4m3_f16', kind='per-tensor-quant', activation_dtype='e4m3_float8', weight_dtype='e4m3_float8', storage_dtype='e4m3_float8', model_dtype='float16', quantize_embedding=False, quantize_final_fc=False, quantize_linear=True, num_elem_per_storage=1, max_int_value=448, use_scale=False)
--model-type gpt_neox
--device cuda:0
--source dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
--source-format huggingface-torch
--output dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC
[2024-05-02 23:38:15] INFO gpt_neox_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
[2024-05-02 23:38:15] INFO gpt_neox_model.py:72: prefill_chunk_size defaults to context_window_size (2048)
Traceback (most recent call last):
File "/home/raghu/miniconda3/envs/mlc-chat-venv/bin/mlc_llm", line 33, in <module>
sys.exit(load_entry_point('mlc-llm', 'console_scripts', 'mlc_llm')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/raghu/mlc-llm/python/mlc_llm/__main__.py", line 29, in main
cli.main(sys.argv[2:])
File "/home/raghu/mlc-llm/python/mlc_llm/cli/convert_weight.py", line 87, in main
convert_weight(
File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 181, in convert_weight
_convert_args(args)
File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 68, in _convert_args
model, quantize_map = args.model.quantize[args.quantization.kind](
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'per-tensor-quant'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment