Created
May 3, 2024 15:21
-
-
Save razetime/50b1d04a06c87b22409a98924b9f2bf7 to your computer and use it in GitHub Desktop.
MLC FP8 error 1
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[2024-05-02 23:38:12] INFO auto_config.py:115: Found model configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/config.json | |
[2024-05-02 23:38:12] INFO auto_device.py:79: Found device: cuda:0 | |
[2024-05-02 23:38:13] INFO auto_device.py:88: Not found device: rocm:0 | |
[2024-05-02 23:38:14] INFO auto_device.py:88: Not found device: metal:0 | |
[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:0 | |
[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:1 | |
[2024-05-02 23:38:15] INFO auto_device.py:88: Not found device: opencl:0 | |
[2024-05-02 23:38:15] INFO auto_device.py:35: Using device: cuda:0 | |
[2024-05-02 23:38:15] INFO auto_weight.py:70: Finding weights in: dist/models/RedPajama-INCITE-Chat-3B-v1 | |
[2024-05-02 23:38:15] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin | |
[2024-05-02 23:38:15] INFO auto_weight.py:167: Not found Huggingface Safetensor | |
[2024-05-02 23:38:15] INFO auto_weight.py:106: Using source weight configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin. Use `--source` to override. | |
[2024-05-02 23:38:15] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override. | |
[2024-05-02 23:38:15] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override. | |
Weight conversion with arguments: | |
--config dist/models/RedPajama-INCITE-Chat-3B-v1/config.json | |
--quantization PerTensorQuantize(name='e4m3_e4m3_f16', kind='per-tensor-quant', activation_dtype='e4m3_float8', weight_dtype='e4m3_float8', storage_dtype='e4m3_float8', model_dtype='float16', quantize_embedding=False, quantize_final_fc=False, quantize_linear=True, num_elem_per_storage=1, max_int_value=448, use_scale=False) | |
--model-type gpt_neox | |
--device cuda:0 | |
--source dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin | |
--source-format huggingface-torch | |
--output dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC | |
[2024-05-02 23:38:15] INFO gpt_neox_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (2048) | |
[2024-05-02 23:38:15] INFO gpt_neox_model.py:72: prefill_chunk_size defaults to context_window_size (2048) | |
Traceback (most recent call last): | |
File "/home/raghu/miniconda3/envs/mlc-chat-venv/bin/mlc_llm", line 33, in <module> | |
sys.exit(load_entry_point('mlc-llm', 'console_scripts', 'mlc_llm')()) | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | |
File "/home/raghu/mlc-llm/python/mlc_llm/__main__.py", line 29, in main | |
cli.main(sys.argv[2:]) | |
File "/home/raghu/mlc-llm/python/mlc_llm/cli/convert_weight.py", line 87, in main | |
convert_weight( | |
File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 181, in convert_weight | |
_convert_args(args) | |
File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 68, in _convert_args | |
model, quantize_map = args.model.quantize[args.quantization.kind]( | |
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^ | |
KeyError: 'per-tensor-quant' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment