razetime · May 3, 2024 15:21
diff --git a/mlc_error_fp8_1 b/mlc_error_fp8_1
 [2024-05-02 23:38:12] INFO auto_config.py:115: Found model configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
 [2024-05-02 23:38:12] INFO auto_device.py:79: Found device: cuda:0
 [2024-05-02 23:38:13] INFO auto_device.py:88: Not found device: rocm:0
 [2024-05-02 23:38:14] INFO auto_device.py:88: Not found device: metal:0
 [2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:0
 [2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:1
 [2024-05-02 23:38:15] INFO auto_device.py:88: Not found device: opencl:0
 [2024-05-02 23:38:15] INFO auto_device.py:35: Using device: cuda:0
 [2024-05-02 23:38:15] INFO auto_weight.py:70: Finding weights in: dist/models/RedPajama-INCITE-Chat-3B-v1
 [2024-05-02 23:38:15] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
 [2024-05-02 23:38:15] INFO auto_weight.py:167: Not found Huggingface Safetensor
 [2024-05-02 23:38:15] INFO auto_weight.py:106: Using source weight configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin. Use `--source` to override.
 [2024-05-02 23:38:15] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override.
 [2024-05-02 23:38:15] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
 Weight conversion with arguments:
  --config          dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
  --quantization    PerTensorQuantize(name='e4m3_e4m3_f16', kind='per-tensor-quant', activation_dtype='e4m3_float8', weight_dtype='e4m3_float8', storage_dtype='e4m3_float8', model_dtype='float16', quantize_embedding=False, quantize_final_fc=False, quantize_linear=True, num_elem_per_storage=1, max_int_value=448, use_scale=False)
  --model-type      gpt_neox
  --device          cuda:0
  --source          dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
  --source-format   huggingface-torch
  --output          dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC
 [2024-05-02 23:38:15] INFO gpt_neox_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
 [2024-05-02 23:38:15] INFO gpt_neox_model.py:72: prefill_chunk_size defaults to context_window_size (2048)
 Traceback (most recent call last):
  File "/home/raghu/miniconda3/envs/mlc-chat-venv/bin/mlc_llm", line 33, in <module>
    sys.exit(load_entry_point('mlc-llm', 'console_scripts', 'mlc_llm')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raghu/mlc-llm/python/mlc_llm/__main__.py", line 29, in main
    cli.main(sys.argv[2:])
  File "/home/raghu/mlc-llm/python/mlc_llm/cli/convert_weight.py", line 87, in main
    convert_weight(
  File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 181, in convert_weight
    _convert_args(args)
  File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 68, in _convert_args
    model, quantize_map = args.model.quantize[args.quantization.kind](
                          ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
 KeyError: 'per-tensor-quant'
	[2024-05-02 23:38:12] INFO auto_config.py:115: Found model configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
	[2024-05-02 23:38:12] INFO auto_device.py:79: Found device: cuda:0
	[2024-05-02 23:38:13] INFO auto_device.py:88: Not found device: rocm:0
	[2024-05-02 23:38:14] INFO auto_device.py:88: Not found device: metal:0
	[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:0
	[2024-05-02 23:38:14] INFO auto_device.py:79: Found device: vulkan:1
	[2024-05-02 23:38:15] INFO auto_device.py:88: Not found device: opencl:0
	[2024-05-02 23:38:15] INFO auto_device.py:35: Using device: cuda:0
	[2024-05-02 23:38:15] INFO auto_weight.py:70: Finding weights in: dist/models/RedPajama-INCITE-Chat-3B-v1
	[2024-05-02 23:38:15] INFO auto_weight.py:129: Found source weight format: huggingface-torch. Source configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
	[2024-05-02 23:38:15] INFO auto_weight.py:167: Not found Huggingface Safetensor
	[2024-05-02 23:38:15] INFO auto_weight.py:106: Using source weight configuration: dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin. Use `--source` to override.
	[2024-05-02 23:38:15] INFO auto_weight.py:110: Using source weight format: huggingface-torch. Use `--source-format` to override.
	[2024-05-02 23:38:15] INFO auto_config.py:153: Found model type: gpt_neox. Use `--model-type` to override.
	Weight conversion with arguments:
	--config dist/models/RedPajama-INCITE-Chat-3B-v1/config.json
	--quantization PerTensorQuantize(name='e4m3_e4m3_f16', kind='per-tensor-quant', activation_dtype='e4m3_float8', weight_dtype='e4m3_float8', storage_dtype='e4m3_float8', model_dtype='float16', quantize_embedding=False, quantize_final_fc=False, quantize_linear=True, num_elem_per_storage=1, max_int_value=448, use_scale=False)
	--model-type gpt_neox
	--device cuda:0
	--source dist/models/RedPajama-INCITE-Chat-3B-v1/pytorch_model.bin
	--source-format huggingface-torch
	--output dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC
	[2024-05-02 23:38:15] INFO gpt_neox_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (2048)
	[2024-05-02 23:38:15] INFO gpt_neox_model.py:72: prefill_chunk_size defaults to context_window_size (2048)
	Traceback (most recent call last):
	File "/home/raghu/miniconda3/envs/mlc-chat-venv/bin/mlc_llm", line 33, in <module>
	sys.exit(load_entry_point('mlc-llm', 'console_scripts', 'mlc_llm')())
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/home/raghu/mlc-llm/python/mlc_llm/__main__.py", line 29, in main
	cli.main(sys.argv[2:])
	File "/home/raghu/mlc-llm/python/mlc_llm/cli/convert_weight.py", line 87, in main
	convert_weight(
	File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 181, in convert_weight
	_convert_args(args)
	File "/home/raghu/mlc-llm/python/mlc_llm/interface/convert_weight.py", line 68, in _convert_args
	model, quantize_map = args.model.quantize[args.quantization.kind](
	~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
	KeyError: 'per-tensor-quant'