Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jnory/bf7d94069223902c252d03799d11f85f to your computer and use it in GitHub Desktop.
Save jnory/bf7d94069223902c252d03799d11f85f to your computer and use it in GitHub Desktop.
無料版colabのGPUなしランタイムでrinna modelを動かす
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "code",
"source": [
"! pip install transformers sentencepiece"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oj1x8I_OY-S5",
"outputId": "8af95a6e-d5bb-42ce-d3ae-b27e9c85d9a0"
},
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.30.2)\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m19.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.12.2)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.15.1)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.13.3)\n",
"Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.3.1)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers) (2023.6.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers) (4.6.3)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (1.26.16)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2023.5.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.4)\n",
"Installing collected packages: sentencepiece\n",
"Successfully installed sentencepiece-0.1.99\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"import torch"
],
"metadata": {
"id": "soFuW7XtURV5"
},
"execution_count": 7,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"必要なデータのclone"
],
"metadata": {
"id": "r9SQh9ukaviw"
}
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BIspRbccP4zr",
"outputId": "38027f08-6c4b-49f6-b8dc-8dd44d2b7c3c"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"WARNING: 'git lfs clone' is deprecated and will not be updated\n",
" with new flags from 'git clone'\n",
"\n",
"'git clone' has been updated in upstream Git to have comparable\n",
"speeds to 'git lfs clone'.\n",
"Cloning into 'japanese-gpt-neox-3.6b-instruction-ppo'...\n",
"remote: Enumerating objects: 10, done.\u001b[K\n",
"remote: Counting objects: 100% (10/10), done.\u001b[K\n",
"remote: Compressing objects: 100% (10/10), done.\u001b[K\n",
"remote: Total 10 (delta 0), reused 9 (delta 0), pack-reused 0\u001b[K\n",
"Unpacking objects: 100% (10/10), 287.66 KiB | 8.72 MiB/s, done.\n"
]
}
],
"source": [
"!git lfs clone --depth=1 https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo"
]
},
{
"cell_type": "code",
"source": [
"! ls -lh japanese-gpt-neox-3.6b-instruction-ppo/"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ujs2uQPlT4qD",
"outputId": "c3c5bac6-c764-4634-b82c-358dedfba979"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"total 6.9G\n",
"-rw-r--r-- 1 root root 534 Jul 4 14:32 config.json\n",
"-rw-r--r-- 1 root root 6.9G Jul 4 14:35 pytorch_model.bin\n",
"-rw-r--r-- 1 root root 8.5K Jul 4 14:32 README.md\n",
"-rw-r--r-- 1 root root 59K Jul 4 14:32 rinna.png\n",
"-rw-r--r-- 1 root root 768K Jul 4 14:32 spiece.model\n",
"-rw-r--r-- 1 root root 561K Jul 4 14:32 spiece.vocab\n",
"-rw-r--r-- 1 root root 284 Jul 4 14:32 tokenizer_config.json\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!git clone https://github.com/togethercomputer/redpajama.cpp.git"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FNZPkGD-YhrN",
"outputId": "c1a8c687-49a7-4429-f6c8-cea68948339f"
},
"execution_count": 15,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'redpajama.cpp'...\n",
"remote: Enumerating objects: 3014, done.\u001b[K\n",
"remote: Counting objects: 100% (3014/3014), done.\u001b[K\n",
"remote: Compressing objects: 100% (991/991), done.\u001b[K\n",
"remote: Total 3014 (delta 2015), reused 2929 (delta 1988), pack-reused 0\u001b[K\n",
"Receiving objects: 100% (3014/3014), 2.74 MiB | 14.47 MiB/s, done.\n",
"Resolving deltas: 100% (2015/2015), done.\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"# ! rm -r redpajama.cpp"
],
"metadata": {
"id": "2ePcy8xPebsv"
},
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# ! cd redpajama.cpp && git diff > /content/redpajama.patch"
],
"metadata": {
"id": "-mBaJt32Yhop"
},
"execution_count": 13,
"outputs": []
},
{
"cell_type": "code",
"source": [
"%%shell\n",
"\n",
"pushd redpajama.cpp >& /dev/null\n",
"\n",
"git rev-parse HEAD\n",
"\n",
"cat <<EOD > redpajama.patch\n",
"diff --git a/examples/redpajama/scripts/convert_gptneox_to_ggml.py b/examples/redpajama/scripts/convert_gptneox_to_ggml.py\n",
"index 6a32942..4a552fa 100644\n",
"--- a/examples/redpajama/scripts/convert_gptneox_to_ggml.py\n",
"+++ b/examples/redpajama/scripts/convert_gptneox_to_ggml.py\n",
"@@ -9,7 +9,7 @@ import code\n",
" import torch\n",
" import numpy as np\n",
"\n",
"-from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"+from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig\n",
"\n",
" # ref: https://github.com/openai/gpt-2/blob/master/src/encoder.py\n",
" def bytes_to_unicode():\n",
"@@ -59,12 +59,12 @@ if len(sys.argv) > 3:\n",
"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
" print(\"Loading model: \", model_name)\n",
"-model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if ftype == 1 else torch.float32,\n",
"- cache_dir=model_cache_dir)\n",
"-model.eval()\n",
"-for p in model.parameters():\n",
"- p.requires_grad = False\n",
"-hparams = model.config.to_dict()\n",
"+config = AutoConfig.from_pretrained(model_name)\n",
"+\n",
"+with open(os.path.join(model_name, \"pytorch_model.bin\"), \"rb\") as fp:\n",
"+ list_vars = torch.load(fp, map_location=torch.device('cpu'))\n",
"+\n",
"+hparams = config.to_dict()\n",
" print(\"Model loaded: \", model_name)\n",
"\n",
" fn_bin = f\"/ggml-{model_name.split('/')[-1]}-{ftype_str[ftype]}.bin\"\n",
"@@ -94,8 +94,6 @@ for i in range(hparams[\"vocab_size\"]):\n",
" fout.write(struct.pack(\"i\", len(text)))\n",
" fout.write(text)\n",
"\n",
"-list_vars = model.state_dict()\n",
"-\n",
" print(hparams)\n",
"\n",
" for name in list_vars.keys():\n",
"@@ -110,8 +108,7 @@ for name in list_vars.keys():\n",
" nn = name\n",
"\n",
" print(src, ' -> ', name)\n",
"- data = list_vars[src].squeeze().numpy()\n",
"- data = data.astype(np.float32)\n",
"+ data = list_vars[src].squeeze().to(torch.float32).numpy()\n",
"\n",
" n_dims = len(data.shape)\n",
" print(name, n_dims, data.shape)\n",
"EOD\n",
"\n",
"patch -p1 < redpajama.patch\n",
"\n",
"popd >& /dev/null"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "L7ISe-u3Yhmd",
"outputId": "a87c34f0-a913-458a-ecc3-00c634bd91cf"
},
"execution_count": 17,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"b9e0389a8fee1a5a8fce1a58e5184194990308bd\n",
"patching file examples/redpajama/scripts/convert_gptneox_to_ggml.py\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": []
},
"metadata": {},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"source": [
"%%shell\n",
"\n",
"pushd redpajama.cpp/examples/redpajama/scripts/\n",
"\n",
"python -u ./convert_gptneox_to_ggml.py /content/japanese-gpt-neox-3.6b-instruction-ppo /content/rinna_model_ggml\n",
"\n",
"echo $?\n",
"\n",
"popd"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "V-8yZlOXYhij",
"outputId": "3f3648c1-b119-45f9-b61d-67e47c2120bc"
},
"execution_count": 18,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"/content/redpajama.cpp/examples/redpajama/scripts /content\n",
"/usr/local/lib/python3.10/dist-packages/transformers/convert_slow_tokenizer.py:454: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.\n",
" warnings.warn(\n",
"Loading model: /content/japanese-gpt-neox-3.6b-instruction-ppo\n",
"Model loaded: /content/japanese-gpt-neox-3.6b-instruction-ppo\n",
"2023-07-04 15:17:44.956308: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"{'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'float16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': False, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'chunk_size_feed_forward': 0, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['GPTNeoXForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 2, 'pad_token_id': None, 'eos_token_id': 3, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': '/content/japanese-gpt-neox-3.6b-instruction-ppo', 'transformers_version': '4.30.2', 'model_type': 'gpt_neox', 'vocab_size': 32000, 'max_position_embeddings': 2048, 'hidden_size': 2816, 'num_hidden_layers': 36, 'num_attention_heads': 22, 'intermediate_size': 11264, 'hidden_act': 'gelu', 'rotary_pct': 1.0, 'rotary_emb_base': 10000, 'classifier_dropout': 0.1, 'initializer_range': 0.02, 'layer_norm_eps': 1e-05, 'use_cache': True, 'use_parallel_residual': False, 'multiple_of': 1}\n",
"gpt_neox.embed_in.weight -> gpt_neox.embed_in.weight\n",
"gpt_neox.embed_in.weight 2 (32000, 2816)\n",
" Converting to float16 (32000, 2816) [[0.012939453125, -0.00921630859375, 0.0284423828125], [0.0036163330078125, 0.00927734375, 0.0118408203125], [0.00494384765625, -0.029541015625, -0.017333984375]]\n",
"b'gpt_neox.embed_in.weight'\n",
"gpt_neox.layers.0.input_layernorm.weight -> gpt_neox.layers.0.input_layernorm.weight\n",
"gpt_neox.layers.0.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.166015625, 0.0654296875, 0.1953125]\n",
"b'gpt_neox.layers.0.input_layernorm.weight'\n",
"gpt_neox.layers.0.input_layernorm.bias -> gpt_neox.layers.0.input_layernorm.bias\n",
"gpt_neox.layers.0.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.017822265625, -0.02001953125, 0.0294189453125]\n",
"b'gpt_neox.layers.0.input_layernorm.bias'\n",
"gpt_neox.layers.0.post_attention_layernorm.weight -> gpt_neox.layers.0.post_attention_layernorm.weight\n",
"gpt_neox.layers.0.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.796875, 0.83203125, 0.8515625]\n",
"b'gpt_neox.layers.0.post_attention_layernorm.weight'\n",
"gpt_neox.layers.0.post_attention_layernorm.bias -> gpt_neox.layers.0.post_attention_layernorm.bias\n",
"gpt_neox.layers.0.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0074462890625, 0.01556396484375, -0.0164794921875]\n",
"b'gpt_neox.layers.0.post_attention_layernorm.bias'\n",
"gpt_neox.layers.0.attention.query_key_value.weight -> gpt_neox.layers.0.attention.query_key_value.weight\n",
"gpt_neox.layers.0.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.004974365234375, 0.08935546875, -0.10400390625], [0.056884765625, -0.01129150390625, -0.04345703125], [0.008544921875, -0.056640625, 0.08056640625]]\n",
"b'gpt_neox.layers.0.attention.query_key_value.weight'\n",
"gpt_neox.layers.0.attention.query_key_value.bias -> gpt_neox.layers.0.attention.query_key_value.bias\n",
"gpt_neox.layers.0.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.06005859375, -0.1484375, 0.162109375]\n",
"b'gpt_neox.layers.0.attention.query_key_value.bias'\n",
"gpt_neox.layers.0.attention.dense.weight -> gpt_neox.layers.0.attention.dense.weight\n",
"gpt_neox.layers.0.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0084228515625, -0.038330078125, 0.01092529296875], [-0.01495361328125, 0.0022430419921875, 0.01165771484375], [0.00799560546875, 0.0012969970703125, -0.01495361328125]]\n",
"b'gpt_neox.layers.0.attention.dense.weight'\n",
"gpt_neox.layers.0.attention.dense.bias -> gpt_neox.layers.0.attention.dense.bias\n",
"gpt_neox.layers.0.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01513671875, 0.01324462890625, -0.027099609375]\n",
"b'gpt_neox.layers.0.attention.dense.bias'\n",
"gpt_neox.layers.0.mlp.dense_h_to_4h.weight -> gpt_neox.layers.0.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.0.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.029052734375, -0.0059814453125, -0.003082275390625], [-0.007568359375, -0.0036773681640625, 0.0888671875], [0.048828125, -0.0186767578125, -0.0038909912109375]]\n",
"b'gpt_neox.layers.0.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.0.mlp.dense_h_to_4h.bias -> gpt_neox.layers.0.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.0.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.0027618408203125, -0.019775390625, -0.01263427734375]\n",
"b'gpt_neox.layers.0.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.0.mlp.dense_4h_to_h.weight -> gpt_neox.layers.0.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.0.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.0242919921875, -0.00811767578125, 0.057373046875], [0.005462646484375, -0.01129150390625, -0.029541015625], [-0.031982421875, 0.006378173828125, -0.01385498046875]]\n",
"b'gpt_neox.layers.0.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.0.mlp.dense_4h_to_h.bias -> gpt_neox.layers.0.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.0.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0037384033203125, -0.023681640625, -0.00738525390625]\n",
"b'gpt_neox.layers.0.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.1.input_layernorm.weight -> gpt_neox.layers.1.input_layernorm.weight\n",
"gpt_neox.layers.1.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.73046875, 0.71484375, 0.76953125]\n",
"b'gpt_neox.layers.1.input_layernorm.weight'\n",
"gpt_neox.layers.1.input_layernorm.bias -> gpt_neox.layers.1.input_layernorm.bias\n",
"gpt_neox.layers.1.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0302734375, 0.08837890625, 0.020263671875]\n",
"b'gpt_neox.layers.1.input_layernorm.bias'\n",
"gpt_neox.layers.1.post_attention_layernorm.weight -> gpt_neox.layers.1.post_attention_layernorm.weight\n",
"gpt_neox.layers.1.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.90625, 0.875, 0.953125]\n",
"b'gpt_neox.layers.1.post_attention_layernorm.weight'\n",
"gpt_neox.layers.1.post_attention_layernorm.bias -> gpt_neox.layers.1.post_attention_layernorm.bias\n",
"gpt_neox.layers.1.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03125, 0.026123046875, -0.06298828125]\n",
"b'gpt_neox.layers.1.post_attention_layernorm.bias'\n",
"gpt_neox.layers.1.attention.query_key_value.weight -> gpt_neox.layers.1.attention.query_key_value.weight\n",
"gpt_neox.layers.1.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.033203125, -0.01214599609375, 0.032470703125], [-0.0250244140625, 0.0546875, 0.0177001953125], [0.01190185546875, -0.016845703125, 0.0181884765625]]\n",
"b'gpt_neox.layers.1.attention.query_key_value.weight'\n",
"gpt_neox.layers.1.attention.query_key_value.bias -> gpt_neox.layers.1.attention.query_key_value.bias\n",
"gpt_neox.layers.1.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.06787109375, -0.006744384765625, 0.0250244140625]\n",
"b'gpt_neox.layers.1.attention.query_key_value.bias'\n",
"gpt_neox.layers.1.attention.dense.weight -> gpt_neox.layers.1.attention.dense.weight\n",
"gpt_neox.layers.1.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.00738525390625, 0.02783203125, -0.0037841796875], [-0.0047607421875, -0.031005859375, 0.0026702880859375], [-0.0400390625, 0.00640869140625, 0.007232666015625]]\n",
"b'gpt_neox.layers.1.attention.dense.weight'\n",
"gpt_neox.layers.1.attention.dense.bias -> gpt_neox.layers.1.attention.dense.bias\n",
"gpt_neox.layers.1.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.032958984375, -0.0240478515625, -0.0223388671875]\n",
"b'gpt_neox.layers.1.attention.dense.bias'\n",
"gpt_neox.layers.1.mlp.dense_h_to_4h.weight -> gpt_neox.layers.1.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.1.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.02490234375, 0.0164794921875, -0.049072265625], [-0.043212890625, -0.016845703125, 0.01397705078125], [0.01611328125, 0.06640625, -0.0047607421875]]\n",
"b'gpt_neox.layers.1.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.1.mlp.dense_h_to_4h.bias -> gpt_neox.layers.1.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.1.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.018310546875, -0.0169677734375, -0.011474609375]\n",
"b'gpt_neox.layers.1.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.1.mlp.dense_4h_to_h.weight -> gpt_neox.layers.1.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.1.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.0234375, 0.040771484375, 0.048583984375], [0.0322265625, 0.00592041015625, 0.01171875], [-0.01165771484375, 0.0257568359375, -0.0184326171875]]\n",
"b'gpt_neox.layers.1.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.1.mlp.dense_4h_to_h.bias -> gpt_neox.layers.1.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.1.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0279541015625, -0.0205078125, 0.000751495361328125]\n",
"b'gpt_neox.layers.1.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.2.input_layernorm.weight -> gpt_neox.layers.2.input_layernorm.weight\n",
"gpt_neox.layers.2.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.67578125, 0.69921875, 0.69140625]\n",
"b'gpt_neox.layers.2.input_layernorm.weight'\n",
"gpt_neox.layers.2.input_layernorm.bias -> gpt_neox.layers.2.input_layernorm.bias\n",
"gpt_neox.layers.2.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.038330078125, 0.10009765625, 0.0086669921875]\n",
"b'gpt_neox.layers.2.input_layernorm.bias'\n",
"gpt_neox.layers.2.post_attention_layernorm.weight -> gpt_neox.layers.2.post_attention_layernorm.weight\n",
"gpt_neox.layers.2.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.95703125, 0.95703125, 0.94140625]\n",
"b'gpt_neox.layers.2.post_attention_layernorm.weight'\n",
"gpt_neox.layers.2.post_attention_layernorm.bias -> gpt_neox.layers.2.post_attention_layernorm.bias\n",
"gpt_neox.layers.2.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.04443359375, -0.00225830078125, -0.01116943359375]\n",
"b'gpt_neox.layers.2.post_attention_layernorm.bias'\n",
"gpt_neox.layers.2.attention.query_key_value.weight -> gpt_neox.layers.2.attention.query_key_value.weight\n",
"gpt_neox.layers.2.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.00024127960205078125, -0.07080078125, -0.037841796875], [-0.000576019287109375, 0.01165771484375, 0.019287109375], [0.0283203125, -0.00457763671875, 0.0281982421875]]\n",
"b'gpt_neox.layers.2.attention.query_key_value.weight'\n",
"gpt_neox.layers.2.attention.query_key_value.bias -> gpt_neox.layers.2.attention.query_key_value.bias\n",
"gpt_neox.layers.2.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.00469970703125, -0.0111083984375, 0.00616455078125]\n",
"b'gpt_neox.layers.2.attention.query_key_value.bias'\n",
"gpt_neox.layers.2.attention.dense.weight -> gpt_neox.layers.2.attention.dense.weight\n",
"gpt_neox.layers.2.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.007171630859375, -0.0225830078125, -0.00262451171875], [0.0108642578125, -0.03759765625, 0.007110595703125], [-0.00531005859375, 0.013916015625, -0.00445556640625]]\n",
"b'gpt_neox.layers.2.attention.dense.weight'\n",
"gpt_neox.layers.2.attention.dense.bias -> gpt_neox.layers.2.attention.dense.bias\n",
"gpt_neox.layers.2.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03564453125, -0.0247802734375, 0.0022735595703125]\n",
"b'gpt_neox.layers.2.attention.dense.bias'\n",
"gpt_neox.layers.2.mlp.dense_h_to_4h.weight -> gpt_neox.layers.2.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.2.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.023193359375, 0.01953125, 0.016845703125], [0.02783203125, -0.002197265625, 0.01318359375], [-0.0206298828125, -0.0146484375, -0.0101318359375]]\n",
"b'gpt_neox.layers.2.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.2.mlp.dense_h_to_4h.bias -> gpt_neox.layers.2.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.2.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0035247802734375, -0.006744384765625, -0.01361083984375]\n",
"b'gpt_neox.layers.2.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.2.mlp.dense_4h_to_h.weight -> gpt_neox.layers.2.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.2.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[4.4345855712890625e-05, 0.00537109375, 0.04150390625], [-0.0189208984375, 0.00738525390625, 0.07470703125], [0.00946044921875, -0.037841796875, -0.01226806640625]]\n",
"b'gpt_neox.layers.2.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.2.mlp.dense_4h_to_h.bias -> gpt_neox.layers.2.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.2.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.024658203125, -0.02001953125, 0.00921630859375]\n",
"b'gpt_neox.layers.2.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.3.input_layernorm.weight -> gpt_neox.layers.3.input_layernorm.weight\n",
"gpt_neox.layers.3.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.765625, 0.76953125, 0.83984375]\n",
"b'gpt_neox.layers.3.input_layernorm.weight'\n",
"gpt_neox.layers.3.input_layernorm.bias -> gpt_neox.layers.3.input_layernorm.bias\n",
"gpt_neox.layers.3.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.037109375, 0.0693359375, 0.0250244140625]\n",
"b'gpt_neox.layers.3.input_layernorm.bias'\n",
"gpt_neox.layers.3.post_attention_layernorm.weight -> gpt_neox.layers.3.post_attention_layernorm.weight\n",
"gpt_neox.layers.3.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9296875, 1.0078125, 0.91796875]\n",
"b'gpt_neox.layers.3.post_attention_layernorm.weight'\n",
"gpt_neox.layers.3.post_attention_layernorm.bias -> gpt_neox.layers.3.post_attention_layernorm.bias\n",
"gpt_neox.layers.3.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.02294921875, -0.0400390625, 0.0230712890625]\n",
"b'gpt_neox.layers.3.post_attention_layernorm.bias'\n",
"gpt_neox.layers.3.attention.query_key_value.weight -> gpt_neox.layers.3.attention.query_key_value.weight\n",
"gpt_neox.layers.3.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.047607421875, -0.0279541015625, -0.00506591796875], [-0.007720947265625, -0.0341796875, 0.00677490234375], [-0.01361083984375, 0.0123291015625, -0.0186767578125]]\n",
"b'gpt_neox.layers.3.attention.query_key_value.weight'\n",
"gpt_neox.layers.3.attention.query_key_value.bias -> gpt_neox.layers.3.attention.query_key_value.bias\n",
"gpt_neox.layers.3.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0028076171875, -0.01068115234375, -0.005157470703125]\n",
"b'gpt_neox.layers.3.attention.query_key_value.bias'\n",
"gpt_neox.layers.3.attention.dense.weight -> gpt_neox.layers.3.attention.dense.weight\n",
"gpt_neox.layers.3.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.0262451171875, -0.0015411376953125, -0.0023040771484375], [0.01055908203125, 0.021240234375, -0.00543212890625], [-0.040771484375, 0.01080322265625, -0.0146484375]]\n",
"b'gpt_neox.layers.3.attention.dense.weight'\n",
"gpt_neox.layers.3.attention.dense.bias -> gpt_neox.layers.3.attention.dense.bias\n",
"gpt_neox.layers.3.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03759765625, -0.02099609375, -0.00244140625]\n",
"b'gpt_neox.layers.3.attention.dense.bias'\n",
"gpt_neox.layers.3.mlp.dense_h_to_4h.weight -> gpt_neox.layers.3.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.3.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.045166015625, 0.0027008056640625, 0.033447265625], [0.05078125, -0.01177978515625, -0.0021514892578125], [0.01507568359375, 0.050537109375, 0.05224609375]]\n",
"b'gpt_neox.layers.3.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.3.mlp.dense_h_to_4h.bias -> gpt_neox.layers.3.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.3.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.0024566650390625, -0.0130615234375, -0.0181884765625]\n",
"b'gpt_neox.layers.3.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.3.mlp.dense_4h_to_h.weight -> gpt_neox.layers.3.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.3.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.00457763671875, 0.005706787109375, -0.041015625], [-0.041259765625, -0.01031494140625, -0.01190185546875], [-0.026123046875, -0.0286865234375, -0.0172119140625]]\n",
"b'gpt_neox.layers.3.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.3.mlp.dense_4h_to_h.bias -> gpt_neox.layers.3.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.3.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.026123046875, -0.01068115234375, -0.006622314453125]\n",
"b'gpt_neox.layers.3.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.4.input_layernorm.weight -> gpt_neox.layers.4.input_layernorm.weight\n",
"gpt_neox.layers.4.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.8984375, 0.859375, 0.95703125]\n",
"b'gpt_neox.layers.4.input_layernorm.weight'\n",
"gpt_neox.layers.4.input_layernorm.bias -> gpt_neox.layers.4.input_layernorm.bias\n",
"gpt_neox.layers.4.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.046142578125, 0.05419921875, 0.02294921875]\n",
"b'gpt_neox.layers.4.input_layernorm.bias'\n",
"gpt_neox.layers.4.post_attention_layernorm.weight -> gpt_neox.layers.4.post_attention_layernorm.weight\n",
"gpt_neox.layers.4.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.95703125, 1.015625, 0.90234375]\n",
"b'gpt_neox.layers.4.post_attention_layernorm.weight'\n",
"gpt_neox.layers.4.post_attention_layernorm.bias -> gpt_neox.layers.4.post_attention_layernorm.bias\n",
"gpt_neox.layers.4.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.033447265625, -0.055908203125, -0.0262451171875]\n",
"b'gpt_neox.layers.4.post_attention_layernorm.bias'\n",
"gpt_neox.layers.4.attention.query_key_value.weight -> gpt_neox.layers.4.attention.query_key_value.weight\n",
"gpt_neox.layers.4.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.04150390625, -0.005767822265625, -0.007080078125], [-0.0361328125, 0.013671875, -0.0286865234375], [0.035400390625, -0.10205078125, 0.0216064453125]]\n",
"b'gpt_neox.layers.4.attention.query_key_value.weight'\n",
"gpt_neox.layers.4.attention.query_key_value.bias -> gpt_neox.layers.4.attention.query_key_value.bias\n",
"gpt_neox.layers.4.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0123291015625, 0.004119873046875, -0.0093994140625]\n",
"b'gpt_neox.layers.4.attention.query_key_value.bias'\n",
"gpt_neox.layers.4.attention.dense.weight -> gpt_neox.layers.4.attention.dense.weight\n",
"gpt_neox.layers.4.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.01300048828125, 0.007476806640625, 0.0107421875], [0.03369140625, -0.0076904296875, -0.007293701171875], [-0.0179443359375, -0.02783203125, -0.0234375]]\n",
"b'gpt_neox.layers.4.attention.dense.weight'\n",
"gpt_neox.layers.4.attention.dense.bias -> gpt_neox.layers.4.attention.dense.bias\n",
"gpt_neox.layers.4.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.033447265625, -0.0213623046875, -5.173683166503906e-05]\n",
"b'gpt_neox.layers.4.attention.dense.bias'\n",
"gpt_neox.layers.4.mlp.dense_h_to_4h.weight -> gpt_neox.layers.4.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.4.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.091796875, -0.042724609375, -0.06396484375], [0.03173828125, -0.025634765625, 0.01220703125], [0.07080078125, -0.0146484375, -0.032470703125]]\n",
"b'gpt_neox.layers.4.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.4.mlp.dense_h_to_4h.bias -> gpt_neox.layers.4.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.4.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0191650390625, -0.0079345703125, -0.0137939453125]\n",
"b'gpt_neox.layers.4.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.4.mlp.dense_4h_to_h.weight -> gpt_neox.layers.4.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.4.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.024658203125, 0.01031494140625, -0.0079345703125], [-0.0498046875, -0.01416015625, -0.0634765625], [-0.004364013671875, 0.0380859375, -0.00174713134765625]]\n",
"b'gpt_neox.layers.4.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.4.mlp.dense_4h_to_h.bias -> gpt_neox.layers.4.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.4.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0189208984375, -0.003753662109375, 0.006927490234375]\n",
"b'gpt_neox.layers.4.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.5.input_layernorm.weight -> gpt_neox.layers.5.input_layernorm.weight\n",
"gpt_neox.layers.5.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.8359375, 0.8359375, 0.85546875]\n",
"b'gpt_neox.layers.5.input_layernorm.weight'\n",
"gpt_neox.layers.5.input_layernorm.bias -> gpt_neox.layers.5.input_layernorm.bias\n",
"gpt_neox.layers.5.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.037841796875, 0.05322265625, 0.004425048828125]\n",
"b'gpt_neox.layers.5.input_layernorm.bias'\n",
"gpt_neox.layers.5.post_attention_layernorm.weight -> gpt_neox.layers.5.post_attention_layernorm.weight\n",
"gpt_neox.layers.5.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.984375, 1.0078125, 0.953125]\n",
"b'gpt_neox.layers.5.post_attention_layernorm.weight'\n",
"gpt_neox.layers.5.post_attention_layernorm.bias -> gpt_neox.layers.5.post_attention_layernorm.bias\n",
"gpt_neox.layers.5.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.058837890625, -0.04150390625, 0.0024566650390625]\n",
"b'gpt_neox.layers.5.post_attention_layernorm.bias'\n",
"gpt_neox.layers.5.attention.query_key_value.weight -> gpt_neox.layers.5.attention.query_key_value.weight\n",
"gpt_neox.layers.5.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0400390625, 0.029052734375, 0.0184326171875], [-0.021484375, 0.0181884765625, -0.0103759765625], [-0.029296875, 0.005462646484375, 0.022216796875]]\n",
"b'gpt_neox.layers.5.attention.query_key_value.weight'\n",
"gpt_neox.layers.5.attention.query_key_value.bias -> gpt_neox.layers.5.attention.query_key_value.bias\n",
"gpt_neox.layers.5.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0126953125, -0.004119873046875, 0.007537841796875]\n",
"b'gpt_neox.layers.5.attention.query_key_value.bias'\n",
"gpt_neox.layers.5.attention.dense.weight -> gpt_neox.layers.5.attention.dense.weight\n",
"gpt_neox.layers.5.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.022216796875, -0.01361083984375, -0.033935546875], [0.01031494140625, -0.013427734375, 0.005126953125], [0.0135498046875, 0.00982666015625, 0.009521484375]]\n",
"b'gpt_neox.layers.5.attention.dense.weight'\n",
"gpt_neox.layers.5.attention.dense.bias -> gpt_neox.layers.5.attention.dense.bias\n",
"gpt_neox.layers.5.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0234375, -0.0164794921875, 0.0128173828125]\n",
"b'gpt_neox.layers.5.attention.dense.bias'\n",
"gpt_neox.layers.5.mlp.dense_h_to_4h.weight -> gpt_neox.layers.5.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.5.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.040771484375, -0.03564453125, 0.01422119140625], [-0.0037994384765625, 0.06982421875, -0.0096435546875], [0.06494140625, 0.08544921875, 0.023193359375]]\n",
"b'gpt_neox.layers.5.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.5.mlp.dense_h_to_4h.bias -> gpt_neox.layers.5.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.5.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.02099609375, -0.0186767578125, -0.01556396484375]\n",
"b'gpt_neox.layers.5.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.5.mlp.dense_4h_to_h.weight -> gpt_neox.layers.5.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.5.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.054443359375, -0.033203125, -0.0218505859375], [0.0458984375, 0.010986328125, -0.0294189453125], [0.03857421875, 0.059326171875, 0.0146484375]]\n",
"b'gpt_neox.layers.5.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.5.mlp.dense_4h_to_h.bias -> gpt_neox.layers.5.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.5.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0004062652587890625, -0.0019683837890625, 0.0135498046875]\n",
"b'gpt_neox.layers.5.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.6.input_layernorm.weight -> gpt_neox.layers.6.input_layernorm.weight\n",
"gpt_neox.layers.6.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.80859375, 0.86328125, 0.89453125]\n",
"b'gpt_neox.layers.6.input_layernorm.weight'\n",
"gpt_neox.layers.6.input_layernorm.bias -> gpt_neox.layers.6.input_layernorm.bias\n",
"gpt_neox.layers.6.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.045166015625, 0.0380859375, 0.01171875]\n",
"b'gpt_neox.layers.6.input_layernorm.bias'\n",
"gpt_neox.layers.6.post_attention_layernorm.weight -> gpt_neox.layers.6.post_attention_layernorm.weight\n",
"gpt_neox.layers.6.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.96484375, 1.015625, 0.9609375]\n",
"b'gpt_neox.layers.6.post_attention_layernorm.weight'\n",
"gpt_neox.layers.6.post_attention_layernorm.bias -> gpt_neox.layers.6.post_attention_layernorm.bias\n",
"gpt_neox.layers.6.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.046875, -0.01495361328125, 0.0234375]\n",
"b'gpt_neox.layers.6.post_attention_layernorm.bias'\n",
"gpt_neox.layers.6.attention.query_key_value.weight -> gpt_neox.layers.6.attention.query_key_value.weight\n",
"gpt_neox.layers.6.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0018157958984375, -0.018310546875, 0.015869140625], [-0.01177978515625, 0.0030517578125, -0.0224609375], [0.0159912109375, 0.015869140625, -0.0225830078125]]\n",
"b'gpt_neox.layers.6.attention.query_key_value.weight'\n",
"gpt_neox.layers.6.attention.query_key_value.bias -> gpt_neox.layers.6.attention.query_key_value.bias\n",
"gpt_neox.layers.6.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.005126953125, -0.0185546875, 0.00811767578125]\n",
"b'gpt_neox.layers.6.attention.query_key_value.bias'\n",
"gpt_neox.layers.6.attention.dense.weight -> gpt_neox.layers.6.attention.dense.weight\n",
"gpt_neox.layers.6.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0322265625, -0.0126953125, -0.029052734375], [-0.01513671875, -0.00982666015625, -0.033203125], [-0.00970458984375, 0.0078125, 0.0322265625]]\n",
"b'gpt_neox.layers.6.attention.dense.weight'\n",
"gpt_neox.layers.6.attention.dense.bias -> gpt_neox.layers.6.attention.dense.bias\n",
"gpt_neox.layers.6.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0166015625, -0.007110595703125, 0.016357421875]\n",
"b'gpt_neox.layers.6.attention.dense.bias'\n",
"gpt_neox.layers.6.mlp.dense_h_to_4h.weight -> gpt_neox.layers.6.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.6.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.005157470703125, 0.0341796875, -0.0498046875], [0.00799560546875, 0.00274658203125, 0.03271484375], [0.04638671875, 0.0052490234375, -0.02685546875]]\n",
"b'gpt_neox.layers.6.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.6.mlp.dense_h_to_4h.bias -> gpt_neox.layers.6.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.6.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.002105712890625, -0.0208740234375, -0.01177978515625]\n",
"b'gpt_neox.layers.6.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.6.mlp.dense_4h_to_h.weight -> gpt_neox.layers.6.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.6.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.00109100341796875, 0.06494140625, -0.0133056640625], [-0.0169677734375, 0.0169677734375, 0.0201416015625], [-0.0341796875, 0.008544921875, 0.0067138671875]]\n",
"b'gpt_neox.layers.6.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.6.mlp.dense_4h_to_h.bias -> gpt_neox.layers.6.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.6.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.005157470703125, 0.001495361328125, 0.01385498046875]\n",
"b'gpt_neox.layers.6.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.7.input_layernorm.weight -> gpt_neox.layers.7.input_layernorm.weight\n",
"gpt_neox.layers.7.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.88671875, 0.890625, 0.91015625]\n",
"b'gpt_neox.layers.7.input_layernorm.weight'\n",
"gpt_neox.layers.7.input_layernorm.bias -> gpt_neox.layers.7.input_layernorm.bias\n",
"gpt_neox.layers.7.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.05517578125, 0.0208740234375, 0.0025787353515625]\n",
"b'gpt_neox.layers.7.input_layernorm.bias'\n",
"gpt_neox.layers.7.post_attention_layernorm.weight -> gpt_neox.layers.7.post_attention_layernorm.weight\n",
"gpt_neox.layers.7.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9921875, 0.95703125, 0.96484375]\n",
"b'gpt_neox.layers.7.post_attention_layernorm.weight'\n",
"gpt_neox.layers.7.post_attention_layernorm.bias -> gpt_neox.layers.7.post_attention_layernorm.bias\n",
"gpt_neox.layers.7.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0284423828125, -0.02490234375, 0.0283203125]\n",
"b'gpt_neox.layers.7.post_attention_layernorm.bias'\n",
"gpt_neox.layers.7.attention.query_key_value.weight -> gpt_neox.layers.7.attention.query_key_value.weight\n",
"gpt_neox.layers.7.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.01251220703125, 0.03466796875, -0.0294189453125], [0.044921875, 0.029541015625, 0.038818359375], [0.03369140625, 0.015869140625, 0.019287109375]]\n",
"b'gpt_neox.layers.7.attention.query_key_value.weight'\n",
"gpt_neox.layers.7.attention.query_key_value.bias -> gpt_neox.layers.7.attention.query_key_value.bias\n",
"gpt_neox.layers.7.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0213623046875, -0.01519775390625, 0.000545501708984375]\n",
"b'gpt_neox.layers.7.attention.query_key_value.bias'\n",
"gpt_neox.layers.7.attention.dense.weight -> gpt_neox.layers.7.attention.dense.weight\n",
"gpt_neox.layers.7.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.01519775390625, -0.01177978515625, 0.00787353515625], [-0.056396484375, -0.00179290771484375, 0.047607421875], [-0.0196533203125, 0.0162353515625, 0.00537109375]]\n",
"b'gpt_neox.layers.7.attention.dense.weight'\n",
"gpt_neox.layers.7.attention.dense.bias -> gpt_neox.layers.7.attention.dense.bias\n",
"gpt_neox.layers.7.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.035888671875, -0.0004787445068359375, 0.0260009765625]\n",
"b'gpt_neox.layers.7.attention.dense.bias'\n",
"gpt_neox.layers.7.mlp.dense_h_to_4h.weight -> gpt_neox.layers.7.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.7.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.0162353515625, 0.0341796875, 0.02978515625], [-0.003692626953125, 0.0167236328125, -0.004119873046875], [0.0098876953125, -0.0123291015625, -0.0130615234375]]\n",
"b'gpt_neox.layers.7.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.7.mlp.dense_h_to_4h.bias -> gpt_neox.layers.7.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.7.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.0057373046875, -0.006683349609375, -0.0274658203125]\n",
"b'gpt_neox.layers.7.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.7.mlp.dense_4h_to_h.weight -> gpt_neox.layers.7.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.7.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.030517578125, 0.024658203125, 0.00628662109375], [0.00151824951171875, -0.0341796875, -0.0086669921875], [0.01483154296875, -0.02685546875, -0.00537109375]]\n",
"b'gpt_neox.layers.7.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.7.mlp.dense_4h_to_h.bias -> gpt_neox.layers.7.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.7.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.02001953125, 0.006256103515625, 0.0218505859375]\n",
"b'gpt_neox.layers.7.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.8.input_layernorm.weight -> gpt_neox.layers.8.input_layernorm.weight\n",
"gpt_neox.layers.8.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.87890625, 0.9140625, 0.8984375]\n",
"b'gpt_neox.layers.8.input_layernorm.weight'\n",
"gpt_neox.layers.8.input_layernorm.bias -> gpt_neox.layers.8.input_layernorm.bias\n",
"gpt_neox.layers.8.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0341796875, 0.00616455078125, 0.00494384765625]\n",
"b'gpt_neox.layers.8.input_layernorm.bias'\n",
"gpt_neox.layers.8.post_attention_layernorm.weight -> gpt_neox.layers.8.post_attention_layernorm.weight\n",
"gpt_neox.layers.8.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9921875, 0.97265625, 0.953125]\n",
"b'gpt_neox.layers.8.post_attention_layernorm.weight'\n",
"gpt_neox.layers.8.post_attention_layernorm.bias -> gpt_neox.layers.8.post_attention_layernorm.bias\n",
"gpt_neox.layers.8.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.038330078125, 0.0111083984375, 0.04052734375]\n",
"b'gpt_neox.layers.8.post_attention_layernorm.bias'\n",
"gpt_neox.layers.8.attention.query_key_value.weight -> gpt_neox.layers.8.attention.query_key_value.weight\n",
"gpt_neox.layers.8.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0174560546875, -0.00701904296875, -0.0008544921875], [-0.01904296875, -0.0012664794921875, 0.0128173828125], [-0.0013427734375, -0.0556640625, -0.0264892578125]]\n",
"b'gpt_neox.layers.8.attention.query_key_value.weight'\n",
"gpt_neox.layers.8.attention.query_key_value.bias -> gpt_neox.layers.8.attention.query_key_value.bias\n",
"gpt_neox.layers.8.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.01300048828125, 0.006317138671875, -0.029052734375]\n",
"b'gpt_neox.layers.8.attention.query_key_value.bias'\n",
"gpt_neox.layers.8.attention.dense.weight -> gpt_neox.layers.8.attention.dense.weight\n",
"gpt_neox.layers.8.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.00848388671875, 0.0272216796875, -0.0223388671875], [0.0018463134765625, -0.01495361328125, 0.041015625], [0.0230712890625, -0.039794921875, -0.04248046875]]\n",
"b'gpt_neox.layers.8.attention.dense.weight'\n",
"gpt_neox.layers.8.attention.dense.bias -> gpt_neox.layers.8.attention.dense.bias\n",
"gpt_neox.layers.8.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03564453125, 0.01226806640625, 0.0194091796875]\n",
"b'gpt_neox.layers.8.attention.dense.bias'\n",
"gpt_neox.layers.8.mlp.dense_h_to_4h.weight -> gpt_neox.layers.8.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.8.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.080078125, -0.0257568359375, 0.033447265625], [0.00360107421875, 0.009521484375, -0.04443359375], [-0.0234375, -0.0101318359375, 0.045654296875]]\n",
"b'gpt_neox.layers.8.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.8.mlp.dense_h_to_4h.bias -> gpt_neox.layers.8.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.8.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.059326171875, -0.0074462890625, -0.0230712890625]\n",
"b'gpt_neox.layers.8.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.8.mlp.dense_4h_to_h.weight -> gpt_neox.layers.8.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.8.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.04345703125, 0.03857421875, -0.01422119140625], [0.005523681640625, -0.006317138671875, -0.046630859375], [-0.006011962890625, -0.006927490234375, 0.01202392578125]]\n",
"b'gpt_neox.layers.8.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.8.mlp.dense_4h_to_h.bias -> gpt_neox.layers.8.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.8.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01953125, 0.0115966796875, 0.01422119140625]\n",
"b'gpt_neox.layers.8.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.9.input_layernorm.weight -> gpt_neox.layers.9.input_layernorm.weight\n",
"gpt_neox.layers.9.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.8984375, 0.93359375, 0.91015625]\n",
"b'gpt_neox.layers.9.input_layernorm.weight'\n",
"gpt_neox.layers.9.input_layernorm.bias -> gpt_neox.layers.9.input_layernorm.bias\n",
"gpt_neox.layers.9.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.036376953125, 0.0030670166015625, 0.0038909912109375]\n",
"b'gpt_neox.layers.9.input_layernorm.bias'\n",
"gpt_neox.layers.9.post_attention_layernorm.weight -> gpt_neox.layers.9.post_attention_layernorm.weight\n",
"gpt_neox.layers.9.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.99609375, 0.96484375, 0.96875]\n",
"b'gpt_neox.layers.9.post_attention_layernorm.weight'\n",
"gpt_neox.layers.9.post_attention_layernorm.bias -> gpt_neox.layers.9.post_attention_layernorm.bias\n",
"gpt_neox.layers.9.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.034912109375, -0.0120849609375, 0.034912109375]\n",
"b'gpt_neox.layers.9.post_attention_layernorm.bias'\n",
"gpt_neox.layers.9.attention.query_key_value.weight -> gpt_neox.layers.9.attention.query_key_value.weight\n",
"gpt_neox.layers.9.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.01092529296875, 0.031494140625, -0.005523681640625], [-0.0002002716064453125, -0.0028076171875, 0.00168609619140625], [0.0026397705078125, 0.015869140625, 0.01123046875]]\n",
"b'gpt_neox.layers.9.attention.query_key_value.weight'\n",
"gpt_neox.layers.9.attention.query_key_value.bias -> gpt_neox.layers.9.attention.query_key_value.bias\n",
"gpt_neox.layers.9.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.027587890625, 0.006011962890625, -0.00013637542724609375]\n",
"b'gpt_neox.layers.9.attention.query_key_value.bias'\n",
"gpt_neox.layers.9.attention.dense.weight -> gpt_neox.layers.9.attention.dense.weight\n",
"gpt_neox.layers.9.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.0019683837890625, -0.037841796875, 0.006378173828125], [0.037109375, 0.0003147125244140625, 0.00836181640625], [0.021728515625, 0.000804901123046875, 0.0302734375]]\n",
"b'gpt_neox.layers.9.attention.dense.weight'\n",
"gpt_neox.layers.9.attention.dense.bias -> gpt_neox.layers.9.attention.dense.bias\n",
"gpt_neox.layers.9.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0264892578125, 0.0118408203125, 0.0208740234375]\n",
"b'gpt_neox.layers.9.attention.dense.bias'\n",
"gpt_neox.layers.9.mlp.dense_h_to_4h.weight -> gpt_neox.layers.9.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.9.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.03759765625, 0.03955078125, -0.018798828125], [0.00445556640625, -0.040283203125, -0.03125], [-0.05712890625, -0.041015625, -0.041015625]]\n",
"b'gpt_neox.layers.9.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.9.mlp.dense_h_to_4h.bias -> gpt_neox.layers.9.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.9.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.0026092529296875, -0.0225830078125, -0.0054931640625]\n",
"b'gpt_neox.layers.9.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.9.mlp.dense_4h_to_h.weight -> gpt_neox.layers.9.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.9.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.057373046875, 0.0087890625, 0.0223388671875], [-0.0101318359375, 0.0576171875, -0.01153564453125], [0.05126953125, -0.05126953125, 0.0177001953125]]\n",
"b'gpt_neox.layers.9.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.9.mlp.dense_4h_to_h.bias -> gpt_neox.layers.9.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.9.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0096435546875, 0.01324462890625, 0.0155029296875]\n",
"b'gpt_neox.layers.9.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.10.input_layernorm.weight -> gpt_neox.layers.10.input_layernorm.weight\n",
"gpt_neox.layers.10.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.91796875, 0.91796875, 0.91796875]\n",
"b'gpt_neox.layers.10.input_layernorm.weight'\n",
"gpt_neox.layers.10.input_layernorm.bias -> gpt_neox.layers.10.input_layernorm.bias\n",
"gpt_neox.layers.10.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0196533203125, -0.00086212158203125, -0.0025787353515625]\n",
"b'gpt_neox.layers.10.input_layernorm.bias'\n",
"gpt_neox.layers.10.post_attention_layernorm.weight -> gpt_neox.layers.10.post_attention_layernorm.weight\n",
"gpt_neox.layers.10.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.015625, 0.9765625, 0.96875]\n",
"b'gpt_neox.layers.10.post_attention_layernorm.weight'\n",
"gpt_neox.layers.10.post_attention_layernorm.bias -> gpt_neox.layers.10.post_attention_layernorm.bias\n",
"gpt_neox.layers.10.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.044921875, -0.0023193359375, 0.0262451171875]\n",
"b'gpt_neox.layers.10.post_attention_layernorm.bias'\n",
"gpt_neox.layers.10.attention.query_key_value.weight -> gpt_neox.layers.10.attention.query_key_value.weight\n",
"gpt_neox.layers.10.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.036865234375, 0.0166015625, -0.0185546875], [-0.06005859375, 0.04638671875, 0.07568359375], [0.011962890625, 0.000415802001953125, -0.003448486328125]]\n",
"b'gpt_neox.layers.10.attention.query_key_value.weight'\n",
"gpt_neox.layers.10.attention.query_key_value.bias -> gpt_neox.layers.10.attention.query_key_value.bias\n",
"gpt_neox.layers.10.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.0031280517578125, 0.005462646484375, -0.00494384765625]\n",
"b'gpt_neox.layers.10.attention.query_key_value.bias'\n",
"gpt_neox.layers.10.attention.dense.weight -> gpt_neox.layers.10.attention.dense.weight\n",
"gpt_neox.layers.10.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.06494140625, 0.007293701171875, -0.013427734375], [-0.0017852783203125, -0.00909423828125, 0.0247802734375], [-0.0185546875, -0.0185546875, -0.0203857421875]]\n",
"b'gpt_neox.layers.10.attention.dense.weight'\n",
"gpt_neox.layers.10.attention.dense.bias -> gpt_neox.layers.10.attention.dense.bias\n",
"gpt_neox.layers.10.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0223388671875, 0.0211181640625, 0.0238037109375]\n",
"b'gpt_neox.layers.10.attention.dense.bias'\n",
"gpt_neox.layers.10.mlp.dense_h_to_4h.weight -> gpt_neox.layers.10.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.10.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.0107421875, 0.0029754638671875, -0.035400390625], [-0.007049560546875, 0.0037078857421875, -0.0247802734375], [0.008544921875, -0.00494384765625, -0.01385498046875]]\n",
"b'gpt_neox.layers.10.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.10.mlp.dense_h_to_4h.bias -> gpt_neox.layers.10.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.10.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.019775390625, -0.020751953125, -0.004974365234375]\n",
"b'gpt_neox.layers.10.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.10.mlp.dense_4h_to_h.weight -> gpt_neox.layers.10.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.10.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.01220703125, -0.042236328125, -0.0007476806640625], [0.004669189453125, -0.00958251953125, -0.01495361328125], [-0.000705718994140625, 0.03857421875, -0.00141143798828125]]\n",
"b'gpt_neox.layers.10.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.10.mlp.dense_4h_to_h.bias -> gpt_neox.layers.10.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.10.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.00604248046875, 0.02197265625, 0.01953125]\n",
"b'gpt_neox.layers.10.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.11.input_layernorm.weight -> gpt_neox.layers.11.input_layernorm.weight\n",
"gpt_neox.layers.11.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.90234375, 0.9375, 0.90234375]\n",
"b'gpt_neox.layers.11.input_layernorm.weight'\n",
"gpt_neox.layers.11.input_layernorm.bias -> gpt_neox.layers.11.input_layernorm.bias\n",
"gpt_neox.layers.11.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0179443359375, 0.00872802734375, -0.0036163330078125]\n",
"b'gpt_neox.layers.11.input_layernorm.bias'\n",
"gpt_neox.layers.11.post_attention_layernorm.weight -> gpt_neox.layers.11.post_attention_layernorm.weight\n",
"gpt_neox.layers.11.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0, 0.984375, 0.9609375]\n",
"b'gpt_neox.layers.11.post_attention_layernorm.weight'\n",
"gpt_neox.layers.11.post_attention_layernorm.bias -> gpt_neox.layers.11.post_attention_layernorm.bias\n",
"gpt_neox.layers.11.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.04150390625, -0.0118408203125, 0.0205078125]\n",
"b'gpt_neox.layers.11.post_attention_layernorm.bias'\n",
"gpt_neox.layers.11.attention.query_key_value.weight -> gpt_neox.layers.11.attention.query_key_value.weight\n",
"gpt_neox.layers.11.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.005645751953125, -0.04931640625, -0.048828125], [0.00469970703125, -0.016357421875, 0.016357421875], [-0.0004138946533203125, -0.017578125, 0.02099609375]]\n",
"b'gpt_neox.layers.11.attention.query_key_value.weight'\n",
"gpt_neox.layers.11.attention.query_key_value.bias -> gpt_neox.layers.11.attention.query_key_value.bias\n",
"gpt_neox.layers.11.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.005645751953125, 0.0079345703125, -0.03564453125]\n",
"b'gpt_neox.layers.11.attention.query_key_value.bias'\n",
"gpt_neox.layers.11.attention.dense.weight -> gpt_neox.layers.11.attention.dense.weight\n",
"gpt_neox.layers.11.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.004730224609375, -0.041015625, -0.026611328125], [0.0274658203125, -8.0108642578125e-05, 0.0011138916015625], [0.040283203125, 0.01416015625, 0.003936767578125]]\n",
"b'gpt_neox.layers.11.attention.dense.weight'\n",
"gpt_neox.layers.11.attention.dense.bias -> gpt_neox.layers.11.attention.dense.bias\n",
"gpt_neox.layers.11.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0152587890625, 0.021240234375, 0.0244140625]\n",
"b'gpt_neox.layers.11.attention.dense.bias'\n",
"gpt_neox.layers.11.mlp.dense_h_to_4h.weight -> gpt_neox.layers.11.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.11.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.0128173828125, 0.045166015625, -0.034912109375], [-0.046630859375, -0.03955078125, 0.03857421875], [-0.01239013671875, -0.030029296875, 0.00921630859375]]\n",
"b'gpt_neox.layers.11.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.11.mlp.dense_h_to_4h.bias -> gpt_neox.layers.11.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.11.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0126953125, -0.01202392578125, 0.004730224609375]\n",
"b'gpt_neox.layers.11.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.11.mlp.dense_4h_to_h.weight -> gpt_neox.layers.11.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.11.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.00347900390625, 0.00860595703125, -0.017822265625], [0.00726318359375, -0.003662109375, 0.031982421875], [0.0191650390625, -0.00408935546875, 0.010986328125]]\n",
"b'gpt_neox.layers.11.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.11.mlp.dense_4h_to_h.bias -> gpt_neox.layers.11.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.11.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.003936767578125, 0.0224609375, 0.022705078125]\n",
"b'gpt_neox.layers.11.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.12.input_layernorm.weight -> gpt_neox.layers.12.input_layernorm.weight\n",
"gpt_neox.layers.12.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.93359375, 0.9296875, 0.921875]\n",
"b'gpt_neox.layers.12.input_layernorm.weight'\n",
"gpt_neox.layers.12.input_layernorm.bias -> gpt_neox.layers.12.input_layernorm.bias\n",
"gpt_neox.layers.12.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0166015625, 0.0191650390625, -0.000736236572265625]\n",
"b'gpt_neox.layers.12.input_layernorm.bias'\n",
"gpt_neox.layers.12.post_attention_layernorm.weight -> gpt_neox.layers.12.post_attention_layernorm.weight\n",
"gpt_neox.layers.12.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9921875, 0.984375, 0.96484375]\n",
"b'gpt_neox.layers.12.post_attention_layernorm.weight'\n",
"gpt_neox.layers.12.post_attention_layernorm.bias -> gpt_neox.layers.12.post_attention_layernorm.bias\n",
"gpt_neox.layers.12.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.023193359375, 0.0157470703125, 0.016357421875]\n",
"b'gpt_neox.layers.12.post_attention_layernorm.bias'\n",
"gpt_neox.layers.12.attention.query_key_value.weight -> gpt_neox.layers.12.attention.query_key_value.weight\n",
"gpt_neox.layers.12.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0137939453125, 0.02978515625, 0.036865234375], [0.0005950927734375, 0.022705078125, 0.01263427734375], [0.056884765625, 0.004486083984375, 0.013916015625]]\n",
"b'gpt_neox.layers.12.attention.query_key_value.weight'\n",
"gpt_neox.layers.12.attention.query_key_value.bias -> gpt_neox.layers.12.attention.query_key_value.bias\n",
"gpt_neox.layers.12.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.01068115234375, -0.00872802734375, 0.001434326171875]\n",
"b'gpt_neox.layers.12.attention.query_key_value.bias'\n",
"gpt_neox.layers.12.attention.dense.weight -> gpt_neox.layers.12.attention.dense.weight\n",
"gpt_neox.layers.12.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.031005859375, 0.0205078125, -0.006683349609375], [-0.021240234375, 0.043212890625, -0.030517578125], [0.0008087158203125, -0.0274658203125, 0.0478515625]]\n",
"b'gpt_neox.layers.12.attention.dense.weight'\n",
"gpt_neox.layers.12.attention.dense.bias -> gpt_neox.layers.12.attention.dense.bias\n",
"gpt_neox.layers.12.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.010986328125, 0.01251220703125, 0.0228271484375]\n",
"b'gpt_neox.layers.12.attention.dense.bias'\n",
"gpt_neox.layers.12.mlp.dense_h_to_4h.weight -> gpt_neox.layers.12.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.12.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.033447265625, -0.0079345703125, -0.05615234375], [-0.072265625, -0.04248046875, -0.008544921875], [-0.00732421875, 0.030029296875, 0.01422119140625]]\n",
"b'gpt_neox.layers.12.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.12.mlp.dense_h_to_4h.bias -> gpt_neox.layers.12.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.12.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.00689697265625, -0.0098876953125, -0.010009765625]\n",
"b'gpt_neox.layers.12.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.12.mlp.dense_4h_to_h.weight -> gpt_neox.layers.12.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.12.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.0308837890625, -0.0021514892578125, -0.0341796875], [-0.01409912109375, 0.030517578125, 0.0224609375], [0.0830078125, 0.016357421875, 0.033447265625]]\n",
"b'gpt_neox.layers.12.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.12.mlp.dense_4h_to_h.bias -> gpt_neox.layers.12.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.12.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0025177001953125, 0.0106201171875, 0.0206298828125]\n",
"b'gpt_neox.layers.12.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.13.input_layernorm.weight -> gpt_neox.layers.13.input_layernorm.weight\n",
"gpt_neox.layers.13.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9453125, 0.92578125, 0.91015625]\n",
"b'gpt_neox.layers.13.input_layernorm.weight'\n",
"gpt_neox.layers.13.input_layernorm.bias -> gpt_neox.layers.13.input_layernorm.bias\n",
"gpt_neox.layers.13.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.015869140625, 0.007415771484375, 0.0078125]\n",
"b'gpt_neox.layers.13.input_layernorm.bias'\n",
"gpt_neox.layers.13.post_attention_layernorm.weight -> gpt_neox.layers.13.post_attention_layernorm.weight\n",
"gpt_neox.layers.13.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.984375, 0.95703125, 0.96484375]\n",
"b'gpt_neox.layers.13.post_attention_layernorm.weight'\n",
"gpt_neox.layers.13.post_attention_layernorm.bias -> gpt_neox.layers.13.post_attention_layernorm.bias\n",
"gpt_neox.layers.13.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0390625, -0.0034637451171875, 0.036376953125]\n",
"b'gpt_neox.layers.13.post_attention_layernorm.bias'\n",
"gpt_neox.layers.13.attention.query_key_value.weight -> gpt_neox.layers.13.attention.query_key_value.weight\n",
"gpt_neox.layers.13.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.009765625, 0.00909423828125, 0.00421142578125], [-0.020263671875, 0.00823974609375, 0.0478515625], [-0.00885009765625, -0.0027313232421875, 0.0213623046875]]\n",
"b'gpt_neox.layers.13.attention.query_key_value.weight'\n",
"gpt_neox.layers.13.attention.query_key_value.bias -> gpt_neox.layers.13.attention.query_key_value.bias\n",
"gpt_neox.layers.13.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.00113677978515625, 0.0255126953125, 0.011474609375]\n",
"b'gpt_neox.layers.13.attention.query_key_value.bias'\n",
"gpt_neox.layers.13.attention.dense.weight -> gpt_neox.layers.13.attention.dense.weight\n",
"gpt_neox.layers.13.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0322265625, 0.0003452301025390625, 0.0123291015625], [0.023193359375, -0.0162353515625, 0.006500244140625], [0.01446533203125, 0.0040283203125, 0.01361083984375]]\n",
"b'gpt_neox.layers.13.attention.dense.weight'\n",
"gpt_neox.layers.13.attention.dense.bias -> gpt_neox.layers.13.attention.dense.bias\n",
"gpt_neox.layers.13.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01019287109375, 0.0103759765625, 0.01422119140625]\n",
"b'gpt_neox.layers.13.attention.dense.bias'\n",
"gpt_neox.layers.13.mlp.dense_h_to_4h.weight -> gpt_neox.layers.13.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.13.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.0186767578125, -0.011474609375, -0.04052734375], [-0.04443359375, -0.02734375, 0.028564453125], [0.0703125, -0.007659912109375, -0.048095703125]]\n",
"b'gpt_neox.layers.13.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.13.mlp.dense_h_to_4h.bias -> gpt_neox.layers.13.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.13.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0107421875, -0.007598876953125, -0.0235595703125]\n",
"b'gpt_neox.layers.13.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.13.mlp.dense_4h_to_h.weight -> gpt_neox.layers.13.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.13.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.027587890625, -0.021728515625, 0.048583984375], [-0.0152587890625, -0.01611328125, -0.0198974609375], [-0.01239013671875, 0.0142822265625, 0.01214599609375]]\n",
"b'gpt_neox.layers.13.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.13.mlp.dense_4h_to_h.bias -> gpt_neox.layers.13.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.13.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.001556396484375, 0.0133056640625, 0.00994873046875]\n",
"b'gpt_neox.layers.13.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.14.input_layernorm.weight -> gpt_neox.layers.14.input_layernorm.weight\n",
"gpt_neox.layers.14.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9609375, 0.9765625, 0.94921875]\n",
"b'gpt_neox.layers.14.input_layernorm.weight'\n",
"gpt_neox.layers.14.input_layernorm.bias -> gpt_neox.layers.14.input_layernorm.bias\n",
"gpt_neox.layers.14.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.019775390625, 0.0186767578125, 0.0024871826171875]\n",
"b'gpt_neox.layers.14.input_layernorm.bias'\n",
"gpt_neox.layers.14.post_attention_layernorm.weight -> gpt_neox.layers.14.post_attention_layernorm.weight\n",
"gpt_neox.layers.14.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.96484375, 0.99609375, 0.96875]\n",
"b'gpt_neox.layers.14.post_attention_layernorm.weight'\n",
"gpt_neox.layers.14.post_attention_layernorm.bias -> gpt_neox.layers.14.post_attention_layernorm.bias\n",
"gpt_neox.layers.14.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.005859375, 0.027587890625, 0.020263671875]\n",
"b'gpt_neox.layers.14.post_attention_layernorm.bias'\n",
"gpt_neox.layers.14.attention.query_key_value.weight -> gpt_neox.layers.14.attention.query_key_value.weight\n",
"gpt_neox.layers.14.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.00022411346435546875, 0.006256103515625, 0.0238037109375], [0.007232666015625, -0.008544921875, 0.0186767578125], [0.0283203125, 0.01483154296875, 0.057861328125]]\n",
"b'gpt_neox.layers.14.attention.query_key_value.weight'\n",
"gpt_neox.layers.14.attention.query_key_value.bias -> gpt_neox.layers.14.attention.query_key_value.bias\n",
"gpt_neox.layers.14.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.00157928466796875, 0.003875732421875, -0.005950927734375]\n",
"b'gpt_neox.layers.14.attention.query_key_value.bias'\n",
"gpt_neox.layers.14.attention.dense.weight -> gpt_neox.layers.14.attention.dense.weight\n",
"gpt_neox.layers.14.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.013671875, -0.0103759765625, 0.0302734375], [-0.054443359375, -0.00543212890625, -0.043701171875], [0.054931640625, -0.031005859375, -0.03515625]]\n",
"b'gpt_neox.layers.14.attention.dense.weight'\n",
"gpt_neox.layers.14.attention.dense.bias -> gpt_neox.layers.14.attention.dense.bias\n",
"gpt_neox.layers.14.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01483154296875, 0.00830078125, 0.01495361328125]\n",
"b'gpt_neox.layers.14.attention.dense.bias'\n",
"gpt_neox.layers.14.mlp.dense_h_to_4h.weight -> gpt_neox.layers.14.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.14.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.050537109375, -0.00142669677734375, 0.019775390625], [0.034423828125, -0.03857421875, 0.00897216796875], [0.0625, 0.07177734375, -0.0130615234375]]\n",
"b'gpt_neox.layers.14.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.14.mlp.dense_h_to_4h.bias -> gpt_neox.layers.14.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.14.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.004058837890625, -0.018310546875, -0.005340576171875]\n",
"b'gpt_neox.layers.14.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.14.mlp.dense_4h_to_h.weight -> gpt_neox.layers.14.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.14.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.036865234375, -0.07861328125, -0.047607421875], [-0.0118408203125, 0.060546875, -0.027587890625], [-0.00970458984375, -0.006591796875, 0.02294921875]]\n",
"b'gpt_neox.layers.14.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.14.mlp.dense_4h_to_h.bias -> gpt_neox.layers.14.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.14.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01275634765625, 0.0062255859375, 0.00994873046875]\n",
"b'gpt_neox.layers.14.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.15.input_layernorm.weight -> gpt_neox.layers.15.input_layernorm.weight\n",
"gpt_neox.layers.15.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.94921875, 0.9765625, 0.95703125]\n",
"b'gpt_neox.layers.15.input_layernorm.weight'\n",
"gpt_neox.layers.15.input_layernorm.bias -> gpt_neox.layers.15.input_layernorm.bias\n",
"gpt_neox.layers.15.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.032470703125, 0.01611328125, -0.00616455078125]\n",
"b'gpt_neox.layers.15.input_layernorm.bias'\n",
"gpt_neox.layers.15.post_attention_layernorm.weight -> gpt_neox.layers.15.post_attention_layernorm.weight\n",
"gpt_neox.layers.15.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.984375, 0.98828125, 0.9765625]\n",
"b'gpt_neox.layers.15.post_attention_layernorm.weight'\n",
"gpt_neox.layers.15.post_attention_layernorm.bias -> gpt_neox.layers.15.post_attention_layernorm.bias\n",
"gpt_neox.layers.15.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.00836181640625, -0.0069580078125, 0.0115966796875]\n",
"b'gpt_neox.layers.15.post_attention_layernorm.bias'\n",
"gpt_neox.layers.15.attention.query_key_value.weight -> gpt_neox.layers.15.attention.query_key_value.weight\n",
"gpt_neox.layers.15.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0172119140625, 0.000759124755859375, 0.07421875], [0.06787109375, -0.046630859375, 0.068359375], [-0.0419921875, -0.001953125, -0.0181884765625]]\n",
"b'gpt_neox.layers.15.attention.query_key_value.weight'\n",
"gpt_neox.layers.15.attention.query_key_value.bias -> gpt_neox.layers.15.attention.query_key_value.bias\n",
"gpt_neox.layers.15.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0027008056640625, 0.000568389892578125, -0.0007171630859375]\n",
"b'gpt_neox.layers.15.attention.query_key_value.bias'\n",
"gpt_neox.layers.15.attention.dense.weight -> gpt_neox.layers.15.attention.dense.weight\n",
"gpt_neox.layers.15.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.0186767578125, -0.0189208984375, -0.0093994140625], [-0.00531005859375, -0.03662109375, 0.01611328125], [-0.037353515625, 0.01214599609375, -0.00946044921875]]\n",
"b'gpt_neox.layers.15.attention.dense.weight'\n",
"gpt_neox.layers.15.attention.dense.bias -> gpt_neox.layers.15.attention.dense.bias\n",
"gpt_neox.layers.15.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0250244140625, 0.00141143798828125, 0.0172119140625]\n",
"b'gpt_neox.layers.15.attention.dense.bias'\n",
"gpt_neox.layers.15.mlp.dense_h_to_4h.weight -> gpt_neox.layers.15.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.15.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.0281982421875, 0.031494140625, 0.049560546875], [0.08251953125, -0.047119140625, -0.03662109375], [0.06884765625, -0.036376953125, -0.0208740234375]]\n",
"b'gpt_neox.layers.15.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.15.mlp.dense_h_to_4h.bias -> gpt_neox.layers.15.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.15.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.00897216796875, -0.0030975341796875, 0.003875732421875]\n",
"b'gpt_neox.layers.15.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.15.mlp.dense_4h_to_h.weight -> gpt_neox.layers.15.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.15.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.00537109375, 0.0201416015625, -0.03369140625], [-0.02294921875, -0.0031280517578125, 0.0036773681640625], [-0.06689453125, 0.0208740234375, 0.0068359375]]\n",
"b'gpt_neox.layers.15.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.15.mlp.dense_4h_to_h.bias -> gpt_neox.layers.15.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.15.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.019775390625, 0.00408935546875, 0.012451171875]\n",
"b'gpt_neox.layers.15.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.16.input_layernorm.weight -> gpt_neox.layers.16.input_layernorm.weight\n",
"gpt_neox.layers.16.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9296875, 0.9765625, 0.94140625]\n",
"b'gpt_neox.layers.16.input_layernorm.weight'\n",
"gpt_neox.layers.16.input_layernorm.bias -> gpt_neox.layers.16.input_layernorm.bias\n",
"gpt_neox.layers.16.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0201416015625, 0.01434326171875, 0.018310546875]\n",
"b'gpt_neox.layers.16.input_layernorm.bias'\n",
"gpt_neox.layers.16.post_attention_layernorm.weight -> gpt_neox.layers.16.post_attention_layernorm.weight\n",
"gpt_neox.layers.16.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9765625, 0.984375, 0.9609375]\n",
"b'gpt_neox.layers.16.post_attention_layernorm.weight'\n",
"gpt_neox.layers.16.post_attention_layernorm.bias -> gpt_neox.layers.16.post_attention_layernorm.bias\n",
"gpt_neox.layers.16.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0301513671875, -0.0208740234375, 0.0296630859375]\n",
"b'gpt_neox.layers.16.post_attention_layernorm.bias'\n",
"gpt_neox.layers.16.attention.query_key_value.weight -> gpt_neox.layers.16.attention.query_key_value.weight\n",
"gpt_neox.layers.16.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.010009765625, 0.03369140625, 0.0185546875], [-0.0086669921875, 0.0140380859375, -0.0120849609375], [-0.0250244140625, 0.04150390625, 0.00151824951171875]]\n",
"b'gpt_neox.layers.16.attention.query_key_value.weight'\n",
"gpt_neox.layers.16.attention.query_key_value.bias -> gpt_neox.layers.16.attention.query_key_value.bias\n",
"gpt_neox.layers.16.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.014404296875, 0.004241943359375, 0.01611328125]\n",
"b'gpt_neox.layers.16.attention.query_key_value.bias'\n",
"gpt_neox.layers.16.attention.dense.weight -> gpt_neox.layers.16.attention.dense.weight\n",
"gpt_neox.layers.16.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0234375, -0.00872802734375, 0.0057373046875], [0.033203125, 0.04833984375, -0.03955078125], [0.02734375, 0.050048828125, -0.00640869140625]]\n",
"b'gpt_neox.layers.16.attention.dense.weight'\n",
"gpt_neox.layers.16.attention.dense.bias -> gpt_neox.layers.16.attention.dense.bias\n",
"gpt_neox.layers.16.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0264892578125, 0.000274658203125, 0.0076904296875]\n",
"b'gpt_neox.layers.16.attention.dense.bias'\n",
"gpt_neox.layers.16.mlp.dense_h_to_4h.weight -> gpt_neox.layers.16.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.16.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.053955078125, -0.00640869140625, -0.02783203125], [0.028564453125, 0.0181884765625, 0.027587890625], [0.047119140625, -0.028076171875, -0.001983642578125]]\n",
"b'gpt_neox.layers.16.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.16.mlp.dense_h_to_4h.bias -> gpt_neox.layers.16.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.16.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0009002685546875, -0.00080108642578125, 0.0067138671875]\n",
"b'gpt_neox.layers.16.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.16.mlp.dense_4h_to_h.weight -> gpt_neox.layers.16.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.16.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.06298828125, -0.02099609375, -0.000225067138671875], [-0.0478515625, 0.0125732421875, 0.0019989013671875], [-0.056884765625, -0.01275634765625, -0.01904296875]]\n",
"b'gpt_neox.layers.16.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.16.mlp.dense_4h_to_h.bias -> gpt_neox.layers.16.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.16.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0179443359375, 0.004119873046875, 0.00194549560546875]\n",
"b'gpt_neox.layers.16.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.17.input_layernorm.weight -> gpt_neox.layers.17.input_layernorm.weight\n",
"gpt_neox.layers.17.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.96875, 1.0078125, 0.9609375]\n",
"b'gpt_neox.layers.17.input_layernorm.weight'\n",
"gpt_neox.layers.17.input_layernorm.bias -> gpt_neox.layers.17.input_layernorm.bias\n",
"gpt_neox.layers.17.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0225830078125, 0.01116943359375, 0.0035858154296875]\n",
"b'gpt_neox.layers.17.input_layernorm.bias'\n",
"gpt_neox.layers.17.post_attention_layernorm.weight -> gpt_neox.layers.17.post_attention_layernorm.weight\n",
"gpt_neox.layers.17.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.98046875, 0.96875, 0.9609375]\n",
"b'gpt_neox.layers.17.post_attention_layernorm.weight'\n",
"gpt_neox.layers.17.post_attention_layernorm.bias -> gpt_neox.layers.17.post_attention_layernorm.bias\n",
"gpt_neox.layers.17.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0439453125, -0.029296875, 0.0120849609375]\n",
"b'gpt_neox.layers.17.post_attention_layernorm.bias'\n",
"gpt_neox.layers.17.attention.query_key_value.weight -> gpt_neox.layers.17.attention.query_key_value.weight\n",
"gpt_neox.layers.17.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0225830078125, -0.021484375, -0.0035400390625], [-0.0017242431640625, -0.009033203125, -0.00537109375], [0.0296630859375, -0.0067138671875, -0.00634765625]]\n",
"b'gpt_neox.layers.17.attention.query_key_value.weight'\n",
"gpt_neox.layers.17.attention.query_key_value.bias -> gpt_neox.layers.17.attention.query_key_value.bias\n",
"gpt_neox.layers.17.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0029449462890625, 0.00518798828125, -0.00150299072265625]\n",
"b'gpt_neox.layers.17.attention.query_key_value.bias'\n",
"gpt_neox.layers.17.attention.dense.weight -> gpt_neox.layers.17.attention.dense.weight\n",
"gpt_neox.layers.17.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.042724609375, -0.01324462890625, 0.00144195556640625], [-0.02197265625, -0.006622314453125, 0.029052734375], [-0.0177001953125, -0.0279541015625, 0.012451171875]]\n",
"b'gpt_neox.layers.17.attention.dense.weight'\n",
"gpt_neox.layers.17.attention.dense.bias -> gpt_neox.layers.17.attention.dense.bias\n",
"gpt_neox.layers.17.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0303955078125, 0.004119873046875, 0.0108642578125]\n",
"b'gpt_neox.layers.17.attention.dense.bias'\n",
"gpt_neox.layers.17.mlp.dense_h_to_4h.weight -> gpt_neox.layers.17.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.17.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.01324462890625, 0.005035400390625, -0.006561279296875], [-0.0162353515625, -0.052001953125, 0.00811767578125], [0.00897216796875, 0.06396484375, 0.0133056640625]]\n",
"b'gpt_neox.layers.17.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.17.mlp.dense_h_to_4h.bias -> gpt_neox.layers.17.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.17.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0103759765625, -0.00933837890625, -0.00885009765625]\n",
"b'gpt_neox.layers.17.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.17.mlp.dense_4h_to_h.weight -> gpt_neox.layers.17.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.17.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.01043701171875, 0.00775146484375, 0.04638671875], [-0.0125732421875, 0.034912109375, 0.0751953125], [-0.0228271484375, 0.0150146484375, 0.00182342529296875]]\n",
"b'gpt_neox.layers.17.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.17.mlp.dense_4h_to_h.bias -> gpt_neox.layers.17.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.17.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.020263671875, 0.00732421875, 0.00933837890625]\n",
"b'gpt_neox.layers.17.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.18.input_layernorm.weight -> gpt_neox.layers.18.input_layernorm.weight\n",
"gpt_neox.layers.18.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.98828125, 1.03125, 0.97265625]\n",
"b'gpt_neox.layers.18.input_layernorm.weight'\n",
"gpt_neox.layers.18.input_layernorm.bias -> gpt_neox.layers.18.input_layernorm.bias\n",
"gpt_neox.layers.18.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0400390625, 0.0286865234375, 0.0194091796875]\n",
"b'gpt_neox.layers.18.input_layernorm.bias'\n",
"gpt_neox.layers.18.post_attention_layernorm.weight -> gpt_neox.layers.18.post_attention_layernorm.weight\n",
"gpt_neox.layers.18.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.99609375, 0.9609375, 0.96484375]\n",
"b'gpt_neox.layers.18.post_attention_layernorm.weight'\n",
"gpt_neox.layers.18.post_attention_layernorm.bias -> gpt_neox.layers.18.post_attention_layernorm.bias\n",
"gpt_neox.layers.18.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01397705078125, -0.026123046875, 0.0211181640625]\n",
"b'gpt_neox.layers.18.post_attention_layernorm.bias'\n",
"gpt_neox.layers.18.attention.query_key_value.weight -> gpt_neox.layers.18.attention.query_key_value.weight\n",
"gpt_neox.layers.18.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.01123046875, 0.00616455078125, 0.04931640625], [0.005401611328125, 0.020263671875, -0.01953125], [-0.0213623046875, 0.005767822265625, 0.019775390625]]\n",
"b'gpt_neox.layers.18.attention.query_key_value.weight'\n",
"gpt_neox.layers.18.attention.query_key_value.bias -> gpt_neox.layers.18.attention.query_key_value.bias\n",
"gpt_neox.layers.18.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.005706787109375, 0.00193023681640625, 0.005950927734375]\n",
"b'gpt_neox.layers.18.attention.query_key_value.bias'\n",
"gpt_neox.layers.18.attention.dense.weight -> gpt_neox.layers.18.attention.dense.weight\n",
"gpt_neox.layers.18.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0196533203125, 0.0322265625, -0.01165771484375], [-0.00787353515625, 0.030517578125, -0.007781982421875], [0.0020599365234375, 0.042236328125, -0.044921875]]\n",
"b'gpt_neox.layers.18.attention.dense.weight'\n",
"gpt_neox.layers.18.attention.dense.bias -> gpt_neox.layers.18.attention.dense.bias\n",
"gpt_neox.layers.18.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.036376953125, -0.00025177001953125, 0.0076904296875]\n",
"b'gpt_neox.layers.18.attention.dense.bias'\n",
"gpt_neox.layers.18.mlp.dense_h_to_4h.weight -> gpt_neox.layers.18.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.18.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.06884765625, 0.0634765625, 0.01953125], [-0.0177001953125, 0.003204345703125, 0.0238037109375], [0.019775390625, -0.01055908203125, -0.047119140625]]\n",
"b'gpt_neox.layers.18.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.18.mlp.dense_h_to_4h.bias -> gpt_neox.layers.18.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.18.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.00183868408203125, -0.0054931640625, -0.00156402587890625]\n",
"b'gpt_neox.layers.18.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.18.mlp.dense_4h_to_h.weight -> gpt_neox.layers.18.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.18.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.0115966796875, 0.04052734375, 0.0064697265625], [-0.0546875, 0.020751953125, -0.01190185546875], [0.025390625, -0.0264892578125, -0.048583984375]]\n",
"b'gpt_neox.layers.18.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.18.mlp.dense_4h_to_h.bias -> gpt_neox.layers.18.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.18.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03173828125, 0.003662109375, 0.00384521484375]\n",
"b'gpt_neox.layers.18.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.19.input_layernorm.weight -> gpt_neox.layers.19.input_layernorm.weight\n",
"gpt_neox.layers.19.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.96875, 1.0234375, 1.0]\n",
"b'gpt_neox.layers.19.input_layernorm.weight'\n",
"gpt_neox.layers.19.input_layernorm.bias -> gpt_neox.layers.19.input_layernorm.bias\n",
"gpt_neox.layers.19.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.01275634765625, 0.02197265625, -0.006317138671875]\n",
"b'gpt_neox.layers.19.input_layernorm.bias'\n",
"gpt_neox.layers.19.post_attention_layernorm.weight -> gpt_neox.layers.19.post_attention_layernorm.weight\n",
"gpt_neox.layers.19.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0, 0.98828125, 0.9765625]\n",
"b'gpt_neox.layers.19.post_attention_layernorm.weight'\n",
"gpt_neox.layers.19.post_attention_layernorm.bias -> gpt_neox.layers.19.post_attention_layernorm.bias\n",
"gpt_neox.layers.19.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0166015625, -0.0230712890625, 0.0027008056640625]\n",
"b'gpt_neox.layers.19.post_attention_layernorm.bias'\n",
"gpt_neox.layers.19.attention.query_key_value.weight -> gpt_neox.layers.19.attention.query_key_value.weight\n",
"gpt_neox.layers.19.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.020751953125, -0.01397705078125, 0.02978515625], [0.0291748046875, -0.005645751953125, -0.0064697265625], [-0.019287109375, 0.04541015625, -0.03662109375]]\n",
"b'gpt_neox.layers.19.attention.query_key_value.weight'\n",
"gpt_neox.layers.19.attention.query_key_value.bias -> gpt_neox.layers.19.attention.query_key_value.bias\n",
"gpt_neox.layers.19.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.00144195556640625, -0.00086212158203125, -0.00604248046875]\n",
"b'gpt_neox.layers.19.attention.query_key_value.bias'\n",
"gpt_neox.layers.19.attention.dense.weight -> gpt_neox.layers.19.attention.dense.weight\n",
"gpt_neox.layers.19.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.007110595703125, -0.03564453125, 0.05859375], [0.0218505859375, -0.00159454345703125, 0.01239013671875], [-0.044921875, -0.0517578125, -0.0133056640625]]\n",
"b'gpt_neox.layers.19.attention.dense.weight'\n",
"gpt_neox.layers.19.attention.dense.bias -> gpt_neox.layers.19.attention.dense.bias\n",
"gpt_neox.layers.19.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.034912109375, -0.00127410888671875, 0.0086669921875]\n",
"b'gpt_neox.layers.19.attention.dense.bias'\n",
"gpt_neox.layers.19.mlp.dense_h_to_4h.weight -> gpt_neox.layers.19.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.19.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.030517578125, 0.0220947265625, -0.0033111572265625], [-0.043212890625, -0.008544921875, 0.0164794921875], [0.05078125, 0.0128173828125, 0.050048828125]]\n",
"b'gpt_neox.layers.19.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.19.mlp.dense_h_to_4h.bias -> gpt_neox.layers.19.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.19.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [0.003021240234375, -0.01348876953125, -0.02587890625]\n",
"b'gpt_neox.layers.19.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.19.mlp.dense_4h_to_h.weight -> gpt_neox.layers.19.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.19.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.0185546875, 0.0269775390625, 0.028564453125], [-0.0269775390625, 0.0018463134765625, 0.007476806640625], [0.0277099609375, -0.01446533203125, 0.034912109375]]\n",
"b'gpt_neox.layers.19.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.19.mlp.dense_4h_to_h.bias -> gpt_neox.layers.19.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.19.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.033935546875, 0.005279541015625, 0.006317138671875]\n",
"b'gpt_neox.layers.19.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.20.input_layernorm.weight -> gpt_neox.layers.20.input_layernorm.weight\n",
"gpt_neox.layers.20.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.97265625, 1.0390625, 1.015625]\n",
"b'gpt_neox.layers.20.input_layernorm.weight'\n",
"gpt_neox.layers.20.input_layernorm.bias -> gpt_neox.layers.20.input_layernorm.bias\n",
"gpt_neox.layers.20.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.005126953125, 0.031982421875, 0.0203857421875]\n",
"b'gpt_neox.layers.20.input_layernorm.bias'\n",
"gpt_neox.layers.20.post_attention_layernorm.weight -> gpt_neox.layers.20.post_attention_layernorm.weight\n",
"gpt_neox.layers.20.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0078125, 0.98046875, 0.97265625]\n",
"b'gpt_neox.layers.20.post_attention_layernorm.weight'\n",
"gpt_neox.layers.20.post_attention_layernorm.bias -> gpt_neox.layers.20.post_attention_layernorm.bias\n",
"gpt_neox.layers.20.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.058349609375, -0.003326416015625, 0.006591796875]\n",
"b'gpt_neox.layers.20.post_attention_layernorm.bias'\n",
"gpt_neox.layers.20.attention.query_key_value.weight -> gpt_neox.layers.20.attention.query_key_value.weight\n",
"gpt_neox.layers.20.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0185546875, -0.0126953125, -0.03515625], [0.0031280517578125, -0.004669189453125, -0.0145263671875], [0.003173828125, -0.0166015625, -0.01904296875]]\n",
"b'gpt_neox.layers.20.attention.query_key_value.weight'\n",
"gpt_neox.layers.20.attention.query_key_value.bias -> gpt_neox.layers.20.attention.query_key_value.bias\n",
"gpt_neox.layers.20.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0034332275390625, -0.00994873046875, -0.004364013671875]\n",
"b'gpt_neox.layers.20.attention.query_key_value.bias'\n",
"gpt_neox.layers.20.attention.dense.weight -> gpt_neox.layers.20.attention.dense.weight\n",
"gpt_neox.layers.20.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.036376953125, -0.00250244140625, 0.0038909912109375], [-0.00830078125, 0.00677490234375, 0.03759765625], [0.021484375, -0.0216064453125, -0.0400390625]]\n",
"b'gpt_neox.layers.20.attention.dense.weight'\n",
"gpt_neox.layers.20.attention.dense.bias -> gpt_neox.layers.20.attention.dense.bias\n",
"gpt_neox.layers.20.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0341796875, 0.00567626953125, 0.00970458984375]\n",
"b'gpt_neox.layers.20.attention.dense.bias'\n",
"gpt_neox.layers.20.mlp.dense_h_to_4h.weight -> gpt_neox.layers.20.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.20.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.0146484375, -0.00421142578125, 0.010498046875], [0.06103515625, 0.028564453125, -0.04150390625], [-0.0274658203125, -0.01141357421875, 0.01318359375]]\n",
"b'gpt_neox.layers.20.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.20.mlp.dense_h_to_4h.bias -> gpt_neox.layers.20.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.20.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0036468505859375, -0.001556396484375, -0.0107421875]\n",
"b'gpt_neox.layers.20.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.20.mlp.dense_4h_to_h.weight -> gpt_neox.layers.20.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.20.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.01080322265625, -0.00433349609375, -0.0186767578125], [0.0057373046875, 0.046630859375, 0.00927734375], [0.023681640625, 0.030029296875, 0.02880859375]]\n",
"b'gpt_neox.layers.20.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.20.mlp.dense_4h_to_h.bias -> gpt_neox.layers.20.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.20.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0196533203125, 0.0029449462890625, 0.00799560546875]\n",
"b'gpt_neox.layers.20.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.21.input_layernorm.weight -> gpt_neox.layers.21.input_layernorm.weight\n",
"gpt_neox.layers.21.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0625, 1.0703125, 1.0546875]\n",
"b'gpt_neox.layers.21.input_layernorm.weight'\n",
"gpt_neox.layers.21.input_layernorm.bias -> gpt_neox.layers.21.input_layernorm.bias\n",
"gpt_neox.layers.21.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.00031280517578125, 0.0281982421875, 0.01324462890625]\n",
"b'gpt_neox.layers.21.input_layernorm.bias'\n",
"gpt_neox.layers.21.post_attention_layernorm.weight -> gpt_neox.layers.21.post_attention_layernorm.weight\n",
"gpt_neox.layers.21.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0, 1.0, 0.9921875]\n",
"b'gpt_neox.layers.21.post_attention_layernorm.weight'\n",
"gpt_neox.layers.21.post_attention_layernorm.bias -> gpt_neox.layers.21.post_attention_layernorm.bias\n",
"gpt_neox.layers.21.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01708984375, -0.021240234375, 0.010498046875]\n",
"b'gpt_neox.layers.21.post_attention_layernorm.bias'\n",
"gpt_neox.layers.21.attention.query_key_value.weight -> gpt_neox.layers.21.attention.query_key_value.weight\n",
"gpt_neox.layers.21.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.005035400390625, 0.001068115234375, -0.0010528564453125], [-0.021240234375, 0.0303955078125, -0.045654296875], [0.03857421875, -0.001922607421875, -0.0020904541015625]]\n",
"b'gpt_neox.layers.21.attention.query_key_value.weight'\n",
"gpt_neox.layers.21.attention.query_key_value.bias -> gpt_neox.layers.21.attention.query_key_value.bias\n",
"gpt_neox.layers.21.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.00089263916015625, 0.0054931640625, 0.003448486328125]\n",
"b'gpt_neox.layers.21.attention.query_key_value.bias'\n",
"gpt_neox.layers.21.attention.dense.weight -> gpt_neox.layers.21.attention.dense.weight\n",
"gpt_neox.layers.21.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.03955078125, -0.0184326171875, 0.0021820068359375], [0.0016937255859375, 0.0177001953125, 0.005218505859375], [0.0186767578125, 0.01251220703125, 0.035400390625]]\n",
"b'gpt_neox.layers.21.attention.dense.weight'\n",
"gpt_neox.layers.21.attention.dense.bias -> gpt_neox.layers.21.attention.dense.bias\n",
"gpt_neox.layers.21.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0277099609375, 0.004302978515625, 0.0162353515625]\n",
"b'gpt_neox.layers.21.attention.dense.bias'\n",
"gpt_neox.layers.21.mlp.dense_h_to_4h.weight -> gpt_neox.layers.21.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.21.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.06396484375, 0.05712890625, -0.07470703125], [0.006439208984375, 0.00689697265625, -0.0225830078125], [-0.019287109375, -0.008056640625, -0.036376953125]]\n",
"b'gpt_neox.layers.21.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.21.mlp.dense_h_to_4h.bias -> gpt_neox.layers.21.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.21.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.01116943359375, -0.0067138671875, -0.0106201171875]\n",
"b'gpt_neox.layers.21.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.21.mlp.dense_4h_to_h.weight -> gpt_neox.layers.21.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.21.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.05126953125, -0.0174560546875, 0.0067138671875], [0.034423828125, -0.025390625, 0.0035552978515625], [-0.0281982421875, 0.037109375, -0.0206298828125]]\n",
"b'gpt_neox.layers.21.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.21.mlp.dense_4h_to_h.bias -> gpt_neox.layers.21.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.21.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.021484375, 0.005096435546875, 0.01422119140625]\n",
"b'gpt_neox.layers.21.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.22.input_layernorm.weight -> gpt_neox.layers.22.input_layernorm.weight\n",
"gpt_neox.layers.22.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9765625, 1.015625, 1.0234375]\n",
"b'gpt_neox.layers.22.input_layernorm.weight'\n",
"gpt_neox.layers.22.input_layernorm.bias -> gpt_neox.layers.22.input_layernorm.bias\n",
"gpt_neox.layers.22.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0106201171875, 0.034423828125, 0.0166015625]\n",
"b'gpt_neox.layers.22.input_layernorm.bias'\n",
"gpt_neox.layers.22.post_attention_layernorm.weight -> gpt_neox.layers.22.post_attention_layernorm.weight\n",
"gpt_neox.layers.22.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.015625, 1.0234375, 1.015625]\n",
"b'gpt_neox.layers.22.post_attention_layernorm.weight'\n",
"gpt_neox.layers.22.post_attention_layernorm.bias -> gpt_neox.layers.22.post_attention_layernorm.bias\n",
"gpt_neox.layers.22.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0220947265625, -0.0201416015625, 0.028076171875]\n",
"b'gpt_neox.layers.22.post_attention_layernorm.bias'\n",
"gpt_neox.layers.22.attention.query_key_value.weight -> gpt_neox.layers.22.attention.query_key_value.weight\n",
"gpt_neox.layers.22.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.03564453125, 0.0294189453125, -0.040283203125], [0.01336669921875, -0.00677490234375, 0.01483154296875], [0.0216064453125, 0.00360107421875, 0.037841796875]]\n",
"b'gpt_neox.layers.22.attention.query_key_value.weight'\n",
"gpt_neox.layers.22.attention.query_key_value.bias -> gpt_neox.layers.22.attention.query_key_value.bias\n",
"gpt_neox.layers.22.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0026702880859375, -0.0091552734375, -0.00531005859375]\n",
"b'gpt_neox.layers.22.attention.query_key_value.bias'\n",
"gpt_neox.layers.22.attention.dense.weight -> gpt_neox.layers.22.attention.dense.weight\n",
"gpt_neox.layers.22.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.036376953125, 0.036865234375, 0.0194091796875], [-0.006927490234375, -0.0177001953125, -0.06396484375], [0.0103759765625, 0.005340576171875, -0.052734375]]\n",
"b'gpt_neox.layers.22.attention.dense.weight'\n",
"gpt_neox.layers.22.attention.dense.bias -> gpt_neox.layers.22.attention.dense.bias\n",
"gpt_neox.layers.22.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0296630859375, 0.003997802734375, 0.017822265625]\n",
"b'gpt_neox.layers.22.attention.dense.bias'\n",
"gpt_neox.layers.22.mlp.dense_h_to_4h.weight -> gpt_neox.layers.22.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.22.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.044189453125, 0.0194091796875, -0.00860595703125], [-0.02294921875, -0.0252685546875, -0.08349609375], [-0.04150390625, -0.034912109375, 0.01397705078125]]\n",
"b'gpt_neox.layers.22.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.22.mlp.dense_h_to_4h.bias -> gpt_neox.layers.22.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.22.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.00726318359375, -0.001953125, -0.0048828125]\n",
"b'gpt_neox.layers.22.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.22.mlp.dense_4h_to_h.weight -> gpt_neox.layers.22.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.22.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.06689453125, 0.025146484375, 0.00014591217041015625], [-0.0140380859375, 0.0206298828125, -0.010986328125], [-0.0439453125, -0.0017242431640625, -0.003509521484375]]\n",
"b'gpt_neox.layers.22.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.22.mlp.dense_4h_to_h.bias -> gpt_neox.layers.22.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.22.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.020751953125, 0.0027923583984375, 0.0087890625]\n",
"b'gpt_neox.layers.22.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.23.input_layernorm.weight -> gpt_neox.layers.23.input_layernorm.weight\n",
"gpt_neox.layers.23.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [0.9921875, 1.0390625, 1.015625]\n",
"b'gpt_neox.layers.23.input_layernorm.weight'\n",
"gpt_neox.layers.23.input_layernorm.bias -> gpt_neox.layers.23.input_layernorm.bias\n",
"gpt_neox.layers.23.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0087890625, 0.0189208984375, 0.01300048828125]\n",
"b'gpt_neox.layers.23.input_layernorm.bias'\n",
"gpt_neox.layers.23.post_attention_layernorm.weight -> gpt_neox.layers.23.post_attention_layernorm.weight\n",
"gpt_neox.layers.23.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0234375, 1.03125, 1.0]\n",
"b'gpt_neox.layers.23.post_attention_layernorm.weight'\n",
"gpt_neox.layers.23.post_attention_layernorm.bias -> gpt_neox.layers.23.post_attention_layernorm.bias\n",
"gpt_neox.layers.23.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.040771484375, -0.0189208984375, 0.03271484375]\n",
"b'gpt_neox.layers.23.post_attention_layernorm.bias'\n",
"gpt_neox.layers.23.attention.query_key_value.weight -> gpt_neox.layers.23.attention.query_key_value.weight\n",
"gpt_neox.layers.23.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0250244140625, 0.00677490234375, -0.033935546875], [-0.015869140625, -0.07177734375, -0.0203857421875], [0.03271484375, 0.04345703125, -0.0025634765625]]\n",
"b'gpt_neox.layers.23.attention.query_key_value.weight'\n",
"gpt_neox.layers.23.attention.query_key_value.bias -> gpt_neox.layers.23.attention.query_key_value.bias\n",
"gpt_neox.layers.23.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.004608154296875, 0.0025177001953125, 0.021484375]\n",
"b'gpt_neox.layers.23.attention.query_key_value.bias'\n",
"gpt_neox.layers.23.attention.dense.weight -> gpt_neox.layers.23.attention.dense.weight\n",
"gpt_neox.layers.23.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.00144195556640625, 0.049560546875, 0.0194091796875], [-0.00408935546875, 0.0211181640625, -0.0303955078125], [-0.02197265625, -0.045654296875, 0.037353515625]]\n",
"b'gpt_neox.layers.23.attention.dense.weight'\n",
"gpt_neox.layers.23.attention.dense.bias -> gpt_neox.layers.23.attention.dense.bias\n",
"gpt_neox.layers.23.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.031005859375, 0.006195068359375, 0.0164794921875]\n",
"b'gpt_neox.layers.23.attention.dense.bias'\n",
"gpt_neox.layers.23.mlp.dense_h_to_4h.weight -> gpt_neox.layers.23.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.23.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.01904296875, -0.009033203125, 0.0223388671875], [-0.007476806640625, -0.004180908203125, -0.037109375], [-0.01336669921875, 0.00186920166015625, 0.058349609375]]\n",
"b'gpt_neox.layers.23.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.23.mlp.dense_h_to_4h.bias -> gpt_neox.layers.23.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.23.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.011962890625, -0.0145263671875, -0.01220703125]\n",
"b'gpt_neox.layers.23.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.23.mlp.dense_4h_to_h.weight -> gpt_neox.layers.23.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.23.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.05322265625, 0.025634765625, -0.023193359375], [0.00787353515625, -0.0167236328125, -0.002532958984375], [-0.03271484375, 0.037841796875, 0.033935546875]]\n",
"b'gpt_neox.layers.23.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.23.mlp.dense_4h_to_h.bias -> gpt_neox.layers.23.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.23.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.018798828125, 0.0062255859375, 0.00555419921875]\n",
"b'gpt_neox.layers.23.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.24.input_layernorm.weight -> gpt_neox.layers.24.input_layernorm.weight\n",
"gpt_neox.layers.24.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.03125, 1.0703125, 1.046875]\n",
"b'gpt_neox.layers.24.input_layernorm.weight'\n",
"gpt_neox.layers.24.input_layernorm.bias -> gpt_neox.layers.24.input_layernorm.bias\n",
"gpt_neox.layers.24.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.01385498046875, 0.0284423828125, 0.03564453125]\n",
"b'gpt_neox.layers.24.input_layernorm.bias'\n",
"gpt_neox.layers.24.post_attention_layernorm.weight -> gpt_neox.layers.24.post_attention_layernorm.weight\n",
"gpt_neox.layers.24.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.03125, 1.0546875, 1.015625]\n",
"b'gpt_neox.layers.24.post_attention_layernorm.weight'\n",
"gpt_neox.layers.24.post_attention_layernorm.bias -> gpt_neox.layers.24.post_attention_layernorm.bias\n",
"gpt_neox.layers.24.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.059814453125, -0.02490234375, 0.035400390625]\n",
"b'gpt_neox.layers.24.post_attention_layernorm.bias'\n",
"gpt_neox.layers.24.attention.query_key_value.weight -> gpt_neox.layers.24.attention.query_key_value.weight\n",
"gpt_neox.layers.24.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0162353515625, 0.0220947265625, -0.05859375], [-0.0179443359375, -0.0242919921875, 0.0244140625], [0.01953125, 0.00732421875, -0.03173828125]]\n",
"b'gpt_neox.layers.24.attention.query_key_value.weight'\n",
"gpt_neox.layers.24.attention.query_key_value.bias -> gpt_neox.layers.24.attention.query_key_value.bias\n",
"gpt_neox.layers.24.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.0137939453125, -0.001312255859375, -0.0030517578125]\n",
"b'gpt_neox.layers.24.attention.query_key_value.bias'\n",
"gpt_neox.layers.24.attention.dense.weight -> gpt_neox.layers.24.attention.dense.weight\n",
"gpt_neox.layers.24.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.01513671875, -0.04248046875, 0.08251953125], [0.00165557861328125, 0.004119873046875, 0.034912109375], [0.0196533203125, 0.038818359375, 0.00689697265625]]\n",
"b'gpt_neox.layers.24.attention.dense.weight'\n",
"gpt_neox.layers.24.attention.dense.bias -> gpt_neox.layers.24.attention.dense.bias\n",
"gpt_neox.layers.24.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0308837890625, 0.00189208984375, 0.010498046875]\n",
"b'gpt_neox.layers.24.attention.dense.bias'\n",
"gpt_neox.layers.24.mlp.dense_h_to_4h.weight -> gpt_neox.layers.24.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.24.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.046875, 0.031982421875, -0.00762939453125], [0.060791015625, -0.03857421875, 0.04638671875], [0.028564453125, 0.0186767578125, -0.0245361328125]]\n",
"b'gpt_neox.layers.24.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.24.mlp.dense_h_to_4h.bias -> gpt_neox.layers.24.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.24.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.012451171875, -0.01202392578125, -0.0162353515625]\n",
"b'gpt_neox.layers.24.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.24.mlp.dense_4h_to_h.weight -> gpt_neox.layers.24.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.24.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.054931640625, -0.032958984375, -0.0283203125], [0.038330078125, -0.01275634765625, -0.008056640625], [-0.048828125, -0.0302734375, 0.06640625]]\n",
"b'gpt_neox.layers.24.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.24.mlp.dense_4h_to_h.bias -> gpt_neox.layers.24.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.24.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.00775146484375, 0.0021820068359375, -0.00408935546875]\n",
"b'gpt_neox.layers.24.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.25.input_layernorm.weight -> gpt_neox.layers.25.input_layernorm.weight\n",
"gpt_neox.layers.25.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.03125, 1.0703125, 1.0546875]\n",
"b'gpt_neox.layers.25.input_layernorm.weight'\n",
"gpt_neox.layers.25.input_layernorm.bias -> gpt_neox.layers.25.input_layernorm.bias\n",
"gpt_neox.layers.25.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.002777099609375, 0.0341796875, 0.0185546875]\n",
"b'gpt_neox.layers.25.input_layernorm.bias'\n",
"gpt_neox.layers.25.post_attention_layernorm.weight -> gpt_neox.layers.25.post_attention_layernorm.weight\n",
"gpt_neox.layers.25.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.078125, 1.0625, 1.015625]\n",
"b'gpt_neox.layers.25.post_attention_layernorm.weight'\n",
"gpt_neox.layers.25.post_attention_layernorm.bias -> gpt_neox.layers.25.post_attention_layernorm.bias\n",
"gpt_neox.layers.25.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03369140625, -0.006927490234375, 0.0263671875]\n",
"b'gpt_neox.layers.25.post_attention_layernorm.bias'\n",
"gpt_neox.layers.25.attention.query_key_value.weight -> gpt_neox.layers.25.attention.query_key_value.weight\n",
"gpt_neox.layers.25.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.00604248046875, 0.01043701171875, -0.054931640625], [-0.01513671875, 0.0306396484375, -0.000614166259765625], [0.0281982421875, 0.01556396484375, -0.006011962890625]]\n",
"b'gpt_neox.layers.25.attention.query_key_value.weight'\n",
"gpt_neox.layers.25.attention.query_key_value.bias -> gpt_neox.layers.25.attention.query_key_value.bias\n",
"gpt_neox.layers.25.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.0240478515625, 0.0103759765625, 0.00116729736328125]\n",
"b'gpt_neox.layers.25.attention.query_key_value.bias'\n",
"gpt_neox.layers.25.attention.dense.weight -> gpt_neox.layers.25.attention.dense.weight\n",
"gpt_neox.layers.25.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.01129150390625, 0.0196533203125, 0.0036468505859375], [-0.00244140625, -0.0159912109375, -0.0380859375], [-0.025634765625, 0.0174560546875, -0.01068115234375]]\n",
"b'gpt_neox.layers.25.attention.dense.weight'\n",
"gpt_neox.layers.25.attention.dense.bias -> gpt_neox.layers.25.attention.dense.bias\n",
"gpt_neox.layers.25.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0162353515625, 0.00124359130859375, 0.003265380859375]\n",
"b'gpt_neox.layers.25.attention.dense.bias'\n",
"gpt_neox.layers.25.mlp.dense_h_to_4h.weight -> gpt_neox.layers.25.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.25.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.016845703125, -0.045166015625, 0.05224609375], [0.03466796875, 0.008544921875, -0.04443359375], [0.01275634765625, 0.0263671875, -0.10693359375]]\n",
"b'gpt_neox.layers.25.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.25.mlp.dense_h_to_4h.bias -> gpt_neox.layers.25.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.25.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.013427734375, -0.0120849609375, -0.009765625]\n",
"b'gpt_neox.layers.25.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.25.mlp.dense_4h_to_h.weight -> gpt_neox.layers.25.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.25.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.058837890625, 0.062255859375, -0.0012664794921875], [-0.01806640625, 0.006439208984375, -0.0042724609375], [0.045654296875, 0.0218505859375, -0.01104736328125]]\n",
"b'gpt_neox.layers.25.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.25.mlp.dense_4h_to_h.bias -> gpt_neox.layers.25.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.25.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.006988525390625, -0.000568389892578125, -0.007110595703125]\n",
"b'gpt_neox.layers.25.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.26.input_layernorm.weight -> gpt_neox.layers.26.input_layernorm.weight\n",
"gpt_neox.layers.26.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.015625, 1.078125, 1.0390625]\n",
"b'gpt_neox.layers.26.input_layernorm.weight'\n",
"gpt_neox.layers.26.input_layernorm.bias -> gpt_neox.layers.26.input_layernorm.bias\n",
"gpt_neox.layers.26.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.007659912109375, 0.0257568359375, 0.006744384765625]\n",
"b'gpt_neox.layers.26.input_layernorm.bias'\n",
"gpt_neox.layers.26.post_attention_layernorm.weight -> gpt_neox.layers.26.post_attention_layernorm.weight\n",
"gpt_neox.layers.26.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0859375, 1.09375, 1.0546875]\n",
"b'gpt_neox.layers.26.post_attention_layernorm.weight'\n",
"gpt_neox.layers.26.post_attention_layernorm.bias -> gpt_neox.layers.26.post_attention_layernorm.bias\n",
"gpt_neox.layers.26.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.05029296875, -0.022705078125, 0.0281982421875]\n",
"b'gpt_neox.layers.26.post_attention_layernorm.bias'\n",
"gpt_neox.layers.26.attention.query_key_value.weight -> gpt_neox.layers.26.attention.query_key_value.weight\n",
"gpt_neox.layers.26.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0167236328125, -0.0291748046875, 0.0196533203125], [-0.001190185546875, 0.005584716796875, -0.0216064453125], [-0.03662109375, 0.00677490234375, 0.00921630859375]]\n",
"b'gpt_neox.layers.26.attention.query_key_value.weight'\n",
"gpt_neox.layers.26.attention.query_key_value.bias -> gpt_neox.layers.26.attention.query_key_value.bias\n",
"gpt_neox.layers.26.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.004852294921875, -0.0103759765625, 0.00164031982421875]\n",
"b'gpt_neox.layers.26.attention.query_key_value.bias'\n",
"gpt_neox.layers.26.attention.dense.weight -> gpt_neox.layers.26.attention.dense.weight\n",
"gpt_neox.layers.26.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.0380859375, 0.033447265625, 0.02587890625], [-0.017578125, 0.01708984375, -0.00616455078125], [0.0096435546875, -0.009033203125, -0.032470703125]]\n",
"b'gpt_neox.layers.26.attention.dense.weight'\n",
"gpt_neox.layers.26.attention.dense.bias -> gpt_neox.layers.26.attention.dense.bias\n",
"gpt_neox.layers.26.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0185546875, -0.0011138916015625, 0.007598876953125]\n",
"b'gpt_neox.layers.26.attention.dense.bias'\n",
"gpt_neox.layers.26.mlp.dense_h_to_4h.weight -> gpt_neox.layers.26.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.26.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.07275390625, 0.01397705078125, -0.030029296875], [0.0306396484375, 0.0181884765625, 0.021240234375], [0.00244140625, -0.043212890625, 0.00174713134765625]]\n",
"b'gpt_neox.layers.26.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.26.mlp.dense_h_to_4h.bias -> gpt_neox.layers.26.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.26.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.007781982421875, -0.013671875, -0.01287841796875]\n",
"b'gpt_neox.layers.26.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.26.mlp.dense_4h_to_h.weight -> gpt_neox.layers.26.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.26.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.011962890625, 0.0026397705078125, 0.03125], [0.01324462890625, -0.08154296875, -0.05810546875], [-0.021728515625, -0.037353515625, 0.0037841796875]]\n",
"b'gpt_neox.layers.26.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.26.mlp.dense_4h_to_h.bias -> gpt_neox.layers.26.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.26.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0037384033203125, -0.00046539306640625, -0.00341796875]\n",
"b'gpt_neox.layers.26.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.27.input_layernorm.weight -> gpt_neox.layers.27.input_layernorm.weight\n",
"gpt_neox.layers.27.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.078125, 1.1015625, 1.1171875]\n",
"b'gpt_neox.layers.27.input_layernorm.weight'\n",
"gpt_neox.layers.27.input_layernorm.bias -> gpt_neox.layers.27.input_layernorm.bias\n",
"gpt_neox.layers.27.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.006011962890625, 0.011474609375, 0.000946044921875]\n",
"b'gpt_neox.layers.27.input_layernorm.bias'\n",
"gpt_neox.layers.27.post_attention_layernorm.weight -> gpt_neox.layers.27.post_attention_layernorm.weight\n",
"gpt_neox.layers.27.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.09375, 1.1171875, 1.1015625]\n",
"b'gpt_neox.layers.27.post_attention_layernorm.weight'\n",
"gpt_neox.layers.27.post_attention_layernorm.bias -> gpt_neox.layers.27.post_attention_layernorm.bias\n",
"gpt_neox.layers.27.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.04638671875, 0.006439208984375, 0.04150390625]\n",
"b'gpt_neox.layers.27.post_attention_layernorm.bias'\n",
"gpt_neox.layers.27.attention.query_key_value.weight -> gpt_neox.layers.27.attention.query_key_value.weight\n",
"gpt_neox.layers.27.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0301513671875, -0.0030517578125, 0.01434326171875], [0.0198974609375, 0.004486083984375, -0.01153564453125], [0.041259765625, -0.0361328125, 0.034912109375]]\n",
"b'gpt_neox.layers.27.attention.query_key_value.weight'\n",
"gpt_neox.layers.27.attention.query_key_value.bias -> gpt_neox.layers.27.attention.query_key_value.bias\n",
"gpt_neox.layers.27.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.01190185546875, 3.4809112548828125e-05, -0.00396728515625]\n",
"b'gpt_neox.layers.27.attention.query_key_value.bias'\n",
"gpt_neox.layers.27.attention.dense.weight -> gpt_neox.layers.27.attention.dense.weight\n",
"gpt_neox.layers.27.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.01043701171875, -0.06298828125, 0.0133056640625], [-0.05615234375, 0.0264892578125, 0.0054931640625], [0.015625, 0.034912109375, -0.025390625]]\n",
"b'gpt_neox.layers.27.attention.dense.weight'\n",
"gpt_neox.layers.27.attention.dense.bias -> gpt_neox.layers.27.attention.dense.bias\n",
"gpt_neox.layers.27.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.01202392578125, -0.00030517578125, 0.0004749298095703125]\n",
"b'gpt_neox.layers.27.attention.dense.bias'\n",
"gpt_neox.layers.27.mlp.dense_h_to_4h.weight -> gpt_neox.layers.27.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.27.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.022705078125, -0.035888671875, -0.0712890625], [0.03076171875, -0.06396484375, -0.00921630859375], [-0.0947265625, 0.048095703125, 0.044189453125]]\n",
"b'gpt_neox.layers.27.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.27.mlp.dense_h_to_4h.bias -> gpt_neox.layers.27.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.27.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.01031494140625, -0.007110595703125, -0.01611328125]\n",
"b'gpt_neox.layers.27.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.27.mlp.dense_4h_to_h.weight -> gpt_neox.layers.27.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.27.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.002105712890625, -0.045654296875, 0.07373046875], [-0.0084228515625, -0.0341796875, 0.046142578125], [0.016845703125, 0.0157470703125, 0.0225830078125]]\n",
"b'gpt_neox.layers.27.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.27.mlp.dense_4h_to_h.bias -> gpt_neox.layers.27.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.27.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.008056640625, -0.00726318359375, -0.0081787109375]\n",
"b'gpt_neox.layers.27.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.28.input_layernorm.weight -> gpt_neox.layers.28.input_layernorm.weight\n",
"gpt_neox.layers.28.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.0078125, 1.0703125, 1.09375]\n",
"b'gpt_neox.layers.28.input_layernorm.weight'\n",
"gpt_neox.layers.28.input_layernorm.bias -> gpt_neox.layers.28.input_layernorm.bias\n",
"gpt_neox.layers.28.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0262451171875, 0.00909423828125, 0.00848388671875]\n",
"b'gpt_neox.layers.28.input_layernorm.bias'\n",
"gpt_neox.layers.28.post_attention_layernorm.weight -> gpt_neox.layers.28.post_attention_layernorm.weight\n",
"gpt_neox.layers.28.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.109375, 1.140625, 1.109375]\n",
"b'gpt_neox.layers.28.post_attention_layernorm.weight'\n",
"gpt_neox.layers.28.post_attention_layernorm.bias -> gpt_neox.layers.28.post_attention_layernorm.bias\n",
"gpt_neox.layers.28.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.025634765625, -0.0002079010009765625, -0.00165557861328125]\n",
"b'gpt_neox.layers.28.post_attention_layernorm.bias'\n",
"gpt_neox.layers.28.attention.query_key_value.weight -> gpt_neox.layers.28.attention.query_key_value.weight\n",
"gpt_neox.layers.28.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0247802734375, -0.020263671875, -0.0400390625], [0.00897216796875, 0.0050048828125, 0.00909423828125], [0.036376953125, 0.0042724609375, 0.01226806640625]]\n",
"b'gpt_neox.layers.28.attention.query_key_value.weight'\n",
"gpt_neox.layers.28.attention.query_key_value.bias -> gpt_neox.layers.28.attention.query_key_value.bias\n",
"gpt_neox.layers.28.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.01055908203125, -0.01171875, 0.0107421875]\n",
"b'gpt_neox.layers.28.attention.query_key_value.bias'\n",
"gpt_neox.layers.28.attention.dense.weight -> gpt_neox.layers.28.attention.dense.weight\n",
"gpt_neox.layers.28.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.05712890625, 0.022705078125, -0.0289306640625], [0.0017547607421875, 0.041015625, 0.01904296875], [0.0556640625, -0.0035552978515625, 0.0208740234375]]\n",
"b'gpt_neox.layers.28.attention.dense.weight'\n",
"gpt_neox.layers.28.attention.dense.bias -> gpt_neox.layers.28.attention.dense.bias\n",
"gpt_neox.layers.28.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.004638671875, -0.0052490234375, -0.002593994140625]\n",
"b'gpt_neox.layers.28.attention.dense.bias'\n",
"gpt_neox.layers.28.mlp.dense_h_to_4h.weight -> gpt_neox.layers.28.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.28.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.0035552978515625, -0.01287841796875, -0.08154296875], [-0.00537109375, -0.0625, -0.00933837890625], [-0.030029296875, 0.060302734375, 0.00506591796875]]\n",
"b'gpt_neox.layers.28.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.28.mlp.dense_h_to_4h.bias -> gpt_neox.layers.28.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.28.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.019287109375, -0.0159912109375, -0.00738525390625]\n",
"b'gpt_neox.layers.28.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.28.mlp.dense_4h_to_h.weight -> gpt_neox.layers.28.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.28.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.00531005859375, 0.0301513671875, -0.02392578125], [-0.037353515625, 0.056396484375, 0.0118408203125], [0.015869140625, -0.04150390625, -0.078125]]\n",
"b'gpt_neox.layers.28.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.28.mlp.dense_4h_to_h.bias -> gpt_neox.layers.28.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.28.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0084228515625, -0.00799560546875, -0.00885009765625]\n",
"b'gpt_neox.layers.28.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.29.input_layernorm.weight -> gpt_neox.layers.29.input_layernorm.weight\n",
"gpt_neox.layers.29.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.09375, 1.1171875, 1.109375]\n",
"b'gpt_neox.layers.29.input_layernorm.weight'\n",
"gpt_neox.layers.29.input_layernorm.bias -> gpt_neox.layers.29.input_layernorm.bias\n",
"gpt_neox.layers.29.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.032470703125, 0.004608154296875, 0.01483154296875]\n",
"b'gpt_neox.layers.29.input_layernorm.bias'\n",
"gpt_neox.layers.29.post_attention_layernorm.weight -> gpt_neox.layers.29.post_attention_layernorm.weight\n",
"gpt_neox.layers.29.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.15625, 1.1484375, 1.109375]\n",
"b'gpt_neox.layers.29.post_attention_layernorm.weight'\n",
"gpt_neox.layers.29.post_attention_layernorm.bias -> gpt_neox.layers.29.post_attention_layernorm.bias\n",
"gpt_neox.layers.29.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0419921875, -0.005523681640625, 0.02587890625]\n",
"b'gpt_neox.layers.29.post_attention_layernorm.bias'\n",
"gpt_neox.layers.29.attention.query_key_value.weight -> gpt_neox.layers.29.attention.query_key_value.weight\n",
"gpt_neox.layers.29.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.0224609375, -0.0272216796875, -0.005859375], [-0.022705078125, -0.033203125, 0.046875], [0.004669189453125, -0.02685546875, 0.00012302398681640625]]\n",
"b'gpt_neox.layers.29.attention.query_key_value.weight'\n",
"gpt_neox.layers.29.attention.query_key_value.bias -> gpt_neox.layers.29.attention.query_key_value.bias\n",
"gpt_neox.layers.29.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.006317138671875, 0.000385284423828125, -0.00150299072265625]\n",
"b'gpt_neox.layers.29.attention.query_key_value.bias'\n",
"gpt_neox.layers.29.attention.dense.weight -> gpt_neox.layers.29.attention.dense.weight\n",
"gpt_neox.layers.29.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.004669189453125, -0.022705078125, 0.03271484375], [0.0252685546875, -0.05517578125, -0.04248046875], [0.0302734375, -0.042724609375, -0.01470947265625]]\n",
"b'gpt_neox.layers.29.attention.dense.weight'\n",
"gpt_neox.layers.29.attention.dense.bias -> gpt_neox.layers.29.attention.dense.bias\n",
"gpt_neox.layers.29.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0024566650390625, -0.00555419921875, -0.0024566650390625]\n",
"b'gpt_neox.layers.29.attention.dense.bias'\n",
"gpt_neox.layers.29.mlp.dense_h_to_4h.weight -> gpt_neox.layers.29.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.29.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.026611328125, 0.046142578125, -0.0625], [0.05078125, -0.00433349609375, 0.033203125], [-0.005096435546875, -0.0634765625, 0.0157470703125]]\n",
"b'gpt_neox.layers.29.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.29.mlp.dense_h_to_4h.bias -> gpt_neox.layers.29.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.29.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.01470947265625, -0.01409912109375, -0.007232666015625]\n",
"b'gpt_neox.layers.29.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.29.mlp.dense_4h_to_h.weight -> gpt_neox.layers.29.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.29.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.064453125, -0.01031494140625, -0.04345703125], [-0.00439453125, 0.0208740234375, 0.0035247802734375], [0.03515625, 0.031005859375, -0.0262451171875]]\n",
"b'gpt_neox.layers.29.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.29.mlp.dense_4h_to_h.bias -> gpt_neox.layers.29.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.29.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.00994873046875, -0.007080078125, -0.01171875]\n",
"b'gpt_neox.layers.29.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.30.input_layernorm.weight -> gpt_neox.layers.30.input_layernorm.weight\n",
"gpt_neox.layers.30.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.1015625, 1.1328125, 1.1484375]\n",
"b'gpt_neox.layers.30.input_layernorm.weight'\n",
"gpt_neox.layers.30.input_layernorm.bias -> gpt_neox.layers.30.input_layernorm.bias\n",
"gpt_neox.layers.30.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.024658203125, 0.0030364990234375, -0.00592041015625]\n",
"b'gpt_neox.layers.30.input_layernorm.bias'\n",
"gpt_neox.layers.30.post_attention_layernorm.weight -> gpt_neox.layers.30.post_attention_layernorm.weight\n",
"gpt_neox.layers.30.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.1796875, 1.1796875, 1.15625]\n",
"b'gpt_neox.layers.30.post_attention_layernorm.weight'\n",
"gpt_neox.layers.30.post_attention_layernorm.bias -> gpt_neox.layers.30.post_attention_layernorm.bias\n",
"gpt_neox.layers.30.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.048583984375, 0.00494384765625, 0.01165771484375]\n",
"b'gpt_neox.layers.30.post_attention_layernorm.bias'\n",
"gpt_neox.layers.30.attention.query_key_value.weight -> gpt_neox.layers.30.attention.query_key_value.weight\n",
"gpt_neox.layers.30.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.015625, -0.0498046875, -0.04541015625], [-0.003875732421875, -0.039306640625, -0.043701171875], [-0.01220703125, 0.010009765625, -0.046630859375]]\n",
"b'gpt_neox.layers.30.attention.query_key_value.weight'\n",
"gpt_neox.layers.30.attention.query_key_value.bias -> gpt_neox.layers.30.attention.query_key_value.bias\n",
"gpt_neox.layers.30.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.000659942626953125, -0.0130615234375, 0.0196533203125]\n",
"b'gpt_neox.layers.30.attention.query_key_value.bias'\n",
"gpt_neox.layers.30.attention.dense.weight -> gpt_neox.layers.30.attention.dense.weight\n",
"gpt_neox.layers.30.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.017333984375, 0.0069580078125, -0.02294921875], [-0.0242919921875, -0.00982666015625, -0.0267333984375], [0.01092529296875, -0.0264892578125, -0.00148773193359375]]\n",
"b'gpt_neox.layers.30.attention.dense.weight'\n",
"gpt_neox.layers.30.attention.dense.bias -> gpt_neox.layers.30.attention.dense.bias\n",
"gpt_neox.layers.30.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.00445556640625, -0.00543212890625, -0.0012969970703125]\n",
"b'gpt_neox.layers.30.attention.dense.bias'\n",
"gpt_neox.layers.30.mlp.dense_h_to_4h.weight -> gpt_neox.layers.30.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.30.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.0036773681640625, 0.007232666015625, -0.00897216796875], [-0.002044677734375, -0.0108642578125, -0.02783203125], [-0.05126953125, -0.0673828125, 0.0164794921875]]\n",
"b'gpt_neox.layers.30.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.30.mlp.dense_h_to_4h.bias -> gpt_neox.layers.30.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.30.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.009765625, -0.0167236328125, -0.018310546875]\n",
"b'gpt_neox.layers.30.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.30.mlp.dense_4h_to_h.weight -> gpt_neox.layers.30.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.30.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.07080078125, 0.008056640625, -0.00958251953125], [-0.07177734375, 0.038330078125, 0.05078125], [0.04248046875, 0.04931640625, 0.0281982421875]]\n",
"b'gpt_neox.layers.30.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.30.mlp.dense_4h_to_h.bias -> gpt_neox.layers.30.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.30.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.01507568359375, -0.0111083984375, -0.008544921875]\n",
"b'gpt_neox.layers.30.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.31.input_layernorm.weight -> gpt_neox.layers.31.input_layernorm.weight\n",
"gpt_neox.layers.31.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.09375, 1.1171875, 1.15625]\n",
"b'gpt_neox.layers.31.input_layernorm.weight'\n",
"gpt_neox.layers.31.input_layernorm.bias -> gpt_neox.layers.31.input_layernorm.bias\n",
"gpt_neox.layers.31.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.036376953125, 0.00958251953125, 0.0225830078125]\n",
"b'gpt_neox.layers.31.input_layernorm.bias'\n",
"gpt_neox.layers.31.post_attention_layernorm.weight -> gpt_neox.layers.31.post_attention_layernorm.weight\n",
"gpt_neox.layers.31.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.203125, 1.1953125, 1.1875]\n",
"b'gpt_neox.layers.31.post_attention_layernorm.weight'\n",
"gpt_neox.layers.31.post_attention_layernorm.bias -> gpt_neox.layers.31.post_attention_layernorm.bias\n",
"gpt_neox.layers.31.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0286865234375, -0.037841796875, 0.0289306640625]\n",
"b'gpt_neox.layers.31.post_attention_layernorm.bias'\n",
"gpt_neox.layers.31.attention.query_key_value.weight -> gpt_neox.layers.31.attention.query_key_value.weight\n",
"gpt_neox.layers.31.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.00311279296875, 0.05322265625, 0.0361328125], [0.0211181640625, 0.052978515625, 0.0230712890625], [0.0045166015625, -0.00091552734375, 0.018310546875]]\n",
"b'gpt_neox.layers.31.attention.query_key_value.weight'\n",
"gpt_neox.layers.31.attention.query_key_value.bias -> gpt_neox.layers.31.attention.query_key_value.bias\n",
"gpt_neox.layers.31.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.0068359375, -0.0031585693359375, 0.0003337860107421875]\n",
"b'gpt_neox.layers.31.attention.query_key_value.bias'\n",
"gpt_neox.layers.31.attention.dense.weight -> gpt_neox.layers.31.attention.dense.weight\n",
"gpt_neox.layers.31.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.005615234375, 0.031494140625, 0.020751953125], [0.033447265625, -0.036376953125, 0.018310546875], [0.004974365234375, -0.00099945068359375, -0.06640625]]\n",
"b'gpt_neox.layers.31.attention.dense.weight'\n",
"gpt_neox.layers.31.attention.dense.bias -> gpt_neox.layers.31.attention.dense.bias\n",
"gpt_neox.layers.31.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.004302978515625, -0.006439208984375, -0.00543212890625]\n",
"b'gpt_neox.layers.31.attention.dense.bias'\n",
"gpt_neox.layers.31.mlp.dense_h_to_4h.weight -> gpt_neox.layers.31.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.31.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.040283203125, -0.0303955078125, -0.006927490234375], [0.0673828125, -0.00390625, -0.041748046875], [0.0015869140625, -0.08056640625, 0.01025390625]]\n",
"b'gpt_neox.layers.31.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.31.mlp.dense_h_to_4h.bias -> gpt_neox.layers.31.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.31.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.01080322265625, -0.004638671875, -0.0126953125]\n",
"b'gpt_neox.layers.31.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.31.mlp.dense_4h_to_h.weight -> gpt_neox.layers.31.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.31.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.0203857421875, 0.0235595703125, 0.0693359375], [0.0235595703125, -0.00750732421875, 0.006744384765625], [0.0019683837890625, 0.01080322265625, -0.02734375]]\n",
"b'gpt_neox.layers.31.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.31.mlp.dense_4h_to_h.bias -> gpt_neox.layers.31.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.31.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.015869140625, -0.00469970703125, -0.01611328125]\n",
"b'gpt_neox.layers.31.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.32.input_layernorm.weight -> gpt_neox.layers.32.input_layernorm.weight\n",
"gpt_neox.layers.32.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.09375, 1.109375, 1.109375]\n",
"b'gpt_neox.layers.32.input_layernorm.weight'\n",
"gpt_neox.layers.32.input_layernorm.bias -> gpt_neox.layers.32.input_layernorm.bias\n",
"gpt_neox.layers.32.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.054443359375, -0.0238037109375, 0.0260009765625]\n",
"b'gpt_neox.layers.32.input_layernorm.bias'\n",
"gpt_neox.layers.32.post_attention_layernorm.weight -> gpt_neox.layers.32.post_attention_layernorm.weight\n",
"gpt_neox.layers.32.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.2265625, 1.2109375, 1.203125]\n",
"b'gpt_neox.layers.32.post_attention_layernorm.weight'\n",
"gpt_neox.layers.32.post_attention_layernorm.bias -> gpt_neox.layers.32.post_attention_layernorm.bias\n",
"gpt_neox.layers.32.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.03515625, -0.003448486328125, 0.01495361328125]\n",
"b'gpt_neox.layers.32.post_attention_layernorm.bias'\n",
"gpt_neox.layers.32.attention.query_key_value.weight -> gpt_neox.layers.32.attention.query_key_value.weight\n",
"gpt_neox.layers.32.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.02197265625, -0.02587890625, 0.006317138671875], [0.01556396484375, 0.02490234375, 0.0230712890625], [0.015869140625, -0.0257568359375, -0.0126953125]]\n",
"b'gpt_neox.layers.32.attention.query_key_value.weight'\n",
"gpt_neox.layers.32.attention.query_key_value.bias -> gpt_neox.layers.32.attention.query_key_value.bias\n",
"gpt_neox.layers.32.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.003265380859375, -0.002685546875, -0.00616455078125]\n",
"b'gpt_neox.layers.32.attention.query_key_value.bias'\n",
"gpt_neox.layers.32.attention.dense.weight -> gpt_neox.layers.32.attention.dense.weight\n",
"gpt_neox.layers.32.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.054443359375, -0.00157928466796875, 0.05322265625], [-0.0038604736328125, 0.009033203125, 0.0185546875], [-0.014892578125, -0.0020599365234375, 0.037841796875]]\n",
"b'gpt_neox.layers.32.attention.dense.weight'\n",
"gpt_neox.layers.32.attention.dense.bias -> gpt_neox.layers.32.attention.dense.bias\n",
"gpt_neox.layers.32.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0003871917724609375, -0.0008087158203125, -0.014404296875]\n",
"b'gpt_neox.layers.32.attention.dense.bias'\n",
"gpt_neox.layers.32.mlp.dense_h_to_4h.weight -> gpt_neox.layers.32.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.32.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.01611328125, -0.09716796875, 0.064453125], [-0.0306396484375, 0.0133056640625, -0.0272216796875], [0.04248046875, -0.0003204345703125, -0.027099609375]]\n",
"b'gpt_neox.layers.32.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.32.mlp.dense_h_to_4h.bias -> gpt_neox.layers.32.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.32.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.00787353515625, -0.01129150390625, -0.01153564453125]\n",
"b'gpt_neox.layers.32.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.32.mlp.dense_4h_to_h.weight -> gpt_neox.layers.32.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.32.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[-0.000698089599609375, 0.00811767578125, -0.005218505859375], [0.0191650390625, -0.01055908203125, 0.028076171875], [-0.04052734375, 0.0242919921875, 0.07568359375]]\n",
"b'gpt_neox.layers.32.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.32.mlp.dense_4h_to_h.bias -> gpt_neox.layers.32.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.32.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.010009765625, -0.004730224609375, -0.01409912109375]\n",
"b'gpt_neox.layers.32.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.33.input_layernorm.weight -> gpt_neox.layers.33.input_layernorm.weight\n",
"gpt_neox.layers.33.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.15625, 1.140625, 1.1640625]\n",
"b'gpt_neox.layers.33.input_layernorm.weight'\n",
"gpt_neox.layers.33.input_layernorm.bias -> gpt_neox.layers.33.input_layernorm.bias\n",
"gpt_neox.layers.33.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.051513671875, -0.03173828125, 0.050048828125]\n",
"b'gpt_neox.layers.33.input_layernorm.bias'\n",
"gpt_neox.layers.33.post_attention_layernorm.weight -> gpt_neox.layers.33.post_attention_layernorm.weight\n",
"gpt_neox.layers.33.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.234375, 1.2265625, 1.203125]\n",
"b'gpt_neox.layers.33.post_attention_layernorm.weight'\n",
"gpt_neox.layers.33.post_attention_layernorm.bias -> gpt_neox.layers.33.post_attention_layernorm.bias\n",
"gpt_neox.layers.33.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.048095703125, 0.00311279296875, -0.0050048828125]\n",
"b'gpt_neox.layers.33.post_attention_layernorm.bias'\n",
"gpt_neox.layers.33.attention.query_key_value.weight -> gpt_neox.layers.33.attention.query_key_value.weight\n",
"gpt_neox.layers.33.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[-0.0054931640625, -0.02392578125, 0.053955078125], [-0.0986328125, 0.013916015625, 0.07373046875], [0.010009765625, -0.064453125, -0.016357421875]]\n",
"b'gpt_neox.layers.33.attention.query_key_value.weight'\n",
"gpt_neox.layers.33.attention.query_key_value.bias -> gpt_neox.layers.33.attention.query_key_value.bias\n",
"gpt_neox.layers.33.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.0079345703125, -0.003143310546875, -0.006378173828125]\n",
"b'gpt_neox.layers.33.attention.query_key_value.bias'\n",
"gpt_neox.layers.33.attention.dense.weight -> gpt_neox.layers.33.attention.dense.weight\n",
"gpt_neox.layers.33.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[0.0242919921875, -0.01495361328125, -0.03369140625], [-0.03515625, -0.01202392578125, -0.04931640625], [-0.0091552734375, -0.017822265625, -0.0269775390625]]\n",
"b'gpt_neox.layers.33.attention.dense.weight'\n",
"gpt_neox.layers.33.attention.dense.bias -> gpt_neox.layers.33.attention.dense.bias\n",
"gpt_neox.layers.33.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.009521484375, 0.00201416015625, -0.0103759765625]\n",
"b'gpt_neox.layers.33.attention.dense.bias'\n",
"gpt_neox.layers.33.mlp.dense_h_to_4h.weight -> gpt_neox.layers.33.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.33.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[-0.03173828125, 0.0030975341796875, 0.01171875], [0.055908203125, -0.0234375, 0.06640625], [0.10205078125, 0.0133056640625, 0.0216064453125]]\n",
"b'gpt_neox.layers.33.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.33.mlp.dense_h_to_4h.bias -> gpt_neox.layers.33.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.33.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.0167236328125, -0.0155029296875, -0.002899169921875]\n",
"b'gpt_neox.layers.33.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.33.mlp.dense_4h_to_h.weight -> gpt_neox.layers.33.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.33.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.044677734375, -0.00022125244140625, -0.0162353515625], [-0.00799560546875, -0.01300048828125, -0.03466796875], [0.019775390625, 0.0024566650390625, -0.051513671875]]\n",
"b'gpt_neox.layers.33.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.33.mlp.dense_4h_to_h.bias -> gpt_neox.layers.33.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.33.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.003997802734375, -0.0064697265625, -0.0037994384765625]\n",
"b'gpt_neox.layers.33.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.34.input_layernorm.weight -> gpt_neox.layers.34.input_layernorm.weight\n",
"gpt_neox.layers.34.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.140625, 1.109375, 1.1484375]\n",
"b'gpt_neox.layers.34.input_layernorm.weight'\n",
"gpt_neox.layers.34.input_layernorm.bias -> gpt_neox.layers.34.input_layernorm.bias\n",
"gpt_neox.layers.34.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.02783203125, -0.0419921875, 0.052978515625]\n",
"b'gpt_neox.layers.34.input_layernorm.bias'\n",
"gpt_neox.layers.34.post_attention_layernorm.weight -> gpt_neox.layers.34.post_attention_layernorm.weight\n",
"gpt_neox.layers.34.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.2109375, 1.1796875, 1.1875]\n",
"b'gpt_neox.layers.34.post_attention_layernorm.weight'\n",
"gpt_neox.layers.34.post_attention_layernorm.bias -> gpt_neox.layers.34.post_attention_layernorm.bias\n",
"gpt_neox.layers.34.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.057861328125, -0.01611328125, -0.01251220703125]\n",
"b'gpt_neox.layers.34.post_attention_layernorm.bias'\n",
"gpt_neox.layers.34.attention.query_key_value.weight -> gpt_neox.layers.34.attention.query_key_value.weight\n",
"gpt_neox.layers.34.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.00035858154296875, 0.032470703125, -0.052490234375], [0.0274658203125, -0.051513671875, 0.006988525390625], [-0.02197265625, 0.00689697265625, -0.0115966796875]]\n",
"b'gpt_neox.layers.34.attention.query_key_value.weight'\n",
"gpt_neox.layers.34.attention.query_key_value.bias -> gpt_neox.layers.34.attention.query_key_value.bias\n",
"gpt_neox.layers.34.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [-0.00421142578125, -0.000530242919921875, -0.002899169921875]\n",
"b'gpt_neox.layers.34.attention.query_key_value.bias'\n",
"gpt_neox.layers.34.attention.dense.weight -> gpt_neox.layers.34.attention.dense.weight\n",
"gpt_neox.layers.34.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.0034027099609375, 0.0028533935546875, -0.0186767578125], [0.013427734375, -0.06005859375, -0.00830078125], [-0.00037384033203125, -0.0147705078125, -0.01251220703125]]\n",
"b'gpt_neox.layers.34.attention.dense.weight'\n",
"gpt_neox.layers.34.attention.dense.bias -> gpt_neox.layers.34.attention.dense.bias\n",
"gpt_neox.layers.34.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0021820068359375, 0.000850677490234375, -0.0123291015625]\n",
"b'gpt_neox.layers.34.attention.dense.bias'\n",
"gpt_neox.layers.34.mlp.dense_h_to_4h.weight -> gpt_neox.layers.34.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.34.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.030029296875, -0.0081787109375, -0.0216064453125], [0.0196533203125, 0.041015625, -0.02685546875], [0.01190185546875, 0.010498046875, -0.0125732421875]]\n",
"b'gpt_neox.layers.34.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.34.mlp.dense_h_to_4h.bias -> gpt_neox.layers.34.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.34.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.004241943359375, -0.0025634765625, -0.00811767578125]\n",
"b'gpt_neox.layers.34.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.34.mlp.dense_4h_to_h.weight -> gpt_neox.layers.34.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.34.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.0034637451171875, 0.005523681640625, 0.0216064453125], [0.052490234375, 0.00014019012451171875, 0.061279296875], [-0.01953125, 0.0191650390625, -0.005767822265625]]\n",
"b'gpt_neox.layers.34.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.34.mlp.dense_4h_to_h.bias -> gpt_neox.layers.34.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.34.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.00482177734375, -0.002899169921875, -0.0022430419921875]\n",
"b'gpt_neox.layers.34.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.layers.35.input_layernorm.weight -> gpt_neox.layers.35.input_layernorm.weight\n",
"gpt_neox.layers.35.input_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.1328125, 1.0859375, 1.1875]\n",
"b'gpt_neox.layers.35.input_layernorm.weight'\n",
"gpt_neox.layers.35.input_layernorm.bias -> gpt_neox.layers.35.input_layernorm.bias\n",
"gpt_neox.layers.35.input_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.0252685546875, -0.025634765625, 0.033203125]\n",
"b'gpt_neox.layers.35.input_layernorm.bias'\n",
"gpt_neox.layers.35.post_attention_layernorm.weight -> gpt_neox.layers.35.post_attention_layernorm.weight\n",
"gpt_neox.layers.35.post_attention_layernorm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [1.125, 1.0703125, 1.078125]\n",
"b'gpt_neox.layers.35.post_attention_layernorm.weight'\n",
"gpt_neox.layers.35.post_attention_layernorm.bias -> gpt_neox.layers.35.post_attention_layernorm.bias\n",
"gpt_neox.layers.35.post_attention_layernorm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [-0.05126953125, -0.01904296875, 0.00714111328125]\n",
"b'gpt_neox.layers.35.post_attention_layernorm.bias'\n",
"gpt_neox.layers.35.attention.query_key_value.weight -> gpt_neox.layers.35.attention.query_key_value.weight\n",
"gpt_neox.layers.35.attention.query_key_value.weight 2 (8448, 2816)\n",
" Converting to float16 (8448, 2816) [[0.008544921875, 0.053466796875, 0.00885009765625], [-0.046875, -0.0191650390625, 0.027099609375], [-0.0306396484375, -0.033935546875, -0.0003566741943359375]]\n",
"b'gpt_neox.layers.35.attention.query_key_value.weight'\n",
"gpt_neox.layers.35.attention.query_key_value.bias -> gpt_neox.layers.35.attention.query_key_value.bias\n",
"gpt_neox.layers.35.attention.query_key_value.bias 1 (8448,)\n",
" Converting to float32 (8448,) [0.000766754150390625, -0.005950927734375, 0.004730224609375]\n",
"b'gpt_neox.layers.35.attention.query_key_value.bias'\n",
"gpt_neox.layers.35.attention.dense.weight -> gpt_neox.layers.35.attention.dense.weight\n",
"gpt_neox.layers.35.attention.dense.weight 2 (2816, 2816)\n",
" Converting to float16 (2816, 2816) [[-0.06103515625, -0.03857421875, -0.055908203125], [-0.0057373046875, -0.03662109375, 0.022216796875], [0.005279541015625, -0.0023651123046875, -0.0074462890625]]\n",
"b'gpt_neox.layers.35.attention.dense.weight'\n",
"gpt_neox.layers.35.attention.dense.bias -> gpt_neox.layers.35.attention.dense.bias\n",
"gpt_neox.layers.35.attention.dense.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.0118408203125, 0.00653076171875, -0.016845703125]\n",
"b'gpt_neox.layers.35.attention.dense.bias'\n",
"gpt_neox.layers.35.mlp.dense_h_to_4h.weight -> gpt_neox.layers.35.mlp.dense_h_to_4h.weight\n",
"gpt_neox.layers.35.mlp.dense_h_to_4h.weight 2 (11264, 2816)\n",
" Converting to float16 (11264, 2816) [[0.013916015625, -0.024169921875, -0.00970458984375], [0.008544921875, 0.0147705078125, -0.006256103515625], [-0.003326416015625, -0.0751953125, 0.01055908203125]]\n",
"b'gpt_neox.layers.35.mlp.dense_h_to_4h.weight'\n",
"gpt_neox.layers.35.mlp.dense_h_to_4h.bias -> gpt_neox.layers.35.mlp.dense_h_to_4h.bias\n",
"gpt_neox.layers.35.mlp.dense_h_to_4h.bias 1 (11264,)\n",
" Converting to float32 (11264,) [-0.004638671875, 0.003997802734375, -0.01116943359375]\n",
"b'gpt_neox.layers.35.mlp.dense_h_to_4h.bias'\n",
"gpt_neox.layers.35.mlp.dense_4h_to_h.weight -> gpt_neox.layers.35.mlp.dense_4h_to_h.weight\n",
"gpt_neox.layers.35.mlp.dense_4h_to_h.weight 2 (2816, 11264)\n",
" Converting to float16 (2816, 11264) [[0.01483154296875, -0.034423828125, -0.04638671875], [-0.004241943359375, -0.0108642578125, -0.042724609375], [0.02880859375, 0.030029296875, -0.033935546875]]\n",
"b'gpt_neox.layers.35.mlp.dense_4h_to_h.weight'\n",
"gpt_neox.layers.35.mlp.dense_4h_to_h.bias -> gpt_neox.layers.35.mlp.dense_4h_to_h.bias\n",
"gpt_neox.layers.35.mlp.dense_4h_to_h.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.036376953125, 0.006195068359375, -0.01483154296875]\n",
"b'gpt_neox.layers.35.mlp.dense_4h_to_h.bias'\n",
"gpt_neox.final_layer_norm.weight -> gpt_neox.final_layer_norm.weight\n",
"gpt_neox.final_layer_norm.weight 1 (2816,)\n",
" Converting to float32 (2816,) [2.046875, 2.171875, 2.109375]\n",
"b'gpt_neox.final_layer_norm.weight'\n",
"gpt_neox.final_layer_norm.bias -> gpt_neox.final_layer_norm.bias\n",
"gpt_neox.final_layer_norm.bias 1 (2816,)\n",
" Converting to float32 (2816,) [0.05615234375, -0.004425048828125, 0.006866455078125]\n",
"b'gpt_neox.final_layer_norm.bias'\n",
"embed_out.weight -> embed_out.weight\n",
"embed_out.weight 2 (32000, 2816)\n",
" Converting to float16 (32000, 2816) [[-0.007232666015625, -0.005859375, -0.0029144287109375], [-0.01031494140625, -0.0267333984375, 0.013916015625], [-0.0191650390625, 0.04736328125, -0.002471923828125]]\n",
"b'embed_out.weight'\n",
"Done. Output file: /content/rinna_model_ggml/ggml-japanese-gpt-neox-3.6b-instruction-ppo-f16.bin\n",
"\n",
"0\n",
"/content\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": []
},
"metadata": {},
"execution_count": 18
}
]
},
{
"cell_type": "code",
"source": [
"! ls -lh rinna_model_ggml"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Gcsltxb4aL7G",
"outputId": "53c442fa-cba5-4063-a230-c901964f1a02"
},
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"total 6.8G\n",
"-rw-r--r-- 1 root root 6.8G Jul 4 15:21 ggml-japanese-gpt-neox-3.6b-instruction-ppo-f16.bin\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"%%shell\n",
"\n",
"pushd redpajama.cpp >& /dev/null\n",
"\n",
"make redpajama redpajama-chat\n",
"\n",
"popd >& /dev/null"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "U8w1arVHeogu",
"outputId": "adef4c1f-629b-427d-cdf1-5a089c0de888"
},
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"I llama.cpp build info: \n",
"I UNAME_S: Linux\n",
"I UNAME_P: x86_64\n",
"I UNAME_M: x86_64\n",
"I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native\n",
"I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native\n",
"I LDFLAGS: \n",
"I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"\n",
"cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/redpajama/gptneox.cpp -o gptneox.o\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/redpajama/common-gptneox.cpp -o common-gptneox.o\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/redpajama/main-redpajama.cpp ggml.o gptneox.o common-gptneox.o -o redpajama \n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama.cpp:\u001b[m\u001b[K In function ‘\u001b[01m\u001b[Kint main(int, char**)\u001b[m\u001b[K’:\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama.cpp:291:10:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kvariable ‘\u001b[01m\u001b[Kis_antiprompt\u001b[m\u001b[K’ set but not used [\u001b[01;35m\u001b[K-Wunused-but-set-variable\u001b[m\u001b[K]\n",
" 291 | bool \u001b[01;35m\u001b[Kis_antiprompt\u001b[m\u001b[K = false;\n",
" | \u001b[01;35m\u001b[K^~~~~~~~~~~~~\u001b[m\u001b[K\n",
"\n",
"==== Run ./redpajama -h for help. ====\n",
"\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/redpajama/main-redpajama-chat.cpp ggml.o gptneox.o common-gptneox.o -o redpajama-chat \n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:\u001b[m\u001b[K In function ‘\u001b[01m\u001b[Kint main(int, char**)\u001b[m\u001b[K’:\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:294:23:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kcomparison of integer expressions of different signedness: ‘\u001b[01m\u001b[Kint\u001b[m\u001b[K’ and ‘\u001b[01m\u001b[Klong unsigned int\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wsign-compare\u001b[m\u001b[K]\n",
" 294 | while (\u001b[01;35m\u001b[Kn_past < inp_size\u001b[m\u001b[K) {\n",
" | \u001b[01;35m\u001b[K~~~~~~~^~~~~~~~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:296:41:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kcomparison of integer expressions of different signedness: ‘\u001b[01m\u001b[Kint32_t\u001b[m\u001b[K’ {aka ‘\u001b[01m\u001b[Kint\u001b[m\u001b[K’} and ‘\u001b[01m\u001b[Klong unsigned int\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wsign-compare\u001b[m\u001b[K]\n",
" 296 | int n_eval = \u001b[01;35m\u001b[Kparams.n_batch < remaining\u001b[m\u001b[K ? params.n_batch : remaining;\n",
" | \u001b[01;35m\u001b[K~~~~~~~~~~~~~~~^~~~~~~~~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:389:38:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kcomparison of integer expressions of different signedness: ‘\u001b[01m\u001b[Kstd::vector<int>::size_type\u001b[m\u001b[K’ {aka ‘\u001b[01m\u001b[Klong unsigned int\u001b[m\u001b[K’} and ‘\u001b[01m\u001b[Kint32_t\u001b[m\u001b[K’ {aka ‘\u001b[01m\u001b[Kint\u001b[m\u001b[K’} [\u001b[01;35m\u001b[K-Wsign-compare\u001b[m\u001b[K]\n",
" 389 | if (\u001b[01;35m\u001b[Klast_n_tokens.size() > params.repeat_last_n\u001b[m\u001b[K) {\n",
" | \u001b[01;35m\u001b[K~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:188:19:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kunused variable ‘\u001b[01m\u001b[Ktop_k\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wunused-variable\u001b[m\u001b[K]\n",
" 188 | const int32_t \u001b[01;35m\u001b[Ktop_k\u001b[m\u001b[K = params.top_k;\n",
" | \u001b[01;35m\u001b[K^~~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:189:19:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kunused variable ‘\u001b[01m\u001b[Ktop_p\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wunused-variable\u001b[m\u001b[K]\n",
" 189 | const float \u001b[01;35m\u001b[Ktop_p\u001b[m\u001b[K = params.top_p;\n",
" | \u001b[01;35m\u001b[K^~~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:190:19:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kunused variable ‘\u001b[01m\u001b[Ktemp\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wunused-variable\u001b[m\u001b[K]\n",
" 190 | const float \u001b[01;35m\u001b[Ktemp\u001b[m\u001b[K = params.temp;\n",
" | \u001b[01;35m\u001b[K^~~~\u001b[m\u001b[K\n",
"\u001b[01m\u001b[Kexamples/redpajama/main-redpajama-chat.cpp:191:19:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kunused variable ‘\u001b[01m\u001b[Krepeat_penalty\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wunused-variable\u001b[m\u001b[K]\n",
" 191 | const float \u001b[01;35m\u001b[Krepeat_penalty\u001b[m\u001b[K = params.repeat_penalty;\n",
" | \u001b[01;35m\u001b[K^~~~~~~~~~~~~~\u001b[m\u001b[K\n",
"\n",
"==== Run ./redpajama-chat -h for help. ====\n",
"\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": []
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "code",
"source": [
"%%shell\n",
"\n",
"pushd redpajama.cpp >& /dev/null\n",
"\n",
"prompt=\"ユーザー: 世界的に有名な日本の映画にはどのようなものがありますか?<NL>システム: \"\n",
"\n",
"./redpajama -s 10 -m /content/rinna_model_ggml/ggml-japanese-gpt-neox-3.6b-instruction-ppo-f16.bin --no-mmap -t 2 -p \"$prompt\"\n",
"\n",
"popd >& /dev/null"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "lFaSYEkgeody",
"outputId": "4ab8dd0c-836b-4750-b5c5-6685ad8e6484"
},
"execution_count": 27,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"main: seed = 10\n",
"gptneox.cpp: loading model from /content/rinna_model_ggml/ggml-japanese-gpt-neox-3.6b-instruction-ppo-f16.bin\n",
"gptneox_model_load_internal: format = ggmf v1 (old version with no mmap support)\n",
"gptneox_model_load_internal: n_vocab = 32000\n",
"gptneox_model_load_internal: n_ctx = 512\n",
"gptneox_model_load_internal: n_embd = 2816\n",
"gptneox_model_load_internal: n_head = 22\n",
"gptneox_model_load_internal: n_layer = 36\n",
"gptneox_model_load_internal: n_rot = 128\n",
"gptneox_model_load_internal: use_parallel_residual = 0\n",
"gptneox_model_load_internal: ftype = 1 (mostly F16)\n",
"gptneox_model_load_internal: n_parts = 1\n",
"gptneox_model_load_internal: model size = 12B\n",
"gptneox_model_load_internal: ggml ctx size = 7048088.19 KiB\n",
"gptneox_model_load_internal: mem required = 8930.90 MiB (+ 1608.00 MiB per state)\n",
"..................................................................................................\n",
".\n",
".\n",
"gptneox_init_from_file: kv self size = 198.00 MiB\n",
"\n",
"system_info: n_threads = 2 / 2 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
"sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000\n",
"generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0\n",
"\n",
"\n",
"ユーザー:<0x1C>世界的に有名な日本の映画にはどのようなものがありますか<0xEB><0xB8><0x9B><NL>システム:<0x1C>ん、日本映画はとても有名です。最も有名なものとしては、黒澤明監督の「羅生門」や溝口健二監督の「雨月物語」などがあります。これらは、世界中の多くの国で上映されています。また、宮崎駿監督のアニメ映画である「千と千尋の神隠し」も非常に人気があり、世界中で1000万枚以上の売り上げを記録しています。日本映画は、世界的に有名な日本の伝統文化の重要な部分であり、重要な役割を果たしています。</s>1つの目標にコミットし、成果を出すこと。成功するためには、一貫性が重要です。目標を明確にし、進捗状況を追跡することが大切です。\n",
"\n",
"\n",
"gptneox_print_timings: load time = 36223.65 ms\n",
"gptneox_print_timings: sample time = 85.74 ms / 128 runs ( 0.67 ms per run)\n",
"gptneox_print_timings: prompt eval time = 3908.38 ms / 23 tokens ( 169.93 ms per token)\n",
"gptneox_print_timings: eval time = 61449.48 ms / 127 runs ( 483.85 ms per run)\n",
"gptneox_print_timings: total time = 97792.79 ms\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": []
},
"metadata": {},
"execution_count": 27
}
]
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "V4ICom8xgN1G"
},
"execution_count": null,
"outputs": []
}
]
}
@jnory
Copy link
Author

jnory commented Jul 4, 2023

https://huggingface.co/rinna/japanese-gpt-neox-3.6b-instruction-ppo を動かすためのツールとして https://github.com/togethercomputer/redpajama.cpp があります。
transformersの形式からggmlの形式に変換するスクリプトが付属していますが、これはモデルを一度メモリにロードする際メモリ使用量が多く、無料版のColabではメモリオーバーしてしまいます。
それの対策として、メモリ使用量を抑えながらモデルを読み込むように修正パッチを当てました。
結果として、GPUなしの無料版Colabでrinnaモデルをggmlに変換できました。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment