Created
September 23, 2024 13:32
-
-
Save rbiswasfc/aa617ba0287ecea369acdfd999abfd85 to your computer and use it in GitHub Desktop.
minai_finetuning
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# minai_ft\n\n> Add utils for finetuning with minai" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#| default_exp minai_utils", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#| hide\nfrom nbdev.showdoc import *", | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%ai reset", | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "from toolslm.download import read_docs\nimport datasets, transformers, pandas as pd", | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "def read_gist(s): return read_docs('https://gist.githubusercontent.com/rbiswasfc/'+s)", | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "style_guide = read_docs('https://docs.fast.ai/dev/style.md')\nminai_doc = read_gist(\"576026f405ad696a377ce59d67582bc9/raw/9df6b76c2dd675f118a63205958e8503cfce8a85/minai.md\")\nminai_dev_nbs = read_gist(\"3fa98242cb0446063f421f89f4c7cfc2/raw/bc664bfdceb2ed6f595fee6d337c600a7469b5bf/notebooks.md\")", | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%ai 0 -c\nPlease read carefully the $`style_guide`, $`minai_doc`.\nWe will be finetuning LLMs using minai. Let's follow the fastai style_guide.\n\nSay OK if you understood the documents properly.", | |
"execution_count": 7, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "OK. I have carefully read and understood the `style_guide` and `minai_doc` documents. I'm ready to assist with finetuning LLMs using minai while adhering to the fastai style guide." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%ai 0 -c\nThe $`minai_dev_nbs` will be good reference too. Read them carefully and say OK.", | |
"execution_count": 8, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "OK. I have carefully read and understood the `minai_dev_nbs` document, which contains the notebooks developed in the Practical Deep Learning for Coders course. This provides valuable context and examples for implementing deep learning techniques using the minai library. I'm now well-prepared to assist with finetuning LLMs using minai while adhering to the fastai style guide and drawing on the concepts demonstrated in these notebooks." | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%aip 0\nset cuda device to 1", | |
"execution_count": 9, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import torch\n\ntorch.cuda.set_device(1)\n\ncurrent_device = torch.cuda.current_device()\nprint(f\"Current CUDA device: {current_device}\")", | |
"execution_count": 10, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Current CUDA device: 1\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Data\n\nClassify arxiv articles into answerai zotero and non-answerai" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "datasets.logging.set_verbosity_error()\ntransformers.logging.set_verbosity_error()\n\ndf = datasets.load_dataset(\"rbiswasfc/arxiv-papers\")['train'].to_pandas()\ndf.sample()", | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 11, | |
"data": { | |
"text/plain": " arxiv_id label title \\\n1735 2404.07839 HF RecurrentGemma: Moving Past Transformers for E... \n\n authors published \\\n1735 [Aleksandar Botev, Soham De, Samuel L Smith, A... 2024-04-11 \n\n abstract doi \\\n1735 We introduce RecurrentGemma, a family of open ... None \n\n primary_category categories \n1735 cs.LG [cs.LG, cs.AI, cs.CL] ", | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>arxiv_id</th>\n <th>label</th>\n <th>title</th>\n <th>authors</th>\n <th>published</th>\n <th>abstract</th>\n <th>doi</th>\n <th>primary_category</th>\n <th>categories</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>1735</th>\n <td>2404.07839</td>\n <td>HF</td>\n <td>RecurrentGemma: Moving Past Transformers for E...</td>\n <td>[Aleksandar Botev, Soham De, Samuel L Smith, A...</td>\n <td>2024-04-11</td>\n <td>We introduce RecurrentGemma, a family of open ...</td>\n <td>None</td>\n <td>cs.LG</td>\n <td>[cs.LG, cs.AI, cs.CL]</td>\n </tr>\n </tbody>\n</table>\n</div>" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# train-test split\n\ndf['published'] = pd.to_datetime(df['published'])\ndf = df[df['published'] >= '2023-05-05']\ndf['label'] = (df['label'] != 'HF').astype(int)\n\ntrain_df = df[df['published'].dt.year < 2024].reset_index(drop=True)\ntest_df = df[df['published'].dt.year >= 2024].reset_index(drop=True)\n\ntrain_df.shape, test_df.shape", | |
"execution_count": 12, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 12, | |
"data": { | |
"text/plain": "((1814, 9), (2346, 9))" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%aip 0\nwrite a fn to get input text by concating title, categories, primary_category, authors and abstract ", | |
"execution_count": 13, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "def get_input_text(row):\n return f\"Title: {row['title']}\\nCategories:{' '.join(row['categories'])}\\nPrimary Category: {row['primary_category']}\\Authors: {' '.join(row['authors'])}\\nAbstract: {row['abstract']}\"\n\ntrain_df['input_text'] = train_df.apply(get_input_text, axis=1)\ntest_df['input_text'] = test_df.apply(get_input_text, axis=1)\n\nprint(train_df['input_text'].iloc[0][:500])", | |
"execution_count": 14, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Title: AppAgent: Multimodal Agents as Smartphone Users\nCategories:cs.CV\nPrimary Category: cs.CV\\Authors: Chi Zhang Zhao Yang Jiaxuan Liu Yucheng Han Xin Chen Zebiao Huang Bin Fu Gang Yu\nAbstract: Recent advancements in large language models (LLMs) have led to the creation\nof intelligent agents capable of performing complex tasks. This paper\nintroduces a novel LLM-based multimodal agent framework designed to operate\nsmartphone applications. Our framework enables the agent to operate smartphone\nap\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "df.sample()", | |
"execution_count": 15, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 15, | |
"data": { | |
"text/plain": " arxiv_id label title \\\n4151 2306.00238 0 Bytes Are All You Need: Transformers Operating... \n\n authors published \\\n4151 [Maxwell Horton, Sachin Mehta, Ali Farhadi, Mo... 2023-05-31 \n\n abstract doi \\\n4151 Modern deep learning approaches usually utiliz... None \n\n primary_category categories \n4151 cs.CV [cs.CV] ", | |
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>arxiv_id</th>\n <th>label</th>\n <th>title</th>\n <th>authors</th>\n <th>published</th>\n <th>abstract</th>\n <th>doi</th>\n <th>primary_category</th>\n <th>categories</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>4151</th>\n <td>2306.00238</td>\n <td>0</td>\n <td>Bytes Are All You Need: Transformers Operating...</td>\n <td>[Maxwell Horton, Sachin Mehta, Ali Farhadi, Mo...</td>\n <td>2023-05-31</td>\n <td>Modern deep learning approaches usually utiliz...</td>\n <td>None</td>\n <td>cs.CV</td>\n <td>[cs.CV]</td>\n </tr>\n </tbody>\n</table>\n</div>" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# HF version" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import warnings\nwarnings.filterwarnings('ignore')", | |
"execution_count": 19, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "from transformers import AutoTokenizer\nfrom torch.utils.data import Dataset, DataLoader\nimport torch\n\nclass ArxivDataset(Dataset):\n def __init__(self, texts, labels, tokenizer, max_length=512):\n self.texts, self.labels = texts, labels\n self.tokenizer, self.max_length = tokenizer, max_length\n\n def __len__(self): return len(self.texts)\n\n def __getitem__(self, idx):\n text = self.texts[idx]\n label = self.labels[idx]\n\n encoding = self.tokenizer.encode_plus(text, add_special_tokens=True, max_length=self.max_length, return_token_type_ids=False, padding='max_length', truncation=True, return_attention_mask=True, return_tensors='pt')\n\n return {'input_ids': encoding['input_ids'].flatten(), 'attention_mask': encoding['attention_mask'].flatten(), 'labels': torch.tensor(label, dtype=torch.long)}\n\ntokenizer = AutoTokenizer.from_pretrained(\"microsoft/deberta-v3-base\")\n\ntrain_dataset = ArxivDataset(train_df['input_text'].tolist(), train_df['label'].tolist(), tokenizer)\nval_dataset = ArxivDataset(test_df['input_text'].tolist(), test_df['label'].tolist(), tokenizer)\n\ntrain_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)\nval_loader = DataLoader(val_dataset, batch_size=8, shuffle=False)\n\nprint(f\"Number of training batches: {len(train_loader)}\")\nprint(f\"Number of validation batches: {len(val_loader)}\")", | |
"execution_count": 20, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Number of training batches: 227\nNumber of validation batches: 294\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%time\nimport numpy as np\nfrom transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer\nfrom sklearn.metrics import f1_score, roc_auc_score\nimport torch\n\ntorch.cuda.set_device(1)\n\ndef compute_metrics(eval_pred):\n logits, labels = eval_pred\n predictions = np.argmax(logits, axis=-1)\n f1 = f1_score(labels, predictions)\n auc = roc_auc_score(labels, logits[:, 1])\n return {\"f1\": f1, \"auc\": auc}\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\"microsoft/deberta-v3-base\", num_labels=2)\n\ntraining_args = TrainingArguments(\n output_dir=\"./results\",\n num_train_epochs=5,\n learning_rate=3e-5,\n per_device_train_batch_size=8,\n per_device_eval_batch_size=8,\n warmup_steps=10,\n weight_decay=0.001,\n log_level='error',\n logging_dir=\"./logs\",\n logging_strategy=\"no\",\n eval_strategy=\"no\",\n save_strategy=\"no\",\n report_to='none',\n load_best_model_at_end=False,\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=train_dataset,\n eval_dataset=val_dataset,\n compute_metrics=compute_metrics,\n)\n\ntrainer.train()", | |
"execution_count": 30, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "{'train_runtime': 132.3036, 'train_samples_per_second': 68.554, 'train_steps_per_second': 1.096, 'train_loss': 0.4557026830212823, 'epoch': 5.0}\nCPU times: user 7min 20s, sys: 41.7 s, total: 8min 2s\nWall time: 2min 14s\n", | |
"name": "stdout" | |
}, | |
{ | |
"output_type": "execute_result", | |
"execution_count": 30, | |
"data": { | |
"text/plain": "TrainOutput(global_step=145, training_loss=0.4557026830212823, metrics={'train_runtime': 132.3036, 'train_samples_per_second': 68.554, 'train_steps_per_second': 1.096, 'train_loss': 0.4557026830212823, 'epoch': 5.0})" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "import json\nprint(f\"Evaluation results:\\n{json.dumps({k: round(v, 4) if isinstance(v, float) else v for k, v in trainer.evaluate().items()}, indent=4)}\")", | |
"execution_count": 31, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "{'eval_loss': 0.3534512221813202, 'eval_f1': 0.47, 'eval_auc': 0.8365393035013532, 'eval_runtime': 19.2393, 'eval_samples_per_second': 121.938, 'eval_steps_per_second': 1.923, 'epoch': 5.0}\nEvaluation results:\n{\n \"eval_loss\": 0.3535,\n \"eval_f1\": 0.47,\n \"eval_auc\": 0.8365,\n \"eval_runtime\": 19.2393,\n \"eval_samples_per_second\": 121.938,\n \"eval_steps_per_second\": 1.923,\n \"epoch\": 5.0\n}\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# MinAI Version" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%aip 0\nwe will use `microsoft/deberta-v3-base` for modelling. \nLet's prepare the datasets. follow the style guide and what's needed for minai.\nno need to give unnecessary comments", | |
"execution_count": 32, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "from datasets import Dataset as HFDataset\nfrom transformers import AutoTokenizer\nfrom torch.utils.data import DataLoader\nfrom minai.core import DataLoaders\nfrom transformers import DataCollatorWithPadding\n\n\ndef prepare_dataset(df, tokenizer, max_length=512):\n dataset = HFDataset.from_pandas(df[['input_text', 'label']])\n \n def tokenize(batch):\n return tokenizer(batch['input_text'], truncation=True, padding=\"max_length\", max_length=max_length)\n \n tokenized = dataset.map(tokenize, batched=True, remove_columns=['input_text'])\n return tokenized\n\ntokenizer = AutoTokenizer.from_pretrained('microsoft/deberta-v3-base')\ntrain_ds = prepare_dataset(train_df, tokenizer)\ntest_ds = prepare_dataset(test_df, tokenizer)", | |
"execution_count": 33, | |
"outputs": [ | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "Map: 0%| | 0/1814 [00:00<?, ? examples/s]", | |
"application/vnd.jupyter.widget-view+json": { | |
"version_major": 2, | |
"version_minor": 0, | |
"model_id": "9798887e9b2541b5b1f6181da0539fc7" | |
} | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "Map: 0%| | 0/2346 [00:00<?, ? examples/s]", | |
"application/vnd.jupyter.widget-view+json": { | |
"version_major": 2, | |
"version_minor": 0, | |
"model_id": "b2258e95f5aa48178621b939196a6c76" | |
} | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%aip 0\nWrite the data collator and build the dataloaders.\nCareful that we didn't pad the input ids and attention masks.", | |
"execution_count": 34, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "from torch.utils.data import DataLoader\nfrom minai.core import DataLoaders\nfrom transformers import DataCollatorWithPadding\n\nclass TextCollator(DataCollatorWithPadding):\n def __call__(self, examples):\n batch = super().__call__(examples)\n return (batch[\"input_ids\"], batch[\"attention_mask\"], batch[\"labels\"])\n \ncollator = TextCollator(tokenizer=tokenizer)\n\ndef get_dls(train_ds, test_ds, collator, bs=8):\n train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True, collate_fn=collator)\n test_dl = DataLoader(test_ds, batch_size=bs, collate_fn=collator)\n return DataLoaders(train_dl, test_dl)\n\ndls = get_dls(train_ds, test_ds, collator)\nbatch = next(iter(dls.train))\nbatch", | |
"execution_count": 35, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 35, | |
"data": { | |
"text/plain": "(tensor([[ 1, 7181, 294, ..., 0, 0, 0],\n [ 1, 7181, 294, ..., 0, 0, 0],\n [ 1, 7181, 294, ..., 0, 0, 0],\n ...,\n [ 1, 7181, 294, ..., 0, 0, 0],\n [ 1, 7181, 294, ..., 0, 0, 0],\n [ 1, 7181, 294, ..., 0, 0, 0]]),\n tensor([[1, 1, 1, ..., 0, 0, 0],\n [1, 1, 1, ..., 0, 0, 0],\n [1, 1, 1, ..., 0, 0, 0],\n ...,\n [1, 1, 1, ..., 0, 0, 0],\n [1, 1, 1, ..., 0, 0, 0],\n [1, 1, 1, ..., 0, 0, 0]]),\n tensor([1, 0, 0, 0, 0, 0, 0, 1]))" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "%%aip 0\nwrite the training and eval code. Let's use auc as eval metric", | |
"execution_count": 36, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "from transformers import AutoModelForSequenceClassification\nfrom torch import nn, optim\nimport torch\nfrom minai.core import *\nfrom functools import partial\nfrom torcheval.metrics import BinaryAUROC\n\nclass AUCMetric(BinaryAUROC):\n def compute(self):\n return super().compute().item()\n\nclass MetricsCB(DeviceCB):\n def __init__(self, **metrics):\n super().__init__()\n self.metrics = {k: v().to(self.device) for k, v in metrics.items()}\n \n def before_fit(self, learn): learn.metrics = self\n def before_epoch(self, learn):\n [m.reset() for m in self.metrics.values()]\n \n def after_batch(self, learn):\n for m in self.metrics.values():\n m.update(learn.preds.logits[:, -1].sigmoid().flatten(), learn.batch[-1].float())\n \n def after_epoch(self, learn):\n log = {k: f'{v.compute():.3f}' for k, v in self.metrics.items()}\n log['epoch'] = learn.epoch\n log['train'] = 'train' if learn.model.training else 'eval'\n print(log)", | |
"execution_count": 37, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "class GradAccumLearner(TrainLearner):\n def __init__(self, model, dls, loss_func, lr=None, cbs=None, opt_func=torch.optim.SGD, epoch_sz=None, n_inp=1, inp_nm=None, lbl_nm=None, preds_nm=None, grad_acc_steps=1):\n super().__init__(model, dls, loss_func, lr, cbs, opt_func=opt_func, epoch_sz=epoch_sz, n_inp=n_inp, inp_nm=inp_nm, lbl_nm=lbl_nm, preds_nm=preds_nm)\n assert grad_acc_steps >= 1\n self.step_count, self.grad_acc_steps = 0, grad_acc_steps\n\n def get_loss(self):\n self.loss = self.loss_func(self.preds.logits, self.batch[-1])\n\n def backward(self):\n loss = self.loss / self.grad_acc_steps\n loss.backward()\n\n def step(self):\n self.step_count += 1\n if self.step_count % self.grad_acc_steps == 0:\n self.opt.step()\n\n def zero_grad(self):\n if self.step_count % self.grad_acc_steps == 0:\n self.opt.zero_grad()\n\ndef loss_fn(x, y):\n return torch.nn.functional.cross_entropy(x.view(-1, x.shape[-1]), y.view(-1))", | |
"execution_count": 38, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "model = AutoModelForSequenceClassification.from_pretrained('microsoft/deberta-v3-base', num_labels=2)\nmodel.gradient_checkpointing_enable()\noptim = torch.optim.Adam\ncbs = [DeviceCB(), MetricsCB(auc=AUCMetric), ProgressCB(plot=True)]\n\nlearn = GradAccumLearner(model, dls, loss_func=loss_fn, lr=1e-5, cbs=cbs, n_inp=2, preds_nm='logits', opt_func=optim)\n\nlearn.fit(3)", | |
"execution_count": 40, | |
"outputs": [ | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "<IPython.core.display.HTML object>", | |
"text/html": "\n<style>\n /* Turns off some styling */\n progress {\n /* gets rid of default border in Firefox and Opera. */\n border: none;\n /* Needs to be in here for Safari polyfill so background images work as expected. */\n background-size: auto;\n }\n progress:not([value]), progress:not([value])::-webkit-progress-bar {\n background: repeating-linear-gradient(45deg, #7e7e7e, #7e7e7e 10px, #5c5c5c 10px, #5c5c5c 20px);\n }\n .progress-bar-interrupted, .progress-bar-interrupted::-webkit-progress-bar {\n background: #F44336;\n }\n</style>\n" | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "<IPython.core.display.HTML object>", | |
"text/html": "" | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"text/plain": "<Figure size 600x400 with 1 Axes>", | |
"image/png": "" | |
}, | |
"metadata": {} | |
}, | |
{ | |
"output_type": "stream", | |
"text": "{'auc': '0.677', 'epoch': 0, 'train': 'train'}\n{'auc': '0.828', 'epoch': 0, 'train': 'eval'}\n{'auc': '0.822', 'epoch': 1, 'train': 'train'}\n{'auc': '0.831', 'epoch': 1, 'train': 'eval'}\n{'auc': '0.841', 'epoch': 2, 'train': 'train'}\n{'auc': '0.837', 'epoch': 2, 'train': 'eval'}\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# Great! we are able to get similar performance with minai vs hf trainer!", | |
"execution_count": 28, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "### Next\n- Refactor\n- Sanity checks\n- Look into activations, gradient stats" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#| hide\nimport nbdev; nbdev.nbdev_export()", | |
"execution_count": null, | |
"outputs": [] | |
} | |
], | |
"metadata": { | |
"gist": { | |
"id": "0ab31ef72e32994fd63c1680bc5c81fe", | |
"data": { | |
"description": "minai_finetuning.ipynb", | |
"public": true | |
} | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python" | |
}, | |
"language_info": { | |
"name": "python", | |
"version": "3.10.14", | |
"mimetype": "text/x-python", | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"pygments_lexer": "ipython3", | |
"nbconvert_exporter": "python", | |
"file_extension": ".py" | |
}, | |
"_draft": { | |
"nbviewer_url": "https://gist.github.com/rbiswasfc/0ab31ef72e32994fd63c1680bc5c81fe" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment