Skip to content

Instantly share code, notes, and snippets.

@mantasu
Last active January 29, 2024 17:15
Show Gist options
  • Save mantasu/33c9562cfaaee2ad5aa36a4717c8a7e6 to your computer and use it in GitHub Desktop.
Save mantasu/33c9562cfaaee2ad5aa36a4717c8a7e6 to your computer and use it in GitHub Desktop.
How to utilize Colab GPUs for training with your own data that can be uploaded to Kaggle
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# From Kaggle to Colab\n",
"\n",
"This guide simply shows how utilize Colab GPUs with your own data that can be uploaded to Kaggle. Before starting, ensure GPU is selected by going to `runtime -> change runtime type`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Configuration\n",
"\n",
"Step-by-step runtime configuration:\n",
"1. **Mounting**: mount to your hard-drive to access and save model checkpoints\n",
"2. **Kaggle API**: add *Kaggle API* to access `kaggle` commands like downloading datasets\n",
"3. **Dataset**: download and unzip your *Kaggle* dataset\n",
"4. **Dependencies**: install `pip` packages and upgrade others\n",
"5. **Fix libraries**: if some libraries fail, e.g., use deprecated code which cannot be resolved by upgrading them, manually replace the code\n",
"6. **Autoreload**: add `%autoreload` to refresh any imported libraries them whenever things change"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"with open(\"kaggle.json\", 'w') as f:\n",
" # Enter your own Kaggle username and API key here\n",
" json.dump({\"username\": \"[kaggle-username]\",\"key\":\"[kaggle-API-key]\"}, f)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir ~/.kaggle\n",
"!mv kaggle.json ~/.kaggle/\n",
"!chmod 600 ~/.kaggle/kaggle.json\n",
"!kaggle datasets download [kaggle-username]/[dataset-name] --unzip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install jsonargparse[signatures] pytorch_lightning torchmetrics\n",
"!pip install --upgrade numpy scipy"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def fix_library_code(lib_filepath: str, old_str: str, new_str: str):\n",
" with open(lib_filepath, 'r') as file:\n",
" # Read library file contents\n",
" file_contents = file.read()\n",
"\n",
" # Replace all instances of old code with new code\n",
" file_contents = file_contents.replace(old_str, new_str)\n",
" \n",
" with open(lib_filepath, 'w') as file:\n",
" # Write modified contents\n",
" file.write(file_contents)\n",
"\n",
"# Define which library has broken code and what needs to be replaced\n",
"LIB_FILEPATH = \"/usr/local/lib/python3.10/dist-packages/skimage/transform/_geometric.py\"\n",
"OLD_STR = \"_tesselation.vertices\"\n",
"NEW_STR = \"_tesselation.simplices\"\n",
"\n",
"# Fix the library code (remember to use %autoreload!)\n",
"fix_library_code(LIB_FILEPATH, OLD_STR, NEW_STR)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Always reload when libraries have changed\n",
"%load_ext autoreload\n",
"%autoreload 2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Main\n",
"\n",
"Here, just define the runtime behavior before running the main code, for example, specify that the path to checkpoints is in `MyDrive`. Further, it is possible to ass `sys.argv` manually if the python script added to this notebook is expected to be run from the command-line."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"kwargs = [\n",
" \"fit\",\n",
" \"--root\", \"\",\n",
" \"--batch-size\", \"64\",\n",
" \"--num-workers\", \"2\",\n",
" \"--checkpoint.dirpath\", \"/content/drive/MyDrive/checkpoints/runtime\",\n",
" \"--ckpt_path\", \"/content/drive/MyDrive/checkpoints/saved/my_checkpoint.ckpt\",\n",
"]\n",
"\n",
"sys.argv = [sys.argv[0]] + kwargs\n",
"main()"
]
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment