Last active
January 29, 2024 17:15
-
-
Save mantasu/33c9562cfaaee2ad5aa36a4717c8a7e6 to your computer and use it in GitHub Desktop.
How to utilize Colab GPUs for training with your own data that can be uploaded to Kaggle
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# From Kaggle to Colab\n", | |
"\n", | |
"This guide simply shows how utilize Colab GPUs with your own data that can be uploaded to Kaggle. Before starting, ensure GPU is selected by going to `runtime -> change runtime type`" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Configuration\n", | |
"\n", | |
"Step-by-step runtime configuration:\n", | |
"1. **Mounting**: mount to your hard-drive to access and save model checkpoints\n", | |
"2. **Kaggle API**: add *Kaggle API* to access `kaggle` commands like downloading datasets\n", | |
"3. **Dataset**: download and unzip your *Kaggle* dataset\n", | |
"4. **Dependencies**: install `pip` packages and upgrade others\n", | |
"5. **Fix libraries**: if some libraries fail, e.g., use deprecated code which cannot be resolved by upgrading them, manually replace the code\n", | |
"6. **Autoreload**: add `%autoreload` to refresh any imported libraries them whenever things change" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from google.colab import drive\n", | |
"drive.mount('/content/drive')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import json\n", | |
"\n", | |
"with open(\"kaggle.json\", 'w') as f:\n", | |
" # Enter your own Kaggle username and API key here\n", | |
" json.dump({\"username\": \"[kaggle-username]\",\"key\":\"[kaggle-API-key]\"}, f)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!mkdir ~/.kaggle\n", | |
"!mv kaggle.json ~/.kaggle/\n", | |
"!chmod 600 ~/.kaggle/kaggle.json\n", | |
"!kaggle datasets download [kaggle-username]/[dataset-name] --unzip" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"!pip install jsonargparse[signatures] pytorch_lightning torchmetrics\n", | |
"!pip install --upgrade numpy scipy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"def fix_library_code(lib_filepath: str, old_str: str, new_str: str):\n", | |
" with open(lib_filepath, 'r') as file:\n", | |
" # Read library file contents\n", | |
" file_contents = file.read()\n", | |
"\n", | |
" # Replace all instances of old code with new code\n", | |
" file_contents = file_contents.replace(old_str, new_str)\n", | |
" \n", | |
" with open(lib_filepath, 'w') as file:\n", | |
" # Write modified contents\n", | |
" file.write(file_contents)\n", | |
"\n", | |
"# Define which library has broken code and what needs to be replaced\n", | |
"LIB_FILEPATH = \"/usr/local/lib/python3.10/dist-packages/skimage/transform/_geometric.py\"\n", | |
"OLD_STR = \"_tesselation.vertices\"\n", | |
"NEW_STR = \"_tesselation.simplices\"\n", | |
"\n", | |
"# Fix the library code (remember to use %autoreload!)\n", | |
"fix_library_code(LIB_FILEPATH, OLD_STR, NEW_STR)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"# Always reload when libraries have changed\n", | |
"%load_ext autoreload\n", | |
"%autoreload 2" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Main\n", | |
"\n", | |
"Here, just define the runtime behavior before running the main code, for example, specify that the path to checkpoints is in `MyDrive`. Further, it is possible to ass `sys.argv` manually if the python script added to this notebook is expected to be run from the command-line." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"kwargs = [\n", | |
" \"fit\",\n", | |
" \"--root\", \"\",\n", | |
" \"--batch-size\", \"64\",\n", | |
" \"--num-workers\", \"2\",\n", | |
" \"--checkpoint.dirpath\", \"/content/drive/MyDrive/checkpoints/runtime\",\n", | |
" \"--ckpt_path\", \"/content/drive/MyDrive/checkpoints/saved/my_checkpoint.ckpt\",\n", | |
"]\n", | |
"\n", | |
"sys.argv = [sys.argv[0]] + kwargs\n", | |
"main()" | |
] | |
} | |
], | |
"metadata": { | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment