mantasu · January 29, 2024 17:15
diff --git a/kaggle-to-colab.ipynb b/kaggle-to-colab.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# From Kaggle to Colab\n",
    "\n",
    "This guide simply shows how utilize Colab GPUs with your own data that can be uploaded to Kaggle. Before starting, ensure GPU is selected by going to `runtime -> change runtime type`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "\n",
    "Step-by-step runtime configuration:\n",
    "1. **Mounting**: mount to your hard-drive to access and save model checkpoints\n",
    "2. **Kaggle API**: add *Kaggle API* to access `kaggle` commands like downloading datasets\n",
    "3. **Dataset**: download and unzip your *Kaggle* dataset\n",
    "4. **Dependencies**: install `pip` packages and upgrade others\n",
    "5. **Fix libraries**: if some libraries fail, e.g., use deprecated code which cannot be resolved by upgrading them, manually replace the code\n",
    "6. **Autoreload**: add `%autoreload` to refresh any imported libraries them whenever things change"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import drive\n",
    "drive.mount('/content/drive')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "with open(\"kaggle.json\", 'w') as f:\n",
    "    # Enter your own Kaggle username and API key here\n",
    "    json.dump({\"username\": \"[kaggle-username]\",\"key\":\"[kaggle-API-key]\"}, f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir ~/.kaggle\n",
    "!mv kaggle.json ~/.kaggle/\n",
    "!chmod 600 ~/.kaggle/kaggle.json\n",
    "!kaggle datasets download [kaggle-username]/[dataset-name] --unzip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install jsonargparse[signatures] pytorch_lightning torchmetrics\n",
    "!pip install --upgrade numpy scipy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fix_library_code(lib_filepath: str, old_str: str, new_str: str):\n",
    "    with open(lib_filepath, 'r') as file:\n",
    "        # Read library file contents\n",
    "        file_contents = file.read()\n",
    "\n",
    "    # Replace all instances of old code with new code\n",
    "    file_contents = file_contents.replace(old_str, new_str)\n",
    "    \n",
    "    with open(lib_filepath, 'w') as file:\n",
    "        # Write modified contents\n",
    "        file.write(file_contents)\n",
    "\n",
    "# Define which library has broken code and what needs to be replaced\n",
    "LIB_FILEPATH = \"/usr/local/lib/python3.10/dist-packages/skimage/transform/_geometric.py\"\n",
    "OLD_STR = \"_tesselation.vertices\"\n",
    "NEW_STR = \"_tesselation.simplices\"\n",
    "\n",
    "# Fix the library code (remember to use %autoreload!)\n",
    "fix_library_code(LIB_FILEPATH, OLD_STR, NEW_STR)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Always reload when libraries have changed\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Main\n",
    "\n",
    "Here, just define the runtime behavior before running the main code, for example, specify that the path to checkpoints is in `MyDrive`. Further, it is possible to ass `sys.argv` manually if the python script added to this notebook is expected to be run from the command-line."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "kwargs = [\n",
    "    \"fit\",\n",
    "    \"--root\", \"\",\n",
    "    \"--batch-size\", \"64\",\n",
    "    \"--num-workers\", \"2\",\n",
    "    \"--checkpoint.dirpath\", \"/content/drive/MyDrive/checkpoints/runtime\",\n",
    "    \"--ckpt_path\", \"/content/drive/MyDrive/checkpoints/saved/my_checkpoint.ckpt\",\n",
    "]\n",
    "\n",
    "sys.argv = [sys.argv[0]] + kwargs\n",
    "main()"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# From Kaggle to Colab\n",
	"\n",
	"This guide simply shows how utilize Colab GPUs with your own data that can be uploaded to Kaggle. Before starting, ensure GPU is selected by going to `runtime -> change runtime type`"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Configuration\n",
	"\n",
	"Step-by-step runtime configuration:\n",
	"1. Mounting: mount to your hard-drive to access and save model checkpoints\n",
	"2. Kaggle API: add Kaggle API to access `kaggle` commands like downloading datasets\n",
	"3. Dataset: download and unzip your Kaggle dataset\n",
	"4. Dependencies: install `pip` packages and upgrade others\n",
	"5. Fix libraries: if some libraries fail, e.g., use deprecated code which cannot be resolved by upgrading them, manually replace the code\n",
	"6. Autoreload: add `%autoreload` to refresh any imported libraries them whenever things change"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"from google.colab import drive\n",
	"drive.mount('/content/drive')"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"import json\n",
	"\n",
	"with open(\"kaggle.json\", 'w') as f:\n",
	" # Enter your own Kaggle username and API key here\n",
	" json.dump({\"username\": \"[kaggle-username]\",\"key\":\"[kaggle-API-key]\"}, f)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"!mkdir ~/.kaggle\n",
	"!mv kaggle.json ~/.kaggle/\n",
	"!chmod 600 ~/.kaggle/kaggle.json\n",
	"!kaggle datasets download [kaggle-username]/[dataset-name] --unzip"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"!pip install jsonargparse[signatures] pytorch_lightning torchmetrics\n",
	"!pip install --upgrade numpy scipy"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"def fix_library_code(lib_filepath: str, old_str: str, new_str: str):\n",
	" with open(lib_filepath, 'r') as file:\n",
	" # Read library file contents\n",
	" file_contents = file.read()\n",
	"\n",
	" # Replace all instances of old code with new code\n",
	" file_contents = file_contents.replace(old_str, new_str)\n",
	" \n",
	" with open(lib_filepath, 'w') as file:\n",
	" # Write modified contents\n",
	" file.write(file_contents)\n",
	"\n",
	"# Define which library has broken code and what needs to be replaced\n",
	"LIB_FILEPATH = \"/usr/local/lib/python3.10/dist-packages/skimage/transform/_geometric.py\"\n",
	"OLD_STR = \"_tesselation.vertices\"\n",
	"NEW_STR = \"_tesselation.simplices\"\n",
	"\n",
	"# Fix the library code (remember to use %autoreload!)\n",
	"fix_library_code(LIB_FILEPATH, OLD_STR, NEW_STR)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"# Always reload when libraries have changed\n",
	"%load_ext autoreload\n",
	"%autoreload 2"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Main\n",
	"\n",
	"Here, just define the runtime behavior before running the main code, for example, specify that the path to checkpoints is in `MyDrive`. Further, it is possible to ass `sys.argv` manually if the python script added to this notebook is expected to be run from the command-line."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": [
	"kwargs = [\n",
	" \"fit\",\n",
	" \"--root\", \"\",\n",
	" \"--batch-size\", \"64\",\n",
	" \"--num-workers\", \"2\",\n",
	" \"--checkpoint.dirpath\", \"/content/drive/MyDrive/checkpoints/runtime\",\n",
	" \"--ckpt_path\", \"/content/drive/MyDrive/checkpoints/saved/my_checkpoint.ckpt\",\n",
	"]\n",
	"\n",
	"sys.argv = [sys.argv[0]] + kwargs\n",
	"main()"
	]
	}
	],
	"metadata": {
	"language_info": {
	"name": "python"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}