Last active
November 12, 2021 01:02
-
-
Save AseiSugiyama/d189a43f656a3313837e820bc54f873b to your computer and use it in GitHub Desktop.
vertex-pipelines-handson.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "vertex-pipelines-handson.ipynb", | |
"provenance": [], | |
"collapsed_sections": [ | |
"Fi93BUsLRg1E", | |
"EjhvP0ZIdspI", | |
"Iqh0Pcbjpcv1", | |
"ftHyqjfaq2qa" | |
], | |
"authorship_tag": "ABX9TyMoRqO2XJf8tMQv1MFouuTj", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/AseiSugiyama/d189a43f656a3313837e820bc54f873b/vertex-pipelines-handson.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "D-ev19nR4gbn" | |
}, | |
"source": [ | |
"# Vertex Pipelines Hands-on!\n", | |
"\n", | |
"## Agenda\n", | |
"\n", | |
"- Set up & Hello world pipeline\n", | |
"- [Lab Sample Pipelines](https://github.com/reproio/lab_sample_pipelines)\n", | |
"\n", | |
"## Set up & Hello world pipeline\n", | |
"\n", | |
"### Set up" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "4gxnn1Ix4aDM", | |
"outputId": "3d143737-8102-4caa-eaf0-4925bc3f416d", | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 1000 | |
} | |
}, | |
"source": [ | |
"!apt-get install tree\n", | |
"!pip install -q poetry \n", | |
"!pip install kfp" | |
], | |
"execution_count": 2, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Reading package lists... Done\n", | |
"Building dependency tree \n", | |
"Reading state information... Done\n", | |
"The following NEW packages will be installed:\n", | |
" tree\n", | |
"0 upgraded, 1 newly installed, 0 to remove and 37 not upgraded.\n", | |
"Need to get 40.7 kB of archives.\n", | |
"After this operation, 105 kB of additional disk space will be used.\n", | |
"Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 tree amd64 1.7.0-5 [40.7 kB]\n", | |
"Fetched 40.7 kB in 0s (110 kB/s)\n", | |
"Selecting previously unselected package tree.\n", | |
"(Reading database ... 155047 files and directories currently installed.)\n", | |
"Preparing to unpack .../tree_1.7.0-5_amd64.deb ...\n", | |
"Unpacking tree (1.7.0-5) ...\n", | |
"Setting up tree (1.7.0-5) ...\n", | |
"Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\n", | |
"\u001b[K |████████████████████████████████| 175 kB 5.1 MB/s \n", | |
"\u001b[K |████████████████████████████████| 40 kB 4.3 MB/s \n", | |
"\u001b[K |████████████████████████████████| 54 kB 2.5 MB/s \n", | |
"\u001b[K |████████████████████████████████| 424 kB 59.5 MB/s \n", | |
"\u001b[K |████████████████████████████████| 5.3 MB 43.5 MB/s \n", | |
"\u001b[K |████████████████████████████████| 91 kB 8.6 MB/s \n", | |
"\u001b[K |████████████████████████████████| 54 kB 1.9 MB/s \n", | |
"\u001b[K |████████████████████████████████| 3.5 MB 42.4 MB/s \n", | |
"\u001b[K |████████████████████████████████| 496 kB 56.8 MB/s \n", | |
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", | |
"datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.\u001b[0m\n", | |
"\u001b[?25hCollecting kfp\n", | |
" Downloading kfp-1.8.5.tar.gz (264 kB)\n", | |
"\u001b[K |████████████████████████████████| 264 kB 5.2 MB/s \n", | |
"\u001b[?25hCollecting absl-py<=0.11,>=0.9\n", | |
" Downloading absl_py-0.11.0-py3-none-any.whl (127 kB)\n", | |
"\u001b[K |████████████████████████████████| 127 kB 46.7 MB/s \n", | |
"\u001b[?25hCollecting PyYAML<6,>=5.3\n", | |
" Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)\n", | |
"\u001b[K |████████████████████████████████| 636 kB 50.2 MB/s \n", | |
"\u001b[?25hCollecting google-cloud-storage<2,>=1.20.0\n", | |
" Downloading google_cloud_storage-1.42.3-py2.py3-none-any.whl (105 kB)\n", | |
"\u001b[K |████████████████████████████████| 105 kB 54.6 MB/s \n", | |
"\u001b[?25hCollecting kubernetes<19,>=8.0.0\n", | |
" Downloading kubernetes-18.20.0-py2.py3-none-any.whl (1.6 MB)\n", | |
"\u001b[K |████████████████████████████████| 1.6 MB 32.3 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: google-api-python-client<2,>=1.7.8 in /usr/local/lib/python3.7/dist-packages (from kfp) (1.12.8)\n", | |
"Requirement already satisfied: google-auth<2,>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from kfp) (1.35.0)\n", | |
"Requirement already satisfied: requests-toolbelt<1,>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from kfp) (0.9.1)\n", | |
"Collecting cloudpickle<3,>=2.0.0\n", | |
" Downloading cloudpickle-2.0.0-py3-none-any.whl (25 kB)\n", | |
"Collecting kfp-server-api<2.0.0,>=1.1.2\n", | |
" Downloading kfp-server-api-1.7.0.tar.gz (52 kB)\n", | |
"\u001b[K |████████████████████████████████| 52 kB 1.3 MB/s \n", | |
"\u001b[?25hCollecting jsonschema<4,>=3.0.1\n", | |
" Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)\n", | |
"\u001b[K |████████████████████████████████| 56 kB 3.9 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: tabulate<1,>=0.8.6 in /usr/local/lib/python3.7/dist-packages (from kfp) (0.8.9)\n", | |
"Requirement already satisfied: click<9,>=7.1.2 in /usr/local/lib/python3.7/dist-packages (from kfp) (7.1.2)\n", | |
"Collecting Deprecated<2,>=1.2.7\n", | |
" Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)\n", | |
"Collecting strip-hints<1,>=0.1.8\n", | |
" Downloading strip-hints-0.1.10.tar.gz (29 kB)\n", | |
"Collecting docstring-parser<1,>=0.7.3\n", | |
" Downloading docstring_parser-0.11.tar.gz (22 kB)\n", | |
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", | |
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", | |
" Preparing wheel metadata ... \u001b[?25l\u001b[?25hdone\n", | |
"Collecting kfp-pipeline-spec<0.2.0,>=0.1.10\n", | |
" Downloading kfp_pipeline_spec-0.1.12-py3-none-any.whl (18 kB)\n", | |
"Collecting fire<1,>=0.3.1\n", | |
" Downloading fire-0.4.0.tar.gz (87 kB)\n", | |
"\u001b[K |████████████████████████████████| 87 kB 5.2 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: protobuf<4,>=3.13.0 in /usr/local/lib/python3.7/dist-packages (from kfp) (3.17.3)\n", | |
"Requirement already satisfied: uritemplate<4,>=3.0.1 in /usr/local/lib/python3.7/dist-packages (from kfp) (3.0.1)\n", | |
"Collecting pydantic<2,>=1.8.2\n", | |
" Downloading pydantic-1.8.2-cp37-cp37m-manylinux2014_x86_64.whl (10.1 MB)\n", | |
"\u001b[K |████████████████████████████████| 10.1 MB 54.5 MB/s \n", | |
"\u001b[?25hCollecting typer<1.0,>=0.3.2\n", | |
" Downloading typer-0.4.0-py3-none-any.whl (27 kB)\n", | |
"Requirement already satisfied: typing-extensions<4,>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from kfp) (3.7.4.3)\n", | |
"Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from absl-py<=0.11,>=0.9->kfp) (1.15.0)\n", | |
"Requirement already satisfied: wrapt<2,>=1.10 in /usr/local/lib/python3.7/dist-packages (from Deprecated<2,>=1.2.7->kfp) (1.12.1)\n", | |
"Requirement already satisfied: termcolor in /usr/local/lib/python3.7/dist-packages (from fire<1,>=0.3.1->kfp) (1.1.0)\n", | |
"Requirement already satisfied: google-api-core<2dev,>=1.21.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.7.8->kfp) (1.26.3)\n", | |
"Requirement already satisfied: httplib2<1dev,>=0.15.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.7.8->kfp) (0.17.4)\n", | |
"Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client<2,>=1.7.8->kfp) (0.0.4)\n", | |
"Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (1.53.0)\n", | |
"Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (2018.9)\n", | |
"Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (2.23.0)\n", | |
"Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (20.9)\n", | |
"Requirement already satisfied: setuptools>=40.3.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (57.4.0)\n", | |
"Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.1->kfp) (0.2.8)\n", | |
"Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.1->kfp) (4.2.4)\n", | |
"Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<2,>=1.6.1->kfp) (4.7.2)\n", | |
"Collecting google-cloud-core<3.0dev,>=1.6.0\n", | |
" Downloading google_cloud_core-2.1.0-py2.py3-none-any.whl (27 kB)\n", | |
"Collecting google-resumable-media<3.0dev,>=1.3.0\n", | |
" Downloading google_resumable_media-2.0.3-py2.py3-none-any.whl (75 kB)\n", | |
"\u001b[K |████████████████████████████████| 75 kB 3.5 MB/s \n", | |
"\u001b[?25hCollecting google-api-core<2dev,>=1.21.0\n", | |
" Downloading google_api_core-1.31.3-py2.py3-none-any.whl (93 kB)\n", | |
"\u001b[K |████████████████████████████████| 93 kB 1.4 MB/s \n", | |
"\u001b[?25hCollecting google-crc32c<2.0dev,>=1.0\n", | |
" Downloading google_crc32c-1.3.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38 kB)\n", | |
"Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/dist-packages (from jsonschema<4,>=3.0.1->kfp) (21.2.0)\n", | |
"Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from jsonschema<4,>=3.0.1->kfp) (1.7.0)\n", | |
"Requirement already satisfied: pyrsistent>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from jsonschema<4,>=3.0.1->kfp) (0.18.0)\n", | |
"Requirement already satisfied: urllib3>=1.15 in /usr/local/lib/python3.7/dist-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp) (1.24.3)\n", | |
"Requirement already satisfied: certifi in /usr/local/lib/python3.7/dist-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp) (2021.5.30)\n", | |
"Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from kfp-server-api<2.0.0,>=1.1.2->kfp) (2.8.2)\n", | |
"Requirement already satisfied: requests-oauthlib in /usr/local/lib/python3.7/dist-packages (from kubernetes<19,>=8.0.0->kfp) (1.3.0)\n", | |
"Collecting websocket-client!=0.40.0,!=0.41.*,!=0.42.*,>=0.32.0\n", | |
" Downloading websocket_client-1.2.1-py2.py3-none-any.whl (52 kB)\n", | |
"\u001b[K |████████████████████████████████| 52 kB 1.2 MB/s \n", | |
"\u001b[?25hRequirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=14.3->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (2.4.7)\n", | |
"Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<2,>=1.6.1->kfp) (0.4.8)\n", | |
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (2.10)\n", | |
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.21.0->google-api-python-client<2,>=1.7.8->kfp) (3.0.4)\n", | |
"Requirement already satisfied: wheel in /usr/local/lib/python3.7/dist-packages (from strip-hints<1,>=0.1.8->kfp) (0.37.0)\n", | |
"Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->jsonschema<4,>=3.0.1->kfp) (3.6.0)\n", | |
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib->kubernetes<19,>=8.0.0->kfp) (3.1.1)\n", | |
"Building wheels for collected packages: kfp, docstring-parser, fire, kfp-server-api, strip-hints\n", | |
" Building wheel for kfp (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for kfp: filename=kfp-1.8.5-py3-none-any.whl size=365073 sha256=5ce47c22e2c08135eec54b4bdd4dde69a5a6c8c9a7913da85a97edf3bc247161\n", | |
" Stored in directory: /root/.cache/pip/wheels/8c/22/69/4d6779bbc4269af9351cf1392cf33d37b5699a81f763632cd8\n", | |
" Building wheel for docstring-parser (PEP 517) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for docstring-parser: filename=docstring_parser-0.11-py3-none-any.whl size=31531 sha256=647cb459d5e7cbde6541ebac22d9bbee31b4dc89d113c95620e50ec883760491\n", | |
" Stored in directory: /root/.cache/pip/wheels/8d/ba/2a/1376f9ea0b3f20a9700b4f1b4ee3cefe69b4a96e26a28e0240\n", | |
" Building wheel for fire (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for fire: filename=fire-0.4.0-py2.py3-none-any.whl size=115943 sha256=f9d5d5211d9a983ecc848d739f18f72e36ffef93f78c5f2b2cefc9fce9a6180d\n", | |
" Stored in directory: /root/.cache/pip/wheels/8a/67/fb/2e8a12fa16661b9d5af1f654bd199366799740a85c64981226\n", | |
" Building wheel for kfp-server-api (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for kfp-server-api: filename=kfp_server_api-1.7.0-py3-none-any.whl size=92619 sha256=99653a7b5ceebaa141885ea73a1837c5ce58e8facee4f996815a3955b7a4bcb7\n", | |
" Stored in directory: /root/.cache/pip/wheels/7c/f0/36/cd1c7475b12b2541f90e4ab9413e59756a11262c1307a97633\n", | |
" Building wheel for strip-hints (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
" Created wheel for strip-hints: filename=strip_hints-0.1.10-py2.py3-none-any.whl size=22302 sha256=46e009ff33a5243ff39cf5acb05d555ad35e8d102e4d490c0343e1e6947c79d7\n", | |
" Stored in directory: /root/.cache/pip/wheels/5e/14/c3/6e44e9b2545f2d570b03f5b6d38c00b7534aa8abb376978363\n", | |
"Successfully built kfp docstring-parser fire kfp-server-api strip-hints\n", | |
"Installing collected packages: google-crc32c, google-api-core, websocket-client, PyYAML, google-resumable-media, google-cloud-core, typer, strip-hints, pydantic, kubernetes, kfp-server-api, kfp-pipeline-spec, jsonschema, google-cloud-storage, fire, docstring-parser, Deprecated, cloudpickle, absl-py, kfp\n", | |
" Attempting uninstall: google-api-core\n", | |
" Found existing installation: google-api-core 1.26.3\n", | |
" Uninstalling google-api-core-1.26.3:\n", | |
" Successfully uninstalled google-api-core-1.26.3\n", | |
" Attempting uninstall: PyYAML\n", | |
" Found existing installation: PyYAML 3.13\n", | |
" Uninstalling PyYAML-3.13:\n", | |
" Successfully uninstalled PyYAML-3.13\n", | |
" Attempting uninstall: google-resumable-media\n", | |
" Found existing installation: google-resumable-media 0.4.1\n", | |
" Uninstalling google-resumable-media-0.4.1:\n", | |
" Successfully uninstalled google-resumable-media-0.4.1\n", | |
" Attempting uninstall: google-cloud-core\n", | |
" Found existing installation: google-cloud-core 1.0.3\n", | |
" Uninstalling google-cloud-core-1.0.3:\n", | |
" Successfully uninstalled google-cloud-core-1.0.3\n", | |
" Attempting uninstall: jsonschema\n", | |
" Found existing installation: jsonschema 2.6.0\n", | |
" Uninstalling jsonschema-2.6.0:\n", | |
" Successfully uninstalled jsonschema-2.6.0\n", | |
" Attempting uninstall: google-cloud-storage\n", | |
" Found existing installation: google-cloud-storage 1.18.1\n", | |
" Uninstalling google-cloud-storage-1.18.1:\n", | |
" Successfully uninstalled google-cloud-storage-1.18.1\n", | |
" Attempting uninstall: cloudpickle\n", | |
" Found existing installation: cloudpickle 1.3.0\n", | |
" Uninstalling cloudpickle-1.3.0:\n", | |
" Successfully uninstalled cloudpickle-1.3.0\n", | |
" Attempting uninstall: absl-py\n", | |
" Found existing installation: absl-py 0.12.0\n", | |
" Uninstalling absl-py-0.12.0:\n", | |
" Successfully uninstalled absl-py-0.12.0\n", | |
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", | |
"nbclient 0.5.4 requires jupyter-client>=6.1.5, but you have jupyter-client 5.3.5 which is incompatible.\n", | |
"gym 0.17.3 requires cloudpickle<1.7.0,>=1.2.0, but you have cloudpickle 2.0.0 which is incompatible.\n", | |
"google-cloud-translate 1.5.0 requires google-cloud-core<2.0dev,>=1.0.0, but you have google-cloud-core 2.1.0 which is incompatible.\n", | |
"google-cloud-firestore 1.7.0 requires google-cloud-core<2.0dev,>=1.0.3, but you have google-cloud-core 2.1.0 which is incompatible.\n", | |
"google-cloud-datastore 1.8.0 requires google-cloud-core<2.0dev,>=1.0.0, but you have google-cloud-core 2.1.0 which is incompatible.\n", | |
"google-cloud-bigquery 1.21.0 requires google-cloud-core<2.0dev,>=1.0.3, but you have google-cloud-core 2.1.0 which is incompatible.\n", | |
"google-cloud-bigquery 1.21.0 requires google-resumable-media!=0.4.0,<0.5.0dev,>=0.3.1, but you have google-resumable-media 2.0.3 which is incompatible.\u001b[0m\n", | |
"Successfully installed Deprecated-1.2.13 PyYAML-5.4.1 absl-py-0.11.0 cloudpickle-2.0.0 docstring-parser-0.11 fire-0.4.0 google-api-core-1.31.3 google-cloud-core-2.1.0 google-cloud-storage-1.42.3 google-crc32c-1.3.0 google-resumable-media-2.0.3 jsonschema-3.2.0 kfp-1.8.5 kfp-pipeline-spec-0.1.12 kfp-server-api-1.7.0 kubernetes-18.20.0 pydantic-1.8.2 strip-hints-0.1.10 typer-0.4.0 websocket-client-1.2.1\n" | |
] | |
}, | |
{ | |
"output_type": "display_data", | |
"data": { | |
"application/vnd.colab-display-data+json": { | |
"pip_warning": { | |
"packages": [ | |
"google" | |
] | |
} | |
} | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "efnV0xiT8VaT" | |
}, | |
"source": [ | |
"ライブラリを更新した都合上、ランタイムのリスタートが必要です。\"Restart Runtime\" ボタンを押してください。\n", | |
"\n", | |
"以上で準備は終わりです。\n", | |
"\n", | |
"### Hello, world pipeline\n", | |
"\n", | |
"パイプラインに入力した文字をそのまま出力する、最小のパイプラインを組んでみましょう。\n", | |
"\n", | |
"Kubeflow Pipelines のパイプラインは「コンポーネント」と呼ばれる処理単位をつなげて作ります。コンポーネントには[Python function を用いた lightweight component を使うもの](https://www.kubeflow.org/docs/components/pipelines/sdk/v2/python-function-components/)とComponentSpec と呼ばれるインターフェースを書いて実装する[Reusable Component](https://www.kubeflow.org/docs/components/pipelines/sdk/v2/component-development/)の2通りがあります。\n", | |
"\n", | |
"まずは lightweight component を使って作ってみましょう。まずは関数を定義します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "VQqlbjxw8LA7", | |
"outputId": "74206c43-a56a-4dc5-bd4e-8024b253f3a8", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"def hello_world(text: str):\n", | |
" print(text)\n", | |
"\n", | |
"hello_world(\"Hello, world\")" | |
], | |
"execution_count": 1, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Hello, world\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "_4egPnsrBfSf" | |
}, | |
"source": [ | |
"次に `hello_world` 関数をコンポーネントに変換します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "emZdApcnBaht" | |
}, | |
"source": [ | |
"from kfp.v2 import components\n", | |
"\n", | |
"hw_op = components.component_factory.create_component_from_func(\n", | |
" hello_world, \n", | |
" output_component_file=\"hw.yaml\"\n", | |
")" | |
], | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "4IU2GgVQEd1W" | |
}, | |
"source": [ | |
"`create_component_from_func` はコンポーネントを生成するファクトリ関数を返します。生成された `hw_op` をパイプラインの内部で使うことでコンポーネントを実行するタスクを生成します。\n", | |
"\n", | |
"同様のことを `@component` デコレーターを用いると次のように記述できます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "s13Nza7Td-8Y" | |
}, | |
"source": [ | |
"from kfp.v2.dsl import component\n", | |
"\n", | |
"@component(output_component_file='hw.yaml')\n", | |
"def hw_op(text: str):\n", | |
" print(text)\n" | |
], | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "qwOnETmQemLD" | |
}, | |
"source": [ | |
"次に、パイプラインを定義します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "xPsP-YZ9B7Aj" | |
}, | |
"source": [ | |
"GCS_PIPELINE_ROOT = \"gs://your-gcs-bucket/root/\"\n", | |
"from kfp.v2 import dsl\n", | |
"\n", | |
"@dsl.pipeline(\n", | |
" name=\"hello-world\",\n", | |
" description=\"A simple intro pipeline\",\n", | |
" pipeline_root=f\"{GCS_PIPELINE_ROOT}helloworld/\"\n", | |
")\n", | |
"def hello_world_pipeline(text: str = \"hi there\"):\n", | |
" hello_world_task = hw_op(text)\n" | |
], | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "teEAEE2AFABv" | |
}, | |
"source": [ | |
"パイプラインには引数を定義できます。利用可能な型は `int`, `bool`, `float`, `double`, `str` です。\n", | |
"\n", | |
"次に、パイプラインをコンパイルし、パイプラインの定義ファイルを作成します。これは PipelineSpec と呼ばれる Protocol Buffer で記述されたスキーマを持つ JSON 文字列が書かれたファイルです。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "gMw6q5enD2HZ" | |
}, | |
"source": [ | |
"from kfp.v2 import compiler\n", | |
"compiler.Compiler().compile(\n", | |
" pipeline_func=hello_world_pipeline,\n", | |
" package_path='hello-world-pipeline.json',\n", | |
")" | |
], | |
"execution_count": 5, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "FsXQmd5dGF71" | |
}, | |
"source": [ | |
"最後に、作成したパイプラインを Vertex Pipelines で実行します。まず、アカウントにログインします。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "PD5ofYBzGL-Z" | |
}, | |
"source": [ | |
"from google.colab import auth as google_auth\n", | |
"google_auth.authenticate_user()" | |
], | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "2bxZNsswMr1P" | |
}, | |
"source": [ | |
"配布されている SDK を用いるとパイプラインを実行できます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "50_J8X7OMrkP" | |
}, | |
"source": [ | |
"GCP_PROJECT_ID = \"your-project-id\"\n", | |
"\n", | |
"from kfp.v2.google.client import AIPlatformClient\n", | |
"api_client = AIPlatformClient(project_id=GCP_PROJECT_ID, region=\"us-central1\")\n", | |
"api_client.create_run_from_job_spec(\n", | |
" job_spec_path=\"hello-world-pipeline.json\",\n", | |
")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "SY1bbFhGF-Y4" | |
}, | |
"source": [ | |
"## Hello, world deep dive\n", | |
"\n", | |
"これまでに実行した内容について、より詳細を見ていきましょう。\n", | |
"\n", | |
"### Introduction to reusable component\n", | |
"\n", | |
"Kubeflow Pipelines の Pipeline は次の要素から成り立っています。\n", | |
"\n", | |
"- Pipeline Python DSL : Pipeline の定義ファイル (.py)\n", | |
"- ComponentSpec : コンポーネントの I/O の定義ファイル (.yaml)\n", | |
"- Container Image : コンポーネントの処理を実行する実体 (Container Image)\n", | |
"\n", | |
"#### Python DSL\n", | |
"\n", | |
"これまでのうち、次の部分が相当します。\n", | |
"\n", | |
"```python\n", | |
"@dsl.pipeline(\n", | |
" name=\"hello-world\",\n", | |
" description=\"A simple intro pipeline\",\n", | |
" pipeline_root=f\"{GCS_PIPELINE_ROOT}helloworld/\"\n", | |
")\n", | |
"def hello_world_pipeline(text: str = \"hi there\"):\n", | |
" hello_world_task = hw_op(text)\n", | |
"```\n", | |
"\n", | |
"#### ComponentSpec\n", | |
"\n", | |
"コンポーネントを Python 関数から作成するときに同時に `hw.yaml` を作成しました。これは Protocol Buffer で定義された ComponentSpec と呼ばれるスキーマに従っており、コンポーネントのインターフェースを規定します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "gioeD6HhPkHR", | |
"outputId": "3e95aadd-d52c-47a8-c133-35e997ba1979", | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
} | |
}, | |
"source": [ | |
"!cat hw.yaml" | |
], | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"name: Hw op\n", | |
"inputs:\n", | |
"- {name: text, type: String}\n", | |
"implementation:\n", | |
" container:\n", | |
" image: python:3.7\n", | |
" command:\n", | |
" - sh\n", | |
" - -c\n", | |
" - (python3 -m ensurepip || python3 -m ensurepip --user) && (PIP_DISABLE_PIP_VERSION_CHECK=1\n", | |
" python3 -m pip install --quiet --no-warn-script-location 'kfp==1.8.5'\n", | |
" || PIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip install --quiet --no-warn-script-location\n", | |
" 'kfp==1.8.5' --user) && \"$0\" \"$@\"\n", | |
" - sh\n", | |
" - -ec\n", | |
" - |\n", | |
" program_path=$(mktemp -d)\n", | |
" printf \"%s\" \"$0\" > \"$program_path/ephemeral_component.py\"\n", | |
" python3 -m kfp.v2.components.executor_main --component_module_path \"$program_path/ephemeral_component.py\" \"$@\"\n", | |
" - |2+\n", | |
"\n", | |
" import kfp\n", | |
" from kfp.v2 import dsl\n", | |
" from kfp.v2.dsl import *\n", | |
" from typing import *\n", | |
"\n", | |
" def hw_op(text: str):\n", | |
" print(text)\n", | |
"\n", | |
" args:\n", | |
" - --executor_input\n", | |
" - {executorInput: null}\n", | |
" - --function_to_execute\n", | |
" - hw_op\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "uCHZsjYoRDY2" | |
}, | |
"source": [ | |
"この YAML ファイルを読み込むことでコンポーネントを作成できます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "sB_mzza-RSbV" | |
}, | |
"source": [ | |
"from kfp.components import load_component_from_file\n", | |
"\n", | |
"hw_op2 = load_component_from_file(\"hw.yaml\")" | |
], | |
"execution_count": 12, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Fi93BUsLRg1E" | |
}, | |
"source": [ | |
"#### Container Image\n", | |
"\n", | |
"利用するコンテナイメージは ComponentSpec で定義します。コンテナイメージに必要な処理を実装し、それを呼び出すのが一般的な利用方法です。今回は `python3.7` を用いていますが、実際はコンテナレジストリを用いて `gcr.io/your-container-image:your-tag` として指定するほうが実用上好ましいです。\n", | |
"\n", | |
"### PipelineSpec\n", | |
"\n", | |
"Pipeline の Python DSL 実装を Compiler に渡すことで Pipelines の定義ファイルが生成されます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "UtuE7X3qEJhl" | |
}, | |
"source": [ | |
"!cat hello-world-pipeline.json" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "VANskWwXTooc" | |
}, | |
"source": [ | |
"この中には定義したコンポーネントや、それらの実行順序の依存関係が含まれています。これを人手で修正するのは現実的ではありません、最終手段としたほうが良いでしょう。\n", | |
"\n", | |
"### Component Input/Output\n", | |
"\n", | |
"次のようにすることで、より複雑なパイプラインを構築できます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "wm7l6jPKEMGd" | |
}, | |
"source": [ | |
"@dsl.component\n", | |
"def echo(text: str) -> str:\n", | |
" return text\n", | |
"\n", | |
"@dsl.pipeline(\n", | |
" name=\"producer-consumer-pipeline\",\n", | |
" pipeline_root=f\"{GCS_PIPELINE_ROOT}producer-consumer/\"\n", | |
")\n", | |
"def producer_consumer_pipeline(text: str = \"hi there\"):\n", | |
" producer = echo(text)\n", | |
" consumer = echo(producer.output)" | |
], | |
"execution_count": 13, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "KYxMbHL1XdoM" | |
}, | |
"source": [ | |
"このように、ある処理の出力を次の処理の出力に与えることで、実行順に依存関係が定義されます。実際に実行してみましょう。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "0DGmF7FlVLth" | |
}, | |
"source": [ | |
"compiler.Compiler().compile(\n", | |
" pipeline_func=hello_world_pipeline,\n", | |
" package_path='producer-consumer-pipeline.json',\n", | |
")\n", | |
"api_client = AIPlatformClient(project_id=GCP_PROJECT_ID, region=\"us-central1\")\n", | |
"api_client.create_run_from_job_spec(\n", | |
" job_spec_path='producer-consumer-pipeline.json',\n", | |
")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "hr_b4CiLXy2t" | |
}, | |
"source": [ | |
"### GPU の利用\n", | |
"\n", | |
"同様な Pipeline の定義で GPU を利用できます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "kQBePTU_WGwM" | |
}, | |
"source": [ | |
"@dsl.pipeline(\n", | |
" name=\"hello-world-with-gpu\",\n", | |
" description=\"A simple intro pipeline with GPU\",\n", | |
" pipeline_root=f\"{GCS_PIPELINE_ROOT}helloworld/\"\n", | |
")\n", | |
"def hello_world_pipeline_with_gpu(text: str = \"hi accelerator\"):\n", | |
" hello_world_task = (hw_op(text)\n", | |
" .add_node_selector_constraint('cloud.google.com/gke-accelerator', 'nvidia-tesla-k80')\n", | |
" .set_gpu_limit(1))" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "OksnsRzSZIi0" | |
}, | |
"source": [ | |
"compiler.Compiler().compile(\n", | |
" pipeline_func=hello_world_pipeline_with_gpu,\n", | |
" package_path='producer-consumer-pipeline-with-GPU.json',\n", | |
")\n", | |
"api_client = AIPlatformClient(project_id=GCP_PROJECT_ID, region=\"us-central1\")\n", | |
"api_client.create_run_from_job_spec(\n", | |
" job_spec_path='producer-consumer-pipeline-with-GPU.json',\n", | |
")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "UEMKw_MdZnQ-" | |
}, | |
"source": [ | |
"## lab_sample_pipeline\n", | |
"\n", | |
"[reproio/lab_sample_pipelines](https://github.com/reproio/lab_sample_pipelines/tree/main/kfp) を用いて、開発時のワークフローを体験します。\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "w4x9S5p3ZV9n" | |
}, | |
"source": [ | |
"!git clone https://github.com/reproio/lab_sample_pipelines.git" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "iF9nPrOPdYGV" | |
}, | |
"source": [ | |
"### プロジェクト開始時\n", | |
"\n", | |
"これまでの成果物を踏まえ、作成する機械学習パイプラインについて関係者で認識を揃えます。\n", | |
"関係者で統一化されたフォーマット、例えばインセプションデッキを書くことが考えられます。\n", | |
"\n", | |
"### パイプラインの要件定義\n", | |
"\n", | |
"機械学習パイプラインをコンポーネントに分割し、パイプラインの Spec を書きます。記述例は[こちらを確認ください](https://github.com/reproio/lab_sample_pipelines/blob/main/kfp/README.md)。\n", | |
"\n", | |
"詳細に決めきる必要はありませんが、コンポーネントの I/O を変更するとパイプライン全体に変更が及ぶケースもあるため、ある程度の決めは必要です。\n", | |
"\n", | |
"### 各コンポーネントの開発\n", | |
"\n", | |
"次の作業を実施します。\n", | |
"\n", | |
"- 仕様の記述\n", | |
"- 開発/単体テスト\n", | |
"- テスト\n", | |
"- コンテナイメージのビルド\n", | |
"\n", | |
"プロジェクトのディレクトリ構造は次のようになっています。\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "3jW7gJ1ZdsQI" | |
}, | |
"source": [ | |
"!tree ." | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "EjhvP0ZIdspI" | |
}, | |
"source": [ | |
"\n", | |
"#### 仕様の記述\n", | |
"\n", | |
"個々のコンポーネントの仕様を記述します。記述例は[こちらを参照ください](https://github.com/reproio/lab_sample_pipelines/blob/main/kfp/components/data_generator/README.md)。\n", | |
"\n", | |
"#### 開発/単体テスト\n", | |
"\n", | |
"コンポーネントの開発を行います。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "03WJGTyJeIWd" | |
}, | |
"source": [ | |
"%cd lab_sample_pipelines/kfp/components/data_generator/" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "oPLdBTPGeX-3" | |
}, | |
"source": [ | |
"ディレクトリ構造は次のようになっています。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "FJ28RbU7egNX" | |
}, | |
"source": [ | |
"! tree ." | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "vmWw8h7iee44" | |
}, | |
"source": [ | |
"[Kubeflowのドキュメント](https://www.kubeflow.org/docs/components/pipelines/sdk/v2/component-development/#organizing-the-component-files)にも同様なディレクトリ構造が推奨されています。\n", | |
"\n", | |
"開発自体は CLI アプリケーション開発と同様です。コンポーネントごとに利用するライブラリが違うため、コンポーネントごとにライブラリの依存関係を管理します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "uoE_ODS_jISM" | |
}, | |
"source": [ | |
"%%writefile pyproject.toml\n", | |
"[tool.poetry]\n", | |
"name = \"kfp_sample_data_generator\"\n", | |
"version = \"0.0.0\"\n", | |
"description = \"data-generator component of kfp sample pipeline\"\n", | |
"authors = [\"Repro AI Lab Team <[email protected]>\"]\n", | |
"license = \"MIT\"\n", | |
"\n", | |
"[tool.poetry.dependencies]\n", | |
"python = \"^3.7.0\"\n", | |
"pip = \"^21.1.1\"\n", | |
"numpy = \"^1.20.3\"\n", | |
"scikit-learn = \"^0.24.2\"\n", | |
"\n", | |
"[tool.poetry.dev-dependencies]\n", | |
"pydocstyle = \"*\"\n", | |
"bandit = \"*\"\n", | |
"prospector = \"*\"\n", | |
"mypy = \"*\"\n", | |
"pytest = \"*\"\n", | |
"black = { version = \"*\", allow-prereleases = true }\n", | |
"\n", | |
"[build-system]\n", | |
"requires = [\"poetry>=0.12\"]\n", | |
"build-backend = \"poetry.masonry.api\"" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "SU3yhbC5k194" | |
}, | |
"source": [ | |
"!poetry update\n", | |
"!poetry install" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "t4RMOo2Ro9jZ" | |
}, | |
"source": [ | |
"開発と同時にテストを記述していきます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "4XiFlbbvn3xw" | |
}, | |
"source": [ | |
"!poetry run pytest" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "3QbTe6KYpAfV" | |
}, | |
"source": [ | |
"単体テストでは I/O の動作を確認できないことがあるため、一度 CLI から実行しておきましょう。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "g-GB79LmotL1" | |
}, | |
"source": [ | |
"!poetry run python src/data_generator.py ./tmp/train.csv ./tmp/eval.csv" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Iqh0Pcbjpcv1" | |
}, | |
"source": [ | |
"今回は省きますが、この後、コンテナイメージを作成し GCR などのコンテナレジストリにプッシュしておきます。\n", | |
"\n", | |
"#### ComponentSpec の記述\n", | |
"\n", | |
"各コンポーネントのインターフェースを規定する YAML ファイルを記述します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "TzwDrxoWqUNu" | |
}, | |
"source": [ | |
"%cd /content/lab_sample_pipelines/kfp/" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "k0Dc_FCDpYu1" | |
}, | |
"source": [ | |
"!cat components/data_generator/src/data_generator.yaml" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "thgqzsuDqG4S" | |
}, | |
"source": [ | |
"!cat components/transform/src/transform.yaml" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "fbjmOeD_qq6E" | |
}, | |
"source": [ | |
"!cat components/trainer/src/trainer.yaml" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "G_mBSkAkquGG" | |
}, | |
"source": [ | |
"!cat components/evaluator/src/evaluator.yaml" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ftHyqjfaq2qa" | |
}, | |
"source": [ | |
"#### Python DSL による Pipeline の記述\n", | |
"\n", | |
"Python により Pipeline を定義します。複数の出力があるコンポーネントを他のコンポーネントの入力へ用いるケースでは、`op_task.outputs[key]` のように key-value 形式で出力を指定します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "MW08Xh3lqzLY" | |
}, | |
"source": [ | |
"%%writefile pipeline.py\n", | |
"\"\"\"\"KFP penguin classification sample pipeline\"\"\"\n", | |
"\n", | |
"from pathlib import Path\n", | |
"from typing import Union\n", | |
"from string import Template\n", | |
"from enum import Enum\n", | |
"\n", | |
"import kfp\n", | |
"from kfp.v2 import compiler\n", | |
"\n", | |
"#\n", | |
"# CONSTANTS\n", | |
"# ------------------------------------------------------------------------------\n", | |
"\n", | |
"PIPELINE_NAME = \"kfp-sample-pipeline\"\n", | |
"COMPONENT_PREFIX = \"kfp-sample\"\n", | |
"GCP_PROJECT_ID = \"your-gcp-project\" # Enter your GCP project ID\n", | |
"GCP_GCR_ENDPOINT = \"gcr.io\" # Enter your GCR endpoint\n", | |
"GCP_GCS_PIPELINE_ROOT = \"gs://your-gcs-bucket/\" # Enter your GCS bucket\n", | |
"\n", | |
"\n", | |
"class GeneratedData(Enum):\n", | |
" TrainData = \"train_data_path\"\n", | |
" EvalData = \"eval_data_path\"\n", | |
" TransformedTrainData = \"transformed_train_data_path\"\n", | |
" TransformedEvalData = \"transformed_eval_data_path\"\n", | |
" TrainedModel = \"trained_model_path\"\n", | |
"\n", | |
"\n", | |
"#\n", | |
"# SUB FUNCTIONS\n", | |
"# ------------------------------------------------------------------------------\n", | |
"\n", | |
"\n", | |
"def get_version_from_toml(toml_path: str) -> Union[str, None]:\n", | |
" path = Path(toml_path)\n", | |
" lines = path.read_text().split(\"\\n\")\n", | |
" for line in lines:\n", | |
" if line.startswith(\"version = \"):\n", | |
" _, right = line.split(\"=\")\n", | |
" return right.replace('\"', \"\").strip()\n", | |
" return \"latest\"\n", | |
"\n", | |
"\n", | |
"def get_component_spec(name: str) -> str:\n", | |
" base_dir = f\"components/{name.replace('-', '_')}\"\n", | |
" version = get_version_from_toml(f\"{base_dir}/pyproject.toml\")\n", | |
" tag = f\"v{version}\"\n", | |
" image = f\"{GCP_GCR_ENDPOINT}/{GCP_PROJECT_ID}/{COMPONENT_PREFIX}-{name}:{tag}\"\n", | |
" path = Path(f\"{base_dir}/src/{name.replace('-', '_')}.yaml\")\n", | |
" template = Template(path.read_text())\n", | |
" return template.substitute(tagged_name=image)\n", | |
"\n", | |
"\n", | |
"#\n", | |
"# COMPONENTS\n", | |
"# ------------------------------------------------------------------------------\n", | |
"\n", | |
"\n", | |
"def _data_generator_op() -> kfp.dsl.ContainerOp:\n", | |
" name = \"data-generator\"\n", | |
" component_spec = get_component_spec(name)\n", | |
" data_generator_op = kfp.components.load_component_from_text(component_spec)\n", | |
" return data_generator_op()\n", | |
"\n", | |
"\n", | |
"def _transform_op(\n", | |
" train_data_path: str, eval_data_path: str, suffix: str\n", | |
") -> kfp.dsl.ContainerOp:\n", | |
" name = \"transform\"\n", | |
" component_spec = get_component_spec(name)\n", | |
" data_generator_op = kfp.components.load_component_from_text(component_spec)\n", | |
" return data_generator_op(\n", | |
" train_data_path=train_data_path, eval_data_path=eval_data_path, suffix=suffix\n", | |
" )\n", | |
"\n", | |
"\n", | |
"def _trainer_op(transformed_train_data_path: str, suffix: str) -> kfp.dsl.ContainerOp:\n", | |
" name = \"trainer\"\n", | |
" component_spec = get_component_spec(name)\n", | |
" trainer_op = kfp.components.load_component_from_text(component_spec)\n", | |
" return trainer_op(\n", | |
" transformed_train_data_path=transformed_train_data_path, suffix=suffix\n", | |
" )\n", | |
"\n", | |
"\n", | |
"def _evaluator_op(\n", | |
" trained_model_path: str, transformed_eval_data_path: str, suffix: str\n", | |
") -> kfp.dsl.ContainerOp:\n", | |
" name = \"evaluator\"\n", | |
" component_spec = get_component_spec(name)\n", | |
" evaluator_op = kfp.components.load_component_from_text(component_spec)\n", | |
" return evaluator_op(\n", | |
" trained_model_path=trained_model_path,\n", | |
" transformed_eval_data_path=transformed_eval_data_path,\n", | |
" suffix=suffix,\n", | |
" )\n", | |
"\n", | |
"\n", | |
"#\n", | |
"# PIPELINE\n", | |
"# ------------------------------------------------------------------------------\n", | |
"\n", | |
"\n", | |
"@kfp.dsl.pipeline(\n", | |
" name=PIPELINE_NAME,\n", | |
" pipeline_root=GCP_GCS_PIPELINE_ROOT,\n", | |
")\n", | |
"def kfp_sample_pipeline(suffix: str = \"_xf\"):\n", | |
" data_generator = _data_generator_op()\n", | |
" transform = _transform_op(\n", | |
" train_data_path=data_generator.outputs[GeneratedData.TrainData.value],\n", | |
" eval_data_path=data_generator.outputs[GeneratedData.EvalData.value],\n", | |
" suffix=suffix,\n", | |
" )\n", | |
" trainer = _trainer_op(\n", | |
" transformed_train_data_path=transform.outputs[\n", | |
" GeneratedData.TransformedTrainData.value\n", | |
" ],\n", | |
" suffix=suffix,\n", | |
" )\n", | |
" _ = _evaluator_op(\n", | |
" trained_model_path=trainer.outputs[GeneratedData.TrainedModel.value],\n", | |
" transformed_eval_data_path=transform.outputs[\n", | |
" GeneratedData.TransformedEvalData.value\n", | |
" ],\n", | |
" suffix=suffix,\n", | |
" )\n", | |
"\n", | |
"\n", | |
"# Compile the pipeline with V2 SDK to test the compatibility between V1 and V2 SDK\n", | |
"compiler.Compiler().compile(\n", | |
" pipeline_func=kfp_sample_pipeline,\n", | |
" package_path=\"kfp_sample_pipeline.json\",\n", | |
")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Yqb-eC89sYzW" | |
}, | |
"source": [ | |
"パイプラインをコンパイルしましょう。上記の Python ファイルを実行することでコンパイルできます。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "c7m-n8-ZsYlV" | |
}, | |
"source": [ | |
"!python pipeline.py" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "2iwxfSTDstZG" | |
}, | |
"source": [ | |
"コンパイルしたパイプラインをこれまで同様に実行します。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "NIEleRF6rxK8" | |
}, | |
"source": [ | |
"api_client = AIPlatformClient(project_id=GCP_PROJECT_ID, region=\"us-central1\")\n", | |
"api_client.create_run_from_job_spec(\n", | |
" job_spec_path='kfp_sample_pipeline.json',\n", | |
")" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "jSZtovuvtK5X" | |
}, | |
"source": [ | |
"## 付録. Kubeflow Pipelines v1 での実行\n", | |
"\n", | |
"今回のパイプラインは Vertex Pipelines でも Kubeflow Pipelines v1 でも動くように作られています。とはいえ、再コンパイルは必要なのでまずは Kubeflow Pipelines で動かすために再コンパイルしましょう。\n", | |
"\n", | |
"コンパイルの方法は[lab_sample_pipelines/DEPLOYMENT.md](https://github.com/reproio/lab_sample_pipelines/blob/main/kfp/DEPLOYMENT.md)を参照ください。" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "pbRzJlups4Xo" | |
}, | |
"source": [ | |
"!dsl-compile --py pipeline.py --output kfp_sample_pipeline.yaml" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "dyIb8rFftYpz" | |
}, | |
"source": [ | |
"from google.colab import files\n", | |
"\n", | |
"files.download('kfp_sample_pipeline.yaml')" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "RmukmbYdvDVT" | |
}, | |
"source": [ | |
"デプロイした後は次のコマンドを実行することで、Kubernetes 上での振る舞いを確認できます。\n", | |
"\n", | |
"- `kubectl api-resources`\n", | |
"- `kubectl get wf`\n", | |
"- `kubectl describe wf [ワークフロー名]`" | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment