Skip to content

Instantly share code, notes, and snippets.

@manisnesan
Created October 11, 2023 19:06
Show Gist options
  • Save manisnesan/c008f853cd1699db87ccb097a1945e77 to your computer and use it in GitHub Desktop.
Save manisnesan/c008f853cd1699db87ccb097a1945e77 to your computer and use it in GitHub Desktop.
021_caikit_tutorial.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/manisnesan/c008f853cd1699db87ccb097a1945e77/021_caikit_tutorial.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cKcNAW1PGjNX"
},
"source": [
"# Getting Started with Sentiment Analysis Pipeline in caikit"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "s8VqflCQGjNa"
},
"source": [
"## Setup"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YrSF4jKOGjNa"
},
"outputs": [],
"source": [
"# %pip install caikit transformers requests\n",
"# mamba install pytorch cpuonly -c pytorch\n",
"# mamba install grpcio"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XK3lZt1xGjNc"
},
"outputs": [],
"source": [
"# %%bash\n",
"# pip install caikit[runtime-grpc] -qqq\n",
"# pip install caikit[runtime-http] -qqq"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CK5TzvWYGjNc",
"outputId": "73e677bd-a555-461a-84b2-da1b1d51f18b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Python 3.9.18\n"
]
}
],
"source": [
"!python --version"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tqYMv6bXGjNd",
"outputId": "77b70f99-53ee-4023-ea85-128247c9546e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name: protobuf\n",
"Version: 4.24.4\n",
"Summary: \n",
"Home-page: https://developers.google.com/protocol-buffers/\n",
"Author: [email protected]\n",
"Author-email: [email protected]\n",
"License: 3-Clause BSD License\n",
"Location: /home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages\n",
"Requires: \n",
"Required-by: caikit, grpcio-health-checking, grpcio-reflection, py-to-proto\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip show protobuf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OAu3-5KEGjNd",
"outputId": "268efb27-3142-4895-f636-6acb3def73d2"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name: py-to-proto\n",
"Version: 0.5.1\n",
"Summary: A tool to dynamically create protobuf message classes from python data schemas\n",
"Home-page: https://github.com/IBM/py-to-proto\n",
"Author: Gabe Goodhart\n",
"Author-email: [email protected]\n",
"License: MIT\n",
"Location: /home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages\n",
"Requires: alchemy-logging, protobuf\n",
"Required-by: caikit\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip show py-to-proto"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "h2wXQpmvGjNe",
"outputId": "a25f1cf9-ff9a-44db-db68-3559c6d6972e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Name: grpcio\n",
"Version: 1.59.0\n",
"Summary: HTTP/2-based RPC framework\n",
"Home-page: https://grpc.io\n",
"Author: The gRPC Authors\n",
"Author-email: [email protected]\n",
"License: Apache License 2.0\n",
"Location: /home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages\n",
"Requires: \n",
"Required-by: caikit, grpcio-health-checking, grpcio-reflection, py-grpc-prometheus\n",
"Note: you may need to restart the kernel to use updated packages.\n"
]
}
],
"source": [
"%pip show grpcio"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Eiq2UpauGjNe"
},
"source": [
"## Outline"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zuxq0lX4GjNe"
},
"source": [
"- Data Module\n",
"- Module\n",
"- config\n",
"- Runtime\n",
"- Client"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Wl1iC3idGjNf"
},
"source": [
"## Text Classification Example"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YqQom8UrGjNf",
"outputId": "3850274a-f6b9-4122-a0ab-c31882d5b84a"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/msivanes/.local/lib/python3.9/site-packages/fastprogress/fastprogress.py:102: UserWarning: Couldn't import ipywidgets properly, progress bar will use console behavior\n",
" warn(\"Couldn't import ipywidgets properly, progress bar will use console behavior\")\n"
]
}
],
"source": [
"from fastcore.all import *\n",
"import warnings; warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QZ7EptdxGjNf"
},
"source": [
"### Requirements"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "USU31haWGjNf",
"outputId": "ba61e2ce-a08f-4640-da3b-b6b2489b8c45"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting requirements-caikit.txt\n"
]
}
],
"source": [
"%%writefile requirements-caikit.txt\n",
"\n",
"caikit[runtime-grpc, runtime-http]\n",
"\n",
"# Only needed for HuggingFace\n",
"scipy\n",
"# torch\n",
"# transformers~=4.27.2\n",
"\n",
"# For http client\n",
"requests"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "iHiKIw2jGjNg"
},
"source": [
"### Data Module"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3gh0hNLBGjNg"
},
"outputs": [],
"source": [
"Path('./text_sentiment/data_model').mkdir(exist_ok=True, parents=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_0Tt9hurGjNg",
"outputId": "015c1928-bf76-400b-8d5e-dcb44389fa0f"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/data_model/classification.py\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/data_model/classification.py\n",
"\n",
"from typing import List\n",
"from caikit.core import DataObjectBase\n",
"\n",
"from caikit.core.data_model import dataobject\n",
"\n",
"# A DataObject is a data model class that is backed by a @dataclass.\n",
"@dataobject(package=\"text_sentiment.data_model\")\n",
"class ClassInfo(DataObjectBase):\n",
" class_name: str\n",
" conf: float\n",
"class ClassificationPrediction(DataObjectBase):\n",
" classes: List[ClassInfo]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pfJaei69GjNg",
"outputId": "f643b8d7-76a8-4e1d-cd48-e02d7a30262e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/data_model/__init__.py\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/data_model/__init__.py\n",
"\n",
"from .classification import ClassificationPrediction"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-_Ngy0xiGjNh"
},
"source": [
"### Runtime Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HcFedBT8GjNh"
},
"outputs": [],
"source": [
"Path('./text_sentiment/runtime_model').mkdir(exist_ok=True, parents=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6mEm0GA9GjNh",
"outputId": "8d9d6591-3db2-4c14-a222-8197cef9af65"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/runtime_model/hf_module.py\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/runtime_model/hf_module.py\n",
"\n",
"from caikit.core import ModuleBase, ModuleLoader, ModuleSaver, TaskBase, task, module\n",
"from text_sentiment.data_model.classification import ClassificationPrediction, ClassInfo\n",
"from transformers import pipeline\n",
"\n",
"@task(required_parameters={\"text_input\": str},output_type=ClassificationPrediction)\n",
"class HFSentimentTask(TaskBase): pass # defines input args and output type for task\n",
"\n",
"@module('8f72161-c0e4-49b0-8fd0-7587b3017a35', 'HFSentimentModule', '0.0.1', HFSentimentTask)\n",
"class HFSentimentModule(ModuleBase): # inherits from ModuleBase and wraps the sentiment analysis pipeline from HF\n",
" def __init__(self, model_path) -> None:\n",
" super().__init__()\n",
" loader = ModuleLoader(model_path) # loads the model from the path\n",
" config = loader.config # gets the config from the model\n",
" model = pipeline(model=config.hf_artifact_path, task='sentiment-analysis')\n",
" self.sentiment_pipeline = model # sets the pipeline as an attribute of the module\n",
"\n",
" def run(self, text_input: str)->ClassificationPrediction:\n",
" raw_results = self.sentiment_pipeline([text_input]) # runs the pipeline on the input text\n",
" class_info = []\n",
" for result in raw_results:\n",
" class_info.append(ClassInfo(class_name=result['label'], conf=result['score'])) # creates a ClassInfo object for each result\n",
" return ClassificationPrediction(classes=class_info) # returns a ClassificationPrediction object\n",
"\n",
" @classmethod\n",
" def bootstrap(cls, model_path='distilbert-base-uncased-finetuned-sst-2-english'): # classmethod to load a HF based caikit model\n",
" return cls(model_path=model_path)\n",
"\n",
" def save(self, model_path, **kwargs):\n",
" import os\n",
" module_saver = ModuleSaver(self, model_path=model_path) # saving modules and context manager for cleaning up after saving\n",
" with module_saver:\n",
" rel_path, _ = module_saver.add_dir(\"hf_model\")\n",
" save_path = os.path.join(model_path, rel_path)\n",
" self.sentiment_pipeline.save_pretrained(save_path)\n",
" module_saver.update_config({\"hf_artifact_path\": rel_path})\n",
"\n",
" @classmethod\n",
" def load(cls, model_path): # classmethod to load a HF based caikit model\n",
" return cls(model_path=model_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9bh4VTcDGjNh",
"outputId": "8eab787c-da28-4a5e-de0d-e412f8bcf488"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/runtime_model/__init__.py\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/runtime_model/__init__.py\n",
"\n",
"from .hf_module import HFSentimentModule"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jUgUD1QZGjNi"
},
"source": [
"### Config"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lWDrjilVGjNi",
"outputId": "125176a7-7e73-48ed-d2af-45d9ad47e2a6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/config.yml\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/config.yml\n",
"\n",
"runtime:\n",
" library: text_sentiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4HX6PuKxGjNi"
},
"outputs": [],
"source": [
"Path('./models/text_sentiment').mkdir(exist_ok=True, parents=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_kDVwA3gGjNj",
"outputId": "eae14e21-a428-439c-f800-daf12d845e17"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./models/text_sentiment/config.yml\n"
]
}
],
"source": [
"%%writefile ./models/text_sentiment/config.yml\n",
"\n",
"module_id: 8f72161-c0e4-49b0-8fd0-7587b3017a35\n",
"name: HFSentimentModule\n",
"version: 0.0.1"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "I6KFIYV5GjNj"
},
"source": [
"### Runtime\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EDmm2h9vGjNj"
},
"outputs": [],
"source": [
"# Kill the process using a particular port\n",
"# !lsof -ti tcp:8086 | xargs kill -9"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fq5qMemwGjNj"
},
"outputs": [],
"source": [
"# !lsof -ti tcp:8086"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OmwEciaLGjNk",
"outputId": "4876c633-cb49-4cd7-b554-17b7ed068228"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting start_runtime.py\n"
]
}
],
"source": [
"%%writefile start_runtime.py\n",
"\n",
"# from os import path\n",
"# import sys\n",
"# import alog\n",
"# from caikit.runtime.__main__ import main\n",
"# import caikit\n",
"\n",
"# if __name__ == \"__main__\":\n",
"# models_directory = path.abspath(path.join(path.dirname(__file__), \"models\"))\n",
"# # models_directory = path.abspath(path.join(path.dirname('.'), \"models\"))\n",
"# caikit.config.configure(config_dict=dict(\n",
"# merge_strategy=\"merge\", runtime=dict(\n",
"# local_models_dir=models_directory, library=\"text_sentiment\", grpc=dict(enabled=True), http=dict(enabled=True)\n",
"# )\n",
"# ))\n",
"# sys.path.append(path.abspath(path.join(path.dirname(__file__), \"../\")))\n",
"# alog.configure(default_level=\"debug\")\n",
"# main()\n",
"# Standard\n",
"from os import path\n",
"import sys\n",
"\n",
"# First Party\n",
"import alog\n",
"\n",
"# Local\n",
"from caikit.runtime.__main__ import main\n",
"import caikit\n",
"\n",
"if __name__ == \"__main__\":\n",
" models_directory = path.abspath(path.join(path.dirname(__file__), \"models\"))\n",
" caikit.config.configure(\n",
" config_dict={\n",
" \"merge_strategy\": \"merge\",\n",
" \"runtime\": {\n",
" \"local_models_dir\": models_directory,\n",
" \"library\": \"text_sentiment\",\n",
" \"grpc\": {\"enabled\": True},\n",
" \"http\": {\"enabled\": True},\n",
" },\n",
" }\n",
" )\n",
"\n",
" sys.path.append(\n",
" path.abspath(path.join(path.dirname(__file__), \"../\"))\n",
" ) # Here we assume that `start_runtime` file is at the same level of the\n",
" # `text_sentiment` package\n",
"\n",
" alog.configure(default_level=\"debug\")\n",
"\n",
" main()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Z4KrfHcbGjNk",
"outputId": "4375903b-a569-449a-d47a-f77fa108c54b"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./text_sentiment/__init__.py\n"
]
}
],
"source": [
"%%writefile ./text_sentiment/__init__.py\n",
"\n",
"from os import path\n",
"from . import data_model, runtime_model\n",
"import caikit\n",
"\n",
"CONFIG_PATH = path.realpath(path.join(path.dirname(__file__), \"config.yml\"))\n",
"caikit.configure(CONFIG_PATH)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tJ4-y9rDGjNk"
},
"source": [
"### Client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qWd5FhGBGjNk",
"outputId": "1e474b02-b8ee-4937-b8a6-32eed87e8d9e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting ./client.py\n"
]
}
],
"source": [
"%%writefile ./client.py\n",
"\n",
"from caikit.config.config import get_config # interacts with config.yml\n",
"from caikit.runtime import get_inference_request # return the inference request DataModel for the Module or Task Class\n",
"from caikit.runtime.service_factory import ServicePackageFactory\n",
"from text_sentiment.runtime_model.hf_module import HFSentimentModule\n",
"import caikit, grpc, requests, json\n",
"\n",
"if __name__ == \"__main__\":\n",
" caikit.config.configure(\n",
" config_dict=dict(merge_strategy='merge',\n",
" runtime=dict(library='text_sentiment', grpc=dict(enabled=True), http=dict(enabled=True)),)\n",
" )\n",
" inference_service = ServicePackageFactory.get_service_package(\n",
" ServicePackageFactory.ServiceType.INFERENCE\n",
" ) # ServicePackage: A container with properties referencing everything you need to bind a concrete Servicer implementation to a protobufs Service and grpc Server\n",
"\n",
" model_id = 'text_sentiment'\n",
"\n",
" if get_config().runtime.grpc.enabled:\n",
" # setup grpc client\n",
" port = 8085\n",
" channel= grpc.insecure_channel(f'localhost:{port}')\n",
" client_stub = inference_service.stub_class(channel)\n",
"\n",
" for text in ['I am not feeling well today', 'Today is a nice sunny day']:\n",
" request = get_inference_request(task_or_module_class=HFSentimentModule.TASK_CLASS)(text_input=text).to_proto()\n",
" response = client_stub.HFSentimentTaskPredict(request,\n",
" metadata=[('mm-model-id', model_id)],\n",
" timeout=1)\n",
" print('Text: ', text)\n",
" print('Response from gRPC: ', response)\n",
"\n",
" if get_config().runtime.http.enabled:\n",
" port = 8080\n",
" for text in ['I am not feeling well today', 'Today is a nice sunny day']:\n",
" url = f'http://localhost:{port}/api/v1/models/{model_id}/task/hugging-face-sentiment'\n",
" data = {'inputs': text}\n",
" response = requests.post(url, json=data, timeout=1)\n",
" print('\\nText: ', text)\n",
" print('Response from HTTP: ', json.dumps(response.json(), indent=4))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fPfbemtqGjNl"
},
"outputs": [],
"source": [
"# Install the dependencies\n",
"# %pip install -r requirements-caikit.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mQ3q65j3GjNl",
"outputId": "e653a7eb-23bb-40ef-8610-6d0baaae19b3"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2023-10-11T18:55:47.765526 [RUNTI:DBUG] Starting up caikit.runtime.grpc_server\n",
"{\"channel\": \"COM-LIB-INIT\", \"exception\": null, \"level\": \"info\", \"log_code\": \"<RUN11997772I>\", \"message\": \"Loading service module: text_sentiment\", \"num_indent\": 0, \"thread_id\": 140468445459712, \"timestamp\": \"2023-10-11T18:55:47.766843\"}\n",
"{\"channel\": \"COM-LIB-INIT\", \"exception\": null, \"level\": \"info\", \"log_code\": \"<RUN11997772I>\", \"message\": \"Loading service module: caikit.interfaces.common\", \"num_indent\": 0, \"thread_id\": 140468445459712, \"timestamp\": \"2023-10-11T18:55:49.599237\"}\n",
"{\"channel\": \"COM-LIB-INIT\", \"exception\": null, \"level\": \"info\", \"log_code\": \"<RUN11997772I>\", \"message\": \"Loading service module: caikit.interfaces.runtime\", \"num_indent\": 0, \"thread_id\": 140468445459712, \"timestamp\": \"2023-10-11T18:55:49.599381\"}\n",
"Traceback (most recent call last):\n",
" File \"/home/msivanes/Documents/1Projects/fastchai/start_runtime.py\", line 51, in <module>\n",
" main()\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/__main__.py\", line 62, in main\n",
" _grpc_server = RuntimeGRPCServer()\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/grpc_server.py\", line 67, in __init__\n",
" super().__init__(get_config().runtime.grpc.port, tls_config_override)\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/server_base.py\", line 57, in __init__\n",
" ServicePackageFactory.get_service_package(\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/service_factory.py\", line 161, in get_service_package\n",
" rpc_jsons = [rpc.create_rpc_json(package_name) for rpc in rpc_list]\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/service_factory.py\", line 161, in <listcomp>\n",
" rpc_jsons = [rpc.create_rpc_json(package_name) for rpc in rpc_list]\n",
" File \"/home/msivanes/miniconda3/envs/fastchai/lib/python3.9/site-packages/caikit/runtime/service_generation/rpcs.py\", line 304, in create_rpc_json\n",
" output_type_name = self.return_type.get_proto_class().DESCRIPTOR.full_name\n",
"AttributeError: 'NoneType' object has no attribute 'DESCRIPTOR'\n"
]
}
],
"source": [
"# Running the caikit runtime\n",
"!python start_runtime.py"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bZ4Joo9qGjNl"
},
"source": [
"## fin"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gfW6SD2fGjNl"
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.18"
},
"colab": {
"provenance": [],
"toc_visible": true,
"include_colab_link": true
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment