janduplessis883 · May 2, 2024 02:51
diff --git a/README.txt b/README.txt
 In this walkthrough we will see how to use Pinecone for semantic search.
diff --git a/semantic_search.ipynb b/semantic_search.ipynb
diff --git a/semantic_search_preprocessing.ipynb b/semantic_search_preprocessing.ipynb
 {
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "W390nLBmZlId"
      },
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/search/semantic-search/semantic-search.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/search/semantic-search/semantic-search.ipynb)\n",
        "\n",
        "# Semantic Search\n",
        "\n",
        "[![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/fast-link.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/semantic-search.ipynb)\n",
        "\n",
        "In this walkthrough we will see how to use Pinecone for semantic search. To begin we must install the required prerequisite libraries:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "id": "q03L1BYEZQfe"
      },
      "outputs": [],
      "source": [
        "!pip install -qU \\\n",
        "  \"pinecone-client[grpc]\"==3.1.0 \\\n",
        "  datasets==2.12.0 \\\n",
        "  sentence-transformers==2.2.2"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "---\n",
        "\n",
        "🚨 _Note: the above `pip install` is formatted for Jupyter notebooks. If running elsewhere you may need to drop the `!`._\n",
        "\n",
        "---"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "hrSfFiIC5roI"
      },
      "source": [
        "## Data Preprocessing"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "kujS_e8s55oJ"
      },
      "source": [
        "The dataset preparation process requires a few steps:\n",
        "\n",
        "1. We download the Quora dataset from Hugging Face Datasets.\n",
        "\n",
        "2. The text content of the dataset is embedded into vectors.\n",
        "\n",
        "3. We reformat into a `(id, vector, metadata)` structure to be added to Pinecone.\n",
        "\n",
        "We will see how steps `1`, `2`, and `3` are done in this section, but we won't implement `2` and `3` across the whole dataset until we reach the *upsert loop* as we will iteratively perform these two steps.\n",
        "\n",
        "In either case, this can take some time. If you'd rather skip the data preparation step and get straight to upserts and testing the semantic search functionality, you should \n",
        "refer to the [**fast notebook**](https://colab.research.google.com/github/pinecone-io/examples/blob/master/docs/semantic-search.ipynb)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "IeJPWu9P7EtR",
        "outputId": "2323c6b2-5feb-4601-e843-1dc04e272008"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:datasets.builder:Found cached dataset quora (/root/.cache/huggingface/datasets/quora/default/0.0.0/36ba4cd42107f051a158016f1bea6ae3f4685c5df843529108a54e42d86c1e04)\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['questions', 'is_duplicate'],\n",
              "    num_rows: 80000\n",
              "})"
            ]
          },
          "execution_count": 2,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "dataset = load_dataset('quora', split='train[240000:290000]')\n",
        "dataset"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ngFHND1nQQU2"
      },
      "source": [
        "The dataset contains ~400K pairs of natural language questions from Quora."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "CsA67WpW7El4",
        "outputId": "5b69152a-3809-453d-ae9e-0bddc66db2eb"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'questions': [{'id': [207550, 351729],\n",
              "   'text': ['What is the truth of life?', \"What's the evil truth of life?\"]},\n",
              "  {'id': [33183, 351730],\n",
              "   'text': ['Which is the best smartphone under 20K in India?',\n",
              "    'Which is the best smartphone with in 20k in India?']},\n",
              "  {'id': [351731, 351732],\n",
              "   'text': ['Steps taken by Canadian government to improve literacy rate?',\n",
              "    'Can I send homemade herbal hair oil from India to US via postal or private courier services?']},\n",
              "  {'id': [37799, 94186],\n",
              "   'text': ['What is a good way to lose 30 pounds in 2 months?',\n",
              "    'What can I do to lose 30 pounds in 2 months?']},\n",
              "  {'id': [351733, 351734],\n",
              "   'text': ['Which of the following most accurately describes the translation of the graph y = (x+3)^2 -2 to the graph of y = (x -2)^2 +2?',\n",
              "    'How do you graph x + 2y = -2?']}],\n",
              " 'is_duplicate': [False, True, False, True, False]}"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset[:5]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "H_Zy8zoeQmRZ"
      },
      "source": [
        "Whether or not the questions are duplicates is not so important, all we need for this example is the text itself. We can extract them all into a single `questions` list."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "heGUpy_37Eis",
        "outputId": "a9b61af4-6595-44fc-bd35-7a7501b26123"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Does Jimmy Wales use Wikipedia?\n",
            "How can I find the real true purpose of my life?\n",
            "What is the meaning of \"we'd\"?\n",
            "If you could be famous for 15 minutes, for what would you want to be known?\n",
            "Do you think Bruno Mars' songs are as good as the 70's funk songs?\n",
            "136057\n"
          ]
        }
      ],
      "source": [
        "questions = []\n",
        "\n",
        "for record in dataset['questions']:\n",
        "    questions.extend(record['text'])\n",
        "  \n",
        "# remove duplicates\n",
        "questions = list(set(questions))\n",
        "print('\\n'.join(questions[:5]))\n",
        "print(len(questions))"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "4BknpfucRkkm"
      },
      "source": [
        "With our questions ready to go we can move on to demoing steps **2** and **3** above.\n",
        "\n",
        "### Building Embeddings and Upsert Format\n",
        "\n",
        "To create our embeddings we will us the `MiniLM-L6` sentence transformer model. This is a very efficient semantic similarity embedding model from the `sentence-transformers` library. We initialize it like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "uxcGjb9GSEqA",
        "outputId": "e3b269d0-c7f4-41d8-85ad-5a94a0b79d8d"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "SentenceTransformer(\n",
              "  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel \n",
              "  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})\n",
              "  (2): Normalize()\n",
              ")"
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from sentence_transformers import SentenceTransformer\n",
        "import torch\n",
        "\n",
        "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
        "if device != 'cuda':\n",
        "    print(f\"You are using {device}. This is much slower than using \"\n",
        "          \"a CUDA-enabled GPU. If on Colab you can change this by \"\n",
        "          \"clicking Runtime > Change runtime type > GPU.\")\n",
        "\n",
        "model = SentenceTransformer('all-MiniLM-L6-v2', device=device)\n",
        "model"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "iy2itPb0S5js"
      },
      "source": [
        "There are *three* interesting bits of information in the above model printout. Those are:\n",
        "\n",
        "* `max_seq_length` is `256`. That means that the maximum number of tokens (like words) that can be encoded into a single vector embedding is `256`. Anything beyond this *must* be truncated.\n",
        "\n",
        "* `word_embedding_dimension` is `384`. This number is the dimensionality of vectors output by this model. It is important that we know this number later when initializing our Pinecone vector index.\n",
        "\n",
        "* `Normalize()`. This final normalization step indicates that all vectors produced by the model are normalized. That means that models that we would typical measure similarity for using *cosine similarity* can also make use of the *dotproduct* similarity metric. In fact, with normalized vectors *cosine* and *dotproduct* are equivalent.\n",
        "\n",
        "Moving on, we can create a sentence embedding using this model like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dyzoJEsAULOe",
        "outputId": "a1ea6149-4b2d-4dfb-e984-b067fb9980d5"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "(384,)"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = 'which city is the most populated in the world?'\n",
        "\n",
        "xq = model.encode(query)\n",
        "xq.shape"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "qVZi8xevUWM6"
      },
      "source": [
        "Encoding this single sentence leaves us with a `384` dimensional sentence embedding (aligned to the `word_embedding_dimension` above).\n",
        "\n",
        "To prepare this for `upsert` to Pinecone, all we do is this:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "T38HdqxwVg6p"
      },
      "outputs": [],
      "source": [
        "_id = '0'\n",
        "metadata = {'text': query}\n",
        "\n",
        "vectors = [(_id, xq, metadata)]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "wiXig_rHV2Wz"
      },
      "source": [
        "Later when we do upsert our data to Pinecone, we will be doing so in batches. Meaning `vectors` will be a list of `(id, embedding, metadata)` tuples."
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "ebd7XSamfMsC"
      },
      "source": [
        "## Creating an Index\n",
        "\n",
        "Now the data is ready, we can set up our index to store it.\n",
        "\n",
        "We begin by initializing our connection to Pinecone. To do this we need a [free API key](https://app.pinecone.io)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "mc66NEBAcQHY"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY'\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "cloud = os.environ.get('PINECONE_CLOUD') or 'aws'\n",
        "region = os.environ.get('PINECONE_REGION') or 'us-east-1'\n",
        "\n",
        "spec = ServerlessSpec(cloud=cloud, region=region)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "SdaTip6CfllN"
      },
      "source": [
        "Now we create a new index called `semantic-search`. It's important that we align the index `dimension` and `metric` parameters with those required by the `MiniLM-L6` model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "parameters"
        ]
      },
      "outputs": [],
      "source": [
        "index_name = 'semantic-search'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in pc.list_indexes().names():\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=model.get_sentence_embedding_dimension(),\n",
        "        metric='cosine',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "YUd1VGg6i108"
      },
      "source": [
        "Now we upsert the data, we will do this in batches of `128`.\n",
        "\n",
        "_**Note:** On Google Colab with GPU expected runtime is ~7 minutes. If using CPU this will be significantly longer. If you'd like to get this running faster refer to the [fast notebook](https://github.com/pinecone-io/examples/blob/master/search/semantic-search/semantic-search-fast.ipynb)._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 120,
          "referenced_widgets": [
            "0e5dc9a271184100a92cd6b373ab8e7d",
            "8396bfd36c2b4869a0b4680587b604f1",
            "d714f8b3ebca4195b747ab18d196b88b",
            "56266bc7062541f5ba042848205270ef",
            "78fab28034dd428c9f49a5653fd003e5",
            "d8ca1c42783f41a8bd17491073883fdd",
            "cffd5e66b82344b0946630eb320ca5b4",
            "b63c0eee680f4708a4c68a4746cb21a4",
            "12e8c994f3ac4f88ae5387c72b4d8cc7",
            "9ea70e9ea5fb4040a080b941950b0fef",
            "b30a98ddfc1841e381937571edcfae71"
          ]
        },
        "id": "RhR6WOi1huXZ",
        "outputId": "ef9f74ef-2ae3-4eb3-cef4-6814e98861a7"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "0e5dc9a271184100a92cd6b373ab8e7d",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/1063 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "text/plain": [
              "{'dimension': 384,\n",
              " 'index_fullness': 0.1,\n",
              " 'namespaces': {'': {'vector_count': 136057}},\n",
              " 'total_vector_count': 136057}"
            ]
          },
          "execution_count": 19,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "batch_size = 128\n",
        "vector_limit = 100000\n",
        "\n",
        "questions = questions[:vector_limit]\n",
        "\n",
        "for i in tqdm(range(0, len(questions), batch_size)):\n",
        "    # find end of batch\n",
        "    i_end = min(i+batch_size, len(questions))\n",
        "    # create IDs batch\n",
        "    ids = [str(x) for x in range(i, i_end)]\n",
        "    # create metadata batch\n",
        "    metadatas = [{'text': text} for text in questions[i:i_end]]\n",
        "    # create embeddings\n",
        "    xc = model.encode(questions[i:i_end])\n",
        "    # create records list for upsert\n",
        "    records = zip(ids, xc, metadatas)\n",
        "    # upsert to Pinecone\n",
        "    index.upsert(vectors=records)\n",
        "\n",
        "# check number of records in the index\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "VrK_IN079Vuu"
      },
      "source": [
        "## Making Queries"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "rr4unPAq9alb"
      },
      "source": [
        "Now that our index is populated we can begin making queries. We are performing a semantic search for *similar questions*, so we should embed and search with another question. Let's begin."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "JWcO7jAK-N_1",
        "outputId": "db437488-c928-4fd7-e4b7-83c93ac67f88"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'matches': [{'id': '31072',\n",
              "              'metadata': {'text': 'What country has the biggest population?'},\n",
              "              'score': 0.7655585,\n",
              "              'sparse_values': {'indices': [], 'values': []},\n",
              "              'values': []},\n",
              "             {'id': '23769',\n",
              "              'metadata': {'text': 'What is the biggest city?'},\n",
              "              'score': 0.7271395,\n",
              "              'sparse_values': {'indices': [], 'values': []},\n",
              "              'values': []},\n",
              "             {'id': '65783',\n",
              "              'metadata': {'text': 'What is the most isolated city in the '\n",
              "                                   'world, with over a million metro area '\n",
              "                                   'inhabitants?'},\n",
              "              'score': 0.7020447,\n",
              "              'sparse_values': {'indices': [], 'values': []},\n",
              "              'values': []},\n",
              "             {'id': '104484',\n",
              "              'metadata': {'text': 'Which is the most beautiful city in '\n",
              "                                   'world?'},\n",
              "              'score': 0.69991666,\n",
              "              'sparse_values': {'indices': [], 'values': []},\n",
              "              'values': []},\n",
              "             {'id': '79997',\n",
              "              'metadata': {'text': 'Where is the most beautiful city in the '\n",
              "                                   'world?'},\n",
              "              'score': 0.69605494,\n",
              "              'sparse_values': {'indices': [], 'values': []},\n",
              "              'values': []}],\n",
              " 'namespace': ''}"
            ]
          },
          "execution_count": 20,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "query = \"which city has the highest population in the world?\"\n",
        "\n",
        "# create the query vector\n",
        "xq = model.encode(query).tolist()\n",
        "\n",
        "# now query\n",
        "xc = index.query(vector=xq, top_k=5, include_metadata=True)\n",
        "xc"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "-XwOWcgo_QtI"
      },
      "source": [
        "In the returned response `xc` we can see the most relevant questions to our particular query. We can reformat this response to be a little easier to read:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gy7isg_f-vWg",
        "outputId": "bbeee182-8e31-4bba-da39-38521b13683f"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "0.77: What country has the biggest population?\n",
            "0.73: What is the biggest city?\n",
            "0.7: What is the most isolated city in the world, with over a million metro area inhabitants?\n",
            "0.7: Which is the most beautiful city in world?\n",
            "0.7: Where is the most beautiful city in the world?\n"
          ]
        }
      ],
      "source": [
        "for result in xc['matches']:\n",
        "    print(f\"{round(result['score'], 2)}: {result['metadata']['text']}\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "1JK5yApl_5fE"
      },
      "source": [
        "These are good results, let's try and modify the words being used to see if we still surface similar results."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dJbjE-iq_yMr",
        "outputId": "b81c67b1-9fe7-48c3-d2b0-14100cbbc25d"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "0.67: What is the most isolated city in the world, with over a million metro area inhabitants?\n",
            "0.64: What is the biggest city?\n",
            "0.61: Which place has the highest Asian Indian population in the USA?\n",
            "0.6: What is the most dangerous city in USA?\n",
            "0.59: What country has the biggest population?\n"
          ]
        }
      ],
      "source": [
        "query = \"which metropolis has the highest number of people?\"\n",
        "\n",
        "# create the query vector\n",
        "xq = model.encode(query).tolist()\n",
        "\n",
        "# now query\n",
        "xc = index.query(vector=xq, top_k=5, include_metadata=True)\n",
        "for result in xc['matches']:\n",
        "    print(f\"{round(result['score'], 2)}: {result['metadata']['text']}\")"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "HIAxOPb-A2w_"
      },
      "source": [
        "Here we used different terms in our query than that of the returned documents. We substituted **\"city\"** for **\"metropolis\"** and **\"populated\"** for **\"number of people\"**.\n",
        "\n",
        "Despite these very different terms and *lack* of term overlap between query and returned documents — we get highly relevant results — this is the power of *semantic search*.\n",
        "\n",
        "You can go ahead and ask more questions above. When you're done, delete the index to save resources:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "id": "-cWdeKzhAtww"
      },
      "outputs": [],
      "source": [
        "pc.delete_index(index_name)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {
        "id": "PMMJSu_DbRx0"
      },
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    },
    "widgets": {
      "application/vnd.jupyter.widget-state+json": {
        "0e5dc9a271184100a92cd6b373ab8e7d": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HBoxModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HBoxModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HBoxView",
            "box_style": "",
            "children": [
              "IPY_MODEL_8396bfd36c2b4869a0b4680587b604f1",
              "IPY_MODEL_d714f8b3ebca4195b747ab18d196b88b",
              "IPY_MODEL_56266bc7062541f5ba042848205270ef"
            ],
            "layout": "IPY_MODEL_78fab28034dd428c9f49a5653fd003e5"
          }
        },
        "12e8c994f3ac4f88ae5387c72b4d8cc7": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "ProgressStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "ProgressStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "bar_color": null,
            "description_width": ""
          }
        },
        "56266bc7062541f5ba042848205270ef": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_9ea70e9ea5fb4040a080b941950b0fef",
            "placeholder": "",
            "style": "IPY_MODEL_b30a98ddfc1841e381937571edcfae71",
            "value": " 1063/1063 [02:48&lt;00:00,  6.41it/s]"
          }
        },
        "78fab28034dd428c9f49a5653fd003e5": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "8396bfd36c2b4869a0b4680587b604f1": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "HTMLModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "HTMLModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "HTMLView",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_d8ca1c42783f41a8bd17491073883fdd",
            "placeholder": "",
            "style": "IPY_MODEL_cffd5e66b82344b0946630eb320ca5b4",
            "value": "100%"
          }
        },
        "9ea70e9ea5fb4040a080b941950b0fef": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "b30a98ddfc1841e381937571edcfae71": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "b63c0eee680f4708a4c68a4746cb21a4": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        },
        "cffd5e66b82344b0946630eb320ca5b4": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "DescriptionStyleModel",
          "state": {
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "DescriptionStyleModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "StyleView",
            "description_width": ""
          }
        },
        "d714f8b3ebca4195b747ab18d196b88b": {
          "model_module": "@jupyter-widgets/controls",
          "model_module_version": "1.5.0",
          "model_name": "FloatProgressModel",
          "state": {
            "_dom_classes": [],
            "_model_module": "@jupyter-widgets/controls",
            "_model_module_version": "1.5.0",
            "_model_name": "FloatProgressModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/controls",
            "_view_module_version": "1.5.0",
            "_view_name": "ProgressView",
            "bar_style": "success",
            "description": "",
            "description_tooltip": null,
            "layout": "IPY_MODEL_b63c0eee680f4708a4c68a4746cb21a4",
            "max": 1063,
            "min": 0,
            "orientation": "horizontal",
            "style": "IPY_MODEL_12e8c994f3ac4f88ae5387c72b4d8cc7",
            "value": 1063
          }
        },
        "d8ca1c42783f41a8bd17491073883fdd": {
          "model_module": "@jupyter-widgets/base",
          "model_module_version": "1.2.0",
          "model_name": "LayoutModel",
          "state": {
            "_model_module": "@jupyter-widgets/base",
            "_model_module_version": "1.2.0",
            "_model_name": "LayoutModel",
            "_view_count": null,
            "_view_module": "@jupyter-widgets/base",
            "_view_module_version": "1.2.0",
            "_view_name": "LayoutView",
            "align_content": null,
            "align_items": null,
            "align_self": null,
            "border": null,
            "bottom": null,
            "display": null,
            "flex": null,
            "flex_flow": null,
            "grid_area": null,
            "grid_auto_columns": null,
            "grid_auto_flow": null,
            "grid_auto_rows": null,
            "grid_column": null,
            "grid_gap": null,
            "grid_row": null,
            "grid_template_areas": null,
            "grid_template_columns": null,
            "grid_template_rows": null,
            "height": null,
            "justify_content": null,
            "justify_items": null,
            "left": null,
            "margin": null,
            "max_height": null,
            "max_width": null,
            "min_height": null,
            "min_width": null,
            "object_fit": null,
            "object_position": null,
            "order": null,
            "overflow": null,
            "overflow_x": null,
            "overflow_y": null,
            "padding": null,
            "right": null,
            "top": null,
            "visibility": null,
            "width": null
          }
        }
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
 }