Skip to content

Instantly share code, notes, and snippets.

@manisnesan
Last active October 23, 2024 10:20
Show Gist options
  • Save manisnesan/cad8f7e17e5891d7136d3d55da3f0ff7 to your computer and use it in GitHub Desktop.
Save manisnesan/cad8f7e17e5891d7136d3d55da3f0ff7 to your computer and use it in GitHub Desktop.
Qna.yaml as Pydantic Data Model with YAML to Pydantic Conversions This script defines a Pydantic data model for a Q&A system based on a YAML structure. It includes classes for handling questions and answers, document chunks, and overall document metadata. The model supports validation for SHA1 commit hashes.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "dd8bbed4-1d63-4e93-8447-9455a975f7aa",
"metadata": {},
"source": [
"## Qna.yaml as Pydantic Data Model with Yaml to Pydantic conversions"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "e231bff2-fd1c-4caf-aad9-85013abe94f7",
"metadata": {},
"outputs": [],
"source": [
"# %pip install pydantic"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "74049fdc-4e8e-423f-a608-e3b774098138",
"metadata": {},
"outputs": [],
"source": [
"# %pip install pyyaml rich"
]
},
{
"cell_type": "code",
"execution_count": 87,
"id": "30752f9c-231a-42e6-afe9-e6edf9d66226",
"metadata": {},
"outputs": [],
"source": [
"from pydantic import BaseModel, HttpUrl, field_validator\n",
"from typing import List, Annotated\n",
"import re\n",
"import yaml\n",
"\n",
"class QuestionAnswer(BaseModel):\n",
" question: str # Question about a single step, or a summary of all steps, or a yes/no question, or a reasoning question on two or three steps.\n",
" answer: str # The answer corresponding to the question.\n",
"\n",
"class DocChunkQnA(BaseModel):\n",
" context: str # Token size limit per chunk is 500; should be directly extracted from the document.\n",
" questions_and_answers: List[QuestionAnswer] # Each question and answer is limited to 250 tokens. \n",
" # Any more than 3 QnA gets ignored. \n",
" # Should be diverse and reflect the use case for the model. \n",
" # Another LLM could be used to generate questions. \n",
" # Answers for the generated questions should be discerned directly using knowledge from the content.\n",
"\n",
"class Doc(BaseModel):\n",
" repo: HttpUrl # URL of the repository.\n",
" commit: str # SHA1 commit hash.\n",
" patterns: List[str] # List of patterns related to the document.\n",
"\n",
" @field_validator('commit')\n",
" @classmethod\n",
" def validate_sha1(cls, v):\n",
" if not re.match(r'^[a-f0-9]{40}$', v): \n",
" raise ValueError('commit must be a valid SHA1 hash')\n",
" return v\n",
"\n",
"class ILabQnA(BaseModel):\n",
" version: int # Version of the QnA model.\n",
" domain: str # Domain of the knowledge base (e.g., astronomy, cooking).\n",
" created_by: str # Creator of the QnA instance.\n",
" seed_examples: List[DocChunkQnA] # No upper bound, but generally 5-10 is considered good.\n",
" document_outline: str # Outline of the document, should be concise around 10 words.\n",
" document: Doc # Document containing the repository information."
]
},
{
"cell_type": "code",
"execution_count": 79,
"id": "6bcc7d3b-0dcc-458e-83ae-68e6788e79ed",
"metadata": {},
"outputs": [],
"source": [
"import yaml\n",
"def pydantic_to_yaml(data: ILabQnA, file_path: str) -> None:\n",
" \"\"\"Convert Pydantic model to YAML and write to a file.\"\"\"\n",
" with open(file_path, 'w') as file:\n",
" yaml.dump(data.dict(), file, sort_keys=False, allow_unicode=True)\n",
"\n",
"def yaml_to_pydantic(file_path: str) -> ILabQnA:\n",
" \"\"\"Read YAML file and convert to Pydantic model.\"\"\"\n",
" with open(file_path, 'r') as file:\n",
" data = yaml.safe_load(file)\n",
" return ILabQnA(**data)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e4d9842c-fa0c-4684-9af0-39426d31557a",
"metadata": {},
"outputs": [],
"source": [
"# rprint(yaml_to_pydantic('qna.yaml'))"
]
},
{
"cell_type": "code",
"execution_count": 81,
"id": "e2d82719-3d22-46fa-8e0c-ac2b26e1ef78",
"metadata": {},
"outputs": [],
"source": [
"# Example data to write to YAML\n",
"example_data = ILabQnA(\n",
" version=3,\n",
" domain=\"astronomy\",\n",
" created_by=\"juliadenham\",\n",
" seed_examples=[\n",
" DocChunkQnA(\n",
" context=\"**Phoenix** is a minor [constellation](constellation \\\"wikilink\\\") in the southern sky...\",\n",
" questions_and_answers=[\n",
" QuestionAnswer(question=\"What is the Phoenix constellation?\", answer=\"Phoenix is a minor constellation in the southern sky.\"),\n",
" QuestionAnswer(question=\"Who charted the Phoenix constellation?\", answer=\"The Phoenix constellation was charted by french explorer and astronomer Nicolas Louis de Lacaille.\"),\n",
" ]\n",
" )\n",
" ],\n",
" document_outline=\"Information about the Phoenix Constellation including the history, characteristics, and features of the stars in the constellation.\",\n",
" document=Doc(\n",
" repo=\"https://github.com/RedHatOfficial/rhelai-taxonomy-data\",\n",
" commit=\"c87a82eb15567f28c0a8d30025e0cd77a2150646\",\n",
" patterns=[\"phoenix.md\"]\n",
" )\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 82,
"id": "b2dc53db-7b4c-4037-a970-8af586f62191",
"metadata": {},
"outputs": [],
"source": [
"from rich import print as rprint"
]
},
{
"cell_type": "code",
"execution_count": 83,
"id": "6aaf47a2-09bc-4b84-b57f-21c47b2413e5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">QuestionAnswer</span><span style=\"font-weight: bold\">(</span>\n",
" <span style=\"color: #808000; text-decoration-color: #808000\">question</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'What is the Phoenix constellation?'</span>,\n",
" <span style=\"color: #808000; text-decoration-color: #808000\">answer</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'Phoenix is a minor constellation in the southern sky.'</span>\n",
"<span style=\"font-weight: bold\">)</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;35mQuestionAnswer\u001b[0m\u001b[1m(\u001b[0m\n",
" \u001b[33mquestion\u001b[0m=\u001b[32m'What is the Phoenix constellation?'\u001b[0m,\n",
" \u001b[33manswer\u001b[0m=\u001b[32m'Phoenix is a minor constellation in the southern sky.'\u001b[0m\n",
"\u001b[1m)\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rprint(example_data.seed_examples[0].questions_and_answers[0])"
]
},
{
"cell_type": "code",
"execution_count": 84,
"id": "849c862a-da5c-427e-bebe-a4db79035895",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'question': 'What is the Phoenix constellation?',\n",
" 'answer': 'Phoenix is a minor constellation in the southern sky.'}"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_data.seed_examples[0].questions_and_answers[0].dict()"
]
},
{
"cell_type": "code",
"execution_count": 86,
"id": "8ef677cd-41ce-4026-9d10-87587e34f2d2",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">created_by: juliadenham\n",
"document:\n",
" commit: c87a82eb15567f28c0a8d30025e0cd77a2150646\n",
" patterns:\n",
" - phoenix.md\n",
" repo: !!python/object/new:pydantic_core._pydantic_core.Url\n",
" - <span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://github.com/RedHatOfficial/rhelai-taxonomy-data</span>\n",
"document_outline: Information about the Phoenix Constellation including the history,\n",
" characteristics, and features of the stars in the constellation.\n",
"domain: astronomy\n",
"seed_examples:\n",
"- context: '**Phoenix** is a minor <span style=\"font-weight: bold\">(</span>constellation <span style=\"color: #008000; text-decoration-color: #008000\">\"wikilink\"</span><span style=\"font-weight: bold\">)</span> in the\n",
" southern sky<span style=\"color: #808000; text-decoration-color: #808000\">...</span>'\n",
" questions_and_answers:\n",
" - answer: Phoenix is a minor constellation in the southern sky.\n",
" question: What is the Phoenix constellation?\n",
" - answer: The Phoenix constellation was charted by french explorer and astronomer\n",
" Nicolas Louis de Lacaille.\n",
" question: Who charted the Phoenix constellation?\n",
"version: <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span>\n",
"\n",
"</pre>\n"
],
"text/plain": [
"created_by: juliadenham\n",
"document:\n",
" commit: c87a82eb15567f28c0a8d30025e0cd77a2150646\n",
" patterns:\n",
" - phoenix.md\n",
" repo: !!python/object/new:pydantic_core._pydantic_core.Url\n",
" - \u001b[4;94mhttps://github.com/RedHatOfficial/rhelai-taxonomy-data\u001b[0m\n",
"document_outline: Information about the Phoenix Constellation including the history,\n",
" characteristics, and features of the stars in the constellation.\n",
"domain: astronomy\n",
"seed_examples:\n",
"- context: '**Phoenix** is a minor \u001b[1m(\u001b[0mconstellation \u001b[32m\"wikilink\"\u001b[0m\u001b[1m)\u001b[0m in the\n",
" southern sky\u001b[33m...\u001b[0m'\n",
" questions_and_answers:\n",
" - answer: Phoenix is a minor constellation in the southern sky.\n",
" question: What is the Phoenix constellation?\n",
" - answer: The Phoenix constellation was charted by french explorer and astronomer\n",
" Nicolas Louis de Lacaille.\n",
" question: Who charted the Phoenix constellation?\n",
"version: \u001b[1;36m3\u001b[0m\n",
"\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rprint(yaml.dump(example_data.dict()))"
]
},
{
"cell_type": "code",
"execution_count": 55,
"id": "99b527a1-3257-4bed-b4bc-10a2579f5690",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'version': 3,\n",
" 'domain': 'astronomy',\n",
" 'created_by': 'juliadenham',\n",
" 'seed_examples': [{'context': '**Phoenix** is a minor [constellation](constellation \"wikilink\") in the southern sky...',\n",
" 'questions_and_answers': [{'question': 'What is the Phoenix constellation?',\n",
" 'answer': 'Phoenix is a minor constellation in the southern sky.'},\n",
" {'question': 'Who charted the Phoenix constellation?',\n",
" 'answer': 'The Phoenix constellation was charted by french explorer and astronomer Nicolas Louis de Lacaille.'}]}],\n",
" 'document_outline': 'Information about the Phoenix Constellation including the history, characteristics, and features of the stars in the constellation.',\n",
" 'document': {'repo': Url('https://github.com/RedHatOfficial/rhelai-taxonomy-data'),\n",
" 'commit': 'c87a82eb15567f28c0a8d30025e0cd77a2150646',\n",
" 'patterns': ['phoenix.md']}}"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"example_data.dict()"
]
},
{
"cell_type": "code",
"execution_count": 56,
"id": "2d738229-0b0e-40a5-b530-b147c2ae544b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Doc</span><span style=\"font-weight: bold\">(</span>\n",
" <span style=\"color: #808000; text-decoration-color: #808000\">repo</span>=<span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">Url</span><span style=\"font-weight: bold\">(</span><span style=\"color: #008000; text-decoration-color: #008000\">'https://github.com/RedHatOfficial/rhelai-taxonomy-data'</span><span style=\"font-weight: bold\">)</span>,\n",
" <span style=\"color: #808000; text-decoration-color: #808000\">commit</span>=<span style=\"color: #008000; text-decoration-color: #008000\">'c87a82eb15567f28c0a8d30025e0cd77a2150646'</span>,\n",
" <span style=\"color: #808000; text-decoration-color: #808000\">patterns</span>=<span style=\"font-weight: bold\">[</span><span style=\"color: #008000; text-decoration-color: #008000\">'phoenix.md'</span><span style=\"font-weight: bold\">]</span>\n",
"<span style=\"font-weight: bold\">)</span>\n",
"</pre>\n"
],
"text/plain": [
"\u001b[1;35mDoc\u001b[0m\u001b[1m(\u001b[0m\n",
" \u001b[33mrepo\u001b[0m=\u001b[1;35mUrl\u001b[0m\u001b[1m(\u001b[0m\u001b[32m'https://github.com/RedHatOfficial/rhelai-taxonomy-data'\u001b[0m\u001b[1m)\u001b[0m,\n",
" \u001b[33mcommit\u001b[0m=\u001b[32m'c87a82eb15567f28c0a8d30025e0cd77a2150646'\u001b[0m,\n",
" \u001b[33mpatterns\u001b[0m=\u001b[1m[\u001b[0m\u001b[32m'phoenix.md'\u001b[0m\u001b[1m]\u001b[0m\n",
"\u001b[1m)\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rprint(example_data.document)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "077c91a7-3c69-4a93-871a-381b733cae4c",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "rhelai",
"language": "python",
"name": "rhelai"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
@manisnesan
Copy link
Author

Key Features:

  • Pydantic Models: Defines QuestionAnswer, DocChunkQnA, Doc, and ILabQnA classes for structured data.
  • YAML Conversion: Functions to convert Pydantic models to YAML and vice versa (pydantic_to_yaml and yaml_to_pydantic).
  • Validation: Uses field_validator to enforce constraints on fields, such as SHA1 format for commits.
  • Example Usage: Demonstrates how to create an instance of the model, serialize it to YAML, and deserialize it back to a Pydantic model.

Installation:

pip install pydantic rich pyyaml

Usage:

  • Use the provided functions to read from and write to YAML files.
  • Validate the structure and content of the data model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment