Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def get_weaviate(): | |
| try: | |
| # Try to connect to an existing Weaviate instance | |
| client = weaviate.connect_to_local(port=8079, grpc_port=50060) | |
| if client.is_ready(): | |
| print("Connected to an already running Weaviate instance.") | |
| return client | |
| # else: | |
| # print("Weaviate instance not ready.") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def inference(text, model, tokenizer, max_input_tokens=1000, max_output_tokens=100): | |
| """ | |
| Generates a continuation of the input text using the provided model and tokenizer. | |
| Args: | |
| text (str): The input text prompt. | |
| model: The pre-trained language model for generation. | |
| tokenizer: The tokenizer corresponding to the model. | |
| max_input_tokens (int, optional): Maximum number of tokens for the input. Defaults to 1000. | |
| max_output_tokens (int, optional): Maximum number of tokens to generate. Defaults to 100. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| This script loads the Hugging Face "google/gemma-2-2b-it" model using 8-bit quantization for optimized inference. | |
| It integrates with LangChain using a prompt template to simulate a travel agent AI, which breaks down travel requests into start, pitstops, and end locations. | |
| """ | |
| # Inspired by "Get Started with LangChain: Your Key to Mastering LLM Pipelines" | |
| # https://medium.com/the-ai-espresso/get-started-with-langchain-your-key-to-mastering-llm-pipelines-b25a1728e8f3/ | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig | |
| from langchain.prompts import PromptTemplate | |
| from langchain_huggingface.llms import HuggingFacePipeline |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pymongo import MongoClient | |
| import weaviate | |
| from weaviate.classes.query import MetadataQuery | |
| import os | |
| from faker import Faker | |
| from dotenv import load_dotenv | |
| from pathlib import Path | |
| from weaviate.classes.config import Configure, Property, DataType | |
| import random |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| This module processes a PDF file using the Unstructured API to extract text, tables, and images. | |
| The extracted data is saved in a specified output directory. The module also provides functions | |
| to convert HTML tables to pandas DataFrames and HTML content to Markdown. | |
| Functions: | |
| - html_table_to_dataframe: Convert an HTML table to a pandas DataFrame. | |
| - html_to_markdown: Convert HTML content to Markdown. | |
| - process_pdf: Process a PDF file using the Unstructured API and save the extracted data. | |
| Images are saved as PNG files, tables as HTML files, and text as JSON. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| import os | |
| from dotenv import load_dotenv | |
| from datasets import load_dataset | |
| from itertools import islice | |
| def load_and_preview_dataset(dataset_name, config=None, split="train"): | |
| print(f"Loading {dataset_name} dataset...") | |
| try: |
NewerOlder