This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import os | |
| import sys | |
| from pprint import pprint | |
| import asyncio | |
| from loguru import logger | |
| from dotenv import load_dotenv | |
| from llama_index.core.agent.workflow import FunctionAgent | |
| from llama_index.llms.openai import OpenAI | |
| from llama_index.core import VectorStoreIndex, SimpleDirectoryReader | |
| from llama_index.readers.file import PandasCSVReader |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/bash | |
| # Llama.cpp Docker Server Launcher | |
| # | |
| # This script can be configured using environment variables and/or command-line arguments. | |
| # Command-line arguments take precedence over environment variables. | |
| # | |
| # Environment variables: | |
| # LLAMA_HOST - Server host (default: 0.0.0.0) | |
| # LLAMA_PORT - Server port (default: 8000) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import pandas as pd | |
| from urllib import request | |
| from bs4 import BeautifulSoup | |
| from fake_useragent import UserAgent | |
| from typing import Union | |
| from time import sleep | |
| class WorldPostCodeScraper: | |
| """Scraper class for https://worldpostalcode.com/.""" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Python executable which scrapes IMDB for reviews.""" | |
| import argparse | |
| import pandas as pd | |
| from time import sleep | |
| from tqdm import tqdm | |
| from dependencies.general import timing | |
| from dependencies.scrapers import ImdbReviewScraper |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class ImdbReviewScraper(Scraper): | |
| """Implements methods for scraping IMDB. | |
| Inherited Attributes: | |
| chromedriver (chromedriver): a Chrome webdriver for Selenium. | |
| Own Methods: | |
| @staticmethod get_ratings_page | |
| @staticmethod get_reviews_page | |
| get_episodes_links |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class ImdbReviewScraper(Scraper): | |
| """Implements methods for scraping IMDB. | |
| Inherited Attributes: | |
| chromedriver (chromedriver): a Chrome webdriver for Selenium. | |
| Own Methods: | |
| @staticmethod get_ratings_page | |
| @staticmethod get_reviews_page | |
| get_episodes_links |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| class ScraperException(Exception): | |
| """Starting point for Scraper exceptions.""" | |
| pass | |
| class ImdbScraperException(ScraperException): | |
| """Starting point for Scraper exceptions.""" | |
| pass | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # Data manipulation | |
| import pandas as pd | |
| import re as regex | |
| # Scraping | |
| from bs4 import BeautifulSoup | |
| from selenium import webdriver | |
| from selenium.webdriver.common.by import By | |
| from selenium.webdriver.chrome.service import Service | |
| from webdriver_manager.chrome import ChromeDriverManager |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| @timing | |
| def main(season_link: str, show_link: str, driver_service: Service, output_path: str) -> None: | |
| """Main function to scrape an IMDB season's reviews for each episode and also the general reviews. | |
| Args: | |
| season_link (str): URL pointing to season page. | |
| show_link (str): URL pointing to show general reviews. | |
| driver_service (Service): a Chrome web driver. | |
| output_path (str): path including filename where we want to save the CSV. | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def scrape_reviews_page(reviews_soup: BeautifulSoup) -> pd.DataFrame: | |
| """Scrape IMDB reviews page. | |
| Note: Extracts ratings, usernames, review date, titles, review body text, | |
| number of reactions, total reactions to review. | |
| Args: | |
| reviews_soup (BeautifulSoup): soup of the entirely loaded reviews page. | |
| Returns: |
NewerOlder