Created
August 20, 2025 14:50
-
-
Save blude/3f24de6c58c8913d1b630a8417a562f0 to your computer and use it in GitHub Desktop.
Einführungskurs Web Scraping
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "3688fd13", | |
| "metadata": {}, | |
| "source": [ | |
| "# Prüfung Web Scraping" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "07633b04", | |
| "metadata": {}, | |
| "source": [ | |
| "## Evaluation\n", | |
| "\n", | |
| "Your performance will be evaluated according to the following criteria:\n", | |
| "\n", | |
| "- **Task Completion:** All tasks are completed as described.\n", | |
| "- **Correctness:** Code runs without errors and produces the expected output.\n", | |
| "- **Understanding:** Answers to questions demonstrate understanding of key concepts.\n", | |
| "\n", | |
| "**Passing the course:** \n", | |
| "Learners must complete all tasks, answer at least 70% of the questions correctly, and meet the evaluation criteria above." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "93ce3b5f", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "url_fc_article1 = \"https://www.fastcompany.com/91374558/where-are-all-the-designers\"\n", | |
| "url_fc_category = \"https://www.fastcompany.com/section/strategy\"" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "c4a526f0", | |
| "metadata": {}, | |
| "source": [ | |
| "### Part 1\n", | |
| "1. Scrape a list of articles from a news site and collect titles, URLs, thumbnails, lead texts and publication dates.\n", | |
| "2. Save this list as a JSON file\n", | |
| "\n", | |
| "### Part 2\n", | |
| "1. Go through each article from the previous list and scrape its full content, including the full-resolution image.\n", | |
| "2. Collect each link found on the article's body into a separate list.\n", | |
| "\n", | |
| "### Bonus\n", | |
| "- You're able to remove extraneous text that's not originally part of the news article" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "ffd0ce25", | |
| "metadata": {}, | |
| "source": [ | |
| "Complete the mini-project and submit your code and results." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "30370f0b", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# your solution for part 1 here" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "59682fa2", | |
| "metadata": {}, | |
| "source": [ | |
| "## Solution" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "3501a5dd", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "id": "154e1f24", | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| } | |
| ], | |
| "metadata": { | |
| "kernelspec": { | |
| "display_name": ".venv (3.9.6)", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.9.6" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment