Skip to content

Instantly share code, notes, and snippets.

@blude
Created August 20, 2025 14:50
Show Gist options
  • Save blude/3f24de6c58c8913d1b630a8417a562f0 to your computer and use it in GitHub Desktop.
Save blude/3f24de6c58c8913d1b630a8417a562f0 to your computer and use it in GitHub Desktop.
Einführungskurs Web Scraping
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "3688fd13",
"metadata": {},
"source": [
"# Prüfung Web Scraping"
]
},
{
"cell_type": "markdown",
"id": "07633b04",
"metadata": {},
"source": [
"## Evaluation\n",
"\n",
"Your performance will be evaluated according to the following criteria:\n",
"\n",
"- **Task Completion:** All tasks are completed as described.\n",
"- **Correctness:** Code runs without errors and produces the expected output.\n",
"- **Understanding:** Answers to questions demonstrate understanding of key concepts.\n",
"\n",
"**Passing the course:** \n",
"Learners must complete all tasks, answer at least 70% of the questions correctly, and meet the evaluation criteria above."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "93ce3b5f",
"metadata": {},
"outputs": [],
"source": [
"url_fc_article1 = \"https://www.fastcompany.com/91374558/where-are-all-the-designers\"\n",
"url_fc_category = \"https://www.fastcompany.com/section/strategy\""
]
},
{
"cell_type": "markdown",
"id": "c4a526f0",
"metadata": {},
"source": [
"### Part 1\n",
"1. Scrape a list of articles from a news site and collect titles, URLs, thumbnails, lead texts and publication dates.\n",
"2. Save this list as a JSON file\n",
"\n",
"### Part 2\n",
"1. Go through each article from the previous list and scrape its full content, including the full-resolution image.\n",
"2. Collect each link found on the article's body into a separate list.\n",
"\n",
"### Bonus\n",
"- You're able to remove extraneous text that's not originally part of the news article"
]
},
{
"cell_type": "markdown",
"id": "ffd0ce25",
"metadata": {},
"source": [
"Complete the mini-project and submit your code and results."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "30370f0b",
"metadata": {},
"outputs": [],
"source": [
"# your solution for part 1 here"
]
},
{
"cell_type": "markdown",
"id": "59682fa2",
"metadata": {},
"source": [
"## Solution"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3501a5dd",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "154e1f24",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv (3.9.6)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment