Created
November 4, 2024 02:04
-
-
Save jgoodie/d105a7b4c315c0bb72f578d4c276cf5e to your computer and use it in GitHub Desktop.
Super quick and dirty web scraper using Newspaper4k
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def get_news_article_text(url): | |
try: | |
article = newspaper.article(url) | |
title = article.title | |
text = article.text_cleaned | |
except Exception as e: | |
logger.debug(f"Error occurred while fetching article at {url}: {e}") | |
return {"url": url, "title":"", "text":""} | |
return {"url": url, "title":title, "text":text} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment