Skip to content

Instantly share code, notes, and snippets.

@jgoodie
Created November 4, 2024 02:04
Show Gist options
  • Save jgoodie/d105a7b4c315c0bb72f578d4c276cf5e to your computer and use it in GitHub Desktop.
Save jgoodie/d105a7b4c315c0bb72f578d4c276cf5e to your computer and use it in GitHub Desktop.
Super quick and dirty web scraper using Newspaper4k
def get_news_article_text(url):
try:
article = newspaper.article(url)
title = article.title
text = article.text_cleaned
except Exception as e:
logger.debug(f"Error occurred while fetching article at {url}: {e}")
return {"url": url, "title":"", "text":""}
return {"url": url, "title":title, "text":text}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment