Skip to content

Instantly share code, notes, and snippets.

@nick3499
Last active October 16, 2021 01:41
Show Gist options
  • Save nick3499/d2c5d00995d062c08c8cc46fd0655e9d to your computer and use it in GitHub Desktop.
Save nick3499/d2c5d00995d062c08c8cc46fd0655e9d to your computer and use it in GitHub Desktop.
Python 3: Scrape weather data from Dark Sky: requests, BeautifulSoup, pendulum
#!/bin/python3
'''Scrape Dark Sky website for weather data.'''
import requests
from bs4 import BeautifulSoup
import pendulum
# source website
URL = 'https://darksky.net/forecast/40.7127,-74.0059/us12/en'
# browser header
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, \
like Gecko) Chrome/94.0.4606.81 Safari/537.36'}
# request webpage
webpage = requests.get(URL, headers=headers)
# print title
print('\x1b[1;7;96m=== Forecasted Temperatures ===\x1b[0m')
# scrape content
soup = BeautifulSoup(webpage.content, 'html.parser')
temp_range = soup.select('span.tempRange')
condition = soup.select('a.day')
# print weather data
print(f"\x1b[1;7;32m {pendulum.now().format('dddd'):10}\x1b[0m ")
print(f" \x1b[1;31mhi\x1b[0m: {temp_range[0].find('span', {'class': 'maxTemp'}).get_text()}")
print(f" \x1b[1;34mlo\x1b[0m: {temp_range[0].find('span', {'class': 'minTemp'}).get_text()}")
print(f" \x1b[38;5;202mcond\x1b[0m: {condition[0].find('img')['alt'].split()[:-1][0]}")
for _ in range(1, 8):
print(f"\x1b[1;7;32m {pendulum.now().add(days=_).format('dddd'):10}\x1b[0m ")
print(f" \x1b[1;31mhi\x1b[0m: {temp_range[_].find('span', {'class': 'maxTemp'}).get_text()}")
print(f" \x1b[1;34mlo\x1b[0m: {temp_range[_].find('span', {'class': 'minTemp'}).get_text()}")
print(f" \x1b[38;5;202mcond\x1b[0m: {condition[_].find('img')['alt'].split()[:-1][0]}")
@nick3499
Copy link
Author

Notes:

  • requests should already be installed, but both bs4 and pendulum modules may need to be installed.
  • change the source website to your own location.
  • use google keywords 'my header' to search for a site that returns the string of your User-Agent header key, then use your own header string to replace the header string in the script in order to reflect your own system. that header is used to simulate a browser request.
  • pendulum basically simplifies Python's tedious datetime module.
  • bs4 or BeautifulSoup is used to scrape the weather website.
  • be aware that scrapers can break after changes have been made to a website.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment