Skip to content

Instantly share code, notes, and snippets.

@dzakyputra
Last active April 25, 2020 14:51
Show Gist options
  • Save dzakyputra/01ab5a03abb4541ed7e42a8069d9f62f to your computer and use it in GitHub Desktop.
Save dzakyputra/01ab5a03abb4541ed7e42a8069d9f62f to your computer and use it in GitHub Desktop.
def get_hangeul():
# Scrape the website and get list of titles
url = 'https://www.bbc.com/korean/popular/read'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
titles = soup.findAll('span',
{'class': 'most-popular-list-item__headline'})
# Iterate through titles -> remove punctuation -> append to the list
result = []
for title in titles:
title = re.sub(r'[^\w\s]','',title.text)
words = title.split()
result += words
# Return the unique words
return set(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment