Created
December 31, 2018 17:40
-
-
Save obswork/2cb8e4c5b790c28abeedb40c45218aca to your computer and use it in GitHub Desktop.
extract your "Top 100 Songs 2018" (spotify)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Prerequisites: | |
(1) You'll first need to open up the developer console in e.g. Chrome (cmd-shift-c) | |
(2) Locate the div element with id "main" (should be relatively easy to find as it's the main enclosing div in the body) | |
(3) Copy all the inner html of that div (easy way- right-click and select "Edit as HTML", then copy normally) | |
(4) Save that to a file somewhere (e.g. /tmp/tracklist.html) | |
(5) Open up a python shell (preferably iPython!) and execute the following | |
""" | |
import lxml.html | |
# read in the file from wherever it is saved | |
with open('/tmp/tracklist.html', 'r') as f: | |
html = f.read() | |
# convert the html blob into an lxml tree | |
tree = lxml.html.fragment_fromstring(html) | |
# grab all the songs | |
songs = tree.xpath("//div[contains(concat(' ', normalize-space(@class), ' '),' tracklist-name ')]") | |
# grab all the artist/album info | |
metadata = tree.xpath("//div[contains(concat(' ', normalize-space(@class), ' '),' second-line ')]") | |
# zip the songs and metadata together, clean up the results a little, and spit them out | |
for s, m in zip(songs, metadata): | |
meta = m.text_content()[8:] if "Explicit" in m.text_content() else m.text_content() | |
meta = meta.replace('•', '/') | |
print("%s - %s" % (s.text, meta)) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment