Skip to content

Instantly share code, notes, and snippets.

@loisaidasam
Last active April 5, 2016 17:48
Show Gist options
  • Select an option

  • Save loisaidasam/9b98f89ab705476684685a7c0aa31403 to your computer and use it in GitHub Desktop.

Select an option

Save loisaidasam/9b98f89ab705476684685a7c0aa31403 to your computer and use it in GitHub Desktop.
Delicious Tag Slugger

Delicious Tag Slugger

I just migrated from Delicious to Pinboard:

https://twitter.com/LoisaidaSam/status/717372389307322368

and had an issue where multi-word tags were all saved separately (Raspberry & Pi instead of Raspberry Pi)

https://twitter.com/LoisaidaSam/status/717374089216786432

So I wrote this tag slugger to take your exported delicious.html and fix up the tags to be slugged so Pinboard knows how to read them (Raspberry-Pi).

Usage:

$ python slugger.py delicious.html delicious_fixed.html

I even geeked out and wrote a few tests to confirm/validate my edge case suspicions:

$ python test.py

Check out the included delicious.html and delicious_fixed.html samples to see how it worked.

Happy pinning!

<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<!-- This is an automatically generated file.
It will be read and overwritten.
Do Not Edit! -->
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL><p>
<DT><A HREF="http://giphy.com/posts/giphy-slack-easter-eggs-2/" ADD_DATE="1459806962" PRIVATE="0" TAGS="Giphy, Slack">GIPHY + Slack Easter Eggs! | GIPHY</A>
<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="Jon Hendren,Edward Snowden">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>
<DT><A HREF="https://www.teslamotors.com/model3" ADD_DATE="1459787672" PRIVATE="0" TAGS="Tesla,,cars">Model 3 | Tesla Motors</A>
<DT><A HREF="http://www.museumofconceptualart.com/clockworks/clockworks.html" ADD_DATE="1459545730" PRIVATE="0" TAGS="">The Clockworks Project</A>
<DT><A HREF="http://firstround.com/review/Spotifys-Design-Lead-on-Why-Side-Projects-Should-be-Stupid/" ADD_DATE="1459545711" PRIVATE="0" TAGS="Spotify">Spotify’s Design Lead on Why Side Projects Should Be Stupid | First Round Review</A>
<DT><A HREF="http://www.mattmahoney.net/barkley/" ADD_DATE="1459545709" PRIVATE="0" TAGS="">The Barkley Marathons</A>
<DT><A HREF="http://www.newsweek.com/secret-ska-history-man-business-suit-levitating-emoji-442192?piano_t=1" ADD_DATE="1459545704" PRIVATE="0" TAGS="emoji">The Secret Ska History of That Weird Levitating Businessman Emoji</A>
<DT><A HREF="http://blog.instagram.com/post/141107034797/160315-news" ADD_DATE="1459545675" PRIVATE="0" TAGS="Instagram">See the Moments You Care About First - Instagram Blog</A>
<DT><A HREF="http://www.impactinterview.com/2009/10/140-google-interview-questions/" ADD_DATE="1459289218" PRIVATE="0" TAGS="interview,interviews,interview questions">140 Google Interview Questions | Impact Interview</A>
<DT><A HREF="http://ny.eater.com/2016/3/29/11322284/mealpass-nyc-launch" ADD_DATE="1459289024" PRIVATE="0" TAGS="NYC, food">MealPass, a ClassPass-Style Lunch Service, Launches in NYC This Week - Eater NY</A>
<DT><A HREF="http://www.digitaltrends.com/wearables/ringly-aries-bracelet-news/" ADD_DATE="1459289021" PRIVATE="0" TAGS="Ringly,,wearables">Ringly Brings Smart Tech to Bracelets | Digital Trends</A>
<DT><A HREF="https://github.com/blog/2135-saved-replies" ADD_DATE="1459289002" PRIVATE="0" TAGS="Github,git,dev">Saved replies</A>
<DT><A HREF="https://connect.garmin.com/modern/course/11225771" ADD_DATE="1459272106" PRIVATE="0" TAGS="cycling,San Fransisco">Garmin Connect</A>
<DT><A HREF="http://www.brooklynvegan.com/lcd-soundsystem-webster-hall-night-2-setlist-video-instagrams/" ADD_DATE="1459272089" PRIVATE="0" TAGS="LCD Soundsystem,music">LCD Soundsystem played a 2nd show at Webster Hall (setlist, video, pics), expand tour, playing Red Rocks w/ Savages</A>
<DT><A HREF="http://www.brooklynvegan.com/lcd-soundsystem-played-their-first-show-in-five-years-webster-hall-setlist-review/" ADD_DATE="1459272055" PRIVATE="0" TAGS="LCD Soundsystem,music">LCD Soundsystem played their first show in five years @ Webster Hall (setlist / review)</A>
<DT><A HREF="http://www.brooklynvegan.com/j-mascis-patti-smith/" ADD_DATE="1459272007" PRIVATE="0" TAGS="David Bowie,,music">J Mascis, Patti Smith & more added to this week’s big David Bowie tributes; Radio City show will be streamed live</A>
<DT><A HREF="http://www.brooklynvegan.com/five-notable-releases-of-the-week-32516/" ADD_DATE="1459271929" PRIVATE="0" TAGS="music">Five Notable Releases of the Week (3/25/16)</A>
<DT><A HREF="http://www.wired.com/2016/03/soundclouds-new-venture-mixes-social-network-music-service/" ADD_DATE="1459271905" PRIVATE="0" TAGS="Soundcloud,music">SoundCloud Go: An Audacious Answer to Spotify That’s Dying to Stand Out | WIRED</A>
<DT><A HREF="http://www.esquire.com/entertainment/music/news/a43372/lcd-soundsystem-return-show-webster-video/" ADD_DATE="1459209324" PRIVATE="0" TAGS="LCD Soundsystem,music,">LCD Soundsystem Return Show at Webster Hall Video</A>
<DT><A HREF="https://drewdevault.com/2014/10/10/The-profitability-of-online-services.html" ADD_DATE="1459208900" PRIVATE="0" TAGS="">On the profitability of image hosting websites</A>
<DT><A HREF="http://www.thedailybeast.com/articles/2016/03/22/butch-vig-on-the-25th-anniversary-of-nirvana-s-nevermind-and-the-mediocre-state-of-music.html" ADD_DATE="1459182336" PRIVATE="0" TAGS="Butch Vig,,music">Butch Vig on the 25th Anniversary of Nirvana’s ‘Nevermind’ and the ‘Mediocre’ State of Music - The Daily Beast</A>
</DL><p>
<!DOCTYPE NETSCAPE-Bookmark-file-1>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
<!-- This is an automatically generated file.
It will be read and overwritten.
Do Not Edit! -->
<TITLE>Bookmarks</TITLE>
<H1>Bookmarks</H1>
<DL><p>
<DT><A HREF="http://giphy.com/posts/giphy-slack-easter-eggs-2/" ADD_DATE="1459806962" PRIVATE="0" TAGS="Giphy,Slack">GIPHY + Slack Easter Eggs! | GIPHY</A>
<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="Jon-Hendren,Edward-Snowden">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>
<DT><A HREF="https://www.teslamotors.com/model3" ADD_DATE="1459787672" PRIVATE="0" TAGS="Tesla,cars">Model 3 | Tesla Motors</A>
<DT><A HREF="http://www.museumofconceptualart.com/clockworks/clockworks.html" ADD_DATE="1459545730" PRIVATE="0" TAGS="">The Clockworks Project</A>
<DT><A HREF="http://firstround.com/review/Spotifys-Design-Lead-on-Why-Side-Projects-Should-be-Stupid/" ADD_DATE="1459545711" PRIVATE="0" TAGS="Spotify">Spotify’s Design Lead on Why Side Projects Should Be Stupid | First Round Review</A>
<DT><A HREF="http://www.mattmahoney.net/barkley/" ADD_DATE="1459545709" PRIVATE="0" TAGS="">The Barkley Marathons</A>
<DT><A HREF="http://www.newsweek.com/secret-ska-history-man-business-suit-levitating-emoji-442192?piano_t=1" ADD_DATE="1459545704" PRIVATE="0" TAGS="emoji">The Secret Ska History of That Weird Levitating Businessman Emoji</A>
<DT><A HREF="http://blog.instagram.com/post/141107034797/160315-news" ADD_DATE="1459545675" PRIVATE="0" TAGS="Instagram">See the Moments You Care About First - Instagram Blog</A>
<DT><A HREF="http://www.impactinterview.com/2009/10/140-google-interview-questions/" ADD_DATE="1459289218" PRIVATE="0" TAGS="interview,interviews,interview-questions">140 Google Interview Questions | Impact Interview</A>
<DT><A HREF="http://ny.eater.com/2016/3/29/11322284/mealpass-nyc-launch" ADD_DATE="1459289024" PRIVATE="0" TAGS="NYC,food">MealPass, a ClassPass-Style Lunch Service, Launches in NYC This Week - Eater NY</A>
<DT><A HREF="http://www.digitaltrends.com/wearables/ringly-aries-bracelet-news/" ADD_DATE="1459289021" PRIVATE="0" TAGS="Ringly,wearables">Ringly Brings Smart Tech to Bracelets | Digital Trends</A>
<DT><A HREF="https://github.com/blog/2135-saved-replies" ADD_DATE="1459289002" PRIVATE="0" TAGS="Github,git,dev">Saved replies</A>
<DT><A HREF="https://connect.garmin.com/modern/course/11225771" ADD_DATE="1459272106" PRIVATE="0" TAGS="cycling,San-Fransisco">Garmin Connect</A>
<DT><A HREF="http://www.brooklynvegan.com/lcd-soundsystem-webster-hall-night-2-setlist-video-instagrams/" ADD_DATE="1459272089" PRIVATE="0" TAGS="LCD-Soundsystem,music">LCD Soundsystem played a 2nd show at Webster Hall (setlist, video, pics), expand tour, playing Red Rocks w/ Savages</A>
<DT><A HREF="http://www.brooklynvegan.com/lcd-soundsystem-played-their-first-show-in-five-years-webster-hall-setlist-review/" ADD_DATE="1459272055" PRIVATE="0" TAGS="LCD-Soundsystem,music">LCD Soundsystem played their first show in five years @ Webster Hall (setlist / review)</A>
<DT><A HREF="http://www.brooklynvegan.com/j-mascis-patti-smith/" ADD_DATE="1459272007" PRIVATE="0" TAGS="David-Bowie,music">J Mascis, Patti Smith & more added to this week’s big David Bowie tributes; Radio City show will be streamed live</A>
<DT><A HREF="http://www.brooklynvegan.com/five-notable-releases-of-the-week-32516/" ADD_DATE="1459271929" PRIVATE="0" TAGS="music">Five Notable Releases of the Week (3/25/16)</A>
<DT><A HREF="http://www.wired.com/2016/03/soundclouds-new-venture-mixes-social-network-music-service/" ADD_DATE="1459271905" PRIVATE="0" TAGS="Soundcloud,music">SoundCloud Go: An Audacious Answer to Spotify That’s Dying to Stand Out | WIRED</A>
<DT><A HREF="http://www.esquire.com/entertainment/music/news/a43372/lcd-soundsystem-return-show-webster-video/" ADD_DATE="1459209324" PRIVATE="0" TAGS="LCD-Soundsystem,music">LCD Soundsystem Return Show at Webster Hall Video</A>
<DT><A HREF="https://drewdevault.com/2014/10/10/The-profitability-of-online-services.html" ADD_DATE="1459208900" PRIVATE="0" TAGS="">On the profitability of image hosting websites</A>
<DT><A HREF="http://www.thedailybeast.com/articles/2016/03/22/butch-vig-on-the-25th-anniversary-of-nirvana-s-nevermind-and-the-mediocre-state-of-music.html" ADD_DATE="1459182336" PRIVATE="0" TAGS="Butch-Vig,music">Butch Vig on the 25th Anniversary of Nirvana’s ‘Nevermind’ and the ‘Mediocre’ State of Music - The Daily Beast</A>
</DL><p>
import sys
def _get_parts(line):
tags_part_start = line.index('TAGS=') + 6
before = line[:tags_part_start]
tags_part = ""
escaped = False
for i, char in enumerate(line[tags_part_start:]):
if escaped:
tags_part += char
escaped = False
continue
if char == '\\':
tags_part += char
escaped = True
continue
if char == '"':
return before, tags_part, line[tags_part_start+i:]
tags_part += char
raise Exception("Unable to parse 'TAGS=\"...\"` from line!")
def _slug_tags(tags_part):
tags = tags_part.split(',')
tags = map(lambda tag: tag.strip(), tags)
tags = filter(None, tags)
tags = map(lambda tag: tag.replace(' ', '-'), tags)
return ','.join(tags)
def slug_delicious_tags(filename, filename_fixed):
with open(filename, 'r') as fp:
with open(filename_fixed, 'w') as fp_fixed:
for line in fp:
if 'TAGS=' not in line:
fp_fixed.write(line)
continue
before, tags_part, after = _get_parts(line)
tags_part_slugged = _slug_tags(tags_part)
result = before + tags_part_slugged + after
fp_fixed.write(result)
if __name__ == '__main__':
filename = sys.argv[1]
filename_fixed = sys.argv[2]
slug_delicious_tags(filename, filename_fixed)
import unittest
import slugger
class GetPartsTests(unittest.TestCase):
def test_basic(self):
line = '''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="Jon Hendren,Edward Snowden">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
parts = slugger._get_parts(line)
self.assertEqual(parts, (
'''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="''',
'''Jon Hendren,Edward Snowden''',
'''">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
))
def test_with_quotes(self):
line = '''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="Jon \\"fuzzy\\" Hendren,Edward Snowden">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
parts = slugger._get_parts(line)
self.assertEqual(parts, (
'''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="''',
'''Jon \\"fuzzy\\" Hendren,Edward Snowden''',
'''">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
))
def test_empty(self):
line = '''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
parts = slugger._get_parts(line)
self.assertEqual(parts, (
'''<DT><A HREF="http://www.esquire.com/news-politics/a38421/edward-scissorhands-hln/" ADD_DATE="1459797443" PRIVATE="0" TAGS="''',
"",
'''">Jon Hendren Discusses Edward Scissorhands Instead of Edward Snowden</A>'''
))
class SlugTagsTests(unittest.TestCase):
def test_empty(self):
result = slugger._slug_tags('')
self.assertEqual(result, '')
def test_spaces_and_commas(self):
result = slugger._slug_tags("Jon Hendren, Edward Snowden")
self.assertEqual(result, "Jon-Hendren,Edward-Snowden")
def test_empty_nonsense(self):
result = slugger._slug_tags(" , ")
self.assertEqual(result, "")
if __name__ == '__main__':
unittest.main()
@loisaidasam
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment