Skip to content

Instantly share code, notes, and snippets.

@caingougou
Last active August 29, 2015 13:57
Show Gist options
  • Save caingougou/9682979 to your computer and use it in GitHub Desktop.
Save caingougou/9682979 to your computer and use it in GitHub Desktop.
checkbookmarks
# -*- coding: utf8 -*-
import sys
from bs4 import BeautifulSoup
reload(sys)
sys.setdefaultencoding('utf8')
f = open("/home/cain/bookmarks_3_21_14.html")
content = f.read()
f.close()
soup = BeautifulSoup(content)
i = 0
for link in soup.find_all('a'):
i += 1
print i
print link.get_text() + link.get('href')
require 'markio'
require 'nokogiri'
require 'open-uri'
require 'ostruct'
filename = ARGV[0] || '/home/cain/bookmarks_3_21_14.html'
bookmarks = Markio::parse(File.open(filename))
bookmarks.each do |b|
begin
doc = Nokogiri::HTML(open(b.href))
if doc.css('title')[0].content != b.title
puts doc.css('title')[0].content + " !== " + b.title
else
puts b.title + " found!"
end
rescue Exception => e
puts b.href + " is not accessible"
end
sleep(1)
# b.title # String
# b.href # String with bookmark URL
# b.folders # Array of strings - folders (tags)
# b.add_date # DateTime
# b.last_visit # DateTime
# b.last_modified # DateTime
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment