Skip to content

Instantly share code, notes, and snippets.

@crmaxx
Last active October 16, 2015 13:45
Show Gist options
  • Save crmaxx/aa52d0b1bf087966f04c to your computer and use it in GitHub Desktop.
Save crmaxx/aa52d0b1bf087966f04c to your computer and use it in GitHub Desktop.
PoC for parce browser_history dump
#!/usr/bin/env ruby
require "addressable/uri"
require 'nokogiri'
require 'string_monkey_path'
def item_field_by_name(item, name)
item.css(name).first.children.map(&:to_s).first || ""
end
def get_host_by_uri(url)
site = Addressable::URI.parse(url).site
return unless site.include?(".")
return site if site.include?("http")
"http://#{site}"
rescue Addressable::URI::InvalidURIError => e
raise "Unable to parse url '#{url}'. \n #{e.class}: #{e}."
end
def get_host(url)
host = url[7..-1].downcase.split(".")
site = host.last.is_number? ? host.join(".") : host.reverse[0, 2].reverse.join(".")
return unless site.include?(".")
return site if site.include?("http")
"http://#{site}"
end
def collect_browsers
parced_xml_file = ::Nokogiri::XML(::File.read(ARGV[0]))
items = parced_xml_file.xpath("//browsing_history_items/item")
items.map do |item|
url = item_field_by_name(item, "url")
case
when (url.include?(":Host:") && !url.include?("My Computer"))
get_host(url)
when url.includes?(::URI.scheme_list.keys.map(&:downcase))
get_host_by_uri(url)
else
next
end
end.compact.uniq
end
puts collect_browsers.inspect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment