Created
January 2, 2018 21:46
-
-
Save vdobrev/f6f2df38ab6cd2768a2e7778c4964737 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# KPI 1. Domain dynamics | |
# values: None (1) / Small (1.1) / Medium (1.3) / Large (1.5) | |
# min/max = 1/1.5 | |
# get the domain root of the result link url | |
root = URI.parse(url).host # root = URI.join url, '/' | |
html = Nokogiri::HTML(open(root)) | |
# if not collected before parse html for dates | |
unless Domain.exists?(root) | |
found = [] | |
date_formats = [ | |
/\d{4}-\d{2}-\d{2}/, /\d{4}-\d{1}-\d{2}/, /\d{4}-\d{1}-\d{1}/, /\d{4}-\d{2}-\d{1}/, # yyyy-mm-dd.. | |
/\d{2}-\d{2}-\d{4}/, /\d{2}-\d{1}-\d{4}/, /\d{1}-\d{1}-\d{4}/, /\d{1}-\d{2}-\d{4}/, # dd-mm-yyyy.. | |
/\d{4}\.\d{2}\.\d{2}/, /\d{4}\.\d{2}\.\d{1}/, # yyyy.mm.dd | |
/\d{2}\.\d{2}\.\d{4}/, /\d{1}\.\d{2}\.\d{4}/, # dd.mm.yyyy | |
/\d{4}\/\d{2}\/\d{2}/, /\d{4}\/\d{2}\/\d{1}/, # yyyy/mm/dd | |
/\d{2}\/\d{2}\/\d{4}/, /\d{1}\/\d{2}\/\d{4}/ # dd/mm/yyyy | |
] | |
date_formats.each_with_index { |v,k| found[k] = html.scan(v).size } | |
# if any found, check how recent | |
else | |
# if checked recently (1w) return the latest value | |
# or get % change from last time .. https://github.com/postmodern/nokogiri-diff | |
end | |
# store result | |
Domain.new(root,html) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment