Skip to content

Instantly share code, notes, and snippets.

@divoxx
Last active December 21, 2015 02:48
Show Gist options
  • Select an option

  • Save divoxx/6237110 to your computer and use it in GitHub Desktop.

Select an option

Save divoxx/6237110 to your computer and use it in GitHub Desktop.
# curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | grep -vE "^#" | awk '{print length($1),$0}' | sort -k1nr | cut -d' ' -f 2- | tr '\n' '|'
TOP_LEVEL = /XN--CLCHC0EA0B2G2A9GCD|XN--HGBK6AJ7F53BBA|XN--HLCJ6AYA9ESC7A|XN--11B5BS3A9AJ6G|XN--MGBERP4A5D4AR|XN--XKC2DL3A5EE0H|XN--80AKHBYKNJ4F|XN--XKC2AL3HYE2A|XN--LGBBAT1AD8J|XN--MGBC0A9AZCG|XN--9T4B11YI5A|XN--MGBAAM7A8H|XN--MGBAYH7GPA|XN--MGBBH1A71E|XN--MGBX4CD0AB|XN--FPCRJ9C3D|XN--FZC2C9E2C|XN--YFRO4I67O|XN--YGBI2AMMX|XN--3E0B707E|XN--JXALPDLP|XN--KGBECHTV|XN--MGB9AWBF|XN--OGBPF8FL|XN--0ZWM56D|XN--45BRJ9C|XN--80AO21A|XN--DEBA0AD|XN--G6W251D|XN--GECRJ9C|XN--H2BRJ9C|XN--J6W193G|XN--KPRW13D|XN--KPRY57D|XN--PGBS0DH|XN--S9BRJ9C|XN--90A3AC|XN--FIQS8S|XN--FIQZ9S|XN--O3CW4H|XN--WGBH1C|XN--WGBL6A|XN--ZCKZAH|XN--P1AI|MUSEUM|TRAVEL|AERO|ARPA|ASIA|COOP|INFO|JOBS|MOBI|NAME|POST|BIZ|CAT|COM|EDU|GOV|INT|MIL|NET|ORG|PRO|TEL|XXX|AC|AD|AE|AF|AG|AI|AL|AM|AN|AO|AQ|AR|AS|AT|AU|AW|AX|AZ|BA|BB|BD|BE|BF|BG|BH|BI|BJ|BM|BN|BO|BR|BS|BT|BV|BW|BY|BZ|CA|CC|CD|CF|CG|CH|CI|CK|CL|CM|CN|CO|CR|CU|CV|CW|CX|CY|CZ|DE|DJ|DK|DM|DO|DZ|EC|EE|EG|ER|ES|ET|EU|FI|FJ|FK|FM|FO|FR|GA|GB|GD|GE|GF|GG|GH|GI|GL|GM|GN|GP|GQ|GR|GS|GT|GU|GW|GY|HK|HM|HN|HR|HT|HU|ID|IE|IL|IM|IN|IO|IQ|IR|IS|IT|JE|JM|JO|JP|KE|KG|KH|KI|KM|KN|KP|KR|KW|KY|KZ|LA|LB|LC|LI|LK|LR|LS|LT|LU|LV|LY|MA|MC|MD|ME|MG|MH|MK|ML|MM|MN|MO|MP|MQ|MR|MS|MT|MU|MV|MW|MX|MY|MZ|NA|NC|NE|NF|NG|NI|NL|NO|NP|NR|NU|NZ|OM|PA|PE|PF|PG|PH|PK|PL|PM|PN|PR|PS|PT|PW|PY|QA|RE|RO|RS|RU|RW|SA|SB|SC|SD|SE|SG|SH|SI|SJ|SK|SL|SM|SN|SO|SR|ST|SU|SV|SX|SY|SZ|TC|TD|TF|TG|TH|TJ|TK|TL|TM|TN|TO|TP|TR|TT|TV|TW|TZ|UA|UG|UK|US|UY|UZ|VA|VC|VE|VG|VI|VN|VU|WF|WS|YE|YT|ZA|ZM|ZW/i
PROXY_URL = /
https?:\/\/ # Either starts with http or https
(?:(?<=:\/\/)[^:@]+(?::[^@]+)?@)? # Userinfo (username:password@), but only if protocol is specified
(?:www\.)? # It might contain www
(?:[\w_-]+\.)+ # Needs at least one domain above top-level
(?:#{TOP_LEVEL}) # Available top-level domains, this allows for easy matching such as "google.com"
(?::\d+)? # With port
(?:\/.*?)? # Matches the path of the URI
\?.*? # Matches the parameters
= # Expects an empty param in the end
/mix
good = [
"http://aol.com?url=",
"http://aol.com/123/something?url=",
"https://aol.com/123/something?url=",
"https://aol.com/123/something?foo=bar&url=",
"https://aol.com:123/123/something?foo=bar&url=",
"https://www.aol.com:123/123/something?foo=bar&url=",
"https://www.aol.com:123/123/something?foo=bar&url=",
]
bad = [
"http://aol.com?url",
]
good.each do |url|
if url =~ PROXY_URL
puts "#{url} ok"
else
puts "#{url} failed"
end
end
bad.each do |url|
if url =~ PROXY_URL
puts "#{url} ok, should have failed"
else
puts "#{url} failed, as expected"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment