Skip to content

Instantly share code, notes, and snippets.

@rbrito
Created April 30, 2015 12:50
Show Gist options
  • Save rbrito/3972695a11294e555409 to your computer and use it in GitHub Desktop.
Save rbrito/3972695a11294e555409 to your computer and use it in GitHub Desktop.
Doubly encoded URLs in coursera.
import urllib
# Some silly people put invalid URLs like this:
#
# 'https://d396qusza40orc.cloudfront.net/webapplications/https%3A//d396qusza40orc.cloudfront.net/webapplications/lecture_slides/M6-L8-Ajax-Handout.pdf'
#
# as resources for students to download. We work around incompetence
# because, well, we are not like them.
s = 'https://d396qusza40orc.cloudfront.net/webapplications/https%3A//d396qusza40orc.cloudfront.net/webapplications/lecture_slides/M6-L8-Ajax-Handout.pdf'
s_unquoted = urllib.unquote(s)
pos = s_unquoted.rfind('https://')
suffix = s_unquoted[pos:]
prefix = s_unquoted[:pos]
url = suffix if suffix.startswith(prefix) else s_unquoted
print(url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment