Skip to content

Instantly share code, notes, and snippets.

@sloanlance
Forked from lsloan/urltidy.py
Last active July 20, 2016 16:20
Show Gist options
  • Save sloanlance/772d10d819903b03d4110976746181d9 to your computer and use it in GitHub Desktop.
Save sloanlance/772d10d819903b03d4110976746181d9 to your computer and use it in GitHub Desktop.
Python experiments in removing contiguous slashes from URLs. The `urlparse` module should do this.

This is the README file.

"""
Experiments in removing contiguous slashes in URLs.
Why doesn't urlparse do this for us?
"""
import posixpath
import re
import urlparse
apiBaseURL = 'http://example.org//api/v1/'
apiQueryURI = '/search///////items/////?name=fubar'
# The problem: Too many contiguous slashes
apiFullURL = apiBaseURL + '/' + apiQueryURI
print apiFullURL # http://example.org//api/v1///search//items?name=fubar
# Attempt 1: Fewer slashes, but still too many
apiFullURL = apiBaseURL.strip('/') + '/' + apiQueryURI.strip('/')
print apiFullURL # http://example.org//api/v1/search//items?name=fubar
# Attempt 2: Maybe safer than above, but similar results
apiFullURL = apiBaseURL.rstrip('/') + '/' + apiQueryURI.lstrip('/')
print apiFullURL # http://example.org//api/v1/search//items?name=fubar
# Attempt 3: A regex works, but is it safe? (Now we have two problems.)
apiFullURL = apiBaseURL + '/' + apiQueryURI
newApiFullURL = re.sub(r'([^:])/+', r'\1/', apiFullURL)
print newApiFullURL # http://example.org/api/v1/search/items?name=fubar
# Attempt 4: The code below is longer, but works safely
# Parse the URL into parts as a mutable, ordered dictionary
apiFullURLParts = urlparse.urlparse(apiBaseURL + '/' + apiQueryURI)._asdict()
# POSIX path normalization is simple, but leaves 1-2 leading slashes
apiFullURLParts['path'] = posixpath.normpath(apiFullURLParts['path'].strip('/'))
# Success! No contiguous slashes
newApiFullURL = urlparse.urlunparse(apiFullURLParts.values())
print newApiFullURL # http://example.org/api/v1/search/items?name=fubar
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment