Skip to content

Instantly share code, notes, and snippets.

@hamletbatista
Created March 12, 2019 19:32
Show Gist options
  • Save hamletbatista/9af977b9e73d7a30b47af9e216d0000e to your computer and use it in GitHub Desktop.
Save hamletbatista/9af977b9e73d7a30b47af9e216d0000e to your computer and use it in GitHub Desktop.
#convert absolute URLs to relative
from urllib.parse import urlsplit, urlunsplit
#Absolute source URLs linking to 404s from Search Console API: webmasters.urlcrawlerrorssamples.list
linkedFromUrls= [
"http://www.example.com/brand/swirly/shopby?sizecode=99",
"https://www.example.com/brand/swirly"
]
#first break url into parts
u = urlsplit(linkedFromUrls[0])
# u -> SplitResult(scheme='http', netloc='www.example.com', path='/brand/swirly/shopby', query='sizecode=99', fragment='')
#then rebuild it back with empty scheme, netloc and fragment
relative_url = urlunsplit(("", "", o.path, o.query, ""))
print(relative_url)
#Output -> /brand/swirly/shopby?sizecode=99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment