Created
November 28, 2019 17:59
-
-
Save sinebeef/9a7304e088d46de816daebc1f3d75468 to your computer and use it in GitHub Desktop.
Python script, output list of products / category names for whatever sitemap you provide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Lists all the items from either product-sitemap.xml or product_category-sitemap.xml | |
# you must provide the full url. | |
from bs4 import BeautifulSoup | |
import json, requests | |
import re, sys | |
url = "" | |
try: | |
url = sys.argv[1] | |
except: | |
pass | |
if url: | |
r = requests.get(url) | |
soup = BeautifulSoup(r.content, 'html.parser') | |
links = soup.find_all('loc') | |
for link in links: | |
print(link.text.split('/')[-2]) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment