Skip to content

Instantly share code, notes, and snippets.

@jiahao
Created August 31, 2021 15:37
Show Gist options
  • Save jiahao/9c17bfb43502bb0f7d195e70c354c145 to your computer and use it in GitHub Desktop.
Save jiahao/9c17bfb43502bb0f7d195e70c354c145 to your computer and use it in GitHub Desktop.
Scrape my arXiv profile to list one PDF per line. Useful for updating conference submission profiles
from bs4 import BeautifulSoup
import urllib.request
url = "https://arxiv.org/a/chen_j_2.html"
with urllib.request.urlopen(url) as response:
html = response.read()
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
link_url = link.get('href')
if r"/pdf/" in link_url:
print("https://arxiv.org"+link_url)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment