Skip to content

Instantly share code, notes, and snippets.

@mr-yoo
Created April 23, 2021 18:43
Show Gist options
  • Save mr-yoo/0a75b4253e8f3939cddd63841c567577 to your computer and use it in GitHub Desktop.
Save mr-yoo/0a75b4253e8f3939cddd63841c567577 to your computer and use it in GitHub Desktop.
금융감독원 공시에서 첨부파일 스크래핑
import requests
from bs4 import BeautifulSoup
import time
for page in range(1, 6):
url = f"https://www.fss.or.kr/fss/kr/bbs/list.jsp?url=/fss/kr/1207404857348&bbsid=1207404857348&page={page}"
resp = requests.get(url)
soup = BeautifulSoup(resp.text, 'html5lib')
sel = "#contents_area > div.contents > table > tbody > tr > td.tit > a"
titles = soup.select(sel)
for tag in titles:
sub_url = "https://www.fss.or.kr/fss/kr/bbs" + tag['href'][1:]
resp = requests.get(sub_url)
soup = BeautifulSoup(resp.text, 'html5lib')
sel = "#contents_area > div.contents > table:nth-child(1) > tbody > tr:nth-child(3) > td > a"
links = soup.select(sel)
for item in links:
print(item.text.strip())
file_url = "https://www.fss.or.kr" + item['href']
print(file_url)
resp = requests.get(file_url)
with open(item.text.strip(), "wb") as f:
f.write(resp.content)
time.sleep(1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment