Last active
January 3, 2016 20:49
-
-
Save shiumachi/8517272 to your computer and use it in GitHub Desktop.
指定したURLのrequestオブジェクトからhrefのリストを取得する (注: BS3 のコード。BS4 のコードを見たい場合はこちらを参照 https://gist.github.com/shiumachi/8633275 )
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
# 参考: http://kondou.com/BS4/ | |
# note: This code uses BeautifulSoup3 which is deprecated. | |
# If you need code sample of BS, please see https://gist.github.com/shiumachi/8633275 | |
from BeautifulSoup import BeautifulSoup | |
import requests | |
def get_href_list(requests_obj): | |
soup = BeautifulSoup(requests_obj.text) | |
href_list = [] | |
for i in soup.findAll('a'): | |
href_list.append(i.get('href')) | |
return href_list | |
if __name__ == '__main__': | |
r = requests.get("http://yahoo.co.jp/") | |
href_list = get_href_list(r) | |
for h in href_list: | |
print(h) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment