Skip to content

Instantly share code, notes, and snippets.

@willxiang
Created March 27, 2016 14:08
Show Gist options
  • Select an option

  • Save willxiang/c68605f2e49c437a636b to your computer and use it in GitHub Desktop.

Select an option

Save willxiang/c68605f2e49c437a636b to your computer and use it in GitHub Desktop.
python regular expression
# coding:utf-8
import re
from ExportXiaMiList.HttpHelper import HttpHelper
from bs4 import BeautifulSoup
# url = 'http://faxian.smzdm.com/9kuai9/p1'
url = 'http://sakurako2007.tumblr.com/'
helper = HttpHelper()
soup = helper.getSoupHtml(url)
# soup = BeautifulSoup(helper.getRawText(),'html.parser')
lista = soup.find_all('img', src=re.compile(r'http://(.*?)\.media\.tumblr\.com/(.*?)/(.*?_1280)\.jpg'))
for img in lista:
print(img['src'])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment