This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
爬取网页图片: | |
html 代码里面的 img 的标签里面一般放的都是图片地址。 | |
src="http 打头,图片是.jpg 结尾 ,.gif 是一些小图,头像类的, 我们常抓.jpg " | |
widch="" , height="" 都是图片的大小。 | |
抓取URL 常用模块 urllib urllib2: | |
除了"http:",URL同样可以使用"ftp:","file:"等等来替代。 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
编译标志 (S ,I ,L ,M , X ) | |
re.S: . 点号可以匹配除了换行符“\n”以外的任何字符, 但是是\n 就会返回空列表。 在匹配规则的后面加上 re.S 就是 .的位置可以匹配任何字符,包括换行符。 | |
>>> import re | |
>>> | |
>>> cs = r"scb.wq" | |
>>> | |
>>> re.findall(cs,"scbzwq") | |
['scbzwq'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
使用正则表达式: | |
re.findall(规则名,匹配字符串名) :规则匹配参数 | |
re.compile() :把正则表达式编译成对象,匹配的速度要比规则块很多 | |
>>> import re | |
>>> | |
>>> c1 = r"\d{3,4}-?\d{8}$" #定义一个正则c1为前面最少三位数最多四位数-?后面八位数结尾的座机号码,?前面的-可有可无的规则。 | |
>>> | |
>>> c1 | |
'\\d{3,4}-?\\d{8}$' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
\ : 反斜杠,转义字符。 | |
前言: | |
>>> import re | |
>>> r = r"^abc" #定义规则为开头是abc的字符集 | |
>>> re.findall(r,'abc') | |
['abc'] | |
>>> re.findall(r,'adcsds') #与规则不同则会返回空列表 | |
[] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
模块:(代码封装在一起,可以拿出来做调用 py文件) | |
模块好似PYTHON 组织代码的基本方式。 | |
python 的脚本都是用扩展名为.py 的文本文件保存的,一个脚本可以单独运行,也可以导入另一个脚本中运行,当脚本被导入运行时,我们将其称为模块 (module# 马久儿) | |
#模块名与脚本的文件名相同: | |
例如我们编写了一个名为ltems.py的脚本,则可以在另外一个脚本中用(调用函数)import ltems语句来导入我们编写的脚本为模块。 | |
root@kali:~/xuexi# ls | |
10.py 1.py 3.py 6.py 8.py jisuanfu.py new.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#字符串内置函数,功能只限于对字符串的操作。 | |
#死追硬儿 | |
string 字符串函数: | |
变量名.capitalize()#开普特来自 : #把字符串首字母替换为大写。 | |
>>> f = 'hello world' | |
>>> f.capitalize() | |
'Hello world' | |
>>> f | |
'hello world' #调用函数的时候会切换大写,但字符串本身不会做改变 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
内置函数: 比较重点,自我感觉 | |
案例: | |
1、返回数字绝对值 | |
2、取列表最大最小值 | |
 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
匿名函数: | |
lambda是快速定义单行的最小函数; | |
>>> x = lambda a,s:a*s #用lambda参数定义一个匿名函数x, 形参是a和s, 调用的时候取相乘的结果。 | |
>>> x(5,6) #实参 为 5和6 值为相乘 | |
30 | |
1、使用python写一些执行脚本时,使用lambda可以省去定义函数的过程,让代码更加精简。 | |
2、对于一些抽象的,不会别的地方在复用的函数,有时候给函数起个名字也是个难题,使用lambda不需要考虑命名的问题。 | |
3、使用lambda在某些时候让代码更容易理解。、 |