Skip to content

Instantly share code, notes, and snippets.

@upangka
Created January 2, 2020 15:44
Show Gist options
  • Save upangka/bbba0042bf1337fd6de91d48017fb23d to your computer and use it in GitHub Desktop.
Save upangka/bbba0042bf1337fd6de91d48017fb23d to your computer and use it in GitHub Desktop.
python 使用shadowsocks代理,初步访问外网

写作背景

  1. 在网上购买了一个shadowsocket,科学上网服务。用其服务来作为本地代理,初步建立爬取墙外信息。
    1. 可以跟着国外的教程走
    2. 想去youtube下爬取我喜欢的栏目信息

初步了解shadowsocket

img

简单来说就是,本地机器上发送墙外的请求,通过代理工具shadowsocket来完成。shadowsocket安装在本地机器上的客户端,即小飞机,通过socket5协议,加密数据包,与墙外的服务器上的shadowsock服务端软件进行传输数据,有墙外服务负责与我们的目标网站进行沟通,从而达到绕过,国内防火墙GFW的目的。

Ref-1: shadowsocks实现原理 Ref-2: Shadowsocks(R)基本原理

本地安装shadowsocket客户端

google: shadowsocket windown download

download->shadowsockets发行版本

关于客户端shadowsocket的基本介绍->Shadowsocks指导篇(总结归类)——从无到有,境无止尽

从服务商那里获取服务器地址address,端口port,密码passwd,加密方法encryption method.在本地启动代理。

shadowsocket默认启动的端口是1080端口为socks5代理端口 Windows下如何查看某个端口被谁占用

// 查看端口1080,占用的进程PID ,为29236
netstat -aon | findstr "1080"
> TCP    127.0.0.1:1080         0.0.0.0:0              LISTENING       29236

根据PID查看进程的详细信息

tasklist | findstr "29236"
> Shadowsocks.exe              29236 Console                    1     12,464 K

python 使用代理访问外网

import requests

proxies = {
    "http": "http://127.0.0.1:1080",
    "https": "https://127.0.0.1:1080"
}


'''''http://icanhazip.com会返回当前的IP地址''' 

url = "http://icanhazip.com"
head = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',  
'Connection': 'keep-alive'}  
response = requests.get(url,proxies=proxies,headers=head)
print(response.text) 

'''
获取google首页
'''
url = "http://www.google.com"
response = requests.get(url,proxies)
with open("index.html","w",encoding="utf-8") as f:
    f.write(response.text)
  1. 使用requests库,简单方便
  2. 代理格式设置,而不是
{
    "http": "socks5://127.0.0.1:1080",
    "https": "socks5://127.0.0.1:1080"
}

具体原因为issue:: Not supported proxy scheme socks5

  1. 访问http://icanhazip.com网站会返回,访问这个网站的IP地址(Ref- python requests 测试代理ip是否生效),通过观察结果104.238.150.226,这个IP address 确实是我设置shadesocket时的节点(日本)。 IP地址查询

  2. 通过访问谷歌首页,保存为本地文件。运行程序后,打卡生成的index.html确实为google首页。

import requests
proxies = {
"http": "http://127.0.0.1:1080",
"https": "https://127.0.0.1:1080"
}
'''''http://icanhazip.com会返回当前的IP地址'''
url = "http://icanhazip.com"
head = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
'Connection': 'keep-alive'}
response = requests.get(url,proxies=proxies,headers=head)
print(response.text)
'''
获取google首页
'''
url = "http://www.google.com"
response = requests.get(url,proxies)
with open("index.html","w",encoding="utf-8") as f:
f.write(response.text)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment