-
登录 B 站
-
打开浏览器 DevTools 的 Network 面板,刷新页面后,在 Network 面板点开任意一个返回结果为 json 类型的请求,复制
请求Cookie
中的内容到cookies.json
中 -
执行脚本导出数据
python export_bili_history.py -c cookies.json -o results.jsonl -p 10
B 站的 API 每页返回 20 条数据,如果想获得更多可以修改
-p 10
为更大的数,不过最多只能导出最近三个月的数据,更早的无法获取了。
Created
December 25, 2021 12:39
-
-
Save Linusp/eeaa2aed4b90cb130fd002d0189ad7ff to your computer and use it in GitHub Desktop.
导出B站观看记录数据
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import json | |
import time | |
import click | |
import requests | |
@click.command() | |
@click.option("-c", "--cookie-file", required=True) | |
@click.option("-o", "--outfile", required=True) | |
@click.option("-p", "--page-num", type=int, default=10) | |
def main(cookie_file, outfile, page_num): | |
cookies = json.load(open(cookie_file)) | |
headers = { | |
'Connection': 'keep-alive', | |
'Host': 'api.bilibili.com', | |
'Referer': 'https://www.bilibili.com/account/history', | |
'User-Agent': ( | |
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) ' | |
'Gecko/20100101 Firefox/90.0' | |
) | |
} | |
session = requests.Session() | |
url = 'https://api.bilibili.com/x/web-interface/history/cursor' | |
params = {'max': 0, 'view_at': 0, 'business': ''} | |
with open(outfile, 'w') as fout: | |
for page_num in range(page_num): | |
time.sleep(1) | |
resp = session.get(url, params=params, headers=headers, cookies=cookies) | |
if resp.status_code != 200: | |
break | |
result = resp.json() | |
if not result.get('data') or result['data']['cursor']['ps'] == 0: | |
break | |
print( | |
f'page = {page_num} ' | |
f'code = {result.get("code")} ' | |
f'datalen = {len(result["data"]["list"])} ' | |
f'cursor = {result["data"]["cursor"]}' | |
) | |
for item in result['data']['list']: | |
print(json.dumps(item, ensure_ascii=False), file=fout) | |
params = { | |
'max': result['data']['cursor']['max'], | |
'view_at': result['data']['cursor']['view_at'], | |
'business': result['data']['cursor']['business'], | |
} | |
if __name__ == '__main__': | |
main() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment