This document describes an SOP for archiving a YouTube livestream, either public or private.
The demonstrations below are operated under Arch Linux, but it should work on other systems as well, including Windows MSYS2.
This SOP was originally written for archiving Gawr Gura’s unarchived streams.
The result of the archive consists of:
-
an MPEG2-TS video file
-
a thumbnail file
-
a metadata JSON file
-
a streamlink trace log file
-
an integrity check log file
-
a raw livechat JSON file
-
a rendered livechat HTML file
The folder layout will be like:
.
├── m-Bq5CG_rGQ.html
├── m-Bq5CG_rGQ.json.gz
├── slchk.log
├── streamlink.log
├── [UNARCHIVED KARAOKE] Jazz Lounge!-m-Bq5CG_rGQ.info.json
├── [UNARCHIVED KARAOKE] Jazz Lounge!-m-Bq5CG_rGQ.jpg
└── [UNARCHIVED KARAOKE] Jazz Lounge!-m-Bq5CG_rGQ.ts
You will need
-
streamlink
-
youtube-dl
-
ffmpeg
-
virtualenv
-
pytchat
-
scripts in this gist
$ sudo pacman -S streamlink youtube-dl python-virtualenv ffmpeg
$ git clone https://github.com/taizan-hokuto/pytchat.git
$ cd pytchat
$ virtualenv venv
$ . venv/bin/activate
$ pip install -r requirements.txt
$ deactivate
Note
|
This step is only needed if you are going to archive a private stream. |
Export your cookies of host youtube.com
to a cookies.txt
, then you need to test and sanitize the file using youtube-dl
.
$ youtube-dl --cookies cookies.txt --skip-download "$any_youtube_video_url"
youtube-dl
will actually rewrite your cookies.txt, you will see # This file is generated by youtube-dl. Do not edit.
at the beginning of your cookies.txt
after that.
Note
|
This step is only needed if you are going to archive a private stream. |
You need to prepare the livechat archiver at this step.
Visit https://www.youtube.com/live_chat?v=$video_id
in your browser, after the page is loaded, open devtools (usually F12), go to the "network" tab, look up for any POST request whose URL is prefixed with ttps://www.youtube.com/youtubei/v1/live_chat/get_live_chat
, then copy the value of Cookie
of such request.
Edit pytchat/config/__init__.py
, add a field with key cookie
to headers
, and paste the cookie value there.
Run script streamlink-cookies.py
if you have prepared a cookies.txt
, otherwise you should replace it with vanilla streamlink
.
$ env TZ=UTC ./streamlink-cookie.py \
-o archive.ts \
-l trace \
--retry-streams 30 \
--hls-live-restart \
--hls-segment-threads 4 \
--hls-segment-attempts 20 \
--hls-playlist-reload-attempt 20 \
"$video_url" \
best \
|& tee streamlink.log
Tip
|
The log file at trace level is very important as it is the source to check your archive’s integrity. |
Note
|
If the stream is going to be unarchived, this will be the only chance you are able get the infomation. |
$ ./get_start_time.sh "$video_id"
$ env TZ=UTC ./chat-archive.py render "$video_id" "$exact_stream_start_timestamp"
$ # compress the raw json
$ gzip -9 "$video_id.json"
$ ./slchk.py streamlink.log |& tee slchk.log
If you see missing segments count: 0
, congrats.