Forked from Parler-Analysis/gist:2c023fd2e053fba5bc85b09209f606eb
Created
January 15, 2021 07:17
-
-
Save polynomial/a81d6ae02a6973a7705837101a836bfa to your computer and use it in GitHub Desktop.
Parler Data & Tools
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Data & Tools: | |
Many contributors. Thanks to all. | |
Contact: | |
[email protected] | |
IRC Channels: | |
#parlerparsers at https://webirc.hackint.org/ | |
#parlerparsers-video for video IDing | |
Please register your nick and at least take a vhost before joining to mask your IP. Using a VPN or Tor is recommended. | |
/msg NickServ register <yourpassword> <[email protected]> | |
/msg hostserv take hackint/user/$account | |
FBI Tips: | |
https://tips.fbi.gov/digitalmedia/aad18481a3e8f02 | |
Many dev efforts are being consolidated in: | |
https://github.com/ozywog/parler-data-tools | |
Open spreadsheet for listing notable video IDs: | |
https://docs.google.com/spreadsheets/d/1ThPUH5HgTcVKCoyfr2oJ21AWKTGq-dR-cRZjPOER-Q0/edit#gid=0 | |
Listing of videos: | |
tommycarsten.com/terrorism/index.html - Most videos posted from Capitol Hill on Jan6th | |
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw - More of the same, also avail on mega (see 10) | |
Various Maps, most are focused on Jan 6th: | |
kylemcdonald.net/parler/map/ | |
https://fortress.maptive.com/ver4/a3486a6ab9a9a12aa9a9cb067839079c/410491 | |
https://darthnithin.github.io/earth/index.html | |
https://parlervid.herokuapp.com/ | |
!!! VideoID can be added to url to download a video, ala https://parlervid.herokuapp.com/VIDEOID | |
Want to help but don't know how? | |
Download copies of data and scripts. rehost them elsewhere, and seed torrents. | |
Help make this file easier for other to understand. | |
Like-minded list with nice formatting - https://github.com/rljacobson/CapitolResources/ | |
Develop ways to make data easy to visualize and sort with current tools | |
Come ask in IRC about current efforts. | |
Tools & Resources | |
Latest resources at top | |
================================ | |
! Torrent of ~all videos from Parler's CDN - Said to contain more than the archive.org pull | |
Ongoing split torrent - recheck your chunks often | |
README FIRST: https://gist.github.com/shoghicp/714f590f3a175635b7a377905bd21ea4 | |
https://pl.gammaspectra.live/ | |
- Formatted jsons of parler posts from Jan 6th. | |
magnet:?xt=urn:btih:03b3250bcf3fc335d74605709f8e081929d2bda7&dn=parler_posts_json.zip&tr=http%3a%2f%2f128.199.70.66%3a5944%2fannounce&tr=udp%3a%2f%2f194.106.216.222%3a80%2fannounce | |
https://gofile.io/d/7TGoWj | |
- Usernames and posts, seperated from dataset | |
https://drive.google.com/file/d/1Lo4I2du5rGSKqPcrC_hnEvrLTufDqTpy/view?usp=sharing | |
- Pictures / Images | |
https://irc.gammaspectra.live/339648d275d2712b/imagelist.zip | |
List of all image filepaths from Internet Archive collection | |
https://par.pw/v1/photo?id=IMAGEID | |
Webtool to download images. You'll need other tools to get IDs | |
- Massive listing of Jan 6thmedia from across multiple socials: | |
https://capitol-hill-riots.s3.us-east-1.wasabisys.com/directory.html | |
Looks to be the same as the mega dump below, but easier to grab from. | |
https://mega.nz/folder/30MlkQib#RDOaGzmtFEHkxSYBaJSzVA | |
- Videos From DC Area, Jan 6th. Estimated to only be about 10% of what was available, at this moment | |
https://www.youtube.com/channel/UCZk6IiAVk2QwOdljEAYCPLw | |
https://mega.nz/file/Pkk2VSRT#x-Gnl1-FddGwHumBXAGsCJ2FL1VHE-Y-u2SFW48KpeQ | |
- 948 files from around DC area Jan5-Jan10, 2021 | |
magnet:?xt=urn:btih:387b8615beec9b506b4f448af0002cd3d651dd00&dn=geocoded&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce | |
- Script to extract images/videos from WARCs: | |
https://gist.github.com/redd-dedd/9a200a9ba789f312faf53b25ac63e024 | |
- JSON / CSV / KML Scrapes: | |
gofile.io is currently having issues | |
https://gofile.io/d/p8RxUC - CSV, with all non-zero lat/log from donk's josn | |
https://gofile.io/d/WVmqhR - quick 'n dirty KMLmade from the csv | |
https://gofile.io/d/DsUUte - KML of posts made 1/6/2020, DC Area Only | |
https://gofile.io/d/EJczW8 - CSV, Cleaned ver of 1/6 in DC | |
https://gofile.io/d/PUxeV4 - CSV, Cleaned ver of all available gettagged data | |
https://gofile.io/d/zKTsWr - list of videos taken with 100m of a LE or gov't building, all-time | |
- Script to scrape videos: video scraper: | |
https://github.com/darthnithin/parlervideoscraper | |
You will need gonk's metadata.tar.gz from to use this | |
- Script to generate a list of unique names and usernames then collect all the | |
posts and associate them with the person who posted them | |
Requires raw html source: | |
https://github.com/billstrobl/Prooter | |
https://github.com/billstrobl/Prooter/blob/master/prooter.py | |
- Magnet URI for torrent of file that contains 1.8 million texts scraped from | |
Parler and is subet of full data. Originally hosted on https://parler-archive.deadops.de/ | |
This is the parler_2020-01-06_posts-partial | |
magnet:?xt=urn:btih:FF29970B902657A32D561C0720E70FACFB8C4284&dn=parler_2020-01-06_posts-partial&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce&tr=udp%3a%2f%2ftracker.internetwarriors.net%3a1337%2fannounce | |
- Metadata json files with EXIF data on all MP4 videos scraped from Parler: | |
donk.sh/metadata.tar.gz | |
magnet:?xt=urn:btih:1723e27bc79186c4574ff056ddb458d771c26e2f&dn=metadata.tar.gz&tr=wss%3A%2F%2Ftracker.btorrent.xyz&tr=wss%3A%2F%2Ftracker.openwebtorrent.com&tr=udp%3A%2F%2Ftracker.leechers-paradise.org%3A6969&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337&tr=udp%3A%2F%2Fexplodie.org%3A6969&tr=udp%3A%2F%2 | |
SHA256: 66809d9ae0a5a6577a3c80bb623562274ceccd96b35519f15f568d09cefc56f8 metadata.tar.gz | |
=========== | |
- Needs to be sorted. | |
http://donk.sh/06d639b2-0252-4b1e-883b-f275eff7e792/ | |
https://web.archive.org/web/timemap/?url=https%3A%2F%2Fimage-cdn.parler.com%2F&matchType=prefix&collapse=urlkey&output=json&fl=original%2Cuniqcount&filter=!statuscode%3A%5B45%5D | |
https://irc.gammaspectra.live/eaa6fa678444b5f4/videos.txt | |
https://gist.github.com/kylemcdonald/8fdabd6526924012c1f5afe538d7dc09 | |
https://github.com/acanthias13/legendary-octo-guacamole - backup of Clean CSVs | |
=================================== | |
HOW TO VIEW WARC/ZSTD from ArchiveTeam's Parler scrape | |
# How to View Parler Archive "megawarc.warc.zst" files. | |
These are official zstd archive and warc standards. | |
They are uploading to: https://archive.org/details/archiveteam_neparlepas | |
$ tar -I zstd -xvf archive.tar.zst | |
===Old. | |
1. Install Python 3.7 | |
2. Execute: pip install zstandard==0.10.2 | |
3. Download archive from here: https://archive.org/details/archiveteam_neparlepas?tab=collection | |
4. Copy this script into a new file called xtract.py: https://hastebin.com/bugedubaxi.py | |
5. Execute: python ./xtract.py /path/to/parler_blahblah.megawarc.warc.zst > dict | |
6. Execute: zstd -d /path/to/parler_blahblah.megawarc.warc.zst -D dict | |
7. Import the decompressed parler_blahblah.megacarc.warc file into this tool: https://github.com/webrecorder/webrecorder-desktop | |
If you cannot install Python 3.7 for some reason, or just want a container, a dockerfile is available at: | |
https://gist.github.com/shoghicp/6ce05806ffc805929667ec2d4c62aba2 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Wow! I’m famous 🤩 /s