Skip to content

Instantly share code, notes, and snippets.

View atomotic's full-sized avatar

raffaele messuti atomotic

View GitHub Profile
@atomotic
atomotic / readme.md
Last active October 19, 2017 04:29
poor man's WARC viewer
@atomotic
atomotic / gist:6821aded25864548dd4a578fbe1fa8b1
Created October 9, 2017 08:37
ffmpeg to save radioradicale stream
ffmpeg -i http://video.radioradicale.it/store-83/_definst_/mp3:roma/2016/12/MP909393.mp3/playlist.m3u8 -c copy archivi.mp3
➜ tika -m -j http://www.anvur.org/rapporto-2016/files/Area01/VQR2011-2014_Area01_Tabelle.pdf | jq .
ERROR OpenType Layout tables used in font ABCDEE+Cambria,Bold are not implemented in PDFBox and will be ignored
ERROR OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
ERROR OpenType Layout tables used in font Times New Roman,BoldItalic are not implemented in PDFBox and will be ignored
ERROR OpenType Layout tables used in font Times New Roman,Italic are not implemented in PDFBox and will be ignored
ERROR OpenType Layout tables used in font ABCDEE+Calibri,BoldItalic are not implemented in PDFBox and will be ignored
{
"Author": "Andrea Gordiani",
"Content-Length": "9651618",
"Content-Type": "application/pdf",
➜ ~ pip install wdmapper
➜ ~ wdmapper get P2748 P3266
#FORMAT: BEACON
#NAME: LocFDD ID
#DESCRIPTION: Mapping from PRONOM file format identifiers to LocFDD IDs
#PREFIX: https://www.nationalarchives.gov.uk/pronom/
#TARGET: http://www.loc.gov/preservation/digital/formats/fdd/{ID}.shtml
#SOURCESET: http://www.wikidata.org/entity/Q235557
#TARGETSET: http://www.wikidata.org/entity/Q235557
A_Blok_1917
AgentVA_1917
Bolsheviks_1917
BritishEmb1917
Bublikov_1917
CaptZeitlin1917
DukeMikhail1917
EmpressAlix1917
ErvinGrimm_1917
FarmAndrey_1917
@atomotic
atomotic / readme.md
Created April 13, 2017 15:21
chrome headless: capture har and replay with webrecorderplayer
  1. install chrome-har-capture

     ~ npm install -g chrome-har-capturer
    
  2. install Chrome Canary

     ~ /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --remote-debugging-port=9222 --headless
    
  3. capture har

➜ ~ /Applications/webrecorderplayer-electron.app/Contents/Resources/app/python-binaries/webrecorder {file.warc}
# https://github.com/internetarchive/warctools
~ warcfilter -H video/mp4 original.warc.gz > video.warc
# https://github.com/chfoo/warcat
~ python3 -m warcat extract video.warc --output-dir ./videos --progress
(venv) ➜ twarc git:(master) ✗ pyinstaller --clean --hidden-import urllib3 --hidden-import queue --onefile twarc.py
(venv) ➜ twarc git:(master) ✗ ls -lah dist/twarc
-rwxr-xr-x 1 raffaele 5.4M Jan 4 21:01 dist/twarc
(venv) ➜ twarc git:(master) ✗ file dist/twarc
dist/twarc: Mach-O 64-bit executable x86_64
./dist/twarc --help
usage: twarc [-h] [--log LOG] [--consumer_key CONSUMER_KEY]
package EPrints::Plugin::Export::DEPOSITOLEGALE;
# eprint needs magic documents field
# documents needs magic files field
use EPrints::Plugin::Export::XMLFile;
@ISA = ( "EPrints::Plugin::Export::DIDL" );