https://asciinema.org/a/FqmauknkDWf8eIXHbyd77aJct
install rat https://github.com/ericfreese/rat
go get github.com/ericfreese/rat
install warcio and warctools
https://asciinema.org/a/FqmauknkDWf8eIXHbyd77aJct
install rat https://github.com/ericfreese/rat
go get github.com/ericfreese/rat
install warcio and warctools
ffmpeg -i http://video.radioradicale.it/store-83/_definst_/mp3:roma/2016/12/MP909393.mp3/playlist.m3u8 -c copy archivi.mp3 |
➜ tika -m -j http://www.anvur.org/rapporto-2016/files/Area01/VQR2011-2014_Area01_Tabelle.pdf | jq . | |
ERROR OpenType Layout tables used in font ABCDEE+Cambria,Bold are not implemented in PDFBox and will be ignored | |
ERROR OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored | |
ERROR OpenType Layout tables used in font Times New Roman,BoldItalic are not implemented in PDFBox and will be ignored | |
ERROR OpenType Layout tables used in font Times New Roman,Italic are not implemented in PDFBox and will be ignored | |
ERROR OpenType Layout tables used in font ABCDEE+Calibri,BoldItalic are not implemented in PDFBox and will be ignored | |
{ | |
"Author": "Andrea Gordiani", | |
"Content-Length": "9651618", | |
"Content-Type": "application/pdf", |
➜ ~ pip install wdmapper | |
➜ ~ wdmapper get P2748 P3266 | |
#FORMAT: BEACON | |
#NAME: LocFDD ID | |
#DESCRIPTION: Mapping from PRONOM file format identifiers to LocFDD IDs | |
#PREFIX: https://www.nationalarchives.gov.uk/pronom/ | |
#TARGET: http://www.loc.gov/preservation/digital/formats/fdd/{ID}.shtml | |
#SOURCESET: http://www.wikidata.org/entity/Q235557 | |
#TARGETSET: http://www.wikidata.org/entity/Q235557 |
A_Blok_1917 | |
AgentVA_1917 | |
Bolsheviks_1917 | |
BritishEmb1917 | |
Bublikov_1917 | |
CaptZeitlin1917 | |
DukeMikhail1917 | |
EmpressAlix1917 | |
ErvinGrimm_1917 | |
FarmAndrey_1917 |
install chrome-har-capture
~ npm install -g chrome-har-capturer
install Chrome Canary
~ /Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --remote-debugging-port=9222 --headless
capture har
➜ ~ /Applications/webrecorderplayer-electron.app/Contents/Resources/app/python-binaries/webrecorder {file.warc} |
# https://github.com/internetarchive/warctools | |
~ warcfilter -H video/mp4 original.warc.gz > video.warc | |
# https://github.com/chfoo/warcat | |
~ python3 -m warcat extract video.warc --output-dir ./videos --progress | |
(venv) ➜ twarc git:(master) ✗ pyinstaller --clean --hidden-import urllib3 --hidden-import queue --onefile twarc.py | |
(venv) ➜ twarc git:(master) ✗ ls -lah dist/twarc | |
-rwxr-xr-x 1 raffaele 5.4M Jan 4 21:01 dist/twarc | |
(venv) ➜ twarc git:(master) ✗ file dist/twarc | |
dist/twarc: Mach-O 64-bit executable x86_64 | |
./dist/twarc --help | |
usage: twarc [-h] [--log LOG] [--consumer_key CONSUMER_KEY] |
package EPrints::Plugin::Export::DEPOSITOLEGALE; | |
# eprint needs magic documents field | |
# documents needs magic files field | |
use EPrints::Plugin::Export::XMLFile; | |
@ISA = ( "EPrints::Plugin::Export::DIDL" ); |