note: Firefox will be subsequently called FckUfox in this doc
- your SSD/HDD https://www.servethehome.com/firefox-is-eating-your-ssd-here-is-how-to-fix-it/
- your privacy since it calls ~50 domains UNASKED
- your nerves, since ~ FckUFx v100 it silenty eats your about:sessionrestore when killed with e.g.
killall -9 fckufox
congrats , over the years a widely accepted open sourced product became a complete nightmare ,
- forcing people to use AppImages of firefox rip-offs that are rarely updated ,
- did not even whitelist i2p and onion domains ( so everybody is googling this domains until they set browser.fixup.domainwhitelist.i2p browser.fixup.domainwhitelist.onion network.dns.blockDotOnion in about:config )
- has default telemetry
- does not even ask if it is okay to pull data from up to 50 Firefox-internal domains ( etc etc)
session-storage is robust , but not too much and the resulting jsonlz4 might be nested ( leaving about:sessionrestore open in another restored session) yielding monster jsonlz4 ( 80 Mbyte jsonlz4 is ~ 300Mbyte+ ) ..
since firefox sometimes refuses to eat this files ( either through the method of enabling/disabling "restore previous session" in settings and putting the file in PROFILE/previous.jsonlz4 or through killing a running instance and placing the file in PROFILE/sessionstore-backups/recovery.[jsonlz4|baklz4] ) the method is to get all urls deduplicated from that json.
the only status you might see is pv ( how many uncompressed Mbytes flow per input file )
- kill firefox
- find the PROFILE folder (e.g. on linux:
~.mozilla/firefox/b3efc4fe.default
) - install: pip python pv jq awk
- get https://github.com/russellballestrini/nested-lookup/ somehow
- copy the PROFILE/sessionstore-backup folder content into another one
- go to that folder (e.g.
cd sessionstore-backup-extraction
) - softlink (
ln -s /where/this/repo/is/*.py ./
) the python files
for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]' ;done |awk '!x[$0]++' |grep -v '^""$' |grep -v ^$ > sessionsave-urls.txt
-
all your urls are now in the sessionsave-urls and (also about: and data:image and some strange things like \000 )
-
you might import it with URLs List https://addons.mozilla.org/en-US/firefox/addon/urls-list/ , Tab-List https://addons.mozilla.org/de/firefox/addon/tab-list/ and sessionbuddy(chrome)
-
you might filter it with stuff like
-
cat sessionsave-urls.txt |grep -e file: -e ftp -e http
-
cat sessionsave-urls.txt|grep -v -e '^"data:image' -e '^"about:preferenc' -e '^"about:newtab' -e '^"about:sesttings' -e '^"about:addons' -e '^"about:logins' -e http://192.168 -e https://192.168
-
still too much ? → use tagging
-
mkdir tags ;for tag in graylog ;do grep $tag sessionsave-urls.txt.filtered.txt > tags/$tag;grep -v $tag sessionsave-urls.txt.filtered.txt >> sessionsave-urls.txt.filtered.txt.tmp;cat sessionsave-urls.txt.filtered.txt.tmp>sessionsave-urls.txt.filtered.txt;rm sessionsave-urls.txt.filtered.txt.tmp;done
-
of course there is a forensic approach as well ...
- what exactly is that alien line doing ?
##↓ process all jsonlz4 files↓ ##### ↓ uncompress it ↓↓ ## status (pv)##↓↓flatten json/get url↓↓#↓↓1 per line↓#####↓↓deduplicate↓↓### ↓↓ no empty lines ↓↓###### ↓↓ save result ##
for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]' ;done |awk '!x[$0]++' |grep -v '^""$' |grep -v ^$ > sessionsave-urls.txt
first variant/playground/testing space:
sessionCheckpoints.json
## extract the inputs
for dest in *jsonlz4 *baklz4 ;do python unlz4.py -d < $dest >$dest.jsontmp ;done;
##then we build ONE large file containing SINGLE LINE ENTRIES
#for file in *.jsontmp;do cat $file |jq -c '.[]' ;done |awk '!x[$0]++' > deduped.json
## remove tmp files
#rm *.jsontmp
## beware of jq/python json max depth , the following lines do not recover all data
#for lines in $(seq 1 $(cat deduped.json|wc -l ));do
# head -n${lines} deduped.json |tail -n1|sed 's/<stripped: exceeds max depth>/"<stripped: exceeds max depth>"/g' > oneline$lines;
# done
#grep '"tabs":' oneline* -l |while read tabfile;do cat $tabfile |jq . > pretty.${tabfile}.json ;done