Skip to content

Instantly share code, notes, and snippets.

@benchonaut
Last active August 30, 2024 14:33
Show Gist options
  • Save benchonaut/79dfe515e624ad2a2e61792795f25ddd to your computer and use it in GitHub Desktop.
Save benchonaut/79dfe515e624ad2a2e61792795f25ddd to your computer and use it in GitHub Desktop.
firefox-restore-jsonlz4-dump

FckUfox ( Firefox ) session jsonlz4 recovery from sessionstore-backup

note: Firefox will be subsequently called FckUfox in this doc

edit 2022: TRY TO AVOID FCKUFOX AS MUCH AS POSSIBLE ; FCKUFOX WILL waste

if any firefox accountable person ever reads this:

congrats , over the years a widely accepted open sourced product became a complete nightmare ,

  • forcing people to use AppImages of firefox rip-offs that are rarely updated ,
  • did not even whitelist i2p and onion domains ( so everybody is googling this domains until they set browser.fixup.domainwhitelist.i2p browser.fixup.domainwhitelist.onion network.dns.blockDotOnion in about:config )
  • has default telemetry
  • does not even ask if it is okay to pull data from up to 50 Firefox-internal domains ( etc etc)

session-storage is robust , but not too much and the resulting jsonlz4 might be nested ( leaving about:sessionrestore open in another restored session) yielding monster jsonlz4 ( 80 Mbyte jsonlz4 is ~ 300Mbyte+ ) ..

since firefox sometimes refuses to eat this files ( either through the method of enabling/disabling "restore previous session" in settings and putting the file in PROFILE/previous.jsonlz4 or through killing a running instance and placing the file in PROFILE/sessionstore-backups/recovery.[jsonlz4|baklz4] ) the method is to get all urls deduplicated from that json.

the only status you might see is pv ( how many uncompressed Mbytes flow per input file )

#### ATTENTION: machine load ahead , keep 1Gbyte+ RAM free for jsonlz4 sizes over 50Mbyte)

HowTo

  • kill firefox
  • find the PROFILE folder (e.g. on linux: ~.mozilla/firefox/b3efc4fe.default )
  • install: pip python pv jq awk
  • get https://github.com/russellballestrini/nested-lookup/ somehow
  • copy the PROFILE/sessionstore-backup folder content into another one
  • go to that folder (e.g. cd sessionstore-backup-extraction)
  • softlink (ln -s /where/this/repo/is/*.py ./) the python files
for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]'  ;done |awk '!x[$0]++'  |grep -v '^""$' |grep -v ^$  > sessionsave-urls.txt
  • all your urls are now in the sessionsave-urls and (also about: and data:image and some strange things like \000 )

  • you might import it with URLs List https://addons.mozilla.org/en-US/firefox/addon/urls-list/ , Tab-List https://addons.mozilla.org/de/firefox/addon/tab-list/ and sessionbuddy(chrome)

  • you might filter it with stuff like

    • cat sessionsave-urls.txt |grep -e file: -e ftp -e http
    • cat sessionsave-urls.txt|grep -v -e '^"data:image' -e '^"about:preferenc' -e '^"about:newtab' -e '^"about:sesttings' -e '^"about:addons' -e '^"about:logins' -e http://192.168 -e https://192.168
  • still too much ? → use tagging

    • mkdir tags ;for tag in graylog ;do grep $tag sessionsave-urls.txt.filtered.txt > tags/$tag;grep -v $tag sessionsave-urls.txt.filtered.txt >> sessionsave-urls.txt.filtered.txt.tmp;cat sessionsave-urls.txt.filtered.txt.tmp>sessionsave-urls.txt.filtered.txt;rm sessionsave-urls.txt.filtered.txt.tmp;done
  • of course there is a forensic approach as well ...


  • what exactly is that alien line doing ?
##↓ process all jsonlz4 files↓ #####  ↓ uncompress it ↓↓ ## status (pv)##↓↓flatten json/get url↓↓#↓↓1 per line↓#####↓↓deduplicate↓↓### ↓↓ no empty lines ↓↓###### ↓↓ save result ##
for dest in *jsonlz4* *baklz4 ;do python mozlz4.py -d < $dest |pv |python unnest-firefox-json.py |jq -c '.[]'  ;done |awk '!x[$0]++'  |grep -v '^""$' |grep -v ^$  > sessionsave-urls.txt

END


first variant/playground/testing space:

sessionCheckpoints.json

## extract the inputs
for dest in *jsonlz4 *baklz4 ;do python unlz4.py -d < $dest >$dest.jsontmp ;done;



##then we build ONE large file containing SINGLE LINE ENTRIES
#for file in *.jsontmp;do cat $file |jq -c '.[]' ;done |awk '!x[$0]++'  > deduped.json
## remove tmp files
#rm *.jsontmp

## beware of jq/python json max depth , the following lines do not recover all data
#for lines in $(seq 1 $(cat deduped.json|wc -l ));do 
#	head -n${lines} deduped.json |tail -n1|sed 's/<stripped: exceeds max depth>/"<stripped: exceeds max depth>"/g' > oneline$lines;
#   done

#grep '"tabs":' oneline* -l |while read tabfile;do  cat $tabfile |jq .  > pretty.${tabfile}.json ;done

#!/usr/bin/env python
from sys import stdin, stdout, argv, stderr
import os
try:
import lz4.block as lz4
except ImportError:
import lz4
stdin = os.fdopen(stdin.fileno(), 'rb')
stdout = os.fdopen(stdout.fileno(), 'wb')
if argv[1:] == ['-c']:
stdout.write(b'mozLz40\0' + lz4.compress(stdin.read()))
elif argv[1:] == ['-d']:
assert stdin.read(8) == b'mozLz40\0'
stdout.write(lz4.decompress(stdin.read()))
else:
stderr.write('Usage: %s -c|-d < infile > outfile\n' % argv[0])
stderr.write('Compress or decompress Mozilla-flavor LZ4 files.\n\n')
stderr.write('Examples:\n')
stderr.write('\t%s -d < infile.json.mozlz4 > outfile.json\n' % argv[0])
stderr.write('\t%s -c < infile.json > outfile.json.mozlz4\n' % argv[0])
exit(1)
from sys import stdin, stdout, argv, stderr
import os
import json
from nested_lookup import nested_lookup
##sourced here https://github.com/russellballestrini/nested-lookup/
#import pprint
stdin = os.fdopen(stdin.fileno(), 'rb')
f=stdin.read()
#with open('json_sample.txt', 'r') as f:
data = json.loads(f)
results = nested_lookup(
key = 'url',
document = data,
wild = True
)
results= list( dict.fromkeys(results) )
print json.dumps(results)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment