WHERE IS THE DATA? SSH into {FIXME} while connected to ImageRive VPN (must be from windows machine) All data is is /merantix_core/data/hospitals/imagerive/export Anonymized reports in reports anonymized_dicoms/ export/cases_new.json export/patients_new.json
Normal Windows VPN connection. IP : 212.243.133.154 Protocol P2TP with IPSec and optional encryption (can also be called L2TP with key) L2TP key : DE70CABBDE7AC31F Login VPN : VPN.TEMP Pwd : ID2018/
SSH 192.168.5.126 merantix merantix
From WIndows
Unfixed: (re) De’cembre (re) fe’vrier New lines in json (make less readable)
Unanswered: What is hospitals/die-radiologie in codebase? How to get the left control key working on Jonas Probst window machine Imagerive Only Notes:
For Mess-Inheriting Developers First of all, do not expect my code to work on the first run. I am sorry. You will need to fix bugs, and hand check the results for at least 45 minutes. This took me like 50h and in hindsight it doesn’t seem that hard but lots of stuff can go wrong at ever turn.
End goal:
a directory called anonymized_dicoms/ filled with dycoms that have been anonymized by the code in anonymization.py (This was fairly easy)
a directory called anonymized_reports containing .txt files that have been anonymized and (represented communication between doctors in my case) need to be anonymized by adding regexes to Report.process_text
Steps and Hacks:
Run the dicom_receiver.py (it just sits there waiting for dicoms) and make sure the port is correct
This only worked outside of docker for me
Somebody will throw a bunch of dicoms at it and it will store them on the file system and make a thing called dicoms.json for you
If it runs out of space/breaks it will tell you
Run it with tee
in tmux and save the logs
changing things in directories.py (if want)
I made it so that there can be duplicate study_ids (following how Flo stored stuff)
Try running python projects/edison/hospitals/imagerive/export_process.py in docker image:
Fix errors
There will be some MatchingExceptions, I don’t know how to fix these but I am satisfied with 818 matched reports. Feel free to look into them.
There will be some empty report warnings, don’t know how to fix those…
Some warnings about duplicate study id
Make sure the patients.json is >= 4.6 mb , cases.json > 3mb and look at their contents
Cat /merantix_core/data/hospitals/telepaxx/export/reports/* > all_txt.txt then open that in vim
Names, dates, places, and ages need to be anonymized, as well as references to Imagerive
Search for strings like: madame, Madame, ['04/04', '21.12', '21/12', 'HS15', '06', '05', '01', '08','RUE DES MOULINS', 'Gen']
Hack: IR Only: Get_export_case_sort_key # HACK, visitation-pattern maybe misordered, but relative dates should be fine
Tips: If you dont have sudo access give up right now How much space is on the box (df -h) we ran out of space because 4000 dicoms * 50mb/dicom > than the 500 GB we had. You will need to copy each dicom so you need 2x+ as much as the dicoms you have France/Switzerland only: the \xe characters are french accents. They are very annoying. I still dont know how to type them install zsh and tmux on the box your connection will die a lot, and you will need to be comfortable in the shell you may b Don’t modify anything in place! Grep -Ril “madame” reports/* should reveal