Created
January 17, 2024 18:21
Get an anonimized database
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Run as: ./anonymize_database.sh my-database-production (database name) | |
set -e | |
# check if pganonymize is installed | |
if ! command -v pganonymize &> /dev/null | |
then | |
echo "pganonymize could not be found" | |
exit 1 | |
fi | |
# check if the pganonymize.yml file exists | |
if [ ! -f pganonymize.yml ]; then | |
echo "pganonymize.yml does not exist" | |
exit 1 | |
fi | |
# Get the database name from the first command line argument | |
DB_NAME=$1 | |
if [ -z "$DB_NAME" ]; then | |
echo "No database name provided" | |
exit 1 | |
fi | |
# dump the original database | |
pg_dump -Fc $DB_NAME > /tmp/$DB_NAME.dump | |
# create a new database for the anonymized data | |
createdb $DB_NAME-anonimized | |
# restore the original dump into the anonymized database | |
pg_restore -d $DB_NAME-anonimized /tmp/$DB_NAME.dump | |
# delete the dump file | |
rm /tmp/$DB_NAME.dump | |
# anonymize the anonymized database | |
pganonymize --schema=pganonymize.yml --dbname=$DB_NAME-anonimized | |
# dump the anonymized database | |
pg_dump -Fc $DB_NAME-anonimized > /tmp/$DB_NAME-anonimized.dump | |
# delete the anonymized database | |
dropdb $DB_NAME-anonimized | |
# /tmp/$DB_NAME-anonimized.dump must be deleted after the file was downloaded |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
tables: | |
- user_user: | |
primary_key: id | |
chunk_size: 5000 | |
fields: | |
- first_name: | |
provider: | |
name: fake.first_name | |
- last_name: | |
provider: | |
name: set | |
value: "Bar" | |
- email: | |
provider: | |
name: md5 | |
append: "@localhost.com" | |
truncate: | |
- django_session |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import io, os | |
def plug_cleaning_into_stream(stream, filename): | |
try: | |
closer = getattr(stream, 'close') | |
def new_closer(): | |
closer() | |
# removes the file when finishes | |
os.remove(filename) | |
setattr(stream, 'close', new_closer) | |
except: | |
raise | |
# path_to_file is /tmp/$DB_NAME-anonimized.dump | |
def send_file(request, path_to_file): | |
# Call the anonymize_database.sh inside python and use the DB_NAME from the settings as the argument | |
with io.open(path_to_file, 'rb') as ready_file: | |
plug_cleaning_into_stream(ready_file, path_to_file) | |
response = HttpResponse(ready_file.read(), content_type='application/force-download') | |
return response |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment