I ran the migrate_package_activity.py (ckan/ckan#3972) on a real existing CKAN database to get a sense of the time taken and the change in database size
Timing: Migration took 49s for 236 datasets. If you scale that linearly to 900k datasets (europeandataportal.eu) then you're talking 1 hour. There's likely to be more activities per dataset though, so it would be more like several hours. Seems reasonable to want to do the migration while the site is up.
Size: size of the activity table went up about 20%, because every revision we now store resources, extras and tags on each package. However the overall size came down when we deleted the content of the activity_detail table (which is no longer needed), because that had a quirk that when you add a tag, it stored the dictized package as well in that table - this seems like a mistake.
Test database properties:
- 4 years old (used by OpenGov developers)
- 236 public datasets (and 1263 private ones, without activity)
- 1499 activities - an average of 6 activities per dataset
- 16MB SQL dump
- Migrated from CKAN 2.7.3
Migration:
49s time taken 1499 activities, of which one dataset had 22 had missing revisions, from before it was deleted, (purged?) and recreated.
What was interesting was to compare the size of the data before and after migration.
Query: sudo -u postgres psql -c "SELECT pg_size_pretty( pg_total_relation_size('activity')) as activity, pg_size_pretty( pg_total_relation_size('activity_detail')) as activity_detail, pg_size_pretty( pg_database_size('ckan_default') ) as total;" ckan_default
Before:
activity | activity_detail | total
----------+-----------------+--------
15 MB | 124 MB | 186 MB
After (with a vacuum):
activity | activity_detail | total
----------+-----------------+-------
17 MB | 12 MB | 78 MB
activity_detail has 0 rows, but still occupies 12MB, even after the vacuum - a hangover, I guess.
This is what we got rid of from activity_detail:
select count(object_type), object_type from activity_detail group by object_type;
count | object_type
-------+--------------
21000 | PackageExtra
40302 | tag
38632 | Resource
5666 | Package
paster db clean -c $CKAN_INI sudo -u postgres pg_restore --clean --if-exists -d ckan_default < /vagrant/src/ckan/ckansandbox.dump sudo -u postgres psql ckan_default -c "delete from activity_detail ad where ad.object_id not in (select id from package);" # DELETE 105152 sudo -u postgres psql ckan_default -c "delete from activity a where a.object_id not in (select id from package);" # DELETE 9527 sudo -u postgres psql ckan_default -c "VACUUM;" sudo -u postgres psql -c "SELECT pg_size_pretty( pg_total_relation_size('activity')) as activity, pg_size_pretty( pg_total_relation_size('activity_detail')) as activity_detail, pg_size_pretty( pg_database_size('ckan_default') ) as total;" ckan_default python ckan/migration/migrate_package_activity.py -c $CKAN_INI --delete yes > ckansandbox.migrate2 sudo -u postgres psql ckan_default -c "VACUUM;" sudo -u postgres psql -c "SELECT pg_size_pretty( pg_total_relation_size('activity')) as activity, pg_size_pretty( pg_total_relation_size('activity_detail')) as activity_detail, pg_size_pretty( pg_database_size('ckan_default') ) as total;" ckan_default paster search-index rebuild -r --config=/etc/ckan/default/development.ini paster --plugin=ckan user add admin password=password apikey=adminkey email=[email protected] paster --plugin=ckan sysadmin add admin