- How to Rebuild a Pre-Existing AEM+Mongo Cluster
- or migrate from Tar to MongoDB or MongoDB to Tar"
- or migrate from 5.6.1 or 6.x to 6.x via data migration instead of jar upgrade
- (Mongo to Mongo only) Remove one replica from the replica set and delete/recreate the db
- Remove the replica node from the set: http://docs.mongodb.org/master/tutorial/remove-replica-set-member/
- Validate that no other nodes in the set consider that node to be part of the set anymore. Log into each node in the cluster via mongo shell and run rs.status() to validate that the node that was taken out is not in the cluster any longer.
- Drop the aem database on that node http://docs.mongodb.org/manual/reference/command/dropDatabase/
- Re-add the database with the correct user permissions (do not add the mongo instance back to the replica set)
- Remove one of the AEM cluster nodes from the load balancer or dispatcher config (whichever config is managing the active cluster nodes)
- Stop the AEM instance on the inactive node
- On the same server install a fresh AEM6.x instance pointed at the new emptied mongo database and install hotfixes (for AEM6.0 install Oak 1.0.27 or later, AEM6.1 install Oak 1.2.11 or later)
- Skip step 2 below
- Install a fresh AEM6.x instance and all the recent recommended hotfixes
- 6.0 - https://helpx.adobe.com/experience-manager/kb/aem6-available-hotfixes.html
- 6.1 - https://helpx.adobe.com/experience-manager/kb/aem61-available-hotfixes.html
- Optimize the instance using the configurations here - https://helpx.adobe.com/experience-manager/kb/performance-tuning-tips.html
- Prep the systems for data migration
- Prep source system if migrating from CQ5.x version to AEM6.x:
- Run consistency check and fix: https://helpx.adobe.com/experience-manager/kb/RepositoryInconsistency.html
- Remove Same Name Sibling nodes: https://helpx.adobe.com/experience-manager/kb/find-sns-nodes.html
- Prep the destination 6.x system:
- Apply any customizations to the AEM start script and restart AEM
- Disable workflow launchers for DAM by going to the launcher tab under workflow console url: http://host:port/libs/cq/workflow/content/console.html
- Disable all launchers that use "DAM Update Asset" and "DAM Metadata Writeback” workflow models.
- Disable OSGi components to reduce overhead.
- Install ACS Commons (on the newly installed environment): http://adobe-consulting-services.github.io/acs-aem-commons/features/osgi-disabler.html
- Disable these OSGi components using the osgi-disabler:
com.day.cq.dam.core.impl.event.DamEventAuditListener com.day.cq.replication.audit.ReplicationEventListener com.day.cq.wcm.core.impl.event.PageEventAuditListener com.day.cq.tagging.impl.TagValidatingEventListener com.day.cq.dam.core.impl.DamChangeEventListener
- (6.x to 6.x migration only) On the old/source instance, package out-of-the-box indexes that were disabled or modified and install those. Do not install custom indexes as it is faster to install them after. The objective here is to get any disabled indexes disabled.
- Go to CRXDE on the destination instance, browse to /oak:index/lucene and add a String[] property excludedPaths
excludedPaths=[/var, /jcr:system, /etc/workflow/instances]
- Perform repository maintenance
- Mongo (only) Run the "Revision Clean Up" task -
Since revision GC will only clean up revision that are older than 24 hours by default then to make it clean up more recent revisions do the following:
- Stop all AEM cluster nodes
- On each node, modify the DocumentNodeStore config file under crx-quickstart/install and set the versionGcMaxAgeInSecs property to 3600 which is 1 hour
- Start the leader AEM node
- Go to this url _/libs/granite/operations/content/maintenance/window.html/mnt/overlay/granite/operations/config/maintenance/granite_daily
- Start Revision GC
- Monitor the logs until it is done
- See here for relevant log messages https://gist.github.com/andrewmkhoury/39b69daf5a097b53937e
- Start the other AEM cluster nodes
- Tar (only) - Backup the segmentstore and run offline compaction
- Mongo (only) Run the "Revision Clean Up" task -
Since revision GC will only clean up revision that are older than 24 hours by default then to make it clean up more recent revisions do the following:
- Monitor the logs to validate when Revision GC completes successfully.
grep VersionGarbage error*
- Prep source system if migrating from CQ5.x version to AEM6.x:
- Migrate the data from the old AEM cluster node to the new one
- if there is not a lot of content in the system then use packages (10GB worth or less). Make sure to set the AC Handling mode on the package to Merge. That way when ACLs are migrated to the new system it won't remove out-of-the-box ACLs.
- Or use VLT service with instructions below (if you have greater than 5GB worth of content)
-
If the destination instance is AEM6.1 or later then do the following extra preparation:
- Make sure ACS Commons is installed (it should have been installed)
- Create a new ACL packager page using the ACS Commons ACL Packager: https://adobe-consulting-services.github.io/acs-aem-commons/features/acl-packager.html
- Have it package all ACLs under /content, /etc, and /apps
- Go to the package manager and edit the package definition. On the Advanced tab, set AC Handling mode to Merge.
-
Install the latest vlt-rcp bundle in the destination server http://repo1.maven.org/maven2/org/apache/jackrabbit/vault/org.apache.jackrabbit.vault.rcp/3.1.24/
- Instructions for vlt-rcp service here: https://github.com/apache/jackrabbit-filevault/blob/trunk/vault-doc/src/site/markdown/rcp.md
https://github.com/apache/jackrabbit-filevault/tree/trunk/vault-rcp Npte: If you are using CQ5.6.1 then install 5.6.1 Service Pack 2 first as it includes all the dependency bundles required to support vlt rcp. Download SP2 from here: https://helpx.adobe.com/experience-manager/kb/cq561-available-hotfixes.html
- Some automated scripts to start with: https://github.com/apache/jackrabbit-filevault/tree/trunk/vault-rcp/src/test/resources
- Alternatively, use VLT RCP UI here: http://adobe-consulting-services.github.io/acs-aem-tools/vlt-rcp.html
-
Use vlt-rcp to copy everything from /etc/tags
-
Use vlt-rcp to copy everything from /content/dam
-
Use vlt-rcp to copy each site under /content individually as well
- If possible run concurrent vlt copy tasks to make it faster
- Use vlt-rcp to copy any large content under other paths like /etc or /var that is required
- Install the ACL package created in the first step
- In CRXDe, go to the paths where the ACL package installed the ACLs. Use the "Access Control" tab to re-order the ACLs so any that override your custom ones are at the top of the list.
-
- Or leverage the Oak migration tool, this might be faster and supports version history migration while VLT does not: http://dev.day.com/content/ddc/en/gems/deep-dive-into-aem-upgrade-process.html Note: Alternatively it might be possible to use grabbit: https://github.com/TWCable/grabbit to perform the data migration instead.
- Migrate users and groups (If users were not imported automatically via LDAP)
- Follow steps here: https://helpx.adobe.com/experience-manager/kb/migrate-users-groups-ACLs.html
- Install the application and other required files
- Install the application including all configurations
- Package the replication agents from production, install the package on the new instance and disable the agents
- Package any extra items like /etc/designs and other items
- Validate any backend and third party integrations
- Install custom lucene property indexes, make sure all indexes have been applied and fully indexed
- Do testing (functional and load testing) to validate that the application works perfectly
- Create package using a search for pages that changed since the copy was done
- Install either one of these tools for packaging the changed content:
- Create package using a search for tags that changed since the copy was done. Use this query for tags (but change the date):
//element(*, cq:Tag)[@jcr:created > xs:dateTime('2015-09-16T00:00:00.000-05:00')]
- Create package using a search for assets that changed since the copy was done. Use this query for assets (but change the date):
//element(*, dam:AssetContent)[@jcr:lastModified > xs:dateTime('2015-09-16T00:00:00.000-05:00')]
- Create package using a search for pages that changed since the copy was done. Use this query for pages (but change the date):
//element(*,cq:PageContent)[@cq:lastModified >= xs:dateTime('2015-09-16T00:00:00.000-05:00')]
- Install all the packages to the AEM instance (install the tags first, then the assets then pages)
- Account for any other data that might of changed (for example, new users added to the system or other content)
- Remove the disabling of OSGi component config that we did in step 3 above to re-allow audit trails and tag validation.
- (Mongo to Mongo only) Remove another replica from the original mongo cluster, wipe out the database there and instead add it to the new replica from step A.
- Remove the replica node from the set: http://docs.mongodb.org/master/tutorial/remove-replica-set-member
- Validate that no other nodes in the set consider that node to be part of the set anymore
- Drop the aem database on that node http://docs.mongodb.org/manual/reference/command/dropDatabase/
- Add the node as a replica http://docs.mongodb.org/master/tutorial/expand-replica-set/
- Do testing (functional and load testing) to validate that the application works perfectly
- Validate that the environment is fully in sync and, if needed, repeat step D above with a later date / time as needed
- Perform repository maintenance
- Mongo (only) - Run the "Revision Clean Up" task -
Since revision GC will only clean up revision that are older than 24 hours by default then to make it clean up more recent revisions do the following:
- Stop all AEM cluster nodes
- On each node, modify the DocumentNodeStore config file under crx-quickstart/install and set the versionGcMaxAgeInSecs property to 3600 which is 1 hour
- Start the leader AEM node
- Go to this url _/libs/granite/operations/content/maintenance/window.html/mnt/overlay/granite/operations/config/maintenance/granite_daily
- Start Revision GC
- Monitor the logs until it is done
- See here for relevant log messages https://gist.github.com/andrewmkhoury/39b69daf5a097b53937e
- Start the other AEM cluster nodes
- Tar (only) - Backup the segmentstore and run offline compaction
- Mongo (only) - Run the "Revision Clean Up" task -
Since revision GC will only clean up revision that are older than 24 hours by default then to make it clean up more recent revisions do the following:
- Take a short outage window and do the cut over to the environment by simply clearing the dispatcher cache and pointing it to the new environment. If no dispatcher is used, then point the load balancer to the new AEM environment instead of the old.
Alternative upgrade approach for CQ5.x to AEM6.x upgrade using Oak level crx2oak tool documented here: http://dev.day.com/content/ddc/en/gems/deep-dive-into-aem-upgrade-process.html
WARNING: the crx2oak (aka oak-upgrade) tool will NOT migrate namespaces that are required for the nodes and properties.
@alexkli and @akrivitzky thanks for the updates. I have updated the gist.