Working with Resources and Pages

I’m currently in the process of migrating many thousands of pages out of OrchestraCMS, a Salesforce-native CMS, and into AEM for my client.  In doing so, I’ve had to do some extensive work with Pages and Resource and thought I’d share some lesser-known intricacies of doing so.  While the contents of this post will lean on this migration activity for examples, it does so only as a platform with which to discuss said intricacies around working with Pages and Resources.

Generally-speaking, I’m a huge fan of doing things with the Sling API whenever possible.  Sometimes it’s best, though, to bring in another API as a means to an end.  In our specific case, the first part of the code uses some available Salesforce and OrchestraCMS APIs to bring over the many thousands of pages into AEM as cq:Pages with associated cq:PageContent children.  Once the pages are over, we then begin the work of cleaning things up.  In our case, we need to take a peek at some page properties we instantiated during the migration – namely a String named “published” that will have either a String value of “true” or “false”.  For those wondering, this property refers to the published state of the content over in Salesforce and since it comes over as a String I didn’t bother with to cast to Boolean before setting the page property as its existence is temporary.  The idea here was to write a method that traverses the tree of pages for the site, checks this “published” property and, if “false”, deletes that page and all descendants.  The astute reader might wonder why we didn’t just filter out unpublished content while querying in the first place.  While that would have been nice, there are some complexities around the combination of version history and published state make this fairly difficult (and expensive) to do at query time so we opted to just bring it all over and clean things up on the AEM side.

To start, the following method that traverses a tree of Resources was conceived:

You might immediately notice, as I did after writing it, that this fails to account for cq:PageContent nodes and that, upon finding a page property named “published” with a value of “false” would simply delete that page’s jcr:content node and its descendant content which is definitely not what we’re after here.  Let’s fix that:

So this is closer to what we want.  Here we’re instead using the WCM API for PageManager so we can use its delete method.  Delete accepts the Page to delete, whether or not you want the deletion to be shallow (we want it to be deep) and whether or not we want the changes to be automatically committed (we do).  Upon running the code, however, I noticed something interesting in the logs:  PageManager.delete actually triggers replication for every page deleted!  While this is perfectly reasonable were we to be deleting pages in this way for a live site, we’re simply trying to migrate all this content into an author instance and then clean it up after.  We certainly wouldn’t want to trigger all these replication requests in production so let’s update the code:

So here we’re using the PageManager for ease of working with Pages and their associated page content but then using the Sling API’s ResourceResolver to handle the deletion.  Deleting the content in this way does not trigger replication although it will cause AEM’s targeting engine to notice.  Since none of the content I’m migrating is yet part of a targeting campaign, it’s of no issue.

With the content I want removed, in fact, removed, I can now focus on the next task: combining orphaned page trees with their true parent pages.  As with the previous examples, the use case isn’t all that important here as it’s unique to the way my client has the content we’re migrating organized.  What is important is some intricacies around moving this content and cleaning up afterwards.  First and foremost, if you’re thinking of using PageManager’s move method, know that the destination path needs to be the actual full path to the newly-created Page and not to the parent Page of where you want to move it.  Secondly, keep in mind that moving a page in this way will trigger not just an activation of the page after moving it but a delete request for the original path.  Lastly, take care to make sure any LiveRelationships the page(s) might be involved in are updated after the move.