Solving the dissonance between AEM Dispatcher and Publish

Part of the beauty and attraction of Adobe Experience Manager is its system architecture—the Author, Publish, and Dispatch instances. Let’s look at a high-level overview of how these things work together. The Author instance allows content to be authored and updated easily and efficiently. The content is then activated and replicated to the Publish instance so web pages are always teeming with the newest content, ready for public consumption. The Dispatcher instance is a cached version of published content so when users hit a website, the content is readily available at increased speeds. The Publish server also lets the public-facing Dispatcher know when its content is out of date and needs to be re-cached on a content path. When a user visits a page with old, invalidated content, that page request will prompt the Dispatcher to fetch new content from the Publish instance and serve that content to the user as well as cache it. The next person to visit the page will receive a fast, updated web page.

It’s simple, right? Author new content –> activate –> publish updates –> Dispatcher is informed it has old content -> Dispatcher re-caches new content when page is visited. It really is a great system, but it isn’t without complex and interesting challenges, depending on the structure of your site and how content is being used.

The problem we faced:
What happens when you have content on one page relying on content from another page? In one particular project, we worked on a complicated site architecture where pages used information from other related pages to populate certain content. For example, Page A is related to Page B for specific content it displays. When the related Page B is updated and activated, AEM does its process and Page B is updated on the Publish instance and eventually gets updated on the Dispatcher. But Page A is relying on that content and does not know that Page B has been updated. Page A does not get re-cached or invalidated and displays old Page B content. There may be additional cases, where Page B refers to content from other pages as well. As you can imagine, this created a huge problem. This system of speed and accuracy was suddenly only fast and definitely not accurate. How did we get related content to stay updated?

Our solution:
We used a combination of an AEM workflow launcher and a custom Java class, implementing the WorkFlowProcess class. Let’s first look under the hood of how this site architecture works. The relationship is that Page A contains a property that refers to the path of Page B. There may also be several other pages: Page C, Page D, etc., that refer to and point to Page B for information and content. If we can collect all of these paths, we can do a request to the Dispatcher, telling it to invalidate and re-cache all of these pages. Whenever a page was activated, the workflow launcher kicked off the custom workflow and executed a series of JCR queries to check for the path of the recently activated page (Page B), which we got through the payload of the WorkItem Java class.

If Page B’s path existed in any properties in the JCR nodes of other pages, it would be related to the activated page. We conducted a check to see if Page B refers to any other additional pages. The site architecture allowed us to have an idea of which page component types to check for in our query, narrowing down where to look for relationships, but also largely relying on knowing where to look based on the page component type of the activated page. For example: one page component type—the Map Modal—will always refer to another page component type—the Location—that provides information about a physical location or an address. We knew if this address page was modified and activated, we needed to look in the pages that referenced the address pages.

We collected all the paths of pages that referred to Page B and all the pages that were referenced by Page B and did a request to the Dispatcher, telling it to invalidate all of those pages and then re-cache them.

Now, pages can be activated without worrying about the repercussions of pages relying on each other for their content! Even though it was a decent effort up front, this solution has been a big help to keep the site up to date and fresh, and has taken a big load off of the content authors.