A Year in the Desert of AEM: The Publish

You’re tired, thirsty and maybe a little cranky. I get it; Adobe Experience Manager can be a little frustrating and overwhelming. It might make you feel like you have been left in a desolate wasteland with nothing but your wits and a half empty—or half full—canteen. Never fear. With these articles you will have another tool in your belt to make it across the desert. Today we will cover the Publish server in AEM; and as mentioned in my introductory article, the Publish server is the workhorse of AEM. What do you mean you haven’t read my first two articles? The first article covers some basics about Adobe Experience Manager. The second article covers the Author server and its role within the AEM stack. I do sometimes reference back to those articles and It could be helpful to read them before moving on, but I won’t make you.

Just like the Author, the Publish also runs as a Java application; in fact, most of the information about setup and startup from the Author applies to the Publish as well (they even use the same jar). This means that the steps to get a Publish server up and running are going to feel like déjà vu. You will obviously want to use the installation run mode of publish, and the default port changes as well (Author :4502, Publish :4503). Other than those changes, you are looking at the same process. Let’s move on and learn more about the Publish instance.

The Desert Oasis – A Return to Familiar Ground
The Publish server is your return to an oasis during your journey as you’re well equipped for the task, having already set up the Author. The Publish server in the AEM stack is a render host and is the first way it differs from the Author instance. The Author pushes new content to the Publish server and it ingests that content. The Dispatcher sends requests to the Publish to be rendered and returned so it can cache and deliver them. The point I want to get across here is that the Publish is not starting these communications; instead it’s processing requests as they come in from the rest of the stack. Now, just because the Publish is zen regarding communication doesn’t mean it isn’t doing anything. In fact, most of the time the Publish will be the busiest of the two Java applications. It has to render all that code and process the logic you or the developers created and turn it into static content; i.e., HTML, CSS, JavaScript, etc. It also handles other tasks such as queries coming through the Dispatcher. The Publish is also where requests to flush the Dispatcher originate. These requests to the Dispatcher are normally only triggered after content has been ingested.

The second difference is that the Publish does not have the Authoring UI. As such, making changes directly to the Publish server should not be done. Normally I am not going to tell you how to manage your stack; instead I will give examples and suggestions. For this I am just going to be blunt and tell you—do not create anything directly on the Publish server. The Publish server has its purposes in the stack; manual configuration and changes is not one of them. Instead, changes or config settings should be created on Author and then replicated to Publish. Remembering this will help you avoid situations where content or configs are mysteriously overwritten. I am going to cover a little bit about configs and touch briefly on those run modes again a little later on.

Scaling for Your Journey’s Load
Having regrouped at the oasis, it’s time to determine if you will need to bring on other resources to help with the load. As the Publish server normally deals with a larger load than the Author, it’s nice that it’s also easier to scale than the Author. In the Introduction article there was a graphic that showed the standard Adobe Experience Manager architecture we employ here at Axis41. If you refer back to that image, you will see that we have two Publish servers, which could just as easily be changed to three, four, or more depending on the load and needs of your project. As long as you remember to create a Publish agent on your Author, you can keep more than one Publish up to date and ready to ingest and serve content. Besides scaling for load, the other reason we recommend running with two or more Publish/Dispatcher servers is for higher availability. This allows you to load balance across different geographic locations. It also allows you to perform maintenance on one of the Publish/Dispatcher lines without reducing availability for end users. This same technique could also be used to have a disaster recovery option available as a hot standby.

Note: Scaling Publish servers, or creating a hot standby, may include additional costs for licensing and you should check with your Sales Rep to make sure you are in compliance.

The Refetching Mirage
In my Author article, I talked a little bit about letting the Publish handle flush requests. There are really two types of flush agents available to you on the Publish—the flush and re-fetch agents. In simple terms, the flush agent sends a request to the Dispatcher to invalidate its cache, so on the next request the Dispatcher will get a new render from the Publish. The re-fetch agent sends a POST request to the Dispatcher that will delete the cache and then trigger an immediate re-cache event so that there is no extra wait time when that resource is next requested. The re-fetch agents sounds pretty great, however there is a cost associated with this agent. The major issue here is that the request made by the Dispatcher isn’t the same as a request made from a browser; this can cause irregularities when a page is cached. The re-fetch also triggers an immediate re-cache; if there are many resources being flushed, this can put a large amount of load on the Publish, creating a snowball effect and making the Publish slow down for all tasks.

The other agent you may see or want to use on Publish is the reverse replication agent. Remembering that the Publish is a render host, also means that it will not try to push content to Author, nor should it. This would appear to cause issues with user-generated content such as comments, reviews, etc., getting back to Author or the other Publish servers. AEM, therefore, has the option to enable reverse replication. The Publish stores user-generated content in a special outbox. The Author periodically requests from the Publish anything in the reverse replication outbox, and pulls that content in. The Author can then pass this content through an approval workflow, if needed, and replicate the user-generated content to the other Publish instances.

Publish Configs along Your Route
Now that we’ve talked about scaling your load and visited the refetching mirage, it’s time to cover what the last leg of your Publish journey will entail: configurations. We’ll begin by discussing configurations that normally you’ll only want configured on the Publish instance, or maybe you want them working differently on Publish. First, the way to make these configs only apply to Publish is by accessing those run modes we talked about in the Author article. Remember there are installation run modes, two of which are Author and Publish. This means you can use these run modes to identify which servers to apply the configs to. Adobe Experience Manager has a feature where if you place a directory named config. under the app/project directory, it will automagically apply any configs in that directory only if the run modes match. Armed with this knowledge, you can take it one step further and use multiple run modes, such as config.publish.production, which will only apply to a Publish server also using the production run mode.

So if you have settings that only apply to production, such as mail server settings for contact forms, then you can make sure that only applies in the right situations. Creating these config nodes can be a little bit of a challenge, but luckily for you there is a post right here on AEM Podcast that covers this process.

“Etc” Map Rules to Your Destination
The final configurations that you normally apply only to Publish are etc map rules, so named because of the default path by which they are found in AEM: /etc/map. Think of them as markers along your desert map.

These configurations are officially called Resource Resolver Mapping, and they allow you to rewrite hrefs on pages to strip out paths that you may not want exposed to an end user. A classic example is removing /content/sitename/en from links for SEO as well as security purposes. I am not really going to cover all the specifics of setting up etc map rules; instead, I will cover how to make etc map rules only apply in the same way that other configs do—by leveraging a sling OSGI configuration node inside of a run mode named config directory.

You will need to have a config node named “org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl.sitename” created under config.publish.

Next you will need to add or modify a property on that JcrResourceResolver config, resource.resolver.map.location. Set it as a string, and then for the value enter a unique etc map path; I would suggest something like /etc/map.publish so as not to confuse it with anything else. What this does is tell the Resource Resolver that it should load its rules from the directory you defined.

You can then set up your etc map rules under that same directory and they will only apply to Publish servers. You can also take this one step further and create the JcrResourceResolver config under config.publish.production and create a map location of /etc/map.publish.production and have rules that apply only to a server that is both Publish and production.

Like the Author article, we really only covered the basics of the Publish server and its configurations. I hope this has shed some light into how the Publish server works and some of the customizations you may need or want to set up. Our next, and final, article will cover the DIspatcher; which for some may be the most frustrating of the servers to set up and configure. Make sure you subscribe and come back as we finish our journey through the desert of AEM, and finally make it out.