I have read with interest the article post “AEM and UGC – A Practical Example” because I spent a lot of time in similar situations during my CQ/AEM professional life. The approach described in the article is what Adobe recommended with CQ 5.x: user generated content (UGC) posted on a Publish instance is reverse replicated to the Author and then forward replicated to all the other Publish instances.
I never loved this approach due to the big overhead that this solution involves. Let’s think about a simple “rating” application. A user rating is stored on a single Publish instance then the Author gets it with a polling request from the Publish outbox (reverse replication). Once replicated, a specific workflow on Author replicates it to all Publish instances. This approach works, and it was recommended by Adobe up to CQ 5.x. The following figure shows the many different operations required to have the rating stored and propagated to all instances.
I saw a similar implementation on a big banking website with a lot of ratings moving from Publish to Author and vice versa. The application soon became unstable, with replication queues busy dispatching lot of nodes. Moreover, the risk of inconsistencies on the different Publish servers was very high.
In such a scenario I always preferred to implement a different solution, with UGC stored directly by the Publish instance on an external SQL database. A single INSERT on a central store instead of the expensive sum of different HTTP requests, plus being sure to have the same information available to all Publish instances at the same time. Clearly some custom development on OSGi working with a datasource is required, but the improvements in terms of performance and Publish synchronization are evident.
Anyway, I have to admit that the database approach may be appropriate when user generated content is represented by many small pieces of information that don’t require approval or author management prior to publication. In this situation, it’s quite simple to store data on an SQL database. With more structured and complex UGC and/or when some sort of elaboration is required on Author, storing information directly on CRX may have a big advantage.
For example, one of our clients (Cucchiaio d’Argento – www.cucchiaio.it) asked us to implement a custom recipe (text + images) posted by the users. In this scenario we implemented a full reverse replication approach for the following reasons:
- Less user generated content to be managed
- Content is represented by full structured pages, with text + images. It was useful to have such content directly structured as a cq:PageContent on CRX
- Content required approval (and corrections) by authors before being republished, hence an author workflow was required
In conclusion, every time user generated content is required, we make a strong evaluation about the impact they may have on performance. When we have a lot of UGC we tend to use a centralized storage, normally a database shared between all Publish instances (and eventually the Author if moderation is required). When we have very little UGC, content can be structured with complex nodes and they require author management, the reverse replication approach may become more appropriate.
It’s important to note that with AEM 6.x, Social Communities architecture has changed a lot, moving toward a centralized storage based on MongoDb (or on Adobe Social Cloud). In this scenario, reverse replication can be avoided.
The approach is very similar to the database pattern described above. MongoDb is used only for nodes under /content/usergenerated, with the possibility to continue to use tarMK for the rest of the storage (as from Adobe best practices). The drawback of this solution is that you are required to setup a specific Mongo instance (that means at least two servers to have fail over) + a Solr index engine (two additional instances). Depending on the type of UGC you need to manage, all these configurations and maintenance may be too much effort.
Ignazio Locatelli is the CEO of CodeLand [www.codeland.it] (formerly DotLand) and has been working on CQ5/AEM from 2009. He’s involved in the coordination and design of big AEM projects in Europe ranging from banking applications to retail, from e-commerce to editorial platforms. As an AEM enthusiast, he loves to focus on performance and scalability topics.