Sling Resource API vs. JCR API

Intro
Although many of us have been using both Sling and JCR API, few numbers are publicly available when it comes to comparing performance of both API’s. This subject was briefly touched in a presentation on Connect 2015 by Maciej Matuszewski.

This article will strive to highlight the performance of both API’s using simple test cases as well as a more complex business case. To protect the data of the client, some of the graphics will be blurred slightly.

For those of you who just want the cliff notes – you can skip to the Summary section. To all the others, have fun ploughing through it all ;).

Test setup & system specs
All tests are run on the same AEM 6.2 SP1 instance updated to Oak 1.4.11 on a MacBook Pro 15-inch Mid 2014 edition with the following specs.

  • Processor: 2.2 GHz Intel Core i7
  • Memory: 16 GB 1600 MHZ DDR3
  • Disk: 512GB PCIe-based flash storage

The AEM java process is limited to 6GB to run with. After each test, AEM is shutdown, tar files are compacted and AEM is rebooted to start the next test. This ensures numbers are comparable.

Test cases
First off, I decided to run the same snippets as Maciej to get a baseline. Instead of doing only a single run, I did multiple iterations, and calculated an average. By doing multiple iterations I just wanted to see if there are some notable improvements when performing something on a larger scale. By using averages, we mitigate the effect other unrelated actions on my system might have on the test results.

The code snippets are run through groovy and omit proper error handling for readability.

Test 1: Traversing /content/geometrixx tree
Description: I’m traversing the /content/geometrixx tree using both API’s and essentially count the number of nodes. Geometrixx consists out of 905 nodes. I scale the amount of iterations up during the test to arrive at a more consistent average per iteration.

JCR code snippet:

Sling code snippet:

Results:

This result illustrates that the JCR API is roughly twice as fast as the Sling API when traversing a tree. To be sure that number of nodes has no effect, I applied the same scripts to a tree containing 30 million nodes and the outcome remained the same.

Test 2: Creating simple pages (single save)
Description: This test is all about creating pages consisting only of the page node and the jcr:content node. Only when all pages have been created I save the pending changes.

JCR code snippet:

Sling code snippet:

Result:

You can read this chart as follows. When creating a 1000 simple pages and saving them in one go, it takes respectively 0.20 and 0.26 ms for the JCR and Sling API to create a single page. Creating 1000 pages took 200 and 260 ms respectively for JCR and Sling in total. That’s blazingly fast!

To my own surprise, writing nodes seems to go a lot faster than traversing them. And just as surprising, Sling isn’t all that much slower than JCR. I expected a similar gap like in Test 1, but instead, the inter API results are extremely close. We are talking about only a few hundreds of a millisecond difference between both API’s. When creating 3000 pages, the total time difference between both scripts is a mere 210 ms.

Test 3: Creating simple pages (save each page)
Description: This test is all about creating pages consisting only of the page node and the jcr:content node. But this time, we save each page separately inside the for loop.

JCR code snippet:

Sling code snippet:

Result:

Saving each page separately dramatically increases the average creation time. We all know that saving every few thousand changes increases performance, but it was actually unknown to me (and possibly also to you) by how much. Depending on what series you’re comparing we’re between a factor of 16 to 20 slower than when saving only once. Creating 3000 pages now takes 15 seconds instead of 210 milliseconds.

Outlining the business case
The previous tests are a great start for a discussion. However, they lack the variations that come naturally with real data. That’s why I’m also going to time the creation of pages using real business data.

The business case is taken directly from one of our clients. The test uses actual production data. The outline is simple, import a bunch of XML files into AEM as ‘source data’ pages which serve as the foundation to manage product pages in a bunch of sites.

The set of XML files we start with represent ‘product’ and ‘product range’ data. Each product is linked to a product range. This implies that there is an n-to-1 mapping to ‘source data’ pages in AEM. Only product range XML’s translate to a page, and products translate to nodes under said page. Each file is linked to a specific locale, this is reflected in AEM as well.

To clarify, I’ve added a screenshot of an imported product range page in AEM. Data from the product range XML is placed under the product-range-data node, while data from actual products are placed under the products/[product-name] node

I will focus only on the import process which creates pages in AEM as the rest of the business case is not relevant for this article.

Explaining the import process
The XML files are placed in the ‘ready’ folder on the server by a system of the client. This folder is polled at regular intervals by AEM. Once one or more files are detected, AEM creates a sling job for each of those files and then moves the XML to the ‘processing’ folder. Next, processing kicks in. This again happens in a separate sling job. During the processing phase, the XML file is read, mapped to Java models, restructured, and finally used to update/create pages in AEM. Once processing has completed, AEM moves the file to the ‘done’ folder. This is clarified with the image below. This strategy allows us to report on import progress during large scale imports.

Let’s import!
The data set I’m importing is the entire en_US XML data collection (14400 files). The file sizes range between 10KB and 266KB each. In total 1.2 GB of XML data is imported. This translates to 3322 pages in AEM with a total of 3.311.344 nodes containing 9.600.101 properties. On average, 996 nodes and 2890 properties per page. Some of course have less, others of course more.

I’ve done 3 imports using each API. Results are shown in the tables below. We are only interested in the create page column. But the import does have some overhead in parsing the XML files, mapping them to java models and restructuring them before being able to actually create/update the page and before moving the file to the ‘done’ folder on disk.

JCR File parsing Java modelling Create page Move file
Run 1 (ms) 1.97 1.03 200.79 2.04
Run 2 (ms) 2.20 0.98 188.44 2.01
Run 3 (ms) 2.10 1.03 204.72 2.10
Average (ms) 2.09 1.01 197.98 2.05

 

Sling File parsing Java modelling Create page Move file
Run 1 (ms) 2.01 1.10 237.43 2.25
Run 2 (ms) 2.46 1.29 265.90 3.08
Run 3 (ms) 2.02 1.12 254.38 2.10
Average (ms) 2.16 1.17 252.57 2.73

While averages give a decent view on the actual import situation, it’s also interesting to look at some of the min and max times I recorded. I’m only showing an example using the JCR API during import, but the same is true for the Sling based import.

JCR File parsing Java modelling Create page Move file
Min time (ms) 0.33 0.09 6.87 0.19
Max time (ms) 166.88 41.14 36102.19 390.38

The big gap between min and max times begs the question: why? Upon further investigation, it seems that the further the import progressed, the higher the average times became. The answer to the why question turned out to be disk I/O.

The combination of writing data to the repository (and thus to tar files on disk) with sling jobs and moving files around on the disk from one folder to another while also trying to read files turns out to be a real show stopper. This doesn’t become an issue until the imported data set becomes large enough (and the disk can’t keep up anymore).

Summary
What have we learned from all this testing?

  • The lower-level API is ALWAYS faster
  • When traversing the content tree, JCR API is about twice as fast as the Sling API
  • When writing, the JCR API is about 30% faster than Sling
  • When writing, batching save/commit actions will improve performance; this must be weighed against odds of data loss or impact of retry mechanisms
  • Disk I/O can have a bigger impact on performance than API choice
  • Writing nodes is more expensive than writing properties (1)

(1) : Tested through writing 1000 pages each with 100 nodes containing 10 properties vs writing 10 nodes containing 100 properties. Results are not included in this article.

Which API should you use?
So now what? Should you always pick JCR? Despite the difference in performance, I would still recommend to write code using the higher-level Sling Resource API over the JCR API (most of the time). This recommendation might seem a bit odd given the results. Why would I recommend the slower Sling API as the better way to go? For me, there are 2 big factors to recommend one over the other and the recommendation is always tied to the use case at hand.
1. Performance
2. API features

1. Performance
First, let’s put things in perspective, shall we? We are talking about milliseconds here, which means that even if you write everything using JCR, you’ll still only have gained … milliseconds. So basically, between 30 and 50% gain if ALL your code is driven by the JCR API. In my case, I gained 27.5% switching API’s (not everything was JCR related) for this part of the codebase. Let’s put that in perspective again. On a single page this means I’ve gained roughly 55ms, on 1000 pages this translates to ‘only’ 55s improvement. On 10.000 pages about 10 minutes, … , see where I’m going with this?

Only when the number of operations reaches a large enough scale to cause performance issues can switching to the JCR API provide some relief. Although when you reach this point you might want to consider extracting the operations into a separate (more suited) system altogether. If your performance comes down to the choice of API, there is probably something wrong with the architecture in my opinion.

2. API features
My prime recommendation is to choose the API which supports your business case the best. Which is usually the higher-level API. The switch to the lower-level API is justified when you need something which isn’t available otherwise. A classic example of this with regards to Sling and JCR would be versioning which is only available in JCR (so far).

The main reasons to pick the higher-level API boil down to readability, maintainability and productivity.

In general, a higher-level API usually sets out to accomplish two things. First, it tries to make your life as a coder easier and more productive as you can do more with less code. This is achieved by adding another layer of abstraction to the lower-level API. This layer attempts to reduce boilerplate code you’d otherwise have to write by wrapping underlying features (to reduce complexity) and providing sensible defaults for exception throwing lower level function calls. Another thing this layer usually does, is hide the complexities which are tied to ‘structure’ which you’d otherwise have to know.

Second, it tends to implement useful new concepts and features. These, in turn, allow you to build more advanced features on top of what exists without having to worry about the basics. A prime example of such a feature would be the ‘adaptTo’ feature. This feature allows you to quite literally adapt a resource to a custom implementation which can manage (and hide) underlying structure and complexities. There are plenty of other useful features and concepts available in Sling (script resolution (using selectors and suffix), Models, Jobs, Pipes, …)

Below are 3 code snippets all doing the same thing – retrieving a ‘jcr:title’ property from an expected page path. The first is pure JCR, the second pure Sling, the third AEM API.

The JCR snippet is clearly a lot of code to do something very simple. It throws a lot of exceptions and requires a lot of extra checks to get to the property value. The Sling and AEM API both do the same thing with far less code and without any exceptions. The last 2 snippets are so much easier to read and maintain than the JCR snippet.

The AEM API has the added benefit that the developer doesn’t need to know about the inner structure of what a property is named or where it’s saved. If the property would be stored in the property ‘title’ in the next version, none of your code would need to change!

Although this is a very simple example, there are plenty of others …

Conclusion
Performance is important, but so are readability and maintainability, perhaps even more so. For most code, the difference between both API’s comes down to only a few milliseconds over a set of operations. And that’s before you’ve implemented any form of caching. As throughput of operations goes up, you might be tempted to switch API’s. And while switching might be an improvement at first, you’ll quickly end up in the same situation as before if that number goes up further.

The higher-level API is faster to write and far more readable and maintainable in the long run. It also tends to protect you from any structure changes and costly code changes, although the latter is largely determined by quality delivered by the developer.

My personal guideline is to use the higher-level API’s when you can. Only when you need a feature that isn’t available higher up, should you switch to the lower-level API.

Bart_WulteputteAuthor bio
Bart Wulteputte is a Senior Adobe AEM Consultant at Amplexor International. While he started his career in Enterprise Content Management technologies with SDL Tridion, he switched over to CQ5/AEM at the beginning of 2013. His role in AEM implementations spans the entire project: project setup, infrastructure architecture, server provisioning, development, quality assurance, continuous integration, and automatization. He upholds very high standards during the development process with very strong focus on (code) quality, performance, and scalability.