The AEM Podcast team had a good surprise last week when David Nuescheler stopped by our studio to discuss a few things before speaking at a Meetup event, hosted by Axis41. The topics of conversation during the podcast interview included things he has learned over 20 years of development, David’s Model, and the new MicroKernel architecture available in Adobe Experience Manager 6 (AEM6). We are very grateful for David’s time and candid answers.
David then was the guest speaker at the first Meetup for the Utah AEM/CQ/CQ5 Developer Community, where he spoke about the origins of AEM, and the evolution of what we know as AEM 6.0. For a recap of the Meetup event, visit the Axis41 blog.
Transcript:Welcome to the Adobe Experience Manager podcast, a weekly discussion regarding the Adobe Experience Manager, formerly CQ, and other marketing and technical issues. This podcast is presented by Axis41, your partner in Adobe Experience Manager implementations. Your hosts for the podcast are Joey Smith and Peter Nash.
Peter Nash: Good afternoon. I am Peter.
Joey Smith: And I am Joey.
Peter Nash: Welcome back, everyone. We are excited for this particular podcast today because we have a very special guest with us, David Nuescheler, Vice President of Enterprise Technology at this new company called Adobe, but I think everybody probably knows him as one of the creators of CQ, now Adobe Experience Manager. So David, thanks very much for taking the time to come down and chat with us today.
David Nuescheler: Thanks a lot for having me.
Joey Smith: This is in conjunction with our meet-up here in the Salt Lake City area to talk about Adobe Experience Manager. So, we are going to be having a lot of different developers in today to discuss a few things as they work with CQ, but David has taken the time to answer a couple of question from us, and so we will go ahead and just jump in those right now.
Looking back on the history of Adobe Experience Manager, what lessons have you learned since the initial creation of it that you could share with yourself 20 years ago, and you know, what are some of the things that you know now that you wish you could have known then that might have affected the direction that you would have taken CQ and eventually AEM?
David Nuescheler: Yeah, I know that’s a lot of lessons we learned today. It’s interesting. I just, a couple of weeks ago, got my 20 year anniversary tombstone, I would say, from Adobe, and that sort of made me think I’ve been doing this for more than half of my life, which is scary—a scary proposition. If you think that you have been in a job for 20 years, then you probably think that’s weird, or I think that’s weird. But the interesting thing is that the challenge, while it was always around the same problem, changed every couple of years, and that’s really—we are in such a rapidly evolving space—technology and then the web, in particular—and then if you take it a little bit broader and say web technology into apps and what not, I think there is so much that’s happening. And if I would have to pick sort of one lesson learned where I think we made the right strategic investment, and I would just have made it earlier and more forcefully, then it would probably be the concept of open development. And what we mean by that is open development has sort of three different components.
One is open source, open architecture, and open APIs—or open standards, if you wish. And it’s interesting because in this ever-changing world, then that’s really what’s the hardest about it, right? The environment changes so rapidly, and if you watch things like the last Google IO and now there is a round screen, right? So, what are you going to do with that, right? And open development really helps us deal with this rapid increase of things that change. And so if we go through the three components, the open source component, of course, was very important to us. The open standards was very important to us, and the open architecture, and you see that reflected in the product’s architecture very much today where big chunks of the infrastructure are open source, where we try to work with a broader community than just ourselves, then obviously we adhere very much to standards. And the standard is an interesting one because if you create open APIs or open standards that you commit yourself to, then you create stability, essentially, for your product.
Joey Smith: It should be like a contract.
David Nuescheler: That is really – exactly as a contract, then if it’s a public standard, it’s not going to change very soon and that allows third party developers to build things against a stable API. That’s not going to go out of fashion with the next big thing, and then that brings us into open architecture, and open architecture is the concept that we build the platform in a way that would give everybody who works on Adobe Experience Manager sort of the same extension capabilities that we use to build Adobe Experience Manager ourselves.
So, the entire OSGi container as an architectural principle where you say, “Well, we build bundles, you build bundles. We are sort of the same as—exactly—and we use our own container to build our own product and that’s—frankly, that’s how we found a lot of the deficiencies, right? If you just build infrastructure and then you hand it over to partners, to integrators, and then it’s just like, “So, what am I going to do with this?” But if you have to use the same thing to build the product itself, you get too much better tested and much more extensible platforms.
So, it’s almost the stability comes from the standard, and the extensibility and speed of adapting to the evolution comes from the open architecture. And an open source is really just making sure that we all work together on basic infrastructure things with that, everybody having going at – having to go at and build these things themselves. So, if you think of the open source projects that are material in Adobe Experience Manager, then I would say it depends how big we cast the nets. So, I would say if we look at AEM 6, there are probably four major open source initiatives.
Obviously, the Jack Rabbit Project is probably fairly well-known for the repository, and the Apache Felix Project for the OSGi container, and then the Apache Sling Project. And the reason why I lump the fourth into that is because of Adobe Experience Manager apps. And Adobe Experience Manager apps is obviously based on PhoneGap, PhoneGap Enterprise, and DPS, and obviously the PhoneGap Project is the Apache Cordova Project where we collaborate with everybody around – very rapidly evolving mobile space where things change every day.
Joey Smith: Constantly shifting, right?
David Nuescheler: Constantly shifting. And it’s great to have contributions from the Googles, and IBMs, and everybody who is involved, so we do not have to run after a lot of the operating system changes and new APIs as the vendors themselves are producing that and putting that into our projects.
So, I think the open bottom line, I think the open development paradigm is something that it took eight years for us to get started with that. It took Roy Fielding to join us and sort of tell about it and sort of hold our hands and coach us through it, and that was in 2001. And obviously, the first that they, the company that produced CQ at the time, that was founded in 1993. So, it took us eight years to get there and to get started with it, and obviously, in the meantime, it’s completely established if I could go back in time into 1993, I would say, “Well, do this from the beginning.”
Joey Smith: Nice. You know, David, one of the things that I struggled with—I’ve already been working with Adobe Experience Manager for about a year and a half that this point. I spent the first 15 years of my career working with relational databases, and so I came in and I kept trying to make JCR act like a relational database. I wanted, you know, I wanted IDs and I wanted some of the things that I was so comfortable with, and I struggled with a lot of what I would call a cognitive dissonance trying to build a hierarchical model, but deep down inside I wanted a relational model. And one of the things that they shared with me here that kind of helped me bridge the gap was your paper, David’s model. And I’ve noticed in the past year or so that a lot of the links that used to have going out to the Jack Rabbit user forums, those links have died. They are no longer fresh. I am wondering if you thought about going back and visiting that document at all, and if you have, then wondering if there—if you thought—are there any new rules that you kind of discovered or put in place that you would like to add to it?
David Nuescheler: Yeah, I mean that was an interesting time, and I remember when I came up with those sort of content modeling rules, and I initially called them sort of—I forgot what the name what the theme was—but it was sort of rules for blissful content modeling or something along those lines. And I still remember, I was on vacation and I just thought I sent, almost every day, I sent one rule to the mailing list, and that people beat me over the head with it, which is what those links are.
Joey Smith: I saw a lot of value in watching that community kind of hash that out, what that really meant.
David Nuescheler: And some of the rules have been adjusted a little bit based on the feedback of the community where I realized that I may have gone too far. They were meant to be a little bit provocative, at least thought provoking, and they were exactly for people who sort of come from a relational database background and think this is a different model that I have to, or that I can, work with, and there is—yeah, there are various different things in specifying the JCR specs that were not obvious that made it into these rules. Then there were a lot of things that just are web things, like if you think if you mention to hierarchy, and I would obviously argue that JCR is not necessarily hierarchy, it’s a network—but still, it has this sort of path hint in it, and if you think of the web, the web also has this path thing in it and the hierarchy is, I mean—I think that’s rule one. I haven’t looked for it for a long time. I sort of loved the hierarchy because a lot of the web infrastructure uses the hierarchy and it makes it super transparent for somebody to understand where things are. If you think of any framework that maps a relational database to a URL, and very often you look at the URL and you have no idea where this is going to end up with, especially if you take the IDs out of the equation, which you should because of—I mean, that’s, I think rule 2 or 3—because especially if you deal with the web, the URL probably shouldn’t have numbers in it because this FSCO and the 100 other reasons. So, it feels like it’s a more natural thing to map a hierarchy to URL structure, and obviously if you think of file systems, that’s exactly how it has been done.
Yeah, and everybody sort of knows, okay hierarchy is like this, if I need to extend the hierarchy, I create a folder, I put some stuff in it, and absolutely brain-dead clear to everybody how that happens.
Peter Nash: Yeah, I like the phrase you just used, the brain-dead clear, because of the fact, like when I came here I was working with other types of databases and I come to this and everything is in a folder, and I said, “Son of a gun, I get it. I can understand where this is going.” And for me, it made a lot of sense because I am not a technical guy, I am not a developer, but I certainly have to work with them a lot and when they use certain words and they talk about how things are structured, I get how this is.
David Nuescheler: And I mean, the inherent structure of the URL, or the path segment that you have in the URL, sort of makes it more intuitive, but there are other reasons, there are other things that work better in a hierarchy than they work in relational access control. This is one of the popular ones where access control with inheritance just works. Access control without inheritance, if you think of all the attempts in role-based access control in relational databases, that nobody ever used. Everybody built their own access control on the application to you, and all the problems that come with that was SQL sequel injection, and the security nightmare if you do video access control on the wrong level. So, there are a couple of reasons why hierarchy has value and that is the way I liked to think about it, because if you think of the creative language in JCR, it’s SQL, right? It gives you the same data in, again, in a table oriented sort of almost relational model, which is interesting. So, it’s – I like to, I usually like to say that JCR supports hierarchy as opposed to JCR is hierarchical.
Joey Smith: Okay.
David Nuescheler: And obviously, I have no database background, I’ve never in my entire life worked with a relational database and I never will, which is great and it is a bit fairly unusual of a course for a developer to be able to avoid relational databases, but I was lucky enough not to be involved. So, I came to this without being tainted with that. [laughter] I did not understand the relational model and I never sort of cared for it all that much and—
Joey Smith: Well, what is interesting is in the years that I held in relational databases, especially when you were building web frontend stuff, there always ends up being a kind of hierarchal model built into the relational stuff anyway, and so those aspects I bought into immediately, like, “Oh yeah, that makes, I get it.” Really, hierarchy really is, at the end of the day, as we are delivering content to the web, there is a hierarchy there. Why not express that directly? But then also things, like, don’t use IDs.
David Nuescheler: And the hilarious part behind it is obviously, in the implementation of JCR, we used IDs left and right, right? I mean, there are IDs all over the place, and I usually used the analogy of a file system, because in a file system, in Unix base file system inodes, which are IDs, of course. And they are identified with numbers, and then you have paths, right? And every single interaction that you have with the file is through the path. Nobody in their right mind would ever use an inode for nothing, right? [laughter] And it’s sort of this abstraction where you say, “Yes, there are IDs, but really don’t show them to the user whatever you do,” right? And I think that’s almost true for software design, in general, just don’t create IDs that don’t mean anything to the user.
And we had a lot of internal arguments over it, and obviously when we specified JCR and we introduced references, and weak and strong references, and all that kind of stuff, we sort of suggest to use references, right? And to use IDs for some of those things, and stability over move is something that you generally do with an ID, right? You say, “I have a reference to this object and I don’t care where you move in the tree,” it’s still the reference and it still points to the same object.
And it turns out, the stability over move so our disability across move operations is something that is very interesting to develop [indiscernible] [00:15:11] but it’s very undesirable for the end user, and that’s really the scary part, if you think about it, because if you think of a file system—again, the Unix file system is a great reference—and you think there are two ways of linking things in the Unix file systems using the LN command as a hard link, and the soft link, and how often in your life have you made a hard link and how often have made as it, that’s when you become a user, right? That’s when you are not a developer anymore.
How often have you made the symbolic rare soft link? And for me, it’s I have never made a hard link in my life. I don’t even – for me, it is like LN-S is just a command, right? That is like, why would I use something else, and that really made me—and it’s so much more descriptive, right? For example, look at we have four examples in area, in access control, where you have group memberships. That is a good one, right?
Joey Smith: That is a great example where I have seen.
David Nuescheler: I’ve seen them too, yeah. [laughter] And we have lengthy arguments over that, and I always argued against the current implementation and I still would because if you browse this thing, it has new IDs in there, right? And then you think, “What is this? Who is this?” Right?
Joey Smith: What is that trying to do?
David Nuescheler: And I have to sort of have to click on it or not. If there was an email address or some form of a path in there, like, Okay, I get it. So it’s, I think I am not arguing against IDs, and this is probably what I should do if I would revisit it. I would probably say I am arguing against ID that where the identifier does not have any semantical merits.
Joey Smith: Okay.
David Nuescheler: And it’s really, just if it’s just an arbitrary number, than it’s like that.
Joey Smith: Yeah, as you talked, I suddenly had a vision of how can I memorize the inode of every file I ever wanted to use and, and like, CD 1379, that can be kind of a nightmare.
David Nuescheler: And even seeing them, right? I mean, in LS there is an option that you can inode, but I don’t even – I don’t even know.
Joey Smith: What good is it, right?
David Nuescheler: Exactly. So, I think, I mean these – the reason why I called them sort of my model is also not because I wanted to claim ownership, but because I knew they were controversial and people wouldn’t like them, and it’s just, “Hey, here is my idea,” right? And even within our own development organization, I mean, as the example of the access control shows, it’s not necessarily, there is not a 100% agreement, and that’s fine. It’s just if you want to hear my opinion, here is my opinion. Feel free to..
Joey Smith: Like I said, it definitely is useful to help bridge that mental gap to say, “Okay, I am used to thinking about everything in this one model. I have got to get myself over to the new way of thinking,” and I think that document is very helpful for that.
David Nuescheler: So, I think you mentioned the links that are broken on that document, and I think it’s really just because I think they went to N-able or something like that there.
Joey Smith: Yeah. N-able, I think, re-architected how they do the links.
David Nuescheler: So, it is probably up to someone to go and Google them again, which you probably can, and just point them to different things. I think it should be on the Jack Rabbit with key, I think that is where the authoritative copy of that is, so if anybody who listens to the Podcast—
Joey Smith: Maybe I’ll go fix it myself.
David Nuescheler: Why can’t you do so?
Joey Smith: You know, one of the things that as we have been looking forward to AEM6.0, a big change is coming down the pipeline, for us as implementers, and myself as a systems guy, is the new MicroKernel architecture, and all the changes that is going to bring with how we index and how we maintain our content repositories. I wondered if you could just walk us a little bit through the kind of the mental processor, the concepts, that went into changing away from RPM and going to the MicroKernel architecture.
David Nuescheler: Yep, I can definitely do that. So, if you think from a timeline perspective, we started with JCR in 2001, and that’s also when we started with the implementation of Jack Rabbit, essentially with the co-base that we have Jack Rabbit 1 and Jack Rabbit 2, which are the ones that we have been using up to version 6, and after doing this for, I would say, eight or nine years, you sort of realize where the pain points are, what the learnings are, and that’s when we decided — it was actually like four years ago or so, we decided that we want to give it another shot with a different implementation, and that’s where the open standards come in, right? Where you can say, “Well, we just implement things underneath,” and for developers, you will not have as much impact, right? And ideally, it has no impact, and your application runs essentially the same way. And now for system administrators and architects, that’s a different story, right? Because the persistence works somewhat differently, and there is a couple of main—these angles that we have, and re-implementing that. And one was we constantly had people having arguments about the horizontal scale for writes, and while I haven’t – I still haven’t found someone who really could argue that they ran into the bottleneck of the Jack Rabbit 2 implementation on how fast you can write, you can write about 500 notes per second and then there is a sort of a hard limit independently of how many cluster nodes you have. Actually, it gets a little bit worse with more cluster nodes.
While nobody really ran into that, I felt like it is a psychological limitation. I cannot scale out horizontally what happens if this becomes the next Twitter, and apparently everybody thinks their websites becomes the next. [laughter] So, while it never really has become a problem in architectural limitation of our implementation, and that was really sort of the biggest driving, the biggest driving and forcing, let’s remove that and make things distributed and work on an MVCC model and merge things asynchronously, also allow things to get out of sync for a while, and just—which is okay in most cases. You mentioned search, which is another good example in the Jack Rabbit 1 and 2 implementations. We had a synchronized query mechanism that would have one index for the entire repository, and it turns out that it’s hard to maintain. It’s a big sort of factoring and locking things, and in the Jack Rabbit Oak, which is sort of the Jack Rabbit 3 code name implementation, we made that asynchronous and that means it’s totally okay for the index to be out of date, and once you’re in that contract where it is okay, it does not really matter whether it is one second out of date, or one minute, or 10 hours, or whatever. You can distribute things a lot better. And one other aspect is we didn’t create one super index that would hold everything, with almost on a procreate basis, create individual indexes, which takes some of the complexity of the queries out of the actual querying and pushes it into the index configuration. So, you say you want to index property, and that property, and here is how I want to index them, and that’s it. And that’s the only thing that is going into that index. So, one of my favorite examples is, for example, the DAM search, or the asset finders searches that we have in the product. We think there was only very minimal set of information that we really need to index, but you want to index that in a very specific way, like for example, the users that start with a one used for a substrings or for whatever looks like the beginning of a new word in a file name, that’s a very special way of indexing things, and you maybe have path and folders and so forth that are interesting, but there is a lot of stuff that just indexing doesn’t make all that much sense and it just pollutes your –
Joey Smith: Sure, like that Oracle XMT properties or something like that.
David Nuescheler: Exactly, and if you search for Adobe and it’s like everything has been touched by an Adobe product, so every single asset is returned, as opposed to one that has Adobe in its name, right?
And this is where you could get with the Jack Rabbit 2 implementation to the same point where you exclude all these things from your query and then you don’t want to include or exclude them from indexing because that means it’s not accessible for anything anymore, and like, it is a bit of a messy process and your query gets very complicated because you sort of include this, exclude that, and then—but not here and there—and this complexity would now reside in the index configuration.
Another change that we made, the index information was stored before sort of separately on the file system, and it was always a bit of a pain with backup, restore, and synchronizing that, and now restoring it in the repository itself. So, you just backup the repository and…
Joey Smith: Yeah. That’s one change I am looking for, because we run into that where we do a restore, and because of the version that the separate workspace where that stuff is stored, and then all of a sudden all the index is crept, we have to rebuild the whole index and that was a huge thing.
David Nuescheler: And I think even if you having that index asynchronous, have that as well, right? The repositories, maybe all indexes, however many you have, maybe a couple of minutes behind, or they should be, they will catch up and it’s alright, but you don’t have to wait for anything. So, I think there is a couple of things. Another one that I really like, and that came with as a design goal, is people always gave us a hard time for a lot of siblings under one, or a lot of children, under one node, just like having a thousand files in one folder, or ten thousand, or 100 thousand, or million, or 10 million, and I would always argue that’s a bad design anyway because eventually somebody is going to hit the browse spot, and be it on a file system explorer, or in CRX, wherever, and then they are going to iterate through all of them and then you have the mess, sort of. The same argument that you have on a file system: just do not put a million files into one folder. It’s just not a good idea. And it turns out that the first test that everybody does when they download Jack Rabbit, or TRX, or CQ, and when they want to do a load test, that they create a loop that creates filenames with just numbers in it, right? Like 1, 2, 3, 4, 5, and let’s see how that works, right?
And obviously, in the old implementation it would linearly degrade, the 100th node would be that slow, that one thousand node would be sort of 10 times slower, and that was another thing that we just addressed and said, “Well, if you want to put 10 million nodes in there, here is enough rope to hang yourself. You will figure out why it’s not a good idea later,” but what would happen is people would start playing with Jack Rabbit doing nothing, what is this, and not continue the development. So, it’s just getting over that first hurdle.
Joey Smith: I think that is all the questions we had for you, David. We really appreciate you coming and sharing your time with us.
David Nuescheler: Thanks a lot.
Thanks for listening the Adobe Experience Manager Podcast brought to you by Axis41, your premier partner in AEM implementations. If you have questions, comments or something you’d like us to cover, send an email to firstname.lastname@example.org.