the CallBacks Dilemma

EDIT: I think Barney's comment did actually help me (although I may not have realized it at the time), to find a better way to handle the more traditional ORM in DataFaucet and I've since published a new version of DF (1.1 RC1) with a more traditional ORM tool that uses your objects from a Dependency Injection (DI) factory. The good news imo is that the new tool doesn't require any call-backs in CacheBox. Although I think this is still a valid question: are there cases in which we actually need call-backs from the cache service? Should they be implemented in CacheBox? I'm inclined to say no, but I'd still like to know what others think.

I'm leaving the rest of this article here at least for now for posterity. There's a part of me that says "dude, your frustration is showing", and would like to remove it in an attempt to present an image of someone who doesn't get frustrated (or confused), who always keeps his cool. But I know that's not true for anyone and I think it's better that I get over myself and let everyone see that I'm a real person with real flaws. And hopefully sharing the frustrations that I've learned from will help some other folks learn from them as well. :)

--------------------

A while back I read this blog article from Joe Rinehart about the idea that the ColdFusion community didn't have any "real ORMs" and it listed several specific things that "real ORMS" do, for example, not having your created objects extend objects in the ORM. Okay, fine. I can see where some folks might consider that useful. But that's where you start to run into the problem.

In the OO world, in an ideologically pure ORM system, your individual objects compose each other. I don't want to get too technical here, but basically it means that one object will have another object as its property, so for example, if you have a family, you can call husband.getWife() and it returns the wife object, which in turn can return the wife.getHusband() method to return the husband object. I can see why this is the ideal that purists strive for, because in a perfect world, this all makes very logical sense. But the ORM also needs to do something else and that is to ensure that if we have a Husband object named Bob that there is only one copy of Bob in the system anywhere at any given time. So if you have Bob in the ShoppingCart and you also have Bob in the WishList, it has to be EXACTLY the same Bob, not two different Husband-Bob objects. That means caching. And caching means that since Husband-Bob has a Wife-Karen, we automatically are REQUIRED to create Karen when we create Bob (or vice versa), unless we lazy-load. But because of some of the current limitations in ColdFusion, we can't lazy-load AND not extend something (remember Joe said extending is bad). And that means we're REQUIRED to aggressively load Karen and Bob at the same time -- and any other objects that you might be able to get from them.

So what this all boils down to here is that even if all you're doing is returning Bob's Birthday to see if he's old-enough to buy beer, you're REQUIRED to aggressively load Karen and the kids and their house and dog and etc. etc. ad infinitum just to get Bob's Birthday.

And at this point I haven't even described the problem yet! Before I started into the CacheBox project I had started working on a more traditional ORM system in DataFaucet based on Joe's comments about traditional ORM systems. I even have some ideas about managing to do the lazy-loading, although they involve extending in the oposite direction (the ORM extends your objects, instead of you extending the ORM).

The real problem comes when we reap the cache. Reaping the cache is important because it removes old items that are hanging out, chewing up memory, so that memory gets freed up so that it can be used to cache other things that have greater demand. And the CacheBox system right now has a pretty darned neat and efficient little system for expiring / deleting / reaping the cache, that allows you expire or delete multiple items at once with a % wild-card. So if I expire or delete "my.query.%", it's applied to anything in cache with a name that starts with "my.query." (my.query.products, my.query.photos, my.query.beer).

So what happens if we decide to expire or delete our Husband-Bob? Well first, the cache is no longer the only place where we have a pointer to Bob, because Wife-Karen has a pointer to Bob. And that means if we expire, delete or reap Bob, we're REQUIRED to also expire, delete or reap Karen at EXACTLY the same time. And if the kids have a getFather() method, then we also are REQUIRED to expire, delete or reap all the kids, and if the dog has a getOwner() method, we're REQUIRED to expire, delete or reap the dog. Because if we don't expire all of these objects simultaneously, then the next time someone goes to the ORM to get Husband-Bob and he's not found in the cache, BAM! You've got 2 coppies of Husband-Bob floating around in your application, as though he'd cloned himself and run off to the store and to see his mistress at the same time.

The Bleading Edge Release (BER) of DataFaucet actually has caching code already that does basically this. It was put in before I started working on CacheBox and then I never relased it (or documented it) because I got started on CacheBox. But the current version of CacheBox, which I honestly think is as complex as it needs to be, won't support it. The reason for this is because cache reaping happens at the service level, far away from the ORM where in my opinion, you really want the reaping to happen. In an ideal world you don't really want the rest of your system to know much at all about the cache, you want it to be a black box - stuff gets put in and later comes back out and occasionally you manually remove stuff when it's updated, but beyond those three very simple operations, the rest of your forum or your shopping cart is completely ignorant. And that's the way it should be imo. But in order to support the more traditional ORM features in the DataFaucet BER, I would be REQUIRED to make CacheBox a bit more complex and allow it to muddy-up your application with more knowledge of the caching system. DataFaucet can't know when reaping cycles occur in CacheBox - it just can't, there's no way around that. So in order for DataFaucet to know when it needs to reap Karen, CacheBox would have to INFORM the CacheBoxAgent that "hey, we're throwing away Bob, in case you're interested". That way the agent can then rifle through its list of composed objects like the Wife-Karen, the kids and the dog and expire or delete each of them in turn. You can call this a "call back" or you can call it a listener (the agent being the listener), but either way it means more complexity on both sides (agent and service) and it means more information about the cache on the agent side, which imo can't be good. It also means that the expire / delete for ANY of the objects stored for the benefit of the ORM suddenly becomes not only MUCH more complicated, but has a HIGH likelyhood of creating a chain-reaction in which what was a relatively simple delete becomes a HORENDOUSLY slow operation as it churns through N arbitrary number of related objects to decide what it needs to throw away.

Basically what I'm saying is, this is the price of ideological purity. At the end of the day, it seems to me like there is a very simple solution to all of this; no composed objects in the ORM. And purists cry out "NO FAIR!" because then instead of husband.getWife() I'm now calling husband.getWifeID() and going back to the ORM to get the wife object whenever I need her. Oh the tragedy! I don't just have immediate access to an object! I'll spend the rest of my days bleeding from both eyes! But the system will be simple, efficient, fast and easy to maintain... Weren't those the objectives of OO in the first place? So if the "philosophically pure" approach causes massive complexity, huge headaches and likely performance bottlenecks and the only advantage is that we get an object instead of an ID without having to use an extends attribute somewhere, how is that doing us any good?

I'm posting this on the blog here because I honestly feel like adding call-backs to CacheBox to broadcast information about the reap-cycle is a bad idea (TM). I think it's simple and efficient the way it is, and I think we should all be glad that we don't have to write all that horribly complex code to make it work. But CacheBox isn't only for me, CacheBox is for the community and I want to know what the community thinks. If the community really disagrees with me, then I'll add call-backs. The cache that's built in to ColdBox has them and my ColdBox sample just ignores that fact, because I think it's one of a small handful of bad ideas in the ColdBox cache. So this is your chance to weigh in and let me know how you feel about it.

Thanks! :)

Comments
Barney Barney Boisvert's Gravatar I think you're missing the point with cashing in an ORM framework. The Bob object should never have a reference to the Karen object (nor the converse), it should just appear as if he does. What should happen on bob.getWife() is the ORM framework should look up in the session/transaction/unit-of-work to see if it already knows about Karen. If so, use it, otherwise hit the DB. Then Karen can be expired at any time without Bob caring, even in the middle of a unit of work. Everyone keeps their own isolated workspace with loaded object (all of which can be discarded at session close), and you can reap cache simply and with impunity.
# Posted By Barney Barney Boisvert | 9/22/09 3:04 AM
ike's Gravatar Thanks Barney. Of course that would be my preference. But the reason for all this back and forth in the blog over the dilemma isn't because of the concept, but rather because of what's available to implement it in current versions of CF.

I don't have a relatively simple "wrapper" object that I can just layer over the original object that will act in all ways identical to the original object. I can come close with onMissingMethod or with inheritance, both of which potentially cause other issues.

OnMissingMethod doesn't support plain-old variables in the "this" scope and potentially causes object type exceptions depending on how the other person writes their code, and of course the objective is to make it seamless so it doesn't matter how they write their code.

Inheritance solves those two issues, but then creates some other potential issues like what if the person wanted special logic in the getWife() method? And I'm also back to using code-generation, which I generally try to avoid. I think this inheritance route may ultimately be the one I go with because it seems the least likely to cause problems.
# Posted By ike | 9/22/09 12:15 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.5.006. | Protected by Akismet | Blog with WordPress