Sunday, September 24, 2006

Validation III

This post is a continuation of my original article about validation. In that post I brought up as an example the case of multiple oil wells connecting to a single battery, where a battery - for the sake of simplicity in this example - is just a tank that stores the production from all of the wells. Again as a simplifying conceit, I brought up the idea that the battery and wells connected to it should always be of the same color. Having set up a red battery and hooked up a bunch of red wells to it, I then brought up the problem of validation deadlock: What do you do if you realize that you meant to make the battery blue, and the wells should be blue as well? Assuming you always validate all of the wells at a battery when you make changes to the battery's attributes, and also that you validate each well when you change its attributes, you wouln't be able to save your changes from red to blue. Changing the battery would produce an error message stating that there are red wells connected to it; changing a well would generate a validation error stating that the well is connected to a red battery. We're stuck!

One can take several approaches to tackle this problem, and the particular approach one decides on really depends on the application itself. One idea might be to simply not permit updating. In the case of our ongoing example, users would have to delete the wells and the battery and start over again, entering the data in correctly. This simple approach may actually work for some cases where the information is quick and easy to enter, but it's not a very good idea most of the time. Having users re-enter data for all wells and for the battery too would likely be too time-consuming and frustrating, especially just to correct a simple data entry mistake.

Another approach, my favourite when I can get away with it, is to allow users to make changes on the user interface one at a time, then to save those changes all at once. Behind the scenes, the application deletes and re-create everything from scratch. This kind of approach generally requires that all of the data being worked on can be managed on a single screen. In the case of the battery-wells example and similar scenarios, one can imagine first displaying the battery data at the top of the screen, then each well in a list below. Clicking on a well would make its attributes editable. A disadvantage of this approach is if the user is happily typing away making changes, and the application suddenly crashes (or in the case of a Web app, the user accidentally closes the window) all those changes are gone and must be re-entered. Such difficulties can be overcome by periodically saving the data behind the scenes - to the session for example in the case of a Web app. Finally two last points: Deleting and re-creating also doesn't work well if there are other dependencies in the system on the existing objects. Finally, performance would likely be a problem if there were a lot of data to delete and then recreate. So, in the case of a pure composition* relationship, this simple approach could work, but not in the case of aggregation**. It would probably not be a suitable solution to our well-battery problem.

A variation on this approach is to defer validation until the unit of work (or transaction) is ready to be committed. One can wait until that time to call the validation method on all objects inside the unit of work. This approach also solves the validation deadlock problem if all of the changes can be made in one unit of work. The description I gave above of one single screen that allows the user to edit both the battery and all of its wells in one go is an example of a case where I think this approach would work. If however the battery is on one form and each well also has its own form, this idea won't help.

* Composition is the idea that an object is made up of constituent objects which have no life of their own. That means that when you delete the main object all of its parts can be safely deleted as well. There are never any external dependecies on these parts. Pure composition is fairly rare in the software world, even in cases where it seems to be the right answer at first glance. For example, going back to our ongoing example, it might seem reasonable to delete all the wells associated with a battery along with the battery itself. However, those wells might be connected to different batteries later or earlier in time. Here's an example of composition from the application I am currently working on. There are meters in this application which measure volumes of oil and gas. At a sales meter, you may enter in the amount of oil sold in a month. You can also enter in priorities for that meter to define how the sales are allocated. You may want to first allocate sales to a particular producer, then to the rest. Since these priorities are defined for one meter and one meter alone, they can be deleted along with the meter itself.

** Aggregation is the notion that an object contains other objects, but that these other objects do have a life of their own. A very typical example is a university course catalog. You have courses and students enrolled in those courses. A student can be enrolled in many courses at once, and if you delete a course from the system, you definitely don't want the system to delete all of the students who may be enrolled in it at the same time.

Let's say that the preceding strategies are not the right answer for our battery-well example. What can we do then? One solution that I personally like is to put the power to validate in the hands of the user. You can let the user toggle validation. If we have a convenient single validation method for each relevant domain object as discussed earlier, that's fairly easy to do. If the validation flag is turned off, the validation method simply isn't called, and the information entered is saved even if it is invalid. One can also alter the behaviour of this method to generate warnings instead of errors. That way the user still sees all messages that the validation produces, but it doesn't prevent saving changes. In order to work with the application beyond just data entry, presumably the user would have to toggle the validation back on. You have to be careful that subsequent validation don't rely on previous ones. For example, if the first validation checked that a particular attribute is not null and a subsequent uses that attribute, you will have to put up guards to validate only if the attribute is set.

One can also remove the notion of a user manually disabling validation by having the validation method generate warnings when it is called during data entry, then have it automatically produce errors when it is triggered prior to processing that actually uses the information that was entered into the system.

That's about it. I will finish up by discussing a few odds and ends. I've encountered an application that was split in two. There was a 'staging area' application. In this application users were able to enter in whatever data they wanted. The application would generate warnings for inappropriate data. Once the data was 'clean', the data was then transferred into the 'production area'. Here no errors were allowed. I don't know how well this worked out as I wasn't actually working on this application, but it's not a solution I'd recommend. First it means more programming. Second, it means the whole process of using the system becomes much more involved. Finally, it doesn't solve the validation deadlock problem. What do you do if you realize you've promoted data to the production system with mutual dependencies that both need to be changed? A very simple, but in my opinion inelegant, solution to the validation deadlock problem is going through the backdoor. That is, issuing a SQL statement to update all of the appropriate records at one time. Since this bypasses the application layer entirely, there is no need to worry about validation. Even if there are database constraints in place, these can usually be deferred until the transaction is ready to be committed. While I believe this technique has its place, it's not a good idea in general. It requires outside intervention in most cases, since few users know enough about databases and SQL to do it themselves, and it is quite error prone since as I mentioned earlier, it bypasses the application.

No comments: