Saturday, September 30, 2006

Validation IV

I figured I would write a bit more about validation after my last three already long winded posts on the subject. Basically this is just a brief synopsis with an example and a few minor elaborations. My major conclusion is that all domain-level validation that one would think to put into setting methods or initializers/constructors should instead be done via a single method for each domain object. Let's call that method valid, or isValid, or validate. Ideally, rather than throwing an exception every time it finds an error, this method would add messages as it goes along, and would only throw an exception when it's done. In Java, here's a typical example:
public void validate() throws ValidationException {
Messages messages = new Messages(this);
checkStartDateAndEndDate(messages);
//more validations go here
if (!messages.empty())
throw new ValidationException(messages);
}

private void checkStartDateAndEndDate(Messages messages) {
if (Util.empty(startDate) || Util.empty(endDate)) {
if (Util.empty(startDate))
messages.addError(Messages.ENTER_START_DATE);
if (Util.empty(endDate))
messages.addError(Messages.ENTER_END_DATE);
} else if (Util.before(endDate, startDate)) {
messages.addError(Messages.START_DATE_MUST_PRECEDE_END_DATE,
startDate, endDate)
}
}
I think it's important to avoid performing domain object validation in setting methods, and instead to collect these validations in a single method. The reason, as stated in my earlier posts, is so this method can be called to validate dependencies from different objects, not just when this particular object is being instantiated or modified. However, what of getting methods or methods that calculate or process (the difference between such methods and simple getting methods is not obvious to me. I consider it to be the difference between methods called during basic data entry vs. methods that are called afterwards)? It's not possible to avoid validating while processing, at least I don't think so in most cases. I am not sure, it may be possible in theory, but in practice you really can't validate all the possible results of calculations before actually performing those calculations. Therefore there will always be method calls on objects which can generate meaninful exceptions (i.e. not bugs) even though the validation methods used to save objects entered into the system have all passed. I suggest using a different type of exception when throwing these kinds of errors, maybe something like ProcessingException instead of ValidationException. In a language with compile-time exception checking, that has the added benefit of helping to make sure that you are not calling methods that throw ProcessingException inside of your validation methods.

Sunday, September 24, 2006

Validation III

This post is a continuation of my original article about validation. In that post I brought up as an example the case of multiple oil wells connecting to a single battery, where a battery - for the sake of simplicity in this example - is just a tank that stores the production from all of the wells. Again as a simplifying conceit, I brought up the idea that the battery and wells connected to it should always be of the same color. Having set up a red battery and hooked up a bunch of red wells to it, I then brought up the problem of validation deadlock: What do you do if you realize that you meant to make the battery blue, and the wells should be blue as well? Assuming you always validate all of the wells at a battery when you make changes to the battery's attributes, and also that you validate each well when you change its attributes, you wouln't be able to save your changes from red to blue. Changing the battery would produce an error message stating that there are red wells connected to it; changing a well would generate a validation error stating that the well is connected to a red battery. We're stuck!

One can take several approaches to tackle this problem, and the particular approach one decides on really depends on the application itself. One idea might be to simply not permit updating. In the case of our ongoing example, users would have to delete the wells and the battery and start over again, entering the data in correctly. This simple approach may actually work for some cases where the information is quick and easy to enter, but it's not a very good idea most of the time. Having users re-enter data for all wells and for the battery too would likely be too time-consuming and frustrating, especially just to correct a simple data entry mistake.

Another approach, my favourite when I can get away with it, is to allow users to make changes on the user interface one at a time, then to save those changes all at once. Behind the scenes, the application deletes and re-create everything from scratch. This kind of approach generally requires that all of the data being worked on can be managed on a single screen. In the case of the battery-wells example and similar scenarios, one can imagine first displaying the battery data at the top of the screen, then each well in a list below. Clicking on a well would make its attributes editable. A disadvantage of this approach is if the user is happily typing away making changes, and the application suddenly crashes (or in the case of a Web app, the user accidentally closes the window) all those changes are gone and must be re-entered. Such difficulties can be overcome by periodically saving the data behind the scenes - to the session for example in the case of a Web app. Finally two last points: Deleting and re-creating also doesn't work well if there are other dependencies in the system on the existing objects. Finally, performance would likely be a problem if there were a lot of data to delete and then recreate. So, in the case of a pure composition* relationship, this simple approach could work, but not in the case of aggregation**. It would probably not be a suitable solution to our well-battery problem.

A variation on this approach is to defer validation until the unit of work (or transaction) is ready to be committed. One can wait until that time to call the validation method on all objects inside the unit of work. This approach also solves the validation deadlock problem if all of the changes can be made in one unit of work. The description I gave above of one single screen that allows the user to edit both the battery and all of its wells in one go is an example of a case where I think this approach would work. If however the battery is on one form and each well also has its own form, this idea won't help.

* Composition is the idea that an object is made up of constituent objects which have no life of their own. That means that when you delete the main object all of its parts can be safely deleted as well. There are never any external dependecies on these parts. Pure composition is fairly rare in the software world, even in cases where it seems to be the right answer at first glance. For example, going back to our ongoing example, it might seem reasonable to delete all the wells associated with a battery along with the battery itself. However, those wells might be connected to different batteries later or earlier in time. Here's an example of composition from the application I am currently working on. There are meters in this application which measure volumes of oil and gas. At a sales meter, you may enter in the amount of oil sold in a month. You can also enter in priorities for that meter to define how the sales are allocated. You may want to first allocate sales to a particular producer, then to the rest. Since these priorities are defined for one meter and one meter alone, they can be deleted along with the meter itself.

** Aggregation is the notion that an object contains other objects, but that these other objects do have a life of their own. A very typical example is a university course catalog. You have courses and students enrolled in those courses. A student can be enrolled in many courses at once, and if you delete a course from the system, you definitely don't want the system to delete all of the students who may be enrolled in it at the same time.

Let's say that the preceding strategies are not the right answer for our battery-well example. What can we do then? One solution that I personally like is to put the power to validate in the hands of the user. You can let the user toggle validation. If we have a convenient single validation method for each relevant domain object as discussed earlier, that's fairly easy to do. If the validation flag is turned off, the validation method simply isn't called, and the information entered is saved even if it is invalid. One can also alter the behaviour of this method to generate warnings instead of errors. That way the user still sees all messages that the validation produces, but it doesn't prevent saving changes. In order to work with the application beyond just data entry, presumably the user would have to toggle the validation back on. You have to be careful that subsequent validation don't rely on previous ones. For example, if the first validation checked that a particular attribute is not null and a subsequent uses that attribute, you will have to put up guards to validate only if the attribute is set.

One can also remove the notion of a user manually disabling validation by having the validation method generate warnings when it is called during data entry, then have it automatically produce errors when it is triggered prior to processing that actually uses the information that was entered into the system.

That's about it. I will finish up by discussing a few odds and ends. I've encountered an application that was split in two. There was a 'staging area' application. In this application users were able to enter in whatever data they wanted. The application would generate warnings for inappropriate data. Once the data was 'clean', the data was then transferred into the 'production area'. Here no errors were allowed. I don't know how well this worked out as I wasn't actually working on this application, but it's not a solution I'd recommend. First it means more programming. Second, it means the whole process of using the system becomes much more involved. Finally, it doesn't solve the validation deadlock problem. What do you do if you realize you've promoted data to the production system with mutual dependencies that both need to be changed? A very simple, but in my opinion inelegant, solution to the validation deadlock problem is going through the backdoor. That is, issuing a SQL statement to update all of the appropriate records at one time. Since this bypasses the application layer entirely, there is no need to worry about validation. Even if there are database constraints in place, these can usually be deferred until the transaction is ready to be committed. While I believe this technique has its place, it's not a good idea in general. It requires outside intervention in most cases, since few users know enough about databases and SQL to do it themselves, and it is quite error prone since as I mentioned earlier, it bypasses the application.

Sunday, September 17, 2006

Validation II

This is a continuation of my previous posting about validation. When validating attributes on an object, the setting method on the attribute would seem to be the most obvious place for validation logic. However, this presents problems when the object must be validated externally. The first example I described in my earlier post concerned the need to validate all the wells associated with a battery when saving the battery - because changing an attribute on a battery could be invalid given the state of wells already attached to that battery. Since the wells themselves already exist, one wouldn't be calling any of the setting methods! A solution I generally adopt to this kind of problem is to write a 'valid' or 'isValid' method for each domain object - in fact I think it's a good idea (in languages like Java and C#) to define an interface (Validatable?) with this method and to implement it for all domain objects. That way, the valid method can be called after all attributes on an object have been set. This method can then also be called by the battery for each well. In my next posting I will discuss the validation deadlock problem.

Saturday, September 16, 2006

Validation

Validating state may not the most exciting aspect of software development, but it's important for just about all applications. It's also a challenging area, because validation requirements vary from one application to another. The topic of validation is something I've been thinking about for years now, and in this post I will try to outline some of my thoughts about it.

I'll start with a common validation problem. It's one that occurs in many applications I've worked on. I'll call it the Validation Deadlock Problem. Snazzy, huh? Most application allow users to work with data via separate data entry screens. As an example, the application I am currently working on has data entry screens for 'wells' and for 'batteries'. When you create a well, you have to connect it to a battery. Only certain kinds of wells can be allowed to connect to certain kinds of batteries. Let's say for the sake of simplicity that wells and batteries have to be the same 'color'. The first step is to create a battery. Then you can create a well and hook it up to that battery. Once you've established the connection though, you can go back and change the well or the battery. Generally one wishes to validate all objects in an application before they are actually saved to the database. However, that's not always easy. Let's say you've realized that you set up your well/battery combination incorrectly. You had a 'red' well connected to a 'red' battery, but actually both should be 'blue'.

This presents two problems for validation. The first problem is that when you change the battery, you can't just validate its own state alone. You also have to validate the wells connected to it. There is a perfomance cost to validate all the wells connected to a battery every time you make a change on the battery screen, and it may annoy users that editting a single field is slower than they expect. That's the first problem. The second problem is the dreaded validation deadlock: Assuming you bite the bullet and validate all the wells every time you save a battery, you try to change the battery to blue but you get a validation message saying that you have a red well attached to that battery. The difficulty is that you also can't change the well to blue because it's still connected to a red battery. The particularly annoying aspect of this problem is that in order for the user to be able to change both the battery and well(s) to blue, the well object(s) or the battery object must be allowed to go into an invalid state and saved that way to the database, in other words, across transactional boundaries. Generally the goal is to make sure all objects are always in a consistent state before they are persisted, but here that seems impossible. In my next posting I will explore some solutions to the two problems I've described.