Friday, December 01, 2006

Notable Books

There are so many books about software development out there. Here is my list of essentials:
  • Code Complete: This book is a classic, and the new edition covers a lot of new material. I would suggest that thoroughly going over this book would be an excellent preparation for professional software development, and would put one ahead of most of the people already out there working in the industry. Steve Mcconnell has that great quality of explaining things clearly without needless embellishment.
  • Agile Software Development with SCRUM: Scrum provides a very simple and lightweight set of project management artifacts, yet these few simple lists and charts produce a powerful form of feedback. Scrum also enlists all stakeholders in a software project to be honest and courageous, and to keep everything out in the open. It is my preferred method of managing software development.
  • Fundamentals of Object-Oriented Design in UML: For anyone who wants to understand object-oriented programming more deeply, in terms of the true underlying principles, I think this book is a great choice. It make you think about design patterns in a more profound way. The use of the word "fundamentals" in the title has more to do with the notion of depth and first principles; it's not necessarily a book for novices. I think this is one of the most underrated books in software development. There's an intro to UML too, to me, that's hardly the main point of this terrific book.
  • Rapid Development: Another Steve Mcconnell book. What he did for programming in Code Complete, here he extends to management of software projects. This book is very well researched and comprehensive.
  • XP Explained: Kent Beck expounds his XP philosophy. Test-driven development, or TDD, fundamentally changed my approach to programming. I don't think you have to practice XP to have a successful software project, but I do believe that the general attitude of XP when it is applied correctly is the right one.
  • Refactoring: While this book is rather wordy for its content, it's worth reading. The most important idea in this book is that you don't have to produce a glorious design on paper before writing any code but instead you can incrementally shape your code as you go on. However, it is also important to realize that thinking hard about one's design is a very important part of the test-code-refactor cycle: It's not just about moving code around.
  • Design Patterns: In my opinion this is a somewhat overrated book. It's really an exploration of techniques that polymorphism (overriding functions via inheritance) makes possible. In that respect it's certainly worth reading, but it has spawned a cult of object/pattern nonsense who's nefarious influence continues to make itself felt to this day.

Saturday, November 11, 2006

Mock Objects and Testing

I have heard rumblings of controversy around testing with mock objects. For example, Jeff Perrin had an interesting link in his blog to an article about why not to use mock objects. I find it hard when I'm trying to illustrate an idea to come up with simple examples off the top of my head. They tend to be either very specific to a project I'm working on or very contrived. That's why I figured I would take an opportunity to describe something simple yet concrete that I encountered the other day at work. In the project I'm currently working on, I was trying to fix a bug. To do so, I added a condition to a function that retrieved a list of wells. This function was retrieving all of the wells connected to a given battery (effectively a storage tank for oil produced at a well). I modified the function to return only wells that had not been "shut-in." While running all of our tests, I ran across a few tests that were breaking as a result of my change. The tests had to do with contracts, an area of the system I am not very familiar with. I asked one of our on-site business users about this and it turned out that it made sense in certain circumstances to include as part of processing a contract wells that were shut-in.

I thought this was interesting because the tests we had did not use mock objects. Because of this I was able to see failures resulting as part of my change in tests that had real business meaning which explained to me why my change was not a good idea. With mock obects, I imagine the tests would have been decoupled. There would be tests for the function that returned wells at a battery. Then there would be tests to make sure contracts were being processed correctly. These latter tests would mock or stub out the function that searched for wells at a battery. Therefore the change I made would only have likely broken one test that made sure that particulr function included shut-in wells, outside of any other context. On one hand that seems like a good thing but the problem is, without having the context of the business problems in which shut-in wells should be included, I could imagine thinking the failing test was probably wrong.

One could argue first of all that's what the on-site customer is for in XP, and secondly that one should also have customer facing integration tests (FIT tests) that would probably fail as a result of this type of change. Still, in the first case the customers were busy and may not have thought of the problem the failing contract test had identified, at least not right on the spot. As for the integration tests, by definition these would probably be run as a batch job overnight so I wouldn't find out about the problem I had checked into source control until quite a bit later on.

It's this sort of thinking that leads me to believe that over-mocking is a bad idea. If a test is triggering code that accesses external resources that a) are not available at the time the test is being run or b) the test takes too long to run, or c) the application code does not yield consistent results over time (as in the case of a stock-ticker or a system clock), that test is a good candidate for using stubs or mocks. Otherwise I think it's best to avoid mocking or stubbing as a matter of principle.

Wednesday, October 11, 2006

Command Pattern

Jeff wrote commenting about one of my posts on validation:

I also like the idea of just having an isValid() method type solution. I did a 1 day spike on our Well class to look at implementing our validation as separate business rule objects. This way, you can easily validate based on a given context, so you could have different rules be applicable during creation vs during an update (as an example).

I was particulary inspired by this post:
http://www.jpboodhoo.com/blog/ValidationInTheDomainLayerTakeOne.aspx
This comment made me think about the command pattern and its applicability to different problems. The basic idea of the command pattern is to convert a function into a full-blown class. You can read more about it here on wikipedia. I've used the command pattern in a number of projects in the past. Here are a couple of examples: On one of my projects, users were able to choose from a number of batch processes they wanted to run, either to load data or to do diagnostics on large recordsets all at once. In general there were several hundred thousand records processed by each such batch job. Another developer started off by writing code that implemented each process as a function in a large class. This was an oil and gas application, so he had functions like loadWells(), loadPools(), etc... all in one class. There was a lot of duplication from one function to another and the code was messy, so one of the first things I did was to refactor it into the command pattern. I turned each loading process into a class that implemented a single method, execute. For example, I had a WellLoader class, and a PoolLoader class that both extended a BatchProcess abstract base class which exposed an abstract execute method. The base class also exposed some protected methods that permitted, among other things, a progress indicator during batch processing (since each batch job generally took several hours). When users selected which batch processes they wanted to run, the system would create an instance of each job and add them all to a list, then successively called the execute method on each one. Whatever the details of implementation of each batch process, its only public interface consisted of the methods in the BatchProcess class, primarily the execute method. Another case where I used the command pattern was an embedded system in the scada world that did scheduled polling of sensors. In this case I took advantage of the command pattern to schedule and execute different kinds of polls and alarms. I set up each poll or alarm one as a different class. The scheduler class would loop through each process in its queue and if it was time to run that process, it would call its send method.

In Domain Driven Design, Eric Evans shows how the command pattern can be used to create specification objects. Such objects can be chained together to validate constraints, create objects, or filter lists. I think the example Jeff presents above falls into this general category. It's an interesting idea, but I think some caution is warranted, because the goal of patterns is to simplify code and reduce duplication. If validation rules can re-used or need to be chained together and executed in different combinations at different times, then it makes sense to use the command pattern. However, if validation rules are always applicable to a given domain object and are not re-used between different classes, then maybe the less flashy approach of just making each validation a method in a Validator class is more readable and good enough.

On a quick final note, I recently read Jamie Mcilroy's post about JSF validation. It seems as though JSF binds the gui directly to a domain object and invokes validation separately for each form field. I don't know how much I like that. I think I'd rather JSF invoked validation on the whole domain object in one call and used keys to determine which form fields had errors. Anyway, given the way it seems to work, that might be a good opportunity to take advantage of Jeff's idea of making each validation a separate class. That way, each attribute validation can easily be invoked from JSF but still be indepedent from the JSF framework itself.

Tuesday, October 03, 2006

Defensive Programming: Handling Nulls

C was one of the first languages I learned to program in. In C, you access memory direcly with pointers and it's not unusual for bugs to arise from improperly dereferencing those pointers. This generally causes the application to crash, often with a segmentation fault or bus error, and sometimes the application manages to carry on, overwriting memory until it finally fails, often in a way that is very difficult to reproduce or track down. Accessing memory directly certainly gives the maximum amount of control, but it's also very error-prone. In managed/interpreted environments, like Java, C#, Perl, Python, or Ruby, one can't access memory directly. Instead, the runtime environment interposes a handle to an object or structure. This prevents most of the memory problems that arise in C applications. However, there is still the problem of using references that don't point to any particular object. These are called null references, or nil references, or something along those lines. In Java, when you call a method on a reference that doesn't point to any object, the runtime throws a NullPointerException. I'd rather it throw something like a NullReferenceException, but that's neither here not there :). It's not nearly as bad as having memory problems in C, because you can trap these exceptions and then your application can carry on, but it doesn't improve user confidence, and it leaves parts of the application inaccessible to the user.

One of the ways to handle such exceptions is called the Null Object pattern. basically, the idea is to guarantee that a reference to a particular kind of object will always be initialized with a stub that implements some kind of harmless do-nothing behaviour for that type of object. The idea is to guarantee that an object will always be there when you call a method on an object reference. Let's say your application involves keeping track of customers. If you search for a customer with an invalid id, instead of returning null, you'd get back a Customer object. If you try to call any methods on this object, they will either do nothing or return intelligent defaults. For example, getAge() would return 0. This is a useful pattern, but there are some issues to keep in mind. First of all, there will always be cases where one must distinguish between a real object and a null object stub. This is generally done by implementing an isNull method for all domain objects (in Java, you could create a Substitutable interface for this purpose). This isNull method will only return true when executed on Null objects. Secondly, defining the proper do-nothing behaviour for complex domain objects can lead to overhead. In an example below, I will try to show an what might need to be done to rigorously implement Null Object. Also, If you want to use the Null Object pattern, you should adhere to the Law Of Demeter to avoid having to implement a whole bunch of un-necessary Null Object types. In other words, instead of having something like student.findExam(3).getGrade(), you would simply call student.getGradeForExam(3). This approach makes the Student class easier to turn into a Null Object. If you tend to be checking object.isNull all the time, then maybe this pattern is not the right one to use, since it is designed for cases where most of the time, you are ok with the Null Object's default behaviour. Finally, I'm not sure it's possible to always define proper do-nothing behaviour that won't depend on some context, though I can't think of a real case of this problem at the moment.

Here is an example I've tried to cook up to show what Null object implemented
throughout an application might look like (it's Java code, as usual, since that's
the language I feel most comfortable with).

public interface Substitutable {
public boolean isNull();
}

public abstract class NullObject implements Substitutable {
public boolean isNull() {
return true;
}

public boolean equals() {
return false;
}
}

public abstract class DomainEntity implements Substitutable {
public boolean isNull() {
return false;
}
}

public interface Customer {
public String getName();
public void setName(String name);
}

public class CustomerImpl extends DomainEntity implements Customer {
public String getName() {
return name;
}

public void setName(String aName) {
name = aName;
}
}

public class NullCustomer extends NullObject implements Customer {
public String getName() {
return "";
}

public void setName(String s) {}
}


Note the extra overhead of an interface for each domain class needed (at least in my example) to more easily support NullObject. To introduce methods in Customer that return proper domain entities rather than value objects like String, I'd have to provide NullObject support for those as well! Also, I've implemented default behavour for equals method to always return false. I wonder that the right behaviour for hashCode might be. Hmmm. It's definitely a bit of extra work. Is it worth it? I haven't ever tried this approach so I'm not really sure.

While the Null Object pattern can be helpful in eliminating null exceptions, it has to be implemtented judiciously throughout an application. Another approach I tend to use to prevent having to check for nulls all over the place, which I haven't seen on too many other projects, is to write static utility methods for many standard operations that compare two objects in some way. For example, one often finds code like if (a != null and a.equals(b)) - I suggest replacing that with if (Util.equal(a,b)). Here is how I implement the Util.equal method:
 public static boolean equal(Object a, Object b) {
if (a == null || b == null)
return false;
return a.equals(b);
}
Again, if a were a Null Object, it would implement the equals method by just returning false.

Other ideas include numeric comparisons on non-primitive numeric types, e.g. isZero(Number a), isLessThan(Number a, Number b), etc..., date comparisons, and so on.

Saturday, September 30, 2006

Validation IV

I figured I would write a bit more about validation after my last three already long winded posts on the subject. Basically this is just a brief synopsis with an example and a few minor elaborations. My major conclusion is that all domain-level validation that one would think to put into setting methods or initializers/constructors should instead be done via a single method for each domain object. Let's call that method valid, or isValid, or validate. Ideally, rather than throwing an exception every time it finds an error, this method would add messages as it goes along, and would only throw an exception when it's done. In Java, here's a typical example:
public void validate() throws ValidationException {
Messages messages = new Messages(this);
checkStartDateAndEndDate(messages);
//more validations go here
if (!messages.empty())
throw new ValidationException(messages);
}

private void checkStartDateAndEndDate(Messages messages) {
if (Util.empty(startDate) || Util.empty(endDate)) {
if (Util.empty(startDate))
messages.addError(Messages.ENTER_START_DATE);
if (Util.empty(endDate))
messages.addError(Messages.ENTER_END_DATE);
} else if (Util.before(endDate, startDate)) {
messages.addError(Messages.START_DATE_MUST_PRECEDE_END_DATE,
startDate, endDate)
}
}
I think it's important to avoid performing domain object validation in setting methods, and instead to collect these validations in a single method. The reason, as stated in my earlier posts, is so this method can be called to validate dependencies from different objects, not just when this particular object is being instantiated or modified. However, what of getting methods or methods that calculate or process (the difference between such methods and simple getting methods is not obvious to me. I consider it to be the difference between methods called during basic data entry vs. methods that are called afterwards)? It's not possible to avoid validating while processing, at least I don't think so in most cases. I am not sure, it may be possible in theory, but in practice you really can't validate all the possible results of calculations before actually performing those calculations. Therefore there will always be method calls on objects which can generate meaninful exceptions (i.e. not bugs) even though the validation methods used to save objects entered into the system have all passed. I suggest using a different type of exception when throwing these kinds of errors, maybe something like ProcessingException instead of ValidationException. In a language with compile-time exception checking, that has the added benefit of helping to make sure that you are not calling methods that throw ProcessingException inside of your validation methods.

Sunday, September 24, 2006

Validation III

This post is a continuation of my original article about validation. In that post I brought up as an example the case of multiple oil wells connecting to a single battery, where a battery - for the sake of simplicity in this example - is just a tank that stores the production from all of the wells. Again as a simplifying conceit, I brought up the idea that the battery and wells connected to it should always be of the same color. Having set up a red battery and hooked up a bunch of red wells to it, I then brought up the problem of validation deadlock: What do you do if you realize that you meant to make the battery blue, and the wells should be blue as well? Assuming you always validate all of the wells at a battery when you make changes to the battery's attributes, and also that you validate each well when you change its attributes, you wouln't be able to save your changes from red to blue. Changing the battery would produce an error message stating that there are red wells connected to it; changing a well would generate a validation error stating that the well is connected to a red battery. We're stuck!

One can take several approaches to tackle this problem, and the particular approach one decides on really depends on the application itself. One idea might be to simply not permit updating. In the case of our ongoing example, users would have to delete the wells and the battery and start over again, entering the data in correctly. This simple approach may actually work for some cases where the information is quick and easy to enter, but it's not a very good idea most of the time. Having users re-enter data for all wells and for the battery too would likely be too time-consuming and frustrating, especially just to correct a simple data entry mistake.

Another approach, my favourite when I can get away with it, is to allow users to make changes on the user interface one at a time, then to save those changes all at once. Behind the scenes, the application deletes and re-create everything from scratch. This kind of approach generally requires that all of the data being worked on can be managed on a single screen. In the case of the battery-wells example and similar scenarios, one can imagine first displaying the battery data at the top of the screen, then each well in a list below. Clicking on a well would make its attributes editable. A disadvantage of this approach is if the user is happily typing away making changes, and the application suddenly crashes (or in the case of a Web app, the user accidentally closes the window) all those changes are gone and must be re-entered. Such difficulties can be overcome by periodically saving the data behind the scenes - to the session for example in the case of a Web app. Finally two last points: Deleting and re-creating also doesn't work well if there are other dependencies in the system on the existing objects. Finally, performance would likely be a problem if there were a lot of data to delete and then recreate. So, in the case of a pure composition* relationship, this simple approach could work, but not in the case of aggregation**. It would probably not be a suitable solution to our well-battery problem.

A variation on this approach is to defer validation until the unit of work (or transaction) is ready to be committed. One can wait until that time to call the validation method on all objects inside the unit of work. This approach also solves the validation deadlock problem if all of the changes can be made in one unit of work. The description I gave above of one single screen that allows the user to edit both the battery and all of its wells in one go is an example of a case where I think this approach would work. If however the battery is on one form and each well also has its own form, this idea won't help.

* Composition is the idea that an object is made up of constituent objects which have no life of their own. That means that when you delete the main object all of its parts can be safely deleted as well. There are never any external dependecies on these parts. Pure composition is fairly rare in the software world, even in cases where it seems to be the right answer at first glance. For example, going back to our ongoing example, it might seem reasonable to delete all the wells associated with a battery along with the battery itself. However, those wells might be connected to different batteries later or earlier in time. Here's an example of composition from the application I am currently working on. There are meters in this application which measure volumes of oil and gas. At a sales meter, you may enter in the amount of oil sold in a month. You can also enter in priorities for that meter to define how the sales are allocated. You may want to first allocate sales to a particular producer, then to the rest. Since these priorities are defined for one meter and one meter alone, they can be deleted along with the meter itself.

** Aggregation is the notion that an object contains other objects, but that these other objects do have a life of their own. A very typical example is a university course catalog. You have courses and students enrolled in those courses. A student can be enrolled in many courses at once, and if you delete a course from the system, you definitely don't want the system to delete all of the students who may be enrolled in it at the same time.

Let's say that the preceding strategies are not the right answer for our battery-well example. What can we do then? One solution that I personally like is to put the power to validate in the hands of the user. You can let the user toggle validation. If we have a convenient single validation method for each relevant domain object as discussed earlier, that's fairly easy to do. If the validation flag is turned off, the validation method simply isn't called, and the information entered is saved even if it is invalid. One can also alter the behaviour of this method to generate warnings instead of errors. That way the user still sees all messages that the validation produces, but it doesn't prevent saving changes. In order to work with the application beyond just data entry, presumably the user would have to toggle the validation back on. You have to be careful that subsequent validation don't rely on previous ones. For example, if the first validation checked that a particular attribute is not null and a subsequent uses that attribute, you will have to put up guards to validate only if the attribute is set.

One can also remove the notion of a user manually disabling validation by having the validation method generate warnings when it is called during data entry, then have it automatically produce errors when it is triggered prior to processing that actually uses the information that was entered into the system.

That's about it. I will finish up by discussing a few odds and ends. I've encountered an application that was split in two. There was a 'staging area' application. In this application users were able to enter in whatever data they wanted. The application would generate warnings for inappropriate data. Once the data was 'clean', the data was then transferred into the 'production area'. Here no errors were allowed. I don't know how well this worked out as I wasn't actually working on this application, but it's not a solution I'd recommend. First it means more programming. Second, it means the whole process of using the system becomes much more involved. Finally, it doesn't solve the validation deadlock problem. What do you do if you realize you've promoted data to the production system with mutual dependencies that both need to be changed? A very simple, but in my opinion inelegant, solution to the validation deadlock problem is going through the backdoor. That is, issuing a SQL statement to update all of the appropriate records at one time. Since this bypasses the application layer entirely, there is no need to worry about validation. Even if there are database constraints in place, these can usually be deferred until the transaction is ready to be committed. While I believe this technique has its place, it's not a good idea in general. It requires outside intervention in most cases, since few users know enough about databases and SQL to do it themselves, and it is quite error prone since as I mentioned earlier, it bypasses the application.

Sunday, September 17, 2006

Validation II

This is a continuation of my previous posting about validation. When validating attributes on an object, the setting method on the attribute would seem to be the most obvious place for validation logic. However, this presents problems when the object must be validated externally. The first example I described in my earlier post concerned the need to validate all the wells associated with a battery when saving the battery - because changing an attribute on a battery could be invalid given the state of wells already attached to that battery. Since the wells themselves already exist, one wouldn't be calling any of the setting methods! A solution I generally adopt to this kind of problem is to write a 'valid' or 'isValid' method for each domain object - in fact I think it's a good idea (in languages like Java and C#) to define an interface (Validatable?) with this method and to implement it for all domain objects. That way, the valid method can be called after all attributes on an object have been set. This method can then also be called by the battery for each well. In my next posting I will discuss the validation deadlock problem.

Saturday, September 16, 2006

Validation

Validating state may not the most exciting aspect of software development, but it's important for just about all applications. It's also a challenging area, because validation requirements vary from one application to another. The topic of validation is something I've been thinking about for years now, and in this post I will try to outline some of my thoughts about it.

I'll start with a common validation problem. It's one that occurs in many applications I've worked on. I'll call it the Validation Deadlock Problem. Snazzy, huh? Most application allow users to work with data via separate data entry screens. As an example, the application I am currently working on has data entry screens for 'wells' and for 'batteries'. When you create a well, you have to connect it to a battery. Only certain kinds of wells can be allowed to connect to certain kinds of batteries. Let's say for the sake of simplicity that wells and batteries have to be the same 'color'. The first step is to create a battery. Then you can create a well and hook it up to that battery. Once you've established the connection though, you can go back and change the well or the battery. Generally one wishes to validate all objects in an application before they are actually saved to the database. However, that's not always easy. Let's say you've realized that you set up your well/battery combination incorrectly. You had a 'red' well connected to a 'red' battery, but actually both should be 'blue'.

This presents two problems for validation. The first problem is that when you change the battery, you can't just validate its own state alone. You also have to validate the wells connected to it. There is a perfomance cost to validate all the wells connected to a battery every time you make a change on the battery screen, and it may annoy users that editting a single field is slower than they expect. That's the first problem. The second problem is the dreaded validation deadlock: Assuming you bite the bullet and validate all the wells every time you save a battery, you try to change the battery to blue but you get a validation message saying that you have a red well attached to that battery. The difficulty is that you also can't change the well to blue because it's still connected to a red battery. The particularly annoying aspect of this problem is that in order for the user to be able to change both the battery and well(s) to blue, the well object(s) or the battery object must be allowed to go into an invalid state and saved that way to the database, in other words, across transactional boundaries. Generally the goal is to make sure all objects are always in a consistent state before they are persisted, but here that seems impossible. In my next posting I will explore some solutions to the two problems I've described.

Sunday, July 30, 2006

Ravi's Post About Boring Enterprise Software

I really enjoy reading Ravi Mohan's blog. Recently I read an entertaining post about enterprise software development, and how truly boring it is, in response to a blog posted originally by Martin Fowler. Here it is, "But Martin, Enterprise Software Is Boring!" Hehe. :)

In reading Ravi's post, I thought about a few things, like what makes software development fun and interesting. Ultimately, I came to the conclusion that software has a number of dimensions to it. The first cool thing about software is that you get to build something incredibly interactive. Software has a magical quality to it. It's almost mystical. Most things in life, you can kind of see how they work; they're physical. But on a computer, you just click on some keys for a while and all sorts of things may appear on the screen. You can make a video game, a business application, a compiler. It's amazing! The first computer I remember using was the Commodore 64. I learned the "poke" command and would poke in random numbers. Sometimes the screen would change color; sometimes nothing would happen; sometimes the system would just go wonky. The amount of technology the goes into making that sort of thing possible is truly vertiginous - from the design itself to physical manufacturing of all of the circuits and I/O to the logic of the operating system and device drivers, and finally to the development environment and compiler or interpreter. A few hundred dollars worth of hardware and software that anyone can pick up at a local electronic outlet represents the penultimate technological achievement of our age. They're so ubiquitous that we tend to forget how extraordinary computers really are. When we program, we have a kind of deep connection to this incredible phenomenon.

I think there are different personalities in the software dev world. Some dimensions of software development are only barely visible to the end user. In XP, when making a new piece of software they are often labeled "motherhood stories." Performance is one of them, perhaps the most common. In business software, often called enterprise software, it is often easier to spend 10, 20, or 50k on improved hardware rather than optimize memory or cpu. From a user's point of view, the software either performs adequately or it doesn't. How one gets there doesn't matter much to the user. Some developers really enjoy the challenge of finding creative solutions to such problems, and reasonably often, simply buying hardware won't cut it. Buying hardware is the brute force solution. It's how Alexander The Great "untied" the Gordian Knot and went on to conquer most of the known world at the time. When it works, it works. However, in the world of operating system design, compilers, video games, and embedded systems, it is often not a reasonable solution. Here, the cleverness of systems and framework programmers comes into full force. The user requirement is very simple: Make it work with this much memory and this much cpu. The hacker then goes off and happily works away behind the scenes to make that requirement a reality. It can be an enormous amount of work that the end user never fully appreciates. Just as none of use truly appreciates the wonder of being able to flick and switch and voila, the lights are on, the TV is on, we can check our e-mail. For the every day business programmer, most such techniques are not only un-necessary, they would probably be counterproductive; using them would lead to a less maintainable system.

Another aspect that some developers are attracted to in such "hardcore" systems is novelty. I suspect that developing one's first compiler must be very interesting. I myself only ever touched the surface of compiler-writing in a University course, but I can see how the sophistication of optimization can become tantalizing. However, I think thay novelty is something that never lasts, no matter what domain one is working with. I once worked with a very smart software developer who had been in the game for 20 years. He had his own company and had written numerous compilers and interpreters. For him it was routine, and he never wanted to do it again. Ultimately, everyone must make that decision in life. If you're only interested in novelty and the excitement of new problems, the research field is probably the best place to go. Even there, you will encounter the tedium of having to publish papers and cite references. If you're a professional software developer, one way or the other you will spend a certain amount of time learning new things, but more time building software to a customer's specification. If all you do is develop operating systems, or compilers, or quantum simulation technologies, whatever it is, soon enough you will have a standard set of tools and your main focus will be to figure out how the building blocks and tools you have fit into what your customer wants. In that respect, I think all professional software developers are in the same boat. Our main job is not to tackle a sexy new problem every day; it's to understand what are customers/users want and to provide a quality product in a timely fashion.

For me, the interesting thing about software development has to do with understanding the business needs of the customer. One ability I've had to hone is the capacity to understand a customer's business on the fly. I've worked in many different areas, and within a few months of starting a project I've had to get to the point where I could probably get hired to do my customers' job, usually at an entry level I'm sure - Still, being able to understand the fundamentals of various businesses, the motivations, the potential efficiencies, that's a real skill, as valid in its own right as being able to optimize C code to operate in under an mb of ram. Another area is modelling.

Being able to model a piece of software so it's maintainable, so new features fit in nicely, while dealing with the fact that many business rules are quite arbitrary, is a difficult skill. It means walking a fine line between generaling code to avoid duplication while avoiding over-design that leads to crufty frameworks that won't accomodate tomorrow's strange and inconsistent variations. More "technical" software projects are often far neater and more orthogonal, and therefore more amenable to a general analysis. As an analogy, consider how some differential equations can be solved analytically, but most can't. There is a neatness of solving somthing completely, the the messy reality of the world is that you come up with partial solutions that suit you in practice. Anyway, I can see how a tendency to construct a complex mental model of the whole system right up front can be a good thing when developing a compiler, while it actually might be a very nasty trait when developing certain business applications.

I guess my point is that people tend to vary quite a bit in the problem domains they're interested in or are good at solving. We tend to assume that whatever our particular area of interest or expertise is, that's the really hard thing to do. The truth is that sofware is a difficult thing to master for a reason, precisely because there are so many dimensions, and I'm sure I've only scratched the surface in this post!

Sunday, July 23, 2006

Software Testing

These are some notes from an evening course I've taught at the
University Of Calgary in Continuing Education for a few years.
Eventually I hope to organize them better and merge them into
something more whole and coherent, but for now here they are so
I won't forget where I put them!


Unit Testing and Refactoring
============================

The topic of this class is TDD, or Test-Driven Development. TDD is
about developing high quality code, keeping the design simple, and
having a regression framework in place for code maintenance. However,
TDD is not sufficient to insure the quality of commercial or
enterprise applications. In general terms, I break down testing
into three general categories:

I) Developer Tests (TDD)
II) QA Tests
III) User Acceptance Tests

Each category is somewhat different from the others. The primary
goal of developer testing is a strong design, high code quality, and
low defect rates.

I) Developer Tests (TDD)

In TDD, a developer always writes a test first, before writing any
code. The steps in TDD are always the same:

1) Write a test
2) Make sure the test fails as expected. If there are new classes or
methods involved, then the test won't even compile in a language
like Java or C#.Net. If the function already exists, but the
code inside has been modified, then make sure the test
initially fails the way you expect it to fail. Let's say you
have a function which calculates change for a purchase and you
are writing a test to cause it to break change down differently
depending on the quanities of available denominations; so whereas
it currently returns one quarter, it is supposed to return two
dimes and a nickle. Make sure the test fails initially by returning
a quarter (as opposed to simply crashing or actually returning the
right change before you've modified the code.
3) Write the new code and make sure the tests now all pass.
4) Examine the design and refactor any duplication (I'll discuss
refactoring in more detail in another class).

That's it, now rinse and repeat!

I'd like to make it clear that the goals of TDD are somewhat
different from the goals of other kinds of testing. TDD is
a development activity. The goal of TDD is first and foremost
to drive the design of the code itself. Using TDD ought to
generate code in which independent concerns are expressed in
separate pieces of code. Such separation makes it easier to
test the code. If the code is hard to test, that implies the
design of the code is not optimal. If the code is easy to
test, then the design of the code is better. Better code means
it's easier to add new features and it also means the QA people
will find fewer defects, and especially should find almost no
defects related to basic functionality.

You can look at an application as a bunch of nested boxes. The
innermost box is the code framework you're using to develop on
top of. It's there before you've written a single line of your
own code. Then you develop code around that kernel. The code
tends to become organized in ever wider layers, although some
"inner" layers may depend on outer layers, so the layering
is rarely "perfect." Still, good code generally has this sort of
hierarchical structure:
_____________
[ __A__ ]
[ [ B ] ]
[ [ [C] ] ]
[ [_____] ]
[_____________] etc...

If you're writing tests for code in B the general approach is
as follows: If necessary, you can "mock" out functionality in C
by subsituting a mock/stub/fake instead of the real code in C.
This kind of thing is done when the real code in C accesses
external resources which are irrelevant to testing the logic in B.
Having extensively tested B by itself, you would then write
a comparatively small number of tests against the application
as a whole making sure that the code in B is actually used by
the application. Such "sanity" tests will make sure the code in B
really has been included in the app and works for a few basic
cases. You can think of such "functional" tests as poking lines
through the outer layer of the application all the way through.
Note, do not confuse a functional test in TDD with a FIT test,
which is a functional integration test. The extensive testing has
already been done, so even though these functional tests do not
test every path through B, they make sure it's properly fitted
into the application as a whole.

A functional test through B looks something like this:
____\__________
[ _\_A___ ]
[ [ \B ] ]
[ [ [\C] ] ]
[ [_______] ]
[_______________] etc...


You can see it's hard to write enough tests like this to properly
cover B. That's what the unit tests are for.

Here is an eample unit test in Java-like syntax (modified from real
Java to make the example more readable):

public void
testCalculateChange_WithChangeLevelling() {
//setup
VendingMachine vm = new VendingMachine();
vm.setNumberOfQuarters(10);
vm.setNumberOfDimes(100);
vm.setNumberOfNickles(100);

//execute
int[] change = vm.makeChange(25);

//assert
assertEquals("use nickles and dimes", [10, 10, 5], change);
}

Note: Before implementing the "levelling" algorithm, the test should
return [25] instead of the expected [10, 10, 5]. Note that this test
just tests the makeChange function. It does not worry about how the
amount of change itself if calculated. A functional test would
be more complicated because more setup would be required, but
there are generally fewer of these per feature:

public void
testFunctionalTest_Purchase_WithChangeLevelling() {
//setup
VendingMachine vm = new VendingMachine();
vm.setNumberOfQuarters(10);
vm.setNumberOfDimes(100);
vm.setNumberOfNickles(100);
vm.addItem("Mars Bar", "$0.75", 15); //15 mars bars; 75 cents

//execute
int[] change = vm.purchase("Mars Bar", "$1.00");

//assert
assertEquals("use nickles and dimes", [10, 10, 5], change);
}


II) QA testing is done with a build of the application provided by
the developers. Therefore QA testing is done after a certain amount of
code has been developed. QA testing is usually done by QA
professionals often with support from users or business people to
make sure the developed software really does work and meets the user
requirements. Some QA testing can be automated with scripts. On
our project, we use a tool called Watir (http://wtr.rubyforge.org/)
to script interactions with the Web application as if a real user
were clicking on buttons and entering data.

III) User acceptance tests are generally written before the
application code has been developed. The format of the tests is
a table with the inputs entered in by the user and expected outputs.
Later, such tests are linked in with the application and executed
to determine whether they've passed. FIT (http://fit.c2.com/)
and Fitnesse (http://fitnesse.org/) are tools commonly used in
the XP community for such "functional integration" testing.




Friday, June 16, 2006

Agile In Action

I don't know to what extent the notions of up-front requirements analysis and design are still being actively used these days, but I've heard a lot of debate on the subject over the past few years. I'm currently working on an agile project and today I had a good experience in which three of us, a customer and two developers, worked together to flesh out a requirement in a way that simply would not be possible in a more traditional environment heavy on up-front requirement preparation. Since this is an example from the real world, I'll be using project-specific terminology, but I hope to make the general sense of what's going on clear to anyone reading this blog. If you're reading this and have something to say that may clarify matters, I encourage you to e-mail me and I'll update this posting.

In our application (oil and gas production accounting), we have the notion of a gas-equivalent factor (ge factor). It is a number that you multiply by a volume of light oil, also called condensate, to get an "equivalent" volume of gas. It's a bit like comparing, say, two grads from different schools. One grad has a GPA of 3.2 and the other one has a 78% average. To compare them, you may want to convert the gpa of 3.2 to a percentage, or vice versa. Anyway, the point is that this number is displayed on several screen of the application. Now, there are several different ways that this number is obtained, and as part of a story I was working on, I had to include the source of the ge factor along with the number itself. The story (similar to a use case) specified that one of two icons should be displayed along with the factor: "A" for analysis, and "E" for entered. In one case, an analysis (a breakdown of the molecular constituents of the condensate) is used to calculate a ge factor; in the other case, the user simply enters in the factor manually. As I began to get into code, I realized that there was a third case that the story didn't talk about. I turned to the customer who had put together the story, who sits just across from me, and asked him about it: Al, what happens when the ge factor is not necessarily derived from an analysis, but it's averaged from ge factors at several other measurements? After a bit of discussion, the user and I agreed to include a third icon as part of the story, "C" for calculated.

I implemented the requirement simply by putting a text label "A", "C", or "E" next to the ge factors on the appropriate screens. Then I went to talk to our resident gui (graphical user interface) expert about the story: Hey Chris, I've put these text labels next to the ge factor on our balancing screens. Could you cook up some icons that look a bit nicer? Chris came over and asked Al: What if instead of an extra icon next to the factor, we turned the factor itself into a link. If the number came from an analysis, the link would actually take the user to the analysis that was used. If the number was entered, the link would lead to the screen where that number was entered. The user really liked this idea. I objected: In the case that the number was calculated by averaging several measurements, it would be a fair amount of work to create a new screen that showed all of those measurements in one place. The user however told us that it wasn't necessary to go that far. If the number was an average, simply omitting the link was fine.

The buzzwordz "multidisciplinary" and "synergy" are used a lot these days, but in this case, we solved a problem by combining our skills and perspectives. Requirements are a bit like art. If you're a customer, you generally know what you want when you see it, but describing it ahead of time isn't so easy. In a non-trivial application, it's hard to think of all the possible scenarios for a given feature. As a developer, I'm close to the code and I can see those scenarios, so it's a lot easier for me to ask the kind of question that I asked. Finally, the user interface expert was concerned about clutter on the screen and how effective the user interface would be, whereas I just cared about the fact that the right information would show up as described in the story. We all worked together to come up with a better solution - without really extending the development time. The whole discussion probably clocked in at about 15-30 minutes, and nothing about the requirement caused an enormous amount of extra work. In the future, if someone decides to show all of the measurements that contribute to an "averaged" ge factor, we can implement that as a separate story. The important thing is that the users currently don't consider it to be especially useful or a high priority. We've built the software not to honor a general principle of orthogonality, but to meet our users' actual requirements.

I was really impressed with the process we went through today and I thought it was a nice simple example the kind of power an agile approach can have.

Sunday, June 11, 2006

Just-In-Time Performance Tuning

I've been thinking a bit about performance tuning business applications lately, after reading some of Ted Ogrady's recent writings on the subject (see Empirical Performance , Elaboration, and Reponse to my concerns about risk). Ted pointed out that some authorities in the agile realm recommend avoiding premature optimization (see Fowler and Beck). It's true that optimizing code early can lead to problems, and especially that one should not optimize code without profiling it first. However, I do think that letting the performance get bad enough to upset users is a bad thing, and seems out of character in agile development. For example, Kent Beck writes:

Later, a developer pulled Kent aside and said, "You got me in trouble."

"About what?"

"You said I could tune performance after I got the design right."

"Yes, I did. What happened?"

"We spent all that time getting the design right. Then the users wanted sub-second response time. The system was taking as long as fifteen seconds to respond to a query. It took me all weekend to change the design around to fix the performance problem."

"How fast is it now?"

"Well, now it comes back in a third of a second."

"So, what’s the problem?"

"I was really nervous that I wouldn’t be able to fix the performance problem."


Kent's suggestion is to do some envelope calculations to assess performance early in the project, but that's just a written down artifact, the kind of thing that agile practices generally discount.

As I see it, the history of the performance tuning debate goes something like this: Early on, performance tuning had to be done all the time because hardware was so slow and memory was so expensive. Later on, a overall computer performance improved, there was a backlash against this kind of optimize-always behaviour. If you aren't desperately worried about memory, you don't need to use bit fields - that sort of thing. However, as a result of leaving performance issues for late in the project, I've seen a number of projects now where the performance becomes really terrible. Users get upset and the overall perception of management is not positive. The developers in this case say "Don't worry, we'll solve the performance problems later." However, these performance problems can start to affect the project. User who are testing that app spend too much time navigating from one screen to another, and even the time to run automated tests written by the development team suffers.

Basically, I think there is a better way: One of the best idea in software development I've come across in the last while is the notion of a FIT test. A FIT test is a customer-facing test. My suggestion is to devote time to developing a relatively small number of performance-oriented FIT tests during each iteration. These tests execute an area of code where performance is important under conditions that are as realistic as possible. Just as with normal FIT tests, performance FIT tests can be written before the actual code exists. Initially, there is no processing done and the test passes trivially. Each iteration, someone is responsible for maintaining the fit test - adding setup data and making sure it runs without errors. If the test meets the established performance criteria, the bar is green, otherwise it's red. That's when we jump in with the profiler to get the performance back to acceptable levels. The code should remain as clean as possible, and only the minimum amount of tweaking required to make the test pass should be done. That way the users won't run across unacceptable levels of performance as the app is being developed, thus reducing risk and stress for everyone. The basic point I am trying to make is not that performance cannot be improved late in a project, but that maybe it doesn't have to be that way.

Saturday, June 03, 2006

Further Lessons In Humility

In an earlier post, I discussed creating an abstract class Node and extending it with either TestNode (for unit tests) or Facility Node (for the actual production code) in order to move some functionality related to facility network topology out of the Facility class. I was kind of proud of my accomplishment, especially when it came to me that I had implemented a kind of "mixin" inheritance in Java. However, a friend of mine blew my bubble by pointing out that a simpler implementation existed. Namely, just move the topology code into something like a NetworkTopologyUtils class as static methods. Hence, we have something like this:

public class NetworkTopologyUtils {
public static findLoop(Node n) { /*code goes here ... */ }
//more applicable methods below...
}

This is just as easy to test, and makes more sense, since the responsibility for finding loops no longer rests with a hard to pin down "Node" class. Now all that remains is to implement the Node interface (e.g. sendsTo) in Facility. For unit testing, one can just as easily write a TestNode class that also implements the Node interface.

What's the lesson here? For me, it's that I shouldn't fall in love with my own code. Also, I should not let my ingrained biases (against such static singleton classes for example) get in the way of putting together the right design. Finally, there's nothing wrong with keeping it simple. Simplicity is good. Even though my example wasn't a great one, I still do like mixins in Ruby though and think that Java should have implemented them (not to mention C#)! :)

Saturday, May 27, 2006

SmackBook

Way COOL!

http://www.youtube.com/watch?v=6uvQTTPr9Rw&www.reghardware.co.uk

Mixins, Interfaces, and Multiple Inheritance

As I've been doing Ruby programming for the last little while, I've discovered the concept of Mixins. Now, my programming experience in OO languages other than Java has been limited. I only did about a year of C++ at my first job before I ended up switching to Java, so that's why I haven't really been exposed to the notion of a Mixin before. But now that I've been using it a bit, I think it is in fact a rather wonderful concept! Interestingly, I've cooked up my own erzatz mixins in Java in the past without fully realizing what I was doing, but I think proper mixins really are more elegant!

A module in ruby is similar to a class with one major exception: It cannot be instantiated. You can execute methods on it statically, so for example you can have a Math module with functions like sin, consin, absolute_value, and so on. More interestingly, you can mix in a module into a proper Ruby class, whereupon all of the module's methods become available to the class as normal member methods - and the module's methods can in fact interact with methods supplied by the class. It is similar to multiple inheritance as implemented in C++ for example, but slightly weaker. C++ has full multiple inheritance, so it can do both interfaces as implemented in Java and mixins in Ruby as special cases. To make a C++ 'interface', you create a class with all pure virtual methods. To create a 'mixin module', you create a class with some pure virtual methods and at least one fully-implemented method which calls the pure virtual methods. However, in C++, you can do even more. You can simply write any number of fully functional classes, and then create another class which extends all of them. In OO, this means that wherever you have a reference to any of those base classes, you can substitute the subclass. Thus, every method from all base classes is valid for the subclass in terms of class invariants as well as preconditions and postconditions. Both Java and Ruby designers chose not to provide that degree of generality. I am guessing they felt encouraging developers to inherit from more than one fully-functional class is asking for trouble, especially as the inheritance chain becomes more than just one level deep! Mixins are a nice compromise: You can inherit functionality from a variety of sources (better than Java), but you can't arbitraily exend any number of actual classes with all the attendant headaches.

In Java, I have used delegation to do the same kind of thing that mixins provide, but without fully realizing what I was doing, until now! A while back I needed to write some code to find a loop in a network of oil facilities. Here is roughly what I did in order not to pollute the Facility class (loop finding after all is a lower-level behaviour) and to make the code easy to write test-first.
public abstract class Node {
public abstract List sendsTo();
public List findLoop() {
//recursively goes through the nodes in sendsTo.
//If it finds itself as it's going along, it returns
//a list of nodes that form the loop. Otherwise it
//returns an empty list. In my implementation, if
//there are multiple loops, only the first one
//this method finds it returned.
}
}
To get this to work with my Facility class, I implemented a class
FacilityNode as follows:
public class FacilityNode extends Node {
public FacilityNode(Facility facility) {
this.facility = facility;
}
public List sendsTo() {
//return a list of all dispositions of oil from this
//facility with non-zero volumes. Ignore dispositions
//of gas or water, and ignore oil dispositions that have
//0 volume.
}
}
Then in the Facility class, I use this FacilityNode as follows:
public class Facility {
//... whole bunch of other methods

public List findLoop() {
return new FacilityNode(this).findLoop();
}
}
This is admittedly far more awkward than simply mixing Node into the Facility class and implementing sendsTo, but it's the same general idea. One can see that the Node class can be expanded to include a variety of operations related to the topologies of networks and thereby provide far better overall cohesion in the application. In order to take advantage of these mixed-in methods, one need only implement a much small number of required methods, like sendsTo for example! Mixins are cool!

Tuesday, May 23, 2006

Mindless Ranting

When I talk about calling a function on an object, I use the word "method." I have no idea where it all started - probably those 4 gangtas started it. Those ghetto kids, always coming up with the crazy patois. What, plain english ain't good enough for 'em? Isn't it such a weird term though? Saying you're calling a function on a particular object seems clear enough. The term function has been around since the dawn of programming, so everyone knows what it means. Sure, it's not very OO, and someone - heaven forbid! - might think you're not all down with the objects. In that case, I am ok with "message." You send a message to a particular object. That's the smalltalk lingo, and it makes a certain amount of reasonable sense. It emphasizes the behaviour of objects over the idea of calling a function which manipulates data. Plus is has that OO cachet. When you talk about sending the hello_world message the person object, no one can look down on you as if you were some grubby C programmer, or worse, PHP! In a conversation with a coworker today, the only explanation I came up with was that if you were relying on polymorphism (whoa, I am not even going to get into that mouthful today!) to do the right thing, then the particular method of implementation of a given interface would depend on the subclass. So if you said the method of execution of a message on a particular object depends on the subclass, I wouldn't complain. It's long winded, but hey, it's ok. But if that's the idea, then what's "calling a method" all about? It just makes no sense to me! If that's the terminology you're going to use, well, just say "function." If you've made it this far, I'm impressed! :) Well, I don't know abou you, but I do feel better! And hey, if you're using "method" all over the place, don't feel too bad. When it comes down to it, so do I. As long as we're all drinking the coolaid together, it's all good.

Wednesday, May 17, 2006

What Is Good Code?

I've noticed that it's somewhat easier to deal with poorly written procedural code than poorly written object-oriented code. The reason, I think, is that the procedural code is all together in large unwieldy functions full of repeated conditional logic. One can take that kind of code and break it up into smaller functions, and then organize those functions into classes. Then one can take duplicate code and just call the appropriate methods instead. It's not necessarily easy, true, but it tends to be easier than cleaning up a confusing bunch of objects and factories. Often the code duplication in object code is hidden so it takes quite a while to figure out how the code currently works, and where the duplications starts and ends; rather surprisingly, it's also more difficult to figure out how to refactor the code to a better model. I think one potential reason that object code becomes problematic is that developers who are less experienced with OO tend to try to make their code "object-oriented." The approach I would recommend focuses on function rather than form. Consider paying attention to two qualities:
  1. Is your code easy to follow?
  2. Do you have to make changes in several different places when modifying or extending your code?
Focusing on these items, making the code clear and removing duplication, ought to naturally lead to nice clean object-oriented code.

Saturday, May 13, 2006

Developing Stories

Based on experiences on my current project, I felt I had something to say about how to write stories. A story should be short. A story should have a clear purpose that can be demo'd to a user on its own. If two aspects of the story can demo'd separately, then there is a good chance they should each be defined as a separate story. One should not throw in unrelated bits and pieces into a story - "Oh and while you're at it, do this here." I was spending some time thinking about how to clearly justify and explain these opinions of mine. Luckily, now I don't really have to: Brian Marick seems to have mostly done the job for me: http://www.exampler.com/writing/product-director.pdf

By the way, one idea that was introduced on my current project is excellent, and I'm not sure it's a standard agile practice. If not, it should be: The Demo. Each story should be demo'd briefly to the user when it has been completed. This practice insures that nothing significant is missing or misunderstood and tightens the feedback loop. The demo should be brief, no more than 5 or 10 minutes, and it should not include too many people - we had some problems with that in the beginning. Being able to demo a story to the user when it's been completed has proven to be both useful and simply satisfying for all concerned.

Don't Get Me Wrong...

I actually like Ruby, I really do. It's become my favourite language for programming at home. One major annoyance I ran into today though has to do with the way booleans are implemented. The number 0 is evaluated as true. So the following actually prints!

my_var = 0
if (my_var)
printf("my_var evaluated as true?!")
end

As long as you're in the world of pure Ruby, this is ok, since you should really be writing 'my_var = false' anyway so the if statement behaves. However, databases generally implement booleans as ints. In fact, if you declare a field as a boolean in mysql, it really just creates a tinyint in the background. When you set that field to false in Rails, it ends up being set to '0' in the database, and then when you evaluate it in a condition, it behaves very unexpectedly, even though in the code, it really is being set to 'true'. You have to use the '?' operator to resolve the problem, e.g.

if (my_active_record_object.flag_set?)
printf("This works as expected\n")
end

if (my_active_record_object.flag_set)
printf("This doesn't work as expected at all!\n")
end

It's a very insidious problem... This is one of those cases where I just kind of mentally shrug... WTF indeed!

Sunday, April 09, 2006

More Observations about Ruby and Rails

I've been sick for the past few days and it's given me the opportunity to play around some more with test-driven development in rails. My overall conclusion is that ruby is a very interesting language, and the rails active-record approach looks very promising, but I am not sure I would actually recommend this technology for a production system, at least not yet. Here are some pros and cons:

Good points:
  • Ruby syntax is clear and simple, a joy to work with. The shortcuts for attributes, parenthesizing arguments to methods, passing in hashes as parameters, the simple use of 'end' to close blocks of code, and other such ideas really pop when you start using them.
  • Mixins seem to be a very nice compromise between the single inheritance of Java and the multiple-inheritance madness of C++. Again, the way they are implemented in Ruby is very nice and clean
  • The Rails active record pattern and the built-in testing framework encourages test-driven development and makes it very easy to develop a clear model. Mixins make it very easy for only the true domain-level code to show up in your model. The rest of the framework functionality is just there
  • Only database schema changes require rails to be restarted. Otherwise, you can make any change you want in the code and run the app! It's a vast improvement over JSP, that's for sure!
Weaknesses:
  • Running tests seems quite slow. 17 unit tests took 3.7 seconds on my 1.6MHz laptop with 512 MB of RAM
  • I found that it seems as though rails might run each unit test concurrently which can cause weird problems if you are doing a lot of set up in your test rather than in the fixtures
  • So far there is no direct step-along-through-the-source-code debugging in any IDE as far as I can tell
  • So far there is no IDE-supported refactoring as there in Eclipse and Idea
  • Because Ruby is a dynamic language, even simple syntax errors are only identified at runtime. In practice, this means you really have to write your code test-first. Even so, it is annoying to have to run tests just to find out that you forgot to pass in a parameter to a method.
  • I presume this is also because Ruby is dynamic, but as far as I know, there is no built-in code completion yet.
  • This is not necessarily the fault of Rails, but still, I have a feeling that many developers out there will end up putting a bunch of business logic right into the views and controllers. This tends to be a problem in all languages and frameworks, with people simply not putting code in the domain where it belongs, so I don't know if it's fair to say it's a major fault of ruby, but nevertheless, I bet it will happen if Ruby and Rails take off
  • Having used them a bit, I don't know how I feel about blocks and closures. On one hand, they do let you effectively create domain-specific languages, e.g. see the Rake configuration. On the other hand, I have a feeling this kind of thing can easily be taken to dangerous extremes.

Friday, March 24, 2006

Does Code Duplication Have To Be A Bad Thing?

This is a bit of a crazy idea. I'm curious what anyone reading this blog might think about it...

The project I am currently working on validates the data coming from forms and imports in the domain. If you try to save a record with bad data in it, the domain will catch the error. The users also want irrelevant fields not to appear on the screen. A generic example would go something like this: Let's say you have a form which lets you enter a sales person's total sales for the month. If the sales exceed a certain threshold, then you can enter in a bonus. The back end of the system validates the bonus. It makes sure that you are not allowed to enter a bonus if the total sales are too low. The customer also requests that the screen not show the entry field for the bonus if the sales entered are too low. There are a couple of options I can think of: In one case, you drive the screen code from the domain validation logic. Otherwise you can duplicate the logic, e.g.

In the form code:

if (totalSales > MIN_BONUS_SALES) {
displayBonusField();

}

In the back-end domain code:

if (totalSales < MIN_BONUS_SALES) {
createNotEnoughSalesError();
}

This is bad because it means you have to maintain this logic in two places. What if the development environment allowed you to set up dependencies so you could still maintain the duplicate code, but the compiler would warn you if you changed code in one function and not the other? e.g.

In the form code:

@depends_on SalesPerson.validateBonus
if (totalSales > MIN_BONUS_SALES) {
displayBonusField();

}

In the back-end domain code:

validateBonus() {

if (totalSales < MIN_BONUS_SALES) {
createNotEnoughSalesError();
}
}


The point I'm making is that just updating both pieces of code isn't really a big problem, but knowing about all of the dependencies is. The normal way of dealing with it is to remove the dependencies and to develop a code framework that allows these ideas to be maintained in one place, but that can mean more work than one might want to do. I wonder if enabling tools to help manage code dependencies like this would be a viable option.

Saturday, March 04, 2006

Ruby and Rails

For the past few weeks I've been playing around with Ruby and Rails. I was somewhat hesitant at first, because I had heard such great things about Ruby and Rails. My natural reaction to that sort of effusive enthusiasm is to be skeptical. After all, the software development world seems to have an endless appetite for fads. Every time a new technology emerges, it promises to make programming easy and to solve the difficult problems that cause so many projects to fail, or at least to run into serious delays and cost overruns. However, at present I must say that I am quite impressed. It really does seem to be such a nice clean framework, and it really is easy to do stuff. One of the things I like best about it so far is that my code doesn't have to be instrumented with a bunch of dependencies on the rails environment. I can write a pure ruby class and it will just work with rails. If I want to persist to a database, I can take advantage of the object-relational mapping, but the framework seems to do a great job of staying out of my way. It's not
like J2EE where I can remember having to produce multitudes of classes and configuration data just to implement some trivial functionality. I am no expert on ruby or rails yet, but I must admit that I am pretty enthusiastic about the whole thing and hope to learn more as I go along.

Monday, February 27, 2006

TestDrivenDesignPhaseShift

This is rather belated news, but I am kind of proud of the fact that I've added a new term to the XP lexicon: Phase Shift. See the c2 wiki:

http://www.c2.com/cgi/wiki?TestDrivenDesignPhaseShift
.
The notion came out of some discussions in the xp newsgroup. The basic idea is that even if you develop your code incrementally using test-driven development, it is possible that your approach may have to fundamentally change at some point. In my opinion, that's OK and does not confute xp and tdd as reasonable appraoches to develop software. Anyway, I was quite happy when I found out that someone had edited my initial posting to sound more definite and authoritative, and phase shift may well be a legitimate part of the xp lexicon now! :)

Monday, February 13, 2006

Simple Refactoring

Refactoring doesn't have to be, and generally should not be, a complex and time-consuming activity. By using some simple refactoring practices it is possible to greatly simplify code and save time for other people reading the code in question. Here is an example:

Here is the refactored code for calculating the number of hours remaining in the month:


public static int calculateNumberOfRemainingHoursInMonthFromDate(Date fromDate) {
return hoursBetween(startOfNextMonth(fromDate), startOfDay(fromDate));
}

And here is the original code:

public static int calculateNumberOfRemainingHoursInMonthFromDate(Date fromDate) {
Calendar startCal = Calendar.getInstance();
startCal.setTime(fromDate);
int startDay = startCal.get(Calendar.DAY_OF_MONTH);
int maxHoursInMonth = 0;
Calendar calendar = Calendar.getInstance();
calendar.setTime(fromDate);

calendar.set(Calendar.SECOND, calendar.getMinimum(Calendar.SECOND));
calendar.set(Calendar.MINUTE, calendar.getMinimum(Calendar.MINUTE));
calendar.set(Calendar.HOUR_OF_DAY,
calendar.getMinimum(Calendar.HOUR_OF_DAY));
calendar.set(Calendar.DAY_OF_MONTH, startDay);
Date startDate = calendar.getTime();

calendar.add(Calendar.MONTH, 1);
calendar.set(Calendar.DAY_OF_MONTH, 1);
Date endDate = calendar.getTime();

long milliseconds = endDate.getTime() - startDate.getTime();
maxHoursInMonth = (int) (milliseconds / 1000 / 60 / 60);

maxHoursInMonth = correctHoursForAprilTo720(maxHoursInMonth);

return maxHoursInMonth;
}

Below are the utility functions used in the refactored code:

private static int hoursBetween(Date endDate, Date startDate)
long millisecondsRemainingInMonth = endDate.getTime() -
startDate.getTime();
int hoursRemainingInMonth =
(int) (millisecondsRemainingInMonth / 1000 / 60 / 60);
hoursRemainingInMonth = correctHoursForAprilTo720(hoursRemainingInMonth);
return hoursRemainingInMonth;
}

private static Date startOfNextMonth(Date date) {
Date initializedDate = startOfDay(date);
Calendar toDateCalendar = Calendar.getInstance();
toDateCalendar.setTime(initializedDate);
toDateCalendar.add(Calendar.MONTH, 1);
toDateCalendar.set(Calendar.DAY_OF_MONTH, 1);
return toDateCalendar.getTime();
}

private static Date startOfDay(Date date) {
Calendar cal = Calendar.getInstance();
cal.setTime(date);
cal.set(Calendar.SECOND, cal.getMinimum(Calendar.SECOND));
cal.set(Calendar.MINUTE, cal.getMinimum(Calendar.MINUTE));
cal.set(Calendar.HOUR_OF_DAY, cal.getMinimum(Calendar.HOUR_OF_DAY));
Date startDate = cal.getTime();
return startDate;
}