Tuesday, April 24, 2007

TDD Is Not An Algorithm Generator!

[Update: more on this topic]

In his blog, Ravi compares Ron Jeffries working out a sudoku solver using TDD (test-driven development) with Peter Norvig's implementation.

Now, what can we learn from the two efforts? Peter does not use TDD to solve the problem, yet his solution is compact and complete, whereas Ron's TDD effort remains just a partial solution. Does that mean that TDD is inferior to design up front? I do not believe that's the point Ravi was trying to make, but perhaps he was trying to say that TDD is not an algorithm generator. In that respect, I completely agree. In fact, one of the first rules I teach my students when I am doing a TDD workshop or teaching a course is precisely that TDD is not an algorithm generator! Solving sudoku is just the kind of problem you want to find an algorithm for first, then implement that algorithm. This is not a question of up-front design. It's just common sense: It's perfectly reasonable to know conceptually how to solve sudoku puzzles before writing any code. Else how would you know whether any of the intermediate methods that you so diligently write tests for will be useful in the end?

There is a real gap between a design, which is a particular structure of code, and an algorithm, which is a description of how to solve a problem. An algorithm can be expressed in any form you choose. It can be a mathematical notation or it can be written down in plain english, or it can live in your head, but if an algorithm is required to complete a story, you should certainly know what it is before you engage in writing your production code. If you doubt what I'm saying, consider a story where I ask you to write code to encrypt and decrypt files. Would you just start hacking out methods (using TDD or otherwise)? You'd be lucky to develop a trivial cipher. You need to choose a cryptographic algorithm before you begin any programming. Once you do start with the programming, I would certainly recommend using TDD as a way of designing the code.

Let's say you're faced with a problem where you want to find the right algorithm yourself, either because you have to or just because you enjoy the challenge. Writing some code to solve a simpler part of the problem may be a good way to start. That approach worked for Peter in his solution to Sudoku - I doubt it's guaranteed to work all the time. In any case, when you're hunting for an algorithm, any code you write should be considered provisional. The main purpose is not to produce high quality maintainable code, it's just to see if getting part way toward a solution will trigger any additional insights. Again, it's once you've understood the correct algorithm that you can begin writing the production code.

I certainly think Ron's example is not a very good indication of how TDD should work. I am also disheartened that Ron doesn't clearly make the point that he is looking for an algorithm. Therefore his efforts to refactor his code during the tdd process, are far too premature. Based on Peter's solution, I'd say Ron never even gets half way toward the correct algorithm and wastes his last few articles on refactoring to objects instead of doing anything particularly useful or enlightening. When you're in an investigative mode where you don't even know how to solve your problem, that harly seems like the time to solidify the design. Even if you're doing agile development, keep in mind that for each XP story, you should be certain that you know at least informally in your own mind the steps needed to solve the problem. Trying to offer TDD as an algorithm generator is dopey and it's just going to make it easy for people not to take TDD seriously as a valid design technique.

So what is the purpose of TDD then? One goal of TDD is to reduce the need to determine ahead of time which classes and methods you're going to implement for an entire story. There's a large body of shared experience in the developer community that trying to anticipate such things tends to lead to paralysis where nothing useful gets done and/or produces bloated, over-designed code. Instead, you can develop one aspect of the story at a time, using each test to keep yourself moving forward and refactoring the design as you go along - think of TDD then as a ratchet or a belay device in climbing. One of the principles of agile development is that it's generally not a good idea to try to comprehensively understand a whole project (or a big part of it) up-front, as is often done in waterfall-ish methodologies. However, that doesn't mean you need to begin every story blindly without understanding how to solve that particular problem.

Updates: UncleBob makes a good comment on reddit: http://programming.reddit.com/info/1kth0/comments/c1lkvh; Read about my TDD Sudoku effort