Legacy Code: When Should You Rewrite It?
A couple of years ago I rewrote one of the smaller products that my team has responsibility for. We needed to add significant new functionality and the existing software’s design was going to make this challenging. Hence, I took the decision to rewrite the existing code in a way that would make it quicker to add the new features (and serve as a better foundation for future development).
The rewrite I undertook only changed the structure of the code. The language (C++) and environment (Microsoft Visual Studio) remained unchanged. I can’t be certain of whether the rewrite saved time, however, the overall development was completed within our original estimate (25 person days). What I can be certain of is that it significantly reduced the number of lines of code (from 15,398 to 4,446), reduced the average routine complexity (from 11.26 to 4.63), reduced the maximum routine complexity (from 95 to 26) and increased the percentage of comment lines (from 10.3% to 14.4%). Even after I had added the new functionality, the complexity measures stayed about the same and the lines of code count was 7,317 (less than half of the original software). From a personal point of view I was also very pleased to know that I would never again have to work with two of the routines in the original code: one was 450 lines long and the other was 560! Quick aside: I collected these metrics very easily using Campwood Software’s free tool SourceMonitor which is a great little Windows utility for analysing source code.
Over the last two years following the rewrite we have continued to benefit from the ease with which new features can be added and this has led us to innovate the product much more in that time than we had in the 10 years before.
So if rewriting code provides so many benefits, does it always make sense to do so? My answer would be “Yes....but”. There are well documented cases of software rewrites that have failed spectacularly to deliver (for example this account of the 3 year rewrite of Netscape version 4 into version 6, during which time its market share dropped from around 60% to just 14%). It seems to me that, to be successful, a rewrite has to apply the following principles:
- Adding value
- Understanding the domain
- Divide and conquer
- The rewrite hierarchy
In my view code should not be rewritten just for the sake of it. It shouldn’t be changed just because it would be nice for a code to be more aesthetic. The cold, hard business case for a rewrite is that, by doing so, it will be possible to deliver something else with less effort. That might be new features, improved performance or fewer bugs.
However one thing I would add, based on my experience, is that it can be amazing how rapidly new features can be implemented once the design of the existing code has been changed to support them, don’t underestimate this!
Divide and conquer
In my example, I was rewriting an application that had only just over 15 thousand lines of code. The task was small enough to complete in less than a month with only one person working on it. I would be very wary of undertaking the same task in one go for applications with more than 30 thousand lines of code. In the increasingly agile world of software development, the idea of putting all new features on hold for more than a couple of months would not make sense. So for larger applications I would encourage people to tackle only parts of the applications at a time (for example single modules or a group of related modules).
Understand the domain
In my case, I had no prior experience of the code I was rewriting however I did have a reasonable understanding of what the application was meant to do. This made it fairly easy for me to figure out what all the code was trying to achieve (although, of course, some would say the only way to understand code is to rewrite it!)
The rewrite hierarchy
In my view there are fundamentally three different types of rewrite which are (in order of increasing risk and difficulty):
- Redesign from requirements
My definition of refactoring here is much broader than what is outlined in Martin Fowler’s (excellent) book Refactoring. I would describe it as:
“Changing the internal structure of code to improve its design without affecting any required functionality”
The language, platform and required external interfaces remain unchanged. The only changes are to the code structure and to remove any redundant code.
The example I talked about at the beginning of this post was really a refactoring exercise. And I found the lowest hanging fruit to be the removal of redundant code and the moving of duplicated code into routines that could then be called.
This is where you choose to move the code to a different language or platform. The rationale is often that you believe the new language/platform will allow you to develop more productively, but it can also be a necessity if your existing platform will soon become unsupported.
Redesign from requirements
This is where you completely re-engineer the solution from scratch, throwing everything away except for the original requirements. This is usually driven by a desire to improve many different aspects of the software (for example the user interface, the internal structure, interfaces with other systems, the platform). The biggest danger with this approach is that many real requirements may not be documented.
Fortunately, the three different types of rewrite are not mutually exclusive. In fact, I would suggest always refactoring before undertaking a port or redesign from requirements. In my example, reducing the total lines of code by two-thirds would significantly reduce the effort involved in a port (and in fact did save us effort when we recently switched to the latest version of Microsoft Visual Studio). What’s more, the detailed insight into the actual requirements that the software meets (rather than just those that are documented) is very useful in assessing the feasibility (and estimating the effort) of a redesign from requirements.
I think that rewriting legacy code generally makes sense but only as a step taken before you make other required changes (such as adding new features). I encourage people to refactor before doing anything else (sometimes you even find that after refactoring, further improvements are no longer required). For large applications, I would always recommend that the task is broken down into sub-projects that can each be delivered in 2 months or less.