Legacy code is frustrating but unavoidable. Here are some ways to work with it more effectively.
It’s a fact of working in an engineering company that you will encounter challenges posed by legacy code at multiple points in your career. Whether you’re an individual contributor, tech lead, or engineering/product manager, there will come a time when some work you want to undertake is hindered by legacy code.
In this article, we’ll walk through five steps for dealing with legacy code. We’ll explore what legacy code really means, and how to think through the possible solutions to improve the state of the codebase, and your team’s ability to ship features efficiently.
Step 1: Accepting that legacy code exists
Before you and your team can even get started on tackling legacy code, it’s important to come to an acceptance that all codebases and all teams will have code that they define as ‘legacy’. It’s not a failing of any individual or team to end up in a situation where some section of the code is outdated or harder to work on.
There are techniques teams can employ to try to maintain the level of legacy code and proactively reduce it, but eradicating every trace of ‘legacy’ is an incorrect, unrealistic goal, and one that’s probably counterproductive to the business and its needs.
Step 2: Being realistic about the definition of legacy code
The term ‘legacy code’ gets thrown around a lot, and as a team it’s important to have a clear definition that is shared and understood by everyone. Over time, this term has become very overloaded; I’ve been on teams where any code considered old has been deemed legacy.
This is where the waters start to muddy: code that is old is not necessarily legacy. In the teams I work on, I try to make it clear that legacy is less to do with the code’s age, and more to do with the characteristics that make it hard and unproductive to work on, such as:
- Lack of automated tests. When developers make changes to this code, they have to manually test it because there is no automated test coverage at any level (unit tests, integration tests, end to end tests, or others).
- Lack of documentation. When developers have to work with this code, it is hard for them to understand. This may be because the code is written in a way that makes its functionality fuzzy, but will also be influenced by the lack of documentation. This may be large comments in the code itself explaining complex parts, separate documentation in README.md files, or in a company’s internal documentation system. This may not be literal documentation about the code, but also the context of how it came to be: is there a design document that was shared with clear motivation for why this feature exists, and a strategy for how it’s going to be implemented?
- Lack of an expert. When the team makes changes to this code, is there an obvious person to review it, or is no one familiar enough to review with confidence? You can often spot this by code reviews that are nothing more than a cursory ‘looks good’, because the reviewer lacks the context and knowledge to provide thorough review. The most likely scenario here is that the code in question was written by someone who has since left the team and no one else worked with them on this area of the codebase.
Code that is simply old, but with thorough documentation and tests, is not legacy code. It’s important to be strict at understanding why code is hard to work with and what can be done about it.
For example, say we have some code that was written ten years ago and has no tests, but good documentation and the team are familiar with it. The situation can be improved simply by investing in time to add some tests. Once we’ve made this easy fix, we’re dealing with old tech, rather than legacy tech.
Step 3: Prioritizing updates to legacy code based on churn
Once code has been justifiably deemed ‘legacy’, it’s important to think about the value that your team can unlock by rewriting or updating the code. It can be tempting to think of all legacy code as equal, but in my experience it is anything but.
Let’s say there are two areas of the codebase deemed legacy. They are legacy in every sense of the word: poorly understood, poorly documented, without tests, using now outdated techniques. Which of these areas do you tackle first? Or do you take them both on at the same time?
The criteria I use is churn. What I mean by this is how often does this area of the code have to change? If one of these areas is attached to a mission critical feature that your team wants to iterate, improve, and ship features then the legacy status of the code is going to have a huge impact, slowing the engineers down, reducing satisfaction, and delaying shipping time.
On the other hand, if the legacy code is in an area of your product that is rarely touched, doesn’t have planned features, and is all functioning without causing users a problem, how much value is there really in rewriting the code, or investing in improving its status?
Additionally, this becomes a very hard sell to those who need to be on board. When legacy code is actively impacting day to day shipping and your team’s velocity, that’s (hopefully!) an easier issue to get buy-in to resolve.
When planning investments to legacy code, don’t focus on the technical benefits of the rewrite, but what it unblocks for the business as a whole.
Step 4: Making incremental improvements to legacy code
You’ve got your legacy code, and you’ve got the necessary sign-off to invest engineering time into improving its state, because it will increase your team’s velocity, happiness, and make the code more robust. Great! But how do we actually approach changing the code?
There tend to be two approaches here: a ‘big bang’ rewrite, or a steady incremental approach, which Martin Fowler calls the 'Strangler Fig approach'. I tend to prefer an incremental approach to the problem wherever possible but, as always in software, there is no right answer every time, and occasionally you will need to reach for an aggressive rewrite.
‘Big bang’ rewrites are inherently risky, and whilst you can try to reduce the risk, ultimately the approach lends itself to risk and opens up a number of tricky questions:
- When we are rewriting, what do we do if a bug is found? Do we fix it in the old codebase? Or in the new codebase? Or both?
- What happens if the rewrite takes longer than anticipated? Will we have to maintain two codebases for a considerable amount of time? Will this lead to duplicated work?
- Throughout the duration of the rewrite, what happens if we want to ship a feature? Are we even able to change the old code at all?
- How do we ship this new rewrite? Do we do one large release at the end? What happens if something goes wrong? Does that mean we won’t get any feedback or a sense of progress until we ship the final version?
An incremental approach proposes that we make a series of smaller steps towards the end goal. This has its downsides – typically this work needs to be carefully planned to map out a series of steps – but reduces some of the risks mentioned previously with a big rewrite:
- When we ship incrementally, there is less concern about one large release at the end of the process.
- We avoid creating a distinct codebase, so don’t risk having to maintain or duplicate work across two codebases.
- When we tackle the work as a series of steps, we are able to pause it to ship new features if necessary (I’d try to avoid doing this generally, but sometimes it has to be done!).
In my team at Google, we used the incremental approach to migrate the Chrome DevTools codebase to TypeScript. We could have done this as a huge ‘big bang’ release, but that would have meant pausing all work on any features (or bug fixes) whilst moving a vast codebase, which would have likely taken months to complete. Instead, we increased the timeframe, because we did it alongside shipping features, but enabled ourselves to move incrementally, file by file. You can learn more about this work and our approach via this blog post and talk from CDS 2020.
Step 5: Clearly scoping rewrites to be as small as possible
Sometimes, a project may be beyond incremental rewrites and require a more aggressive approach. As mentioned above, this isn’t my preferred option most of the time, but sometimes it’s necessary – typically when the code has so many bad characteristics that even attempting to understand how to incrementally improve it will take an unreasonable amount of time and energy.
When taking this approach, I still like to work hard to reduce risk. The main way to do this is to very tightly control the scope of the rewrite. Sometimes, it’s possible to rewrite the entire codebase in chunks, rather than all at once. For example, we might rewrite the data layer before rewriting the front-end, rather than tackling them both at once. Given that a rewrite usually requires any work or changes to the original to be paused, I like to keep the amount of rewrites and the scope of them down to a minimum.
Another area I like to focus on is the risk of release. A huge rewrite spanning weeks or months is a risky release: that’s a lot of code changes being dumped onto production all at once. Instead, I prefer to enable the rewrite changes to be shipped incrementally as a series of pull requests. This can be done by landing the rewrite behind a flag (often known as ‘feature flags’). So, the code ships, but it is not executed by your application until a flag is toggled on. You can then toggle this flag on for a set group of users – for example, all internal staff – and ensure that things are functioning. The additional benefit here is that you can launch this rewrite to users by turning the feature flag on for everyone. If there are any issues, you can disable it by toggling the flag.
This ability to switch a rewrite on/off without deploying any code is very powerful. It allows you to stay reactive to unexpected issues (there will always be issues, no matter how careful and diligent you are) and encourages the team to think about how they can land changes for this rewrite across multiple pull requests, getting good code review from others at the same time. Once your rewrite has shipped and you are confident it’s been successful, you can then remove the feature flag and all the old code – which is always a very satisfying feeling!
Reflections
Legacy code is clearly here to stay. Products pivot, teams evolve, and technologies change. Some of the code you’re writing today will be considered legacy by people in years to come – through no fault of your own! Learning to embrace legacy code and see it as an interesting technical challenge is key. As you work on these issues over time, you’ll build up an array of experience and tools for tackling legacy code. You’ll be able to carefully modernize it over time, with minimal disruption to your users and colleagues, as you increase test coverage, developer satisfaction and code quality. There’s nothing more satisfying than that final pull request that removes the last remnants of the code you’ve worked to remove for multiple months. I hope you find these challenges as rewarding and interesting as I do. Good luck!