7 principles for balancing agility and durability

Engineering is a game of trade-offs – move fast and break things, or build slow and last forever? Know when to do which.

By Maxime Najim

March 24, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Your inbox, upgraded.

Receive weekly engineering insights to level up your leadership approach.

Estimated reading time: 10 minutes

Balancing speed and durability in engineering is a constant challenge – move too fast, and technical debt piles up; over-engineer and agility suffers.

As staff+ engineers, we are always navigating a balance between prioritizing speed and investing in the durability of systems. If we prioritize speed too much without planning, we run the risk of accumulating significant technical debt that can impede future progress. On the other hand, if we focus heavily on building overly complex systems too early on, it may stifle our ability to innovate quickly. The true test lies in determining when to prioritize speed and when to enhance stability.

Striking the perfect balance

The debate between speed and scalability is often framed as a binary choice. Experienced engineers understand it’s not that straightforward. They believe that both speed and excellence can work together harmoniously and should do so where possible, but the key lies in balancing compromises according to the situation at hand.

Without flexibility, an overreliance on one approach, or over-engineering, can result in accumulating technical debt, which could impede progress over time. With over-engineering particularly, there is the threat of spending too much time on unnecessary optimizations early on. Engineers must find the balance between building the right abstractions for a solid foundation.

Here are seven strategies to navigate this balance.

1. Build steel threads

A steel thread is a thin, end-to-end slice of functionality that touches all critical system components. It’s a solid, foundational aspect that weaves through different parts of the software system to implement a key use case.

Steel threads integrate major system components from the offset, meaning that scaling and modifying are a lot easier. It also helps identify discrepancies at earlier stages, helping to uncover integration problems sooner when it’s simpler (and more cost-effective) to rectify them. In effect, engineers are able to attain speedy building with scalability.

Use case: Airbnb

During Airbnb’s expansion phase, they faced an issue of making sure payments could be passed through different countries and currencies smoothly. Initially, they created a complicated payment system that was hard to maintain and update in the future – but over time, they eventually opted for a more streamlined strategy.

Their steel thread approach involved crafting a comprehensive payment process that seamlessly connected the user interface with the back-end system and database to ensure efficiency and reliability. This enabled Airbnb to test its structure in stages and address integration challenges before expanding its operations. They were able to enhance the system gradually without the need for expensive overhauls and guarantee smooth payment transactions across various regions.

Actionable tip: start with a steel thread implementation for critical paths, ensuring they function across the entire stack before adding complexity.

2. Managing technical debt intentionally

Technical debt isn’t inherently bad – it’s a trade-off that enables faster progress in the short term and is often necessary for quick wins. Over-focusing on future-proofing can backfire, as the anticipated benefits may never materialize, leading to wasted time and resources. However, excessive technical debt can eventually drag down a team’s productivity. The key is to manage it intentionally – taking calculated shortcuts while planning and tracking future fixes – to maintain a healthy balance between speed and long-term sustainability.

Best practices include:

Set times to revisit – Set explicit deadlines to revisit scrappy solutions.
Debt registers – Maintain a backlog of tech debt items and prioritize them alongside feature work.
Regular refactoring – Invest in periodic (every one to three months) improvements to prevent accumulated debt from becoming overwhelming.

Use case: X

In its early stages, X (formerly Twitter) prioritized growth over perfect infrastructure. When the platform gained traction, this meant the system wasn’t capable of supporting such large volumes of traffic. System-wide outages were common, bringing rise to the Fail Whale’s popularity.

To remedy the issue, X chose to manage its technical debt strategically instead of rewriting everything from scratch. They closely monitored issues like database bottlenecks and focused on improvements that increased reliability without slowing down feature development. Over time, they gradually migrated to a new framework while continuing to ship new products without major disruptions.

Actionable tip: before taking a shortcut, ask yourself: “How painful will this be to fix later?” If the answer is very, think twice.

3. The last responsible moment

For some, optimizing systems for potential future requirements is an attractive option, but this approach can impede a developer’s efficiency and creativity in the long run. Further, doing so will most likely invite unnecessary complexity, wasting time and resources.

A better approach is to delay decisions until they’re truly necessary – when waiting any longer would create significant risks or costs. This isn’t about procrastinating; it’s about maximizing adaptability and avoiding premature commitments based on assumptions.

How to apply this principle in practice:

Start simple: start with something simple that solves today’s problem. For example, deploy your service on a single cloud region rather than designing a global multi-region failover system.
Monitor real usage: monitor how your system is actually used – track traffic growth, query patterns, or latency trends. Only when the data shows strain should you invest time in scaling. This protects your team from solving hypothetical problems while staying responsive to real needs.
Have a contingency plan: have a contingency plan ready for scaling when the time comes – whether that’s database sharding, breaking a monolith into services, or adopting new infrastructure.

Use case: SpaceX

SpaceX is known for its deliberate approach to innovation, especially in developing the Falcon 9 rocket. Instead of trying to perfect designs from the start, they prioritized building practical prototypes to gather real-world data through testing. This strategy allowed them to refine their designs based on actual performance, reflecting their philosophy of deferring final decisions until absolutely necessary.

Actionable tip: if a feature’s future evolution is unclear, keep the design simple and loosely coupled. Frequent iteration gives your team regular checkpoints to learn from real user behavior and spot emerging patterns, narrowing that cone of uncertainty over time. Overall this reduces risk and enables smarter, more informed design and architectural decisions as the product evolves.

4. Selectively invest in quality

It’s natural for teams to strive for consistent quality standards across their codebase – many engineering organizations even formalize this through guidelines or automated checks. But in practice, not every part of a system carries the same weight or risk. Instead, think strategically about where to focus your engineering investment. Aim for high quality where it has the most impact and focus engineering efforts on areas that are:

Core to business functionality: the parts of your system that differentiate your product and drive revenue.
Frequently modified or extended: high-change areas are more prone to introducing errors – each change is a new opportunity for regressions. Prioritizing test coverage, readability, and clean design in these areas makes iteration safer and faster.
High-risk in security or performance: systems handling sensitive data or operating under a heavy load demand extra scrutiny.

Use case: Slack

Slack prioritizes reliability in core features like message delivery and API availability. To accurately measure user experience and system reliability, Slack uses a metric called the service delivery index for reliability (SDI-R). This approach ensures that essential functions remain dependable while allowing less critical areas to evolve with more flexible enhancements over time.

Actionable tip: regularly assess engineering friction – places where small changes take too long due to tangled or brittle code. If minor changes require excessive effort, it’s a signal to invest in targeted refactoring.

5. Use the Strangler Fig pattern

Strangler Fig trees germinate on host trees and send roots downward, eventually engulfing and supplanting the host altogether. Once the host tree dies and decays, the strangler fig stands as a hollow, self-sustaining tree.

The software engineering practice is less ruthless but just as effective. The Strangler Fig approach helps to modernize systems incrementally without the risk of a full system rebuild. The process involves developing features as separate services or components that gradually replace the old ones. The practice promotes the development of features alongside an established legacy system, whereby old duties are eventually phased out.

The Strangler Fig pattern is successful because it:

Defines boundaries: to establish sections within the current system and focus on transitioning one feature or function at a time.
Develops additional features: enhances services while operating in parallel with the existing system to gradually mimic capabilities.
Gradually transitions: gradually transfers traffic and reliance to the updated services while ensuring performance and functionality are verified at each step.
Deprecates the legacy system: gradually replaces components as their roles transition to the system. The old platform is eventually retired completely.

Use case: Shopify

Shopify faced growing challenges maintaining a large, monolithic codebase – making it harder to ship features quickly without introducing bugs or creating operational risks. To tackle this, they adopted the Strangler Fig pattern, gradually replacing legacy components with new, more maintainable services. Over the course of several years, this incremental approach reduced technical debt, improved system reliability, and gave their teams more confidence to iterate without fear of breaking critical functionality.

Actionable tip: design modular boundaries early so that future refactoring or service extraction can happen without major rewrites.

6. Embrace sacrificial architecture

When Martin Fowler introduced the concept of “sacrificial architecture,” he emphasized that early-stage systems are often built with the expectation they’ll be replaced later. Instead of aiming for a long-lasting architecture upfront – especially when requirements or technical details are still unclear – teams should intentionally create temporary, practical solutions to address immediate needs. They should use this period to learn more about the problem space.

What might this look like in practice?

Launching with a simple monolith to validate product-market fit before breaking into services.
Hard-coding values or manual workflows in a minimum viable product (MVP), knowing they’ll be automated later.
Using no-code tools or third-party platforms to test demand before investing in custom solutions.

The fundamental concept of sacrificial architecture includes:

Intentional short-term focus: engineers must recognize upfront that the current architecture is a short-term solution.
Rapid experimentation: sacrificial architecture thrives on fast, focused experimentation. When requirements are still evolving, the priority is testing assumptions – not perfecting the architecture. This means running small experiments to see what resonates with users and trying different technical approaches to test feasibility. Make sure to avoid getting caught up in long-term sustainability or scalability concerns.
Planned obsolescence: acknowledge that once key insights have been gathered and analyzed enough, the original design will be set aside. Eventually, what has been built will be replaced with an improved and more enduring solution.

Sacrificial architectures build momentum early, but they carry risks:

Technical debt is a risk if temporary code stays too long.
Scaling challenges if success outpaces the initial design.
Alignment issues if teams forget the system was meant to be replaced.

Use case: Instagram

Instagram started with a simple, monolithic backend – fast to build and good enough to support an early-stage product. Their priority was speed and learning, not long-term scalability. As user growth exploded, they treated that initial architecture as sacrificial – intentionally rebuilding core components to handle scale, performance, and reliability demands. That early simplicity wasn’t wasted, it gave them the insight and time needed to invest wisely in long-term infrastructure.

Actionable tip: When building experimental systems, document assumptions and set clear triggers to revisit the architecture before fragility sets in. Triggers could include hitting a user threshold (e.g., 50K daily active users), experiencing issues like rising error rates, or reaching business milestones such as preparing for an international launch.

Prices for all New York conferences increase on Monday, August 25 ⏰

Buy tickets

7. Use AI

Generative AI has emerged as a true asset for engineers. Platforms such as GitHub Copilot and ChatGPT are helping enhance team productivity by expediting tasks like creating code and outlining service structures.

Although these tools cannot entirely replace thinking in systems design, their capacity to automate mundane tasks and simplify coding procedures is impressive. In addition to generating code, we are observing the emergence of vibe coding – a method through which engineers partner with AI to brainstorm ideas and outline solutions.

Actionable tip: Try out vibe coding to explore ideas and come up with approaches while brainstorming in code. There are teams that have started incorporating this method into design meetings. Create prototypes of service interfaces to evaluate trade-offs and outline API structures to coordinate between front-end and back-end teams while also testing out architectural patterns.

Final thoughts

The best staff+ engineers promote a culture of experimentation while also setting up a structure that supports the scalability and durability of systems.

Balancing quality and speed does not mean compromising on quality; instead, it involves recognizing that releasing often can lead to feedback loops that help refine products to meet user needs and market demands effectively. Using these seven principles can empower team leaders to maintain a pace without accumulating much technical burden or debt over time.

About the author

Maxime Najim

With two decades of experience building resilient, scalable systems, Maxime has had the privilege of working with engineering teams at Yahoo!, Apple, Netflix, and Amazon. Today, as a distinguished engineer at Target, he partners closely with senior engineering leaders—directors, VPs, and principal architects—to navigate complex technical challenges and align architectural decisions with business goals.
- @softwarecrafts
- Maxime Najim

Newsletters

Webinars

Videos

Reports

For you

New York

Berlin

London

Meetups

7 principles for balancing agility and durability

By Maxime Najim

Your inbox, upgraded.

Striking the perfect balance

1. Build steel threads

Use case: Airbnb

2. Managing technical debt intentionally

Use case: X

More like this

3. The last responsible moment

Use case: SpaceX

4. Selectively invest in quality

Use case: Slack

5. Use the Strangler Fig pattern

Use case: Shopify

6. Embrace sacrificial architecture

What might this look like in practice?

Use case: Instagram

7. Use AI

Final thoughts

About the author

Maxime Najim

New York

Berlin

London

Meetups

7 principles for balancing agility and durability

By Maxime Najim

Your inbox, upgraded.

Striking the perfect balance

1. Build steel threads

Use case: Airbnb

2. Managing technical debt intentionally

Use case: X

More like this

3. The last responsible moment

Use case: SpaceX

4. Selectively invest in quality

Use case: Slack

5. Use the Strangler Fig pattern

Use case: Shopify

6. Embrace sacrificial architecture

What might this look like in practice?

Use case: Instagram

7. Use AI

Final thoughts

Share:

About the author

Share:

More like this