Engineering managers ditch cloud AI for local LLMs

Are local LLMs a third way forward?

By Chris Stokel-Walker

June 23, 2026

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 3 minutes

Key takeaways:

Local LLMs have crossed a credibility threshold. When the engineer whose work underpins local inference says he’s using a 27B model as his daily coding driver, that’s a signal worth taking seriously.
Local LLMs are already good enough for the boring-but-useful stuff where paying for frontier inference is overkill.
The “expensive licence or no AI” binary is breaking down: local LLMs offer a third path.

Between rising costs, tighter token limits, and the whims of a US President willing to flick the switch on availability for frontier models, engineering managers are facing Hobson’s choice.

Since the advent of ChatGPT those in charge have historically had two, equally bad, options available to them. Either they fork out for expensive and sometimes shaky seat licences for AI, or they have to forego frontier inference entirely because legal departments are too worried about data governance.

However, there’s a third option emerging for managers to pick from: local large language models (LLMs).

Your inbox, upgraded.

Receive weekly engineering insights to level up your leadership approach.

Go local?

Until recently, running serious models locally was largely the preserve of hobbyists, researchers, and those lucky enough to have bought high-specced graphics processing units (GPUs) when they were far cheaper.

Even then, performance lagged far behind frontier APIs – sometimes by several generations – and the setup was fiddly. When it worked, context windows were tight and tool calls weren’t always reliable.

The breakwater moment for the local LLM revolution was announced by Georgi Gerganov, the engineer behind llama.cpp, who is using a local model for much of his daily work.

Gerganov did not respond to a request for comment made via Hugging Face, the AI company he joined in February, but we do have a sense of his thinking thanks to a Hacker News comment.

“I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks,” Gerganov wrote. “Over the last month and a half I’ve been using it almost daily, either on my M2 Ultra or on my RTX 5090 box.”

Gerganov explained he is using a lightweight pi agent with everything stripped and a short system prompt “to align it a bit with my style.”

A broader consensus on local LLMs

It’s not just Gerganov who is a convert to the local LLM route. “Open source has played a critical role in the software industry for decades, offering invaluable lessons along the way,” says Mat Velloso, former VP at Meta and Google DeepMind, who recently posted that he is starting to use GLM-5.2, an open model developed by Z.AI, as his daily driver for development.

Velloso is a supporter of open-source, though also points out that reliance on it is nothing new.

“Today, both private and public organizations are significantly stronger thanks to the relentless dedication of open-source contributors,” Velloso says. “I believe the same will prove true for open-weight AI models.”

This reflects a bigger shift in how things are changing. “The world is quickly realising that critical infrastructure cannot rely purely on proprietary models that might be withdrawn without notice,” says Velloso. “Openness is no longer just a matter of business continuity; it is an imperative for technological sovereignty and safety.”

Total replacement?

The practical question for engineering managers is less whether local models can go toe-to-toe with the frontier labs (they can) and more about where they are already good enough to start eating into paid API usage.

For many teams, that’s the boring-but-useful bits of coding like autocomplete and refactoring, as well as documentation, test generation, and lightweight debugging.

Gerganov hinted at that in his own comment, saying “I use it for small mundane tasks at ggml-org – nothing really impressive, but definitely a helpful tool for a maintainer.”

That’s useful, particularly in organizations where latency, privacy, or predictable costs matter more than squeezing out the last few percentage points of capability.

New York • September 15 & 16, 2026

Delivering AI results without a playbook?

Find what’s working at LDX3

Explore

That said, while open-source models and local LLMs are a new option for software engineers, there’s a deeper question about whether they could solely be responsible for development and advancement in the field.

Velloso isn’t so sure that we could rely on open-source labs alone without the commercial incentives enforced by those closed-source, proprietary labs, with their funding and massive user bases.

“I am deeply grateful to the many AI labs – both in the US and globally – that continue to push the frontiers of AI and ensure it remains accessible to all,” he says. “The world needs them to keep leading the charge.”

About the author

Chris Stokel-Walker

Chris Stokel-Walker is a freelance journalist based in the UK.
- @stokel

Newsletters

Panel discussions

Videos

Reports

For you

London

Meetups

New York

Berlin

Engineering managers ditch cloud AI for local LLMs

By Chris Stokel-Walker

Your inbox, upgraded.

Go local?

A broader consensus on local LLMs

More like this

Total replacement?

About the author

Chris Stokel-Walker

London

Meetups

New York

Berlin

Engineering managers ditch cloud AI for local LLMs

By Chris Stokel-Walker

Your inbox, upgraded.

Go local?

A broader consensus on local LLMs

More like this

Total replacement?

Share:

About the author

Share:

More like this