New York

October 15–17, 2025

Berlin

November 3–4, 2025

London

June 2–3, 2026

Best practices for letting AI write integration code

Integrations are no walk in the park.
December 04, 2025

You have 1 article left to read this month before you need to register a free LeadDev.com account.

Estimated reading time: 6 minutes

When you’re integrating AI into engineering workflows, reliability, robustness, and fit are key concepts to consider.

Before we built an AI system that could write integration code for us, we had to confront a simple question that shaped the entire project. What actually makes an integration hard for an AI to write in the first place? 

From the outside, integrations can look like straightforward plumbing. One system has data, another system needs that data, and the code in the middle simply moves it from point A to point B. But anyone who has ever built integrations knows that this image is an illusion. Integrations are among the most complex and failure-prone parts of software development, not because the concept is difficult, but because the real world is messy.

APIs, the interfaces that systems use to communicate, rarely behave consistently. One system might paginate data with page numbers, another with opaque tokens, and another with a mix of both depending on the dataset. A vendor might enforce strict rate limits on one customer but not on another. Documentation might describe authentication flows that differ from what actually happens in production. Even the shape of the data itself can shift between accounts.

Humans handle these inconsistencies by drawing on intuition and accumulated experience. AI has none of those instincts. It has patterns, not understanding.

Our early experiments made this painfully clear. We asked the model to generate full integrations based on API documentation, and the code it produced appeared polished at first glance. But beneath the polish, it invented details that were simply not true. It guessed at pagination rules. It assumed fields existed in responses where they did not. It stitched together flows that would break as soon as they interacted with real data. The model was filling in gaps, because the world we placed it in was full of ambiguity.

That was the turning point. We saw that the model wasn’t failing. It was doing exactly what we asked. The failure was our assumption that the model could reason through the same implicit knowledge an engineer carries around. To make the model useful, we would need to build a world where the “unknowns” were no longer unknowns.

Building a world AI can succeed in

The first step was reshaping the environment that the model worked inside. Instead of feeding it raw API documentation and hoping it would piece things together, we created a set of internal tools that hid the complexity of real APIs behind predictable, well-defined interfaces. 

These tools handled everything that tends to vary in the wild. They knew how to authenticate reliably. They understood the quirks of pagination. They respected rate limits. They stabilized inconsistent fields. By wrapping external systems in a consistent shape, we removed the need for the AI to interpret behavior that even humans find confusing.

Once those foundations were in place, we narrowed the AI’s responsibility. Instead of asking it to build integrations from scratch, we assigned it one specific job… generate the flow logic that connects two systems, using only the functions we gave it. Flow logic is the part of an integration that describes how data should move, how it should be transformed, and how systems should interact. It’s the human-readable layer that expresses intent.

By limiting the AI to this focused task, we eliminated most of the uncertainty that had caused trouble early on. The model no longer needed to wonder how an API might behave. It only needed to decide how to move information through a controlled environment. The shift was immediate. The output became consistent, because the system only allowed consistent choices.

But even with strong architecture, another issue surfaced. The flows produced by the AI were far more reliable than before, but there were still mechanical slips. A method call might reference the correct idea but miss an argument. A transformation might follow the right pattern but leave out a detail. Engineers could fix these issues easily, but if they had to fix them constantly, the time savings would evaporate. We needed a way to ensure that the AI only handed engineers work worth reviewing.

That realization led naturally into the next phase.

Guardrails that keep the system honest

To support the AI’s progress and give engineers meaningful output, we introduced an automated validation system. This system sat between the AI and the humans, serving as an initial reviewer. As soon as the AI produced a flow, the system attempted to run it through a series of checks. It compiled the code. It verified that the functions the AI used were real. It tested whether the structure matched what our architecture expected. Whenever something fell out of alignment, the system rejected the flow and sent feedback back to the model.

This process improved consistency in a way prompts alone never could. The model received immediate corrections each time it stepped outside its boundaries. Over time, the number of corrections dropped, and the code began to settle into a stable pattern. Engineers who reviewed flows after this automated stage found that their time shifted from repairing structure to interpreting intent, which is exactly the kind of work humans excel at.

Still, the automated checks had a natural limit. They could confirm whether the code was structurally valid, but they could not tell us whether the choices inside the code matched the expectations of a real integration. They could not interpret the purpose behind a field or the nuance inside a mapping. They could not recognize when a vendor’s behavior required a subtle exception. The final evaluation always belonged to a human. And that took us to the last piece of the system.

Keeping humans in control where it matters

With architecture guiding the model and automation screening its work, the remaining step was creating a workflow where engineers could evaluate the output in a way that fit naturally into their existing process. We decided early that AI-generated code would not be treated differently from human-written code. Every flow lives in our repository. Every flow passes through CI. Every flow becomes a pull request. Engineers review these flows exactly as they review one another’s work.

This choice preserves human judgment, which cannot be automated. Engineers understand subtle business requirements. They recognize when a flow is technically correct but practically risky. They know which fields matter most to a particular customer and how an API is likely to behave under real conditions. They can interpret the purpose behind the code, not just its structure.

The new system also changed how engineers onboard. Instead of beginning with a blank file, new team members encounter AI-generated flows that demonstrate our conventions in action. They learn by examining concrete examples rather than navigating abstract documentation. The AI becomes a starting point for learning rather than a replacement for it.

With all three pieces in place, the system functions smoothly. The architecture creates stability. The automation reinforces consistency. The human workflow brings meaning and judgment. Each step solves something the previous step could not. The model produces the code, but the system around the model determines whether that code can be trusted.

Final Thoughts

Throughout this project, one lesson became clear. The reliability of AI-generated code is not determined by the AI alone. It is determined by the environment that surrounds it. A well-designed environment removes ambiguity, supports consistency, and gives engineers the clarity they need to trust the output. Without that environment, the AI guesses. With it, the AI collaborates.

Hey, you’d be great on the LDX3 stage

We did not set out to build a system that would write integrations for us. We set out to build a world where an AI could participate in integration work without introducing risk. And in building that world, we discovered a way for humans and AI to work together that preserves speed, improves consistency, and maintains control. The structure, not the model, is what makes the system reliable.