Planning next moves: Improving performance when half your stack is someone else's problem

Learn how to measure latency, set realistic goals, and improve performance even when critical parts of your system are out of your control.

Speakers: Maude Lemaire

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

June 03, 2026

New York • September 15 & 16, 2026

Expectations are rising. See how other leaders are handling it at LDX3 New York.

Explore

Most of my career has been spent making one big backend stay up, and go fast. My new job description is harder: keep a product fast when half the latency budget lives inside somebody else’s GPU cluster, every model provider degrades differently, and the honest answer to “”what is our time to first token?”” is “”which of the five numbers do you want?””

So, what does building a reliable and performant product on top of a notoriously unreliable and under-performant AI models look like at Cursor? We’ll walk through the time-to-first-token pipeline from a user’s keystroke through client, network, agent server, inference proxy, and model provider: all the way through the (sometimes comical) ways each layer lies to you about where time went. We’ll learn about how you can’t skimp on the basics (good observability), how to set aggressive-but-achievable goals, and how sometimes, just a handful of relatively simple changes can make all the difference.

If you work on a product that sits downstream of dependencies you don’t own, providers you can’t fire mid-request, and users who think your product is slow when really it’s the weather, this one’s for you.

Slides