Register or log in to access this video
We dive into the technical details of three areas in the context of shipping AI agents: security, reliability and performance.
This is a talk written by engineers, for engineers. At Gradient Labs we build AI agents for financial services companies. In this talk, we share techniques and best practices that we’ve developed in shipping AI agents in high-stakes environments. Specifically, we cover three areas, outlined below.
Security
We talk about how we apply the principle of least privilege to AI agents, as well as some specific practical examples where we’ve had to think through security (e.g. identity verification with voice agents).
Reliability
We talk about how we handle failures, rate limits, latency spikes, durable execution and retries, as well as how we protect our agent from external factors like spikes in load.
Performance
We talk about testing non-deterministic systems using production conversations as test fixtures, LLM-as-judge evaluation, and the metrics that actually predict customer satisfaction.
Key takeaways
- How to manage the risk of AI agents reading sensitive data and performing high-stakes actions
- How to ensure reliable execution in the presence of failures when working with AI agents
- How to test AI agents when traditional unit tests don’t work; synthetic evaluation pipelines, LLM-as-judge, and metrics that catch regressions before customers do