Detect malicious attacks at 84M RPS with ML under 500us

A deep dive into the high-performance, distributed architecture and machine learning optimizations Cloudflare uses to detect malicious attacks at a global scale with sub-millisecond latencies.

Speakers: Denzil Correa

Register or log in to access this video

Create an account to access our free engineering leadership content, free online events and to receive our weekly email newsletter. We will also keep you up to date with LeadDev events.

We have linked your account and just need a few more details to complete your registration:

First name Last name Job title Company Country

Terms and conditions I agree to the LeadDev.com terms and conditions of use

Create a password

November 14, 2025

A deep dive into the high-performance, distributed architecture and machine learning optimizations Cloudflare uses to detect malicious attacks at a global scale with sub-millisecond latencies.

In a world where the scale of internet traffic is constantly growing, how do you protect against malicious attacks when you’re handling over 84 million requests per second? This talk will pull back the curtain on Cloudflare’s approach to high-throughput, low-latency threat detection. We’ll explore our distributed architecture that spans data centers worldwide, and how we leverage Rust for extreme efficiency and resource optimization, where every CPU cycle and byte of memory counts. We will delve into the smart processing techniques that are critical to our success, including advanced caching strategies and hardware tuning. A significant portion of the talk will focus on the machine learning models at the core of our Web Application Firewall (WAF), and the extensive optimizations we’ve implemented, from SIMD and TensorFlow Lite upgrades to LRU caching, to make them incredibly fast. We will also touch on our innovative data generation and sampling strategies that are key to training accurate and resilient AI models.

Key takeaways

Architecting for Scale: Learn how to design and build distributed systems capable of handling massive volumes of traffic with low latency.
Performance Optimization: Discover practical techniques for optimizing software and hardware, including the benefits of using Rust and advanced caching.
ML in Production: Understand the challenges and solutions for deploying and optimizing machine learning models in a high-stakes, real-time environment.
Actionable Scaling Strategies: Gain insights into how to leverage serverless and distributed architectures to scale your own applications and ML algorithms.

Slides