How I Built a Real-Time DDoS Detection Engine from Scratch

Introduction

When your web application is under attack, every second counts. In this post I'll walk you through how I built a real-time anomaly detection engine that watches HTTP traffic, learns what normal looks like, and automatically blocks suspicious IPs using iptables — all from scratch, no rate-limiting libraries allowed.

What the Project Does

The live dashboard showing a banned IP, global request rate, and top source IPs

The system sits alongside a Nextcloud instance and monitors every HTTP request in real time. It learns normal traffic patterns, detects deviations using statistical analysis, blocks attacking IPs at the network level within 10 seconds, sends Slack alerts, and serves a live dashboard showing the current state of the system.

The Sliding Window

The detector daemon processing log lines and detecting anomalies in real time

The core data structure is a Python deque. For each IP address, I maintain a deque of (timestamp, is_error) tuples. For global traffic I maintain a deque of timestamps.

Every time a new request arrives, I append it to the deque. Before checking the rate, I evict old entries from the left using a while loop:

python

while dq and dq[0][0] < now - 60:
    dq.popleft()

This gives O(1) append and O(1) eviction. The length of the deque at any moment equals the number of requests in the last 60 seconds — that's the rate. No counters, no resets, no drift.

How the Baseline Learns

Baseline mean changing across time — from 1.03 during quiet periods to 16.60 during attack traffic, then decaying back as traffic normalized

Every second, I record how many requests arrived that second into a rolling deque spanning 30 minutes. Every 60 seconds, a background thread wakes up and computes the mean and standard deviation of all per-second counts in that window:

python

mean = sum(counts) / len(counts)
variance = sum((x - mean) ** 2 for x in counts) / len(counts)
stddev = math.sqrt(variance)

I also maintain per-hour slots. If the current hour has enough data (10+ samples), I prefer that over the full 30-minute window — this means the baseline adapts to time-of-day patterns automatically.

Floor values prevent division-by-zero and false positives during startup: mean floors at 1.0, stddev at 0.5.

Detection Logic

Structured audit log showing BAN events and baseline recalculations

Two conditions can trigger a ban, whichever fires first:

Z-score threshold: if (rate - mean) / stddev > 3.0, the IP is statistically anomalous. A z-score of 3.0 means the rate is 3 standard deviations above normal — that happens by chance less than 0.3% of the time under normal traffic.

Rate multiplier: if rate > 5 * mean, the IP is sending more than 5 times the normal rate regardless of variance.

If an IP also has elevated error rates (3x the baseline error mean), thresholds tighten to 70% of normal — the system becomes more sensitive to IPs that are both high-volume and causing errors.

iptables Blocking

iptables DROP rule inserted for the attacking IP

When an IP is flagged, the blocker runs:

python

subprocess.run(["iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"])

The -I flag inserts the rule at the top of the INPUT chain, so it takes effect immediately for all subsequent packets. The IP is blocked at the kernel level — packets never reach Nginx or the application.

On unban, the rule is deleted:

python

subprocess.run(["iptables", "-D", "INPUT", "-s", ip, "-j", "DROP"])

Bans follow a backoff schedule: 10 minutes for the first offense, 30 minutes for the second, 2 hours for the third, and permanent for repeat offenders.

Slack Alerts

Slack alert for a global traffic anomaly

Slack ban notification with condition, rate, baseline and duration

Automatic unban notification after the 10-minute backoff expired

Every ban, unban, and global anomaly sends a Slack message via an incoming webhook with the condition that fired, current rate, baseline mean, timestamp, and ban duration. This gives operators immediate visibility without needing to watch a dashboard.

The Dashboard

A Flask app serves a dark-themed dashboard that auto-refreshes every 3 seconds via a JavaScript fetch loop hitting /api/metrics. It shows uptime, global request rate, baseline mean and stddev, CPU and memory usage, currently banned IPs, and the top 10 source IPs by request count.

Conclusion

Building this from scratch taught me how statistical anomaly detection works in practice, how iptables integrates with application logic, and how to think about baseline learning in a system where traffic patterns change over time. The hardest part was getting the baseline to adapt quickly enough to real attacks while not being so sensitive that normal traffic spikes trigger false positives.

How I Built a Real-Time DDoS Detection Engine from Scratch

Comments

More from this blog

Grafana Dashboards as Code

Prometheus Deep Dive — Alerts, Debugging, and What

Observability Customisation

Building Your Own Virtual Private Cloud (VPC) on Linux

Command Palette

Comments

More from this blog