How I Built a Real-Time DDoS Detection Engine from Scratch

Introduction
When your web application is under attack, every second counts. In this post I'll walk you through how I built a real-time anomaly detection engine that watches HTTP traffic, learns what normal looks like, and automatically blocks suspicious IPs using iptables — all from scratch, no rate-limiting libraries allowed.
What the Project Does
The live dashboard showing a banned IP, global request rate, and top source IPs
The system sits alongside a Nextcloud instance and monitors every HTTP request in real time. It learns normal traffic patterns, detects deviations using statistical analysis, blocks attacking IPs at the network level within 10 seconds, sends Slack alerts, and serves a live dashboard showing the current state of the system.
The Sliding Window
The detector daemon processing log lines and detecting anomalies in real time
The core data structure is a Python deque. For each IP address, I maintain a deque of (timestamp, is_error) tuples. For global traffic I maintain a deque of timestamps.
Every time a new request arrives, I append it to the deque. Before checking the rate, I evict old entries from the left using a while loop:
python
while dq and dq[0][0] < now - 60:
dq.popleft()
This gives O(1) append and O(1) eviction. The length of the deque at any moment equals the number of requests in the last 60 seconds — that's the rate. No counters, no resets, no drift.
How the Baseline Learns
Baseline mean changing across time — from 1.03 during quiet periods to 16.60 during attack traffic, then decaying back as traffic normalized
Every second, I record how many requests arrived that second into a rolling deque spanning 30 minutes. Every 60 seconds, a background thread wakes up and computes the mean and standard deviation of all per-second counts in that window:
python
mean = sum(counts) / len(counts)
variance = sum((x - mean) ** 2 for x in counts) / len(counts)
stddev = math.sqrt(variance)
I also maintain per-hour slots. If the current hour has enough data (10+ samples), I prefer that over the full 30-minute window — this means the baseline adapts to time-of-day patterns automatically.
Floor values prevent division-by-zero and false positives during startup: mean floors at 1.0, stddev at 0.5.
Detection Logic
Structured audit log showing BAN events and baseline recalculations
Two conditions can trigger a ban, whichever fires first:
Z-score threshold: if (rate - mean) / stddev > 3.0, the IP is statistically anomalous. A z-score of 3.0 means the rate is 3 standard deviations above normal — that happens by chance less than 0.3% of the time under normal traffic.
Rate multiplier: if rate > 5 * mean, the IP is sending more than 5 times the normal rate regardless of variance.
If an IP also has elevated error rates (3x the baseline error mean), thresholds tighten to 70% of normal — the system becomes more sensitive to IPs that are both high-volume and causing errors.
iptables Blocking
iptables DROP rule inserted for the attacking IP
When an IP is flagged, the blocker runs:
python
subprocess.run(["iptables", "-I", "INPUT", "-s", ip, "-j", "DROP"])
The -I flag inserts the rule at the top of the INPUT chain, so it takes effect immediately for all subsequent packets. The IP is blocked at the kernel level — packets never reach Nginx or the application.
On unban, the rule is deleted:
python
subprocess.run(["iptables", "-D", "INPUT", "-s", ip, "-j", "DROP"])
Bans follow a backoff schedule: 10 minutes for the first offense, 30 minutes for the second, 2 hours for the third, and permanent for repeat offenders.
Slack Alerts
Slack alert for a global traffic anomaly
Slack ban notification with condition, rate, baseline and duration
Automatic unban notification after the 10-minute backoff expired
Every ban, unban, and global anomaly sends a Slack message via an incoming webhook with the condition that fired, current rate, baseline mean, timestamp, and ban duration. This gives operators immediate visibility without needing to watch a dashboard.
The Dashboard
A Flask app serves a dark-themed dashboard that auto-refreshes every 3 seconds via a JavaScript fetch loop hitting /api/metrics. It shows uptime, global request rate, baseline mean and stddev, CPU and memory usage, currently banned IPs, and the top 10 source IPs by request count.
Conclusion
Building this from scratch taught me how statistical anomaly detection works in practice, how iptables integrates with application logic, and how to think about baseline learning in a system where traffic patterns change over time. The hardest part was getting the baseline to adapt quickly enough to real attacks while not being so sensitive that normal traffic spikes trigger false positives.



