Shivansh21-pixel/WireSpy
GitHub: Shivansh21-pixel/WireSpy
Stars: 0 | Forks: 0
# DPI Engine - Deep Packet Inspection System
This document explains **everything** about this project - from basic networking concepts to the complete code architecture. After reading this, you should understand exactly how packets flow through the system without needing to read the code.
## Table of Contents
1. [What is DPI?](#1-what-is-dpi)
2. [Networking Background](#2-networking-background)
3. [Project Overview](#3-project-overview)
4. [File Structure](#4-file-structure)
5. [The Journey of a Packet (Simple Version)](#5-the-journey-of-a-packet-simple-version)
6. [The Journey of a Packet (Multi-threaded Version)](#6-the-journey-of-a-packet-multi-threaded-version)
7. [Deep Dive: Each Component](#7-deep-dive-each-component)
8. [How SNI Extraction Works](#8-how-sni-extraction-works)
9. [How Blocking Works](#9-how-blocking-works)
10. [Building and Running](#10-building-and-running)
11. [Understanding the Output](#11-understanding-the-output)
## 1. What is DPI?
**Deep Packet Inspection (DPI)** is a technology used to examine the contents of network packets as they pass through a checkpoint. Unlike simple firewalls that only look at packet headers (source/destination IP), DPI looks *inside* the packet payload.
### Real-World Uses:
- **ISPs**: Throttle or block certain applications (e.g., BitTorrent)
- **Enterprises**: Block social media on office networks
- **Parental Controls**: Block inappropriate websites
- **Security**: Detect malware or intrusion attempts
### What Our DPI Engine Does:
User Traffic (PCAP) → [DPI Engine] → Filtered Traffic (PCAP)
↓
- Identifies apps (YouTube, Facebook, etc.)
- Blocks based on rules
- Generates reports
## 2. Networking Background
### The Network Stack (Layers)
When you visit a website, data travels through multiple "layers":
┌─────────────────────────────────────────────────────────┐
│ Layer 7: Application │ HTTP, TLS, DNS │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Transport │ TCP (reliable), UDP (fast) │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Network │ IP addresses (routing) │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Data Link │ MAC addresses (local network)│
└─────────────────────────────────────────────────────────┘
### A Packet's Structure
Every network packet is like a **Russian nesting doll** - headers wrapped inside headers:
┌──────────────────────────────────────────────────────────────────┐
│ Ethernet Header (14 bytes) │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ IP Header (20 bytes) │ │
│ │ ┌──────────────────────────────────────────────────────────┐ │ │
│ │ │ TCP Header (20 bytes) │ │ │
│ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ Payload (Application Data) │ │ │ │
│ │ │ │ e.g., TLS Client Hello with SNI │ │ │ │
│ │ │ └──────────────────────────────────────────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
### The Five-Tuple
A **connection** (or "flow") is uniquely identified by 5 values:
| Field | Example | Purpose |
|-------|---------|---------|
| Source IP | 192.168.1.100 | Who is sending |
| Destination IP | 172.217.14.206 | Where it's going |
| Source Port | 54321 | Sender's application identifier |
| Destination Port | 443 | Service being accessed (443 = HTTPS) |
| Protocol | TCP (6) | TCP or UDP |
**Why is this important?**
- All packets with the same 5-tuple belong to the same connection
- If we block one packet of a connection, we should block all of them
- This is how we "track" conversations between computers
### What is SNI?
**Server Name Indication (SNI)** is part of the TLS/HTTPS handshake. When you visit `https://www.youtube.com`:
1. Your browser sends a "Client Hello" message
2. This message includes the domain name in **plaintext** (not encrypted yet!)
3. The server uses this to know which certificate to send
TLS Client Hello:
├── Version: TLS 1.2
├── Random: [32 bytes]
├── Cipher Suites: [list]
└── Extensions:
└── SNI Extension:
└── Server Name: "www.youtube.com" ← We extract THIS!
**This is the key to DPI**: Even though HTTPS is encrypted, the domain name is visible in the first packet!
## 3. Project Overview
### What This Project Does
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Wireshark │ │ DPI Engine │ │ Output │
│ Capture │ ──► │ │ ──► │ PCAP │
│ (input.pcap)│ │ - Parse │ │ (filtered) │
└─────────────┘ │ - Classify │ └─────────────┘
│ - Block │
│ - Report │
└─────────────┘
### Two Versions
| Version | File | Use Case |
|---------|------|----------|
| Simple (Single-threaded) | `src/main_working.cpp` | Learning, small captures |
| Multi-threaded | `src/dpi_mt.cpp` | Production, large captures |
## 4. File Structure
packet_analyzer/
├── include/ # Header files (declarations)
│ ├── pcap_reader.h # PCAP file reading
│ ├── packet_parser.h # Network protocol parsing
│ ├── sni_extractor.h # TLS/HTTP inspection
│ ├── types.h # Data structures (FiveTuple, AppType, etc.)
│ ├── rule_manager.h # Blocking rules (multi-threaded version)
│ ├── connection_tracker.h # Flow tracking (multi-threaded version)
│ ├── load_balancer.h # LB thread (multi-threaded version)
│ ├── fast_path.h # FP thread (multi-threaded version)
│ ├── thread_safe_queue.h # Thread-safe queue
│ └── dpi_engine.h # Main orchestrator
│
├── src/ # Implementation files
│ ├── pcap_reader.cpp # PCAP file handling
│ ├── packet_parser.cpp # Protocol parsing
│ ├── sni_extractor.cpp # SNI/Host extraction
│ ├── types.cpp # Helper functions
│ ├── main_working.cpp # ★ SIMPLE VERSION ★
│ ├── dpi_mt.cpp # ★ MULTI-THREADED VERSION ★
│ └── [other files] # Supporting code
│
├── generate_test_pcap.py # Creates test data
├── test_dpi.pcap # Sample capture with various traffic
└── README.md # This file!
## 5. The Journey of a Packet (Simple Version)
Let's trace a single packet through `main_working.cpp`:
### Step 1: Read PCAP File
PcapReader reader;
reader.open("capture.pcap");
**What happens:**
1. Open the file in binary mode
2. Read the 24-byte global header (magic number, version, etc.)
3. Verify it's a valid PCAP file
**PCAP File Format:**
┌────────────────────────────┐
│ Global Header (24 bytes) │ ← Read once at start
├────────────────────────────┤
│ Packet Header (16 bytes) │ ← Timestamp, length
│ Packet Data (variable) │ ← Actual network bytes
├────────────────────────────┤
│ Packet Header (16 bytes) │
│ Packet Data (variable) │
├────────────────────────────┤
│ ... more packets ... │
└────────────────────────────┘
### Step 2: Read Each Packet
while (reader.readNextPacket(raw)) {
// raw.data contains the packet bytes
// raw.header contains timestamp and length
}
**What happens:**
1. Read 16-byte packet header
2. Read N bytes of packet data (N = header.incl_len)
3. Return false when no more packets
### Step 3: Parse Protocol Headers
PacketParser::parse(raw, parsed);
**What happens (in packet_parser.cpp):**
raw.data bytes:
[0-13] Ethernet Header
[14-33] IP Header
[34-53] TCP Header
[54+] Payload
After parsing:
parsed.src_mac = "00:11:22:33:44:55"
parsed.dest_mac = "aa:bb:cc:dd:ee:ff"
parsed.src_ip = "192.168.1.100"
parsed.dest_ip = "172.217.14.206"
parsed.src_port = 54321
parsed.dest_port = 443
parsed.protocol = 6 (TCP)
parsed.has_tcp = true
**Parsing the Ethernet Header (14 bytes):**
Bytes 0-5: Destination MAC
Bytes 6-11: Source MAC
Bytes 12-13: EtherType (0x0800 = IPv4)
**Parsing the IP Header (20+ bytes):**
Byte 0: Version (4 bits) + Header Length (4 bits)
Byte 8: TTL (Time To Live)
Byte 9: Protocol (6=TCP, 17=UDP)
Bytes 12-15: Source IP
Bytes 16-19: Destination IP
**Parsing the TCP Header (20+ bytes):**
Bytes 0-1: Source Port
Bytes 2-3: Destination Port
Bytes 4-7: Sequence Number
Bytes 8-11: Acknowledgment Number
Byte 12: Data Offset (header length)
Byte 13: Flags (SYN, ACK, FIN, etc.)
### Step 4: Create Five-Tuple and Look Up Flow
FiveTuple tuple;
tuple.src_ip = parseIP(parsed.src_ip);
tuple.dst_ip = parseIP(parsed.dest_ip);
tuple.src_port = parsed.src_port;
tuple.dst_port = parsed.dest_port;
tuple.protocol = parsed.protocol;
Flow& flow = flows[tuple]; // Get or create
**What happens:**
- The flow table is a hash map: `FiveTuple → Flow`
- If this 5-tuple exists, we get the existing flow
- If not, a new flow is created
- All packets with the same 5-tuple share the same flow
### Step 5: Extract SNI (Deep Packet Inspection)
// For HTTPS traffic (port 443)
if (pkt.tuple.dst_port == 443 && pkt.payload_length > 5) {
auto sni = SNIExtractor::extract(payload, payload_length);
if (sni) {
flow.sni = *sni; // "www.youtube.com"
flow.app_type = sniToAppType(*sni); // AppType::YOUTUBE
}
}
**What happens (in sni_extractor.cpp):**
1. **Check if it's a TLS Client Hello:**
Byte 0: Content Type = 0x16 (Handshake) ✓
Byte 5: Handshake Type = 0x01 (Client Hello) ✓
2. **Navigate to Extensions:**
Skip: Version, Random, Session ID, Cipher Suites, Compression
3. **Find SNI Extension (type 0x0000):**
Extension Type: 0x0000 (SNI)
Extension Length: N
SNI List Length: M
SNI Type: 0x00 (hostname)
SNI Length: L
SNI Value: "www.youtube.com" ← FOUND!
4. **Map SNI to App Type:**
// In types.cpp
if (sni.find("youtube") != std::string::npos) {
return AppType::YOUTUBE;
}
### Step 6: Check Blocking Rules
if (rules.isBlocked(tuple.src_ip, flow.app_type, flow.sni)) {
flow.blocked = true;
}
**What happens:**
// Check IP blacklist
if (blocked_ips.count(src_ip)) return true;
// Check app blacklist
if (blocked_apps.count(app)) return true;
// Check domain blacklist (substring match)
for (const auto& dom : blocked_domains) {
if (sni.find(dom) != std::string::npos) return true;
}
return false;
### Step 7: Forward or Drop
if (flow.blocked) {
dropped++;
// Don't write to output
} else {
forwarded++;
// Write packet to output file
output.write(packet_header);
output.write(packet_data);
}
### Step 8: Generate Report
After processing all packets:
// Count apps
for (const auto& [tuple, flow] : flows) {
app_stats[flow.app_type]++;
}
// Print report
"YouTube: 150 packets (15%)"
"Facebook: 80 packets (8%)"
...
## 6. The Journey of a Packet (Multi-threaded Version)
The multi-threaded version (`dpi_mt.cpp`) adds **parallelism** for high performance:
### Architecture Overview
┌─────────────────┐
│ Reader Thread │
│ (reads PCAP) │
└────────┬────────┘
│
┌──────────────┴──────────────┐
│ hash(5-tuple) % 2 │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ LB0 Thread │ │ LB1 Thread │
│ (Load Balancer)│ │ (Load Balancer)│
└────────┬────────┘ └────────┬────────┘
│ │
┌──────┴──────┐ ┌──────┴──────┐
│hash % 2 │ │hash % 2 │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│FP0 Thread│ │FP1 Thread│ │FP2 Thread│ │FP3 Thread│
│(Fast Path)│ │(Fast Path)│ │(Fast Path)│ │(Fast Path)│
└─────┬────┘ └─────┬────┘ └─────┬────┘ └─────┬────┘
│ │ │ │
└────────────┴──────────────┴────────────┘
│
▼
┌───────────────────────┐
│ Output Queue │
└───────────┬───────────┘
│
▼
┌───────────────────────┐
│ Output Writer Thread │
│ (writes to PCAP) │
└───────────────────────┘
### Why This Design?
1. **Load Balancers (LBs):** Distribute work across FPs
2. **Fast Paths (FPs):** Do the actual DPI processing
3. **Consistent Hashing:** Same 5-tuple always goes to same FP
**Why consistent hashing matters:**
Connection: 192.168.1.100:54321 → 142.250.185.206:443
Packet 1 (SYN): hash → FP2
Packet 2 (SYN-ACK): hash → FP2 (same FP!)
Packet 3 (Client Hello): hash → FP2 (same FP!)
Packet 4 (Data): hash → FP2 (same FP!)
All packets of this connection go to FP2.
FP2 can track the flow state correctly.
### Detailed Flow
#### Step 1: Reader Thread
#### Step 2: Load Balancer Thread
void LoadBalancer::run() {
while (running_) {
// Pop from my input queue
auto pkt = input_queue_.pop();
// Hash to select Fast Path
size_t fp_idx = hash(pkt.tuple) % num_fps_;
// Push to FP's queue
fps_[fp_idx]->queue().push(pkt);
}
}
#### Step 3: Fast Path Thread
void FastPath::run() {
while (running_) {
// Pop from my input queue
auto pkt = input_queue_.pop();
// Look up flow (each FP has its own flow table)
Flow& flow = flows_[pkt.tuple];
// Classify (SNI extraction)
classifyFlow(pkt, flow);
// Check rules
if (rules_->isBlocked(pkt.tuple.src_ip, flow.app_type, flow.sni)) {
stats_->dropped++;
} else {
// Forward: push to output queue
output_queue_->push(pkt);
}
}
}
#### Step 4: Output Writer Thread
void outputThread() {
while (running_ || output_queue_.size() > 0) {
auto pkt = output_queue_.pop();
// Write to output file
output_file.write(packet_header);
output_file.write(pkt.data);
}
}
### Thread-Safe Queue
The magic that makes multi-threading work:
template
class TSQueue {
std::queue queue_;
std::mutex mutex_;
std::condition_variable not_empty_;
std::condition_variable not_full_;
void push(T item) {
std::lock_guard lock(mutex_);
queue_.push(item);
not_empty_.notify_one(); // Wake up waiting consumer
}
T pop() {
std::unique_lock lock(mutex_);
not_empty_.wait(lock, [&]{ return !queue_.empty(); });
T item = queue_.front();
queue_.pop();
return item;
}
};
**How it works:**
- `push()`: Producer adds item, signals waiting consumers
- `pop()`: Consumer waits until item available, then takes it
- `mutex`: Only one thread can access at a time
- `condition_variable`: Efficient waiting (no busy-loop)
## 7. Deep Dive: Each Component
### pcap_reader.h / pcap_reader.cpp
**Purpose:** Read network captures saved by Wireshark
**Key structures:**
struct PcapGlobalHeader {
uint32_t magic_number; // 0xa1b2c3d4 identifies PCAP
uint16_t version_major; // Usually 2
uint16_t version_minor; // Usually 4
uint32_t snaplen; // Max packet size captured
uint32_t network; // 1 = Ethernet
};
struct PcapPacketHeader {
uint32_t ts_sec; // Timestamp (seconds)
uint32_t ts_usec; // Timestamp (microseconds)
uint32_t incl_len; // Bytes saved in file
uint32_t orig_len; // Original packet size
};
**Key functions:**
- `open(filename)`: Open PCAP, validate header
- `readNextPacket(raw)`: Read next packet into buffer
- `close()`: Clean up
### packet_parser.h / packet_parser.cpp
**Purpose:** Extract protocol fields from raw bytes
**Key function:**
bool PacketParser::parse(const RawPacket& raw, ParsedPacket& parsed) {
parseEthernet(...); // Extract MACs, EtherType
parseIPv4(...); // Extract IPs, protocol, TTL
parseTCP(...); // Extract ports, flags, seq numbers
// OR
parseUDP(...); // Extract ports
}
**Important concepts:**
*Network Byte Order:* Network protocols use big-endian (most significant byte first). Your computer might use little-endian. We use `ntohs()` and `ntohl()` to convert:
// ntohs = Network TO Host Short (16-bit)
uint16_t port = ntohs(*(uint16_t*)(data + offset));
// ntohl = Network TO Host Long (32-bit)
uint32_t seq = ntohl(*(uint32_t*)(data + offset));
### sni_extractor.h / sni_extractor.cpp
**Purpose:** Extract domain names from TLS and HTTP
**For TLS (HTTPS):**
std::optional SNIExtractor::extract(
const uint8_t* payload,
size_t length
) {
// 1. Verify TLS record header
// 2. Verify Client Hello handshake
// 3. Skip to extensions
// 4. Find SNI extension (type 0x0000)
// 5. Extract hostname string
}
**For HTTP:**
std::optional HTTPHostExtractor::extract(
const uint8_t* payload,
size_t length
) {
// 1. Verify HTTP request (GET, POST, etc.)
// 2. Search for "Host: " header
// 3. Extract value until newline
}
### types.h / types.cpp
**Purpose:** Define data structures used throughout
**FiveTuple:**
struct FiveTuple {
uint32_t src_ip;
uint32_t dst_ip;
uint16_t src_port;
uint16_t dst_port;
uint8_t protocol;
bool operator==(const FiveTuple& other) const;
};
**AppType:**
enum class AppType {
UNKNOWN,
HTTP,
HTTPS,
DNS,
GOOGLE,
YOUTUBE,
FACEBOOK,
// ... more apps
};
**sniToAppType function:**
AppType sniToAppType(const std::string& sni) {
if (sni.find("youtube") != std::string::npos)
return AppType::YOUTUBE;
if (sni.find("facebook") != std::string::npos)
return AppType::FACEBOOK;
// ... more patterns
}
## 8. How SNI Extraction Works
### The TLS Handshake
When you visit `https://www.youtube.com`:
┌──────────┐ ┌──────────┐
│ Browser │ │ Server │
└────┬─────┘ └────┬─────┘
│ │
│ ──── Client Hello ─────────────────────►│
│ (includes SNI: www.youtube.com) │
│ │
│ ◄─── Server Hello ───────────────────── │
│ (includes certificate) │
│ │
│ ──── Key Exchange ─────────────────────►│
│ │
│ ◄═══ Encrypted Data ══════════════════► │
│ (from here on, everything is │
│ encrypted - we can't see it) │
**We can only extract SNI from the Client Hello!**
### TLS Client Hello Structure
Byte 0: Content Type = 0x16 (Handshake)
Bytes 1-2: Version = 0x0301 (TLS 1.0)
Bytes 3-4: Record Length
-- Handshake Layer --
Byte 5: Handshake Type = 0x01 (Client Hello)
Bytes 6-8: Handshake Length
-- Client Hello Body --
Bytes 9-10: Client Version
Bytes 11-42: Random (32 bytes)
Byte 43: Session ID Length (N)
Bytes 44 to 44+N: Session ID
... Cipher Suites ...
... Compression Methods ...
-- Extensions --
Bytes X-X+1: Extensions Length
For each extension:
Bytes: Extension Type (2)
Bytes: Extension Length (2)
Bytes: Extension Data
-- SNI Extension (Type 0x0000) --
Extension Type: 0x0000
Extension Length: L
SNI List Length: M
SNI Type: 0x00 (hostname)
SNI Length: K
SNI Value: "www.youtube.com" ← THE GOAL!
### Our Extraction Code (Simplified)
std::optional SNIExtractor::extract(
const uint8_t* payload, size_t length
) {
// Check TLS record header
if (payload[0] != 0x16) return std::nullopt; // Not handshake
if (payload[5] != 0x01) return std::nullopt; // Not Client Hello
size_t offset = 43; // Skip to session ID
// Skip Session ID
uint8_t session_len = payload[offset];
offset += 1 + session_len;
// Skip Cipher Suites
uint16_t cipher_len = readUint16BE(payload + offset);
offset += 2 + cipher_len;
// Skip Compression Methods
uint8_t comp_len = payload[offset];
offset += 1 + comp_len;
// Read Extensions Length
uint16_t ext_len = readUint16BE(payload + offset);
offset += 2;
// Search for SNI extension
size_t ext_end = offset + ext_len;
while (offset + 4 <= ext_end) {
uint16_t ext_type = readUint16BE(payload + offset);
uint16_t ext_data_len = readUint16BE(payload + offset + 2);
offset += 4;
if (ext_type == 0x0000) { // SNI!
// Parse SNI structure
uint16_t sni_len = readUint16BE(payload + offset + 3);
return std::string(
(char*)(payload + offset + 5),
sni_len
);
}
offset += ext_data_len;
}
return std::nullopt; // SNI not found
}
## 9. How Blocking Works
### Rule Types
| Rule Type | Example | What it Blocks |
|-----------|---------|----------------|
| IP | `192.168.1.50` | All traffic from this source |
| App | `YouTube` | All YouTube connections |
| Domain | `tiktok` | Any SNI containing "tiktok" |
### The Blocking Flow
Packet arrives
│
▼
┌─────────────────────────────────┐
│ Is source IP in blocked list? │──Yes──► DROP
└───────────────┬─────────────────┘
│No
▼
┌─────────────────────────────────┐
│ Is app type in blocked list? │──Yes──► DROP
└───────────────┬─────────────────┘
│No
▼
┌─────────────────────────────────┐
│ Does SNI match blocked domain? │──Yes──► DROP
└───────────────┬─────────────────┘
│No
▼
FORWARD
### Flow-Based Blocking
**Important:** We block at the *flow* level, not packet level.
Connection to YouTube:
Packet 1 (SYN) → No SNI yet, FORWARD
Packet 2 (SYN-ACK) → No SNI yet, FORWARD
Packet 3 (ACK) → No SNI yet, FORWARD
Packet 4 (Client Hello) → SNI: www.youtube.com
→ App: YOUTUBE (blocked!)
→ Mark flow as BLOCKED
→ DROP this packet
Packet 5 (Data) → Flow is BLOCKED → DROP
Packet 6 (Data) → Flow is BLOCKED → DROP
...all subsequent packets → DROP
**Why this approach?**
- We can't identify the app until we see the Client Hello
- Once identified, we block all future packets of that flow
- The connection will fail/timeout on the client
## 10. Building and Running
### Prerequisites
- **macOS/Linux** with C++17 compiler
- **g++** or **clang++**
- No external libraries needed!
### Build Commands
**Simple Version:**
g++ -std=c++17 -O2 -I include -o dpi_simple \
src/main_working.cpp \
src/pcap_reader.cpp \
src/packet_parser.cpp \
src/sni_extractor.cpp \
src/types.cpp
**Multi-threaded Version:**
g++ -std=c++17 -pthread -O2 -I include -o dpi_engine \
src/dpi_mt.cpp \
src/pcap_reader.cpp \
src/packet_parser.cpp \
src/sni_extractor.cpp \
src/types.cpp
### Running
**Basic usage:**
./dpi_engine test_dpi.pcap output.pcap
**With blocking:**
./dpi_engine test_dpi.pcap output.pcap \
--block-app YouTube \
--block-app TikTok \
--block-ip 192.168.1.50 \
--block-domain facebook
./dpi_engine input.pcap output.pcap --lbs 4 --fps 4
# Creates 4 LB threads × 4 FP threads = 16 processing threads
### Creating Test Data
python3 generate_test_pcap.py
# Creates test_dpi.pcap with sample traffic
## 11. Understanding the Output
### Sample Output
╔══════════════════════════════════════════════════════════════╗
║ DPI ENGINE v2.0 (Multi-threaded) ║
╠══════════════════════════════════════════════════════════════╣
║ Load Balancers: 2 FPs per LB: 2 Total FPs: 4 ║
╚══════════════════════════════════════════════════════════════╝
[Rules] Blocked app: YouTube
[Rules] Blocked IP: 192.168.1.50
[Reader] Processing packets...
[Reader] Done reading 77 packets
╔══════════════════════════════════════════════════════════════╗
║ PROCESSING REPORT ║
╠══════════════════════════════════════════════════════════════╣
║ Total Packets: 77 ║
║ Total Bytes: 5738 ║
║ TCP Packets: 73 ║
║ UDP Packets: 4 ║
╠══════════════════════════════════════════════════════════════╣
║ Forwarded: 69 ║
║ Dropped: 8 ║
╠══════════════════════════════════════════════════════════════╣
║ THREAD STATISTICS ║
║ LB0 dispatched: 53 ║
║ LB1 dispatched: 24 ║
║ FP0 processed: 53 ║
║ FP1 processed: 0 ║
║ FP2 processed: 0 ║
║ FP3 processed: 24 ║
╠══════════════════════════════════════════════════════════════╣
║ APPLICATION BREAKDOWN ║
╠══════════════════════════════════════════════════════════════╣
║ HTTPS 39 50.6% ########## ║
║ Unknown 16 20.8% #### ║
║ YouTube 4 5.2% # (BLOCKED) ║
║ DNS 4 5.2% # ║
║ Facebook 3 3.9% ║
║ ... ║
╚══════════════════════════════════════════════════════════════╝
[Detected Domains/SNIs]
- www.youtube.com -> YouTube
- www.facebook.com -> Facebook
- www.google.com -> Google
- github.com -> GitHub
...
### What Each Section Means
| Section | Meaning |
|---------|---------|
| Configuration | Number of threads created |
| Rules | Which blocking rules are active |
| Total Packets | Packets read from input file |
| Forwarded | Packets written to output file |
| Dropped | Packets blocked (not written) |
| Thread Statistics | Work distribution across threads |
| Application Breakdown | Traffic classification results |
| Detected SNIs | Actual domain names found |
## 12. Extending the Project
### Ideas for Improvement
1. **Add More App Signatures**
// In types.cpp
if (sni.find("twitch") != std::string::npos)
return AppType::TWITCH;
2. **Add Bandwidth Throttling**
// Instead of DROP, delay packets
if (shouldThrottle(flow)) {
std::this_thread::sleep_for(10ms);
}
3. **Add Live Statistics Dashboard**
// Separate thread printing stats every second
void statsThread() {
while (running) {
printStats();
sleep(1);
}
}
4. **Add QUIC/HTTP3 Support**
- QUIC uses UDP on port 443
- SNI is in the Initial packet (encrypted differently)
5. **Add Persistent Rules**
- Save rules to file
- Load on startup
## Summary
This DPI engine demonstrates:
1. **Network Protocol Parsing** - Understanding packet structure
2. **Deep Packet Inspection** - Looking inside encrypted connections
3. **Flow Tracking** - Managing stateful connections
4. **Multi-threaded Architecture** - Scaling with thread pools
5. **Producer-Consumer Pattern** - Thread-safe queues
The key insight is that even HTTPS traffic leaks the destination domain in the TLS handshake, allowing network operators to identify and control application usage.
## Questions?
If you have questions about any part of this project, the code is well-commented and follows the same flow described in this document. Start with the simple version (`main_working.cpp`) to understand the concepts, then move to the multi-threaded version (`dpi_mt.cpp`) to see how parallelism is added.