Overview
RAXE uses a dual-layer detection system to identify threats in LLM prompts and responses:- L1 (Rule-Based): Fast regex pattern matching (~1ms)
- L2 (ML-Based): Neural classifier for novel attacks (~3ms)
L1: Rule-Based Detection
The first layer uses 460+ curated regex patterns organized into 7 threat families. Characteristics:- Sub-millisecond latency
- High precision (95%+) on known patterns
- Zero false positives on benign prompts
- No external dependencies
L2: ML-Based Detection
The second layer uses a CPU-friendly ONNX classifier to catch:- Obfuscated attacks (l33t speak, Unicode tricks)
- Novel attack patterns
- Semantic attacks that don’t match regex
- ~3ms latency (CPU-only, no GPU needed)
- Catches attacks L1 misses
- Trained on real-world attack data
- Updates via model downloads
Detection Flow
Combining Results
When both layers detect threats, RAXE merges results:Enabling/Disabling Layers
Performance Comparison
| Configuration | Latency | Detection Rate | Use Case |
|---|---|---|---|
| L1 only | ~0.4ms | 85% | High-throughput |
| L2 only | ~3ms | 90% | Novel attacks |
| L1 + L2 | ~3.5ms | 95%+ | Maximum security |
