Overview
RAXE is designed for production workloads with sub-millisecond latency and high throughput.P50 Latency
0.37ms
P95 Latency
0.49ms
Throughput
~1,200/sec
Benchmark Results
Latency by Configuration
| Configuration | P50 | P95 | P99 | Use Case |
|---|---|---|---|---|
| L1 only (fast) | 0.37ms | 0.49ms | 1.34ms | High-throughput APIs |
| L2 only (ML) | ~3ms | ~5ms | ~10ms | Novel attack detection |
| L1 + L2 (balanced) | ~3.5ms | ~5.5ms | ~12ms | Production default |
| L1 + L2 (thorough) | ~5ms | ~8ms | ~15ms | Maximum security |
Throughput
| Mode | Single-threaded | Multi-threaded (10) | AsyncRaxe |
|---|---|---|---|
| L1 only | ~1,200/sec | ~8,000/sec | ~10,000/sec |
| L1 + L2 | ~250/sec | ~2,000/sec | ~3,000/sec |
Memory Usage
| Component | Memory |
|---|---|
| Base SDK | ~20MB |
| L1 Rules (460+) | ~10MB |
| L2 ML Model | ~30MB |
| Total Peak | ~60MB |
Performance Modes
RAXE provides three performance modes to balance speed and detection:Fast Mode
L1 rules only, optimized for latency.- ~0.4ms average latency
- 85% detection rate
- Zero ML overhead
- Best for: High-volume APIs, real-time chat
Balanced Mode (Default)
L1 + L2 with async parallel execution.- ~3.5ms average latency
- 95% detection rate
- ML runs in parallel with rules
- Best for: Production applications
Thorough Mode
All detection layers with maximum coverage.- ~5ms average latency
- 95%+ detection rate
- Additional rule variations checked
- Best for: Security-critical applications
Optimization Tips
1. Use AsyncRaxe for High Throughput
2. Enable Caching
AsyncRaxe includes built-in caching for repeated scans:3. Disable L2 for Speed-Critical Paths
4. Use Thread Pools for Sync Code
5. Warm Up on Startup
First scan has initialization overhead. Warm up during startup:Latency Breakdown
L1 (Rule-Based) Detection
| Stage | Time |
|---|---|
| Text preprocessing | ~0.05ms |
| Pattern compilation | Cached |
| Pattern matching | ~0.25ms |
| Result aggregation | ~0.05ms |
| Total | ~0.35ms |
L2 (ML-Based) Detection
| Stage | Time |
|---|---|
| Text tokenization | ~0.5ms |
| Feature extraction | ~0.5ms |
| ONNX inference | ~2ms |
| Prediction decode | ~0.1ms |
| Total | ~3ms |
Combined Pipeline
Hardware Recommendations
Minimum Requirements
- CPU: 2 cores
- RAM: 512MB
- Python: 3.10+
Recommended (Production)
- CPU: 4+ cores (for parallel L1/L2)
- RAM: 2GB+
- SSD: For scan history database
High-Throughput
- CPU: 8+ cores
- RAM: 4GB+
- Use AsyncRaxe with high concurrency
Monitoring Performance
Built-in Profiling
CLI Profiling
Statistics
Benchmarking Your Setup
Run the built-in benchmark:Performance Guarantees
RAXE is designed to avoid performance regressions:- No catastrophic backtracking: All 460+ regex patterns are REDOS-safe
- Bounded memory: Fixed-size buffers, no unbounded allocations
- Timeouts: Configurable scan timeouts prevent runaway processing
- Circuit breaker: Graceful degradation under extreme load
