vLLM Semantic Router
System-Level Intelligence for Mixture-of-Models (MoM) - An intelligent routing layer that brings collective intelligence to LLM systems. Acting as an Envoy External Processor (ExtProc), it uses a signal-driven decision engine and plugin chain architecture to capture missing signals, make better routing decisions, and secure your LLM infrastructure.
Project Goalsâ
We are building the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems, answering:
- How to capture the missing signals in request, response and context?
- How to combine the signals to make better decisions?
- How to collaborate more efficiently between different models?
- How to secure the real world and LLM system from jailbreaks, PII leaks, hallucinations?
- How to collect valuable signals and build a self-learning system?
Core Architectureâ
Signal-Driven Decision Engineâ
Captures and combines 6 types of signals to make intelligent routing decisions:
| Signal Type | Description | Use Case |
|---|---|---|
| keyword | Pattern matching with AND/OR operators | Fast rule-based routing for specific terms |
| embedding | Semantic similarity using embeddings | Intent detection and semantic understanding |
| domain | MMLU domain classification (14 categories) | Academic and professional domain routing |
| fact_check | ML-based fact-checking requirement detection | Identify queries needing fact verification |
| user_feedback | User satisfaction and feedback classification | Handle follow-up messages and corrections |
| preference | LLM-based route preference matching | Complex intent analysis via external LLM |
How it works: Signals are extracted from requests, combined using AND/OR operators in decision rules, and used to select the best model and configuration.
Plugin Chain Architectureâ
Extensible plugin system for request/response processing:
| Plugin Type | Description | Use Case |
|---|---|---|
| semantic-cache | Semantic similarity-based caching | Reduce latency and costs for similar queries |
| jailbreak | Adversarial prompt detection | Block prompt injection and jailbreak attempts |
| pii | Personally identifiable information detection | Protect sensitive data and ensure compliance |
| system_prompt | Dynamic system prompt injection | Add context-aware instructions per route |
| header_mutation | HTTP header manipulation | Control routing and backend behavior |
| hallucination | Token-level hallucination detection | Real-time fact verification during generation |
How it works: Plugins form a processing chain, each plugin can inspect/modify requests and responses, with configurable enable/disable per decision.
Architecture Overviewâ
Key Benefitsâ
Intelligent Routingâ
- Signal Fusion: Combine multiple signals (keyword + embedding + domain) for accurate routing
- Adaptive Decisions: Use AND/OR operators to create complex routing logic
- Model Specialization: Route math to math models, code to code models, etc.
Security & Complianceâ
- Multi-layer Protection: PII detection, jailbreak prevention, hallucination detection
- Policy Enforcement: Model-specific PII policies and security rules
- Audit Trail: Complete logging of all security decisions
Performance & Costâ
- Semantic Caching: 10-100x latency reduction for similar queries
- Smart Model Selection: Use smaller models for simple tasks, larger for complex
- Tool Optimization: Auto-select relevant tools to reduce token usage
Flexibility & Extensibilityâ
- Plugin Architecture: Add custom processing logic without modifying core
- Signal Extensibility: Define new signal types for your use cases
- Configuration-Driven: Change routing behavior without code changes
Use Casesâ
- Enterprise API Gateways: Intelligent routing with security and compliance
- Multi-tenant Platforms: Per-tenant routing policies and model selection
- Development Environments: Cost optimization through smart model selection
- Production Services: High-performance routing with comprehensive monitoring
- Regulated Industries: Compliance-ready with PII detection and audit trails
Quick Linksâ
- Installation - Setup and installation guide
- Overview - Project goals and core concepts
- Configuration - Configure signals and routing decisions
- Tutorials - Step-by-step guides
Documentation Structureâ
This documentation is organized into the following sections:
Overviewâ
Learn about our goals, semantic routing concepts, collective intelligence, and signal-driven decisions.
Installation & Configurationâ
Get started with installation and learn how to configure signals, decisions, and plugins.
Tutorialsâ
Step-by-step guides for implementing intelligent routing, semantic caching, content safety, and observability.
Contributingâ
We welcome contributions! Please see our Contributing Guide for details.
Licenseâ
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.