配置
本指南涵盖了语义路由 (Semantic Router) 的配置选项。系统使用单个 YAML 配置文件来控制信号驱动路由、插件链处理和模型选择。
架构概览
配置定义了三个主要层:
- 信号提取层:定义 6 种类型的信号(关键词、嵌入、领域、事实核查、用户反馈、偏好)
- 决策引擎:使用 AND/OR 运算符组合信号以做出路由决策
- 插件链:配置用于缓存、安全和优化的插件
配置文件
配置文件位于 config/config.yaml。以下是基于实际实现的结构:
# config/config.yaml - 实际配置结构
# 用于语义相似度的 BERT 模型
bert_model:
model_id: sentence-transformers/all-MiniLM-L12-v2
threshold: 0.6
use_cpu: true
# 语义缓存
semantic_cache:
backend_type: "memory" # 选项: "memory" 或 "milvus"
enabled: false
similarity_threshold: 0.8 # 全局默认阈值
max_entries: 1000
ttl_seconds: 3600
eviction_policy: "fifo" # 选项: "fifo", "lru", "lfu"
# 工具自动选择
tools:
enabled: false
top_k: 3
similarity_threshold: 0.2
tools_db_path: "config/tools_db.json"
fallback_to_empty: true
# 越狱防护
prompt_guard:
enabled: false # 全局默认 - 可以针对每个类别覆盖
use_modernbert: true
model_id: "models/jailbreak_classifier_modernbert-base_model"
threshold: 0.7
use_cpu: true
# vLLM 端点 - 您的后端模型
vllm_endpoints:
- name: "endpoint1"
address: "192.168.1.100" # 替换为您的服务器 IP 地址
port: 11434
models:
- "your-model" # 替换为您的模型
weight: 1
# 模型配置
model_config:
"your-model":
pii_policy:
allow_by_default: true
pii_types_allowed: ["EMAIL_ADDRESS", "PERSON"]
preferred_endpoints: ["endpoint1"]
# 示例:具有自定义名称的 DeepSeek 模型
"ds-v31-custom":
reasoning_family: "deepseek" # 使用 DeepSeek 推理语法
preferred_endpoints: ["endpoint1"]
# 示例:具有自定义名称的 Qwen3 模型
"my-qwen3-model":
reasoning_family: "qwen3" # 使用 Qwen3 推理语法
preferred_endpoints: ["endpoint2"]
# 示例:不支持推理的模型
"phi4":
preferred_endpoints: ["endpoint1"]
# 分类模型
classifier:
category_model:
model_id: "models/category_classifier_modernbert-base_model"
use_modernbert: true
threshold: 0.6
use_cpu: true
pii_model:
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
use_modernbert: true
threshold: 0.7
use_cpu: true
# 信号 - 信号提取配置
signals:
# 基于关键词的信号(快速模式匹配)
keywords:
- name: "math_keywords"
operator: "OR"
keywords:
- "calculate"
- "equation"
- "solve"
- "derivative"
- "integral"
case_sensitive: false
- name: "code_keywords"
operator: "OR"
keywords:
- "function"
- "class"
- "debug"
- "compile"
case_sensitive: false
# 基于嵌入的信号(语义相似度)
embeddings:
- name: "code_debug"
threshold: 0.70
candidates:
- "how to debug the code"
- "troubleshooting steps for my code"
aggregation_method: "max"
- name: "math_intent"
threshold: 0.75
candidates:
- "solve mathematical problem"
- "calculate the result"
aggregation_method: "max"
# 领域信号(MMLU 分类)
domains:
- name: "mathematics"
description: "Mathematical and computational problems"
mmlu_categories:
- "abstract_algebra"
- "college_mathematics"
- "elementary_mathematics"
- name: "computer_science"
description: "Programming and computer science"
mmlu_categories:
- "computer_security"
- "machine_learning"
# 事实核查信号(检测验证需求)
fact_check:
- name: "needs_verification"
description: "Queries requiring fact verification"
# 用户反馈信号(满意度分析)
user_feedbacks:
- name: "correction_needed"
description: "User indicates previous answer was wrong"
# 偏好信号(基于 LLM 的匹配)
preferences:
- name: "complex_reasoning"
description: "Requires deep reasoning and analysis"
llm_endpoint: "http://localhost:11434"
# 类别 - 定义领域类别
categories:
- name: math
- name: computer science
- name: other
# 决策 - 结合信号以做出路由决策
decisions:
- name: math
description: "Route mathematical queries"
priority: 10
rules:
operator: "OR" # 匹配任何条件
conditions:
- type: "keyword"
name: "math_keywords"
- type: "embedding"
name: "math_intent"
- type: "domain"
name: "mathematics"
modelRefs:
- model: your-model
use_reasoning: true # 为数学问题启用推理
# 可选:决策级插件
plugins:
- type: "semantic-cache"
configuration:
enabled: true
similarity_threshold: 0.9 # 数学问题需要更高的阈值
- type: "jailbreak"
configuration:
enabled: true
- type: "pii"
configuration:
enabled: true
threshold: 0.8
- type: "system_prompt"
configuration:
enabled: true
prompt: "You are a mathematics expert. Solve problems step by step."
- name: computer_science
description: "Route computer science queries"
priority: 10
rules:
operator: "OR"
conditions:
- type: "keyword"
name: "code_keywords"
- type: "embedding"
name: "code_debug"
- type: "domain"
name: "computer_science"
modelRefs:
- model: your-model
use_reasoning: true # 为代码启用推理
plugins:
- type: "semantic-cache"
configuration:
enabled: true
similarity_threshold: 0.85
- type: "system_prompt"
configuration:
enabled: true
prompt: "You are a programming expert. Provide clear code examples."
- name: other
description: "Route general queries"
priority: 5
rules:
operator: "OR"
conditions:
- type: "domain"
name: "other"
modelRefs:
- model: your-model
use_reasoning: false # 通用查询不使用推理
plugins:
- type: "semantic-cache"
configuration:
enabled: true
similarity_threshold: 0.75 # 通用查询使用较低的阈值
default_model: your-model
# 推理家族配置 - 定义不同模型家族如何处理推理语法
reasoning_families:
deepseek:
type: "chat_template_kwargs"
parameter: "thinking"
qwen3:
type: "chat_template_kwargs"
parameter: "enable_thinking"
gpt-oss:
type: "reasoning_effort"
parameter: "reasoning_effort"
gpt:
type: "reasoning_effort"
parameter: "reasoning_effort"
# 全局默认推理努力等级
default_reasoning_effort: "medium"
在上面的 model_config 块中分配推理家族——每个模型使用 reasoning_family(参见示例中的 ds-v31-custom 和 my-qwen3-model)。不支持推理语法的模型只需省略该字段(例如 phi4)。
配置方案 (预设)
我们提供精心挑选的、版本化的预设,您可以直接使用或作为起点:
- 精度优化:https://github.com/vllm-project/semantic-router/blob/main/config/config.recipe-accuracy.yaml
- Token 效率优化:https://github.com/vllm-project/semantic-router/blob/main/config/config.recipe-token-efficiency.yaml
- 延迟优化:https://github.com/vllm-project/semantic-router/blob/main/config/config.recipe-latency.yaml
- 指南和用法:https://github.com/vllm-project/semantic-router/blob/main/config/RECIPES.md
快速使用:
- 本地:将方案复制到 config.yaml,然后运行
- cp config/config.recipe-accuracy.yaml config/config.yaml
- make run-router
- Helm/Argo:在您的 ConfigMap 中引用方案文件内容(示例在上述指南中)。
信号配置
信号是智能路由的基础。系统支持 6 种类型的信号,可以组合起来做出路由决策。
1. 关键词信号 - 快速模式匹配
signals:
keywords:
- name: "math_keywords"
operator: "OR" # OR: 匹配任意关键词, AND: 匹配所有关键词
keywords:
- "calculate"
- "equation"
- "solve"
case_sensitive: false
用例:
- 针对特定术语的确定性路由
- 合规性和安全性(PII 关键词、违禁术语)
- 需要 <1ms 延迟的高吞吐量场景
2. 嵌入信号 - 语义理解
signals:
embeddings:
- name: "code_debug"
threshold: 0.70 # 相似度阈值 (0-1)
candidates:
- "how to debug the code"
- "troubleshooting steps"
aggregation_method: "max" # max, avg, 或 min
用例:
- 对释义具有鲁棒性的意图检测
- 语义相似度匹配
- 处理多样化的用户措辞
3. 领域信号 - MMLU 分类
signals:
domains:
- name: "mathematics"
description: "Mathematical problems"
mmlu_categories:
- "abstract_algebra"
- "college_mathematics"
用例:
- 学术和专业领域路由
- 领域专家模型选择
- 支持 14 个 MMLU 类别
4. 事实核查信号 - 验证需求检测
signals:
fact_check:
- name: "needs_verification"
description: "Queries requiring fact verification"
用例:
- 识别事实查询与创意/代码任务
- 路由到具有幻觉检测的模型
- 触发事实核查插件