Preference Signal Routing
This guide shows you how to route requests using LLM-based preference matching. The preference signal uses an external LLM to analyze complex intent and make nuanced routing decisions.
Key Advantagesâ
- Complex Intent Analysis: Use LLM reasoning for nuanced routing decisions
- Flexible Logic: Define routing preferences in natural language
- High Accuracy: 90-98% accuracy for complex intent detection
- Extensible: Add new preferences without retraining models
What Problem Does It Solve?â
Some routing decisions are too complex for simple pattern matching or classification:
- Nuanced Intent: "Explain the philosophical implications of quantum mechanics"
- Multi-faceted Queries: "Compare and contrast utilitarianism and deontology"
- Context-dependent: "What's the best approach for this problem?"
The preference signal uses an external LLM to analyze these complex queries and match them to routing preferences, allowing you to:
- Handle complex intent that other signals miss
- Make nuanced routing decisions based on LLM reasoning
- Define routing logic in natural language
- Adapt to new use cases without retraining
Configurationâ
Basic Configurationâ
Define preference signals in your config.yaml:
signals:
preferences:
- name: "code_generation"
description: "Generating new code snippets, writing functions, creating classes"
- name: "bug_fixing"
description: "Identifying and fixing errors, debugging issues, troubleshooting problems"
- name: "code_review"
description: "Reviewing code quality, suggesting improvements, best practices"
- name: "other"
description: "Irrelevant queries or already fulfilled requests"
External LLM Configurationâ
Configure external LLM for preference matching in router-defaults.yaml:
# External models configuration
# Used for advanced routing signals like preference-based routing via external LLM
external_models:
- llm_provider: "vllm"
model_role: "preference"
llm_endpoint:
address: "127.0.0.1"
port: 8000
llm_model_name: "openai/gpt-oss-120b"
llm_timeout_seconds: 30
parser_type: "json"
access_key: "" # Optional: for Authorization header (Bearer token)
Use in Decision Rulesâ
decisions:
- name: preference_code_generation
description: "Route code generation requests based on LLM preference matching"
priority: 200
rules:
operator: "AND"
conditions:
- type: "preference"
name: "code_generation"
modelRefs:
- model: "openai/gpt-oss-120b"
use_reasoning: false
plugins:
- type: "system_prompt"
configuration:
system_prompt: "You are an expert code generator. Write clean, efficient, and well-documented code."
- name: preference_bug_fixing
description: "Route bug fixing requests based on LLM preference matching"
priority: 200
rules:
operator: "AND"
conditions:
- type: "preference"
name: "bug_fixing"
modelRefs:
- model: "openai/gpt-oss-120b"
use_reasoning: true
plugins:
- type: "system_prompt"
configuration:
system_prompt: "You are an expert debugger. Analyze the issue carefully, identify the root cause, and provide a clear fix with explanation."
How It Worksâ
1. Query Analysisâ
The external LLM analyzes the query:
Query: "Explain the philosophical implications of quantum mechanics"
LLM Analysis:
- Requires deep reasoning: YES
- Complexity level: HIGH
- Domain: Philosophy + Physics
- Reasoning type: Analytical, conceptual
2. Preference Matchingâ
The LLM matches the query to defined preferences:
preferences:
- name: "complex_reasoning"
description: "Requires deep reasoning and analysis"
# LLM evaluates: Does this query require deep reasoning?
# Result: YES (confidence: 0.95)
3. Routing Decisionâ
Based on the match, the query is routed:
Preference matched: complex_reasoning (0.95)
Decision: deep_reasoning
Model: reasoning-specialist
Use Casesâ
1. Academic Research - Complex Analysisâ
Problem: Research queries require deep reasoning and analysis
signals:
preferences:
- name: "research_analysis"
description: "Academic research requiring deep analysis and critical thinking"
domains:
- name: "philosophy"
description: "Philosophical queries"
mmlu_categories: ["philosophy", "formal_logic"]
decisions:
- name: academic_research
description: "Route academic research queries"
priority: 200
rules:
operator: "AND"
conditions:
- type: "domain"
name: "philosophy"
- type: "preference"
name: "research_analysis"
modelRefs:
- model: "openai/gpt-oss-120b"
use_reasoning: true
plugins:
- type: "system_prompt"
configuration:
system_prompt: "You are an academic research specialist with expertise in critical analysis and philosophical reasoning."
Example Queries:
- "Analyze the epistemological implications of Kant's Critique" â â Complex analysis
- "What is philosophy?" â â Simple definition
2. Business Strategy - Decision Makingâ
Problem: Strategic queries need nuanced analysis
signals:
preferences:
- name: "strategic_thinking"
description: "Business strategy requiring multi-faceted analysis"
keywords:
- name: "business_keywords"
operator: "OR"
keywords: ["strategy", "market", "competition", "growth"]
case_sensitive: false
decisions:
- name: strategic_analysis
description: "Route strategic business queries"
priority: 200
rules:
operator: "AND"
conditions:
- type: "keyword"
name: "business_keywords"
- type: "preference"
name: "strategic_thinking"
modelRefs:
- model: "openai/gpt-oss-120b"
use_reasoning: true
plugins:
- type: "system_prompt"
configuration:
system_prompt: "You are a senior business strategist with expertise in market analysis and competitive strategy."
Example Queries:
- "Analyze our competitive position and recommend growth strategies" â â Strategic
- "What is our revenue?" â â Simple query
3. Technical Architecture - Design Decisionsâ
Problem: Architecture decisions require deep technical reasoning
signals:
preferences:
- name: "architecture_design"
description: "Technical architecture requiring design thinking and trade-off analysis"
keywords:
- name: "architecture_keywords"
operator: "OR"
keywords: ["architecture", "design", "scalability", "performance"]
case_sensitive: false
decisions:
- name: architecture_analysis
description: "Route architecture design queries"
priority: 200
rules:
operator: "AND"
conditions:
- type: "keyword"
name: "architecture_keywords"
- type: "preference"
name: "architecture_design"
modelRefs:
- model: "openai/gpt-oss-120b"
use_reasoning: true
plugins:
- type: "system_prompt"
configuration:
system_prompt: "You are a technical architecture specialist with expertise in system design, scalability, and performance optimization."
Example Queries:
- "Design a scalable microservices architecture with trade-offs" â â Design thinking
- "What is microservices?" â â Simple definition
Performance Characteristicsâ
| Aspect | Value |
|---|---|
| Latency | 100-500ms (depends on LLM) |
| Accuracy | 90-98% |
| Cost | Higher (external LLM call) |
| Scalability | Limited by LLM endpoint |
Best Practicesâ
1. Use as Last Resortâ
Preference signals are expensive. Use other signals first:
decisions:
- name: simple_math
priority: 10
rules:
operator: "OR"
conditions:
- type: "keyword"
name: "math_keywords" # Fast, cheap
- name: complex_reasoning
priority: 5
rules:
operator: "OR"
conditions:
- type: "preference"
name: "complex_reasoning" # Slow, expensive
2. Combine with Other Signalsâ
Use AND operator to reduce false positives:
rules:
operator: "AND"
conditions:
- type: "domain"
name: "philosophy" # Fast pre-filter
- type: "preference"
name: "complex_reasoning" # Expensive verification
3. Cache LLM Responsesâ
Enable caching to reduce latency and cost:
preferences:
- name: "complex_reasoning"
description: "Requires deep reasoning"
llm_endpoint: "http://localhost:11434"
cache_enabled: true
cache_ttl: 3600 # 1 hour
4. Set Appropriate Timeoutsâ
Prevent slow LLM calls from blocking:
preferences:
- name: "complex_reasoning"
description: "Requires deep reasoning"
llm_endpoint: "http://localhost:11434"
timeout: 2000 # 2 seconds
fallback_on_timeout: false # Don't match if timeout
5. Monitor Performanceâ
Track LLM call latency and accuracy:
logging:
level: info
preference_signals: true
llm_latency: true
Advanced Configurationâ
Multiple LLM Endpointsâ
Use different LLMs for different preferences:
signals:
preferences:
- name: "complex_reasoning"
description: "Deep reasoning"
llm_endpoint: "http://localhost:11434"
model: "llama3-70b" # Large model for complex reasoning
- name: "simple_classification"
description: "Simple intent classification"
llm_endpoint: "http://localhost:11435"
model: "llama3-8b" # Small model for simple tasks
Custom Promptsâ
Customize the LLM prompt for better accuracy:
preferences:
- name: "complex_reasoning"
description: "Requires deep reasoning"
llm_endpoint: "http://localhost:11434"
prompt_template: |
Analyze the following query and determine if it requires deep reasoning and analysis.
Query: {query}
Answer with YES or NO and explain why.
Referenceâ
See Signal-Driven Decision Architecture for complete signal architecture.