Skip to main content

LLM Service

LLMService provides a unified interface for calling language model providers (Anthropic, OpenAI, Google) with optional intelligent routing that selects the best provider and model based on task type, complexity, and cost preferences.


Configuration

LLMService is configured entirely through agentmap_config.yaml. There are two top-level sections.

Provider config (llm:)

API keys, default model, and temperature per provider:

llm:
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
model: "claude-sonnet-4-6"
temperature: 0.7
max_tokens: 4096
openai:
api_key: "${OPENAI_API_KEY}"
model: "gpt-4o"
temperature: 0.7
google:
api_key: "${GOOGLE_API_KEY}"
model: "gemini-2.5-flash"
temperature: 0.5
  • max_tokens — (optional) maximum number of tokens in the LLM response. Omit or set to null to use the provider's default. Set to 0 to explicitly mean "no limit". Can be overridden per-call via call_llm(max_tokens=...) or via routing_context.

Routing config (routing:)

Opt-in intelligent routing. Key sub-sections:

Sub-sectionPurpose
routing_matrixProvider × complexity → model mapping (used as fallback when no activity matches)
activitiesExplicit provider/model plans per activity + complexity tier — evaluated first
task_typesKeyword-based complexity detection and provider preferences (used when no activity is set)
complexity_analysisThresholds for auto-detecting complexity from prompt length, keywords, memory size
cost_optimizationPrefer cost-effective models
fallbackDefault provider/model when routing fails

See src/agentmap/templates/config/agentmap_config.yaml.template (lines 105–365) for the full annotated routing config.


Execution Patterns

call_llm() has two mutually exclusive modes:

ModeTriggered byprovidermodel
Directno routing_contextRequired — target providerOptional — overrides config default
Routingrouting_context presentIgnored (warning logged)Ignored (warning logged)

Use routing_context['provider_preference'] / routing_context['fallback_provider'] and routing_context['model_override'] to control those within the routing path.

Pattern 1: Direct provider call

Specify the provider directly, optionally overriding model and temperature. provider is required in this path.

response = llm_service.call_llm(
provider="anthropic",
messages=[{"role": "user", "content": "Explain quantum entanglement"}],
model="claude-sonnet-4-6", # optional override
temperature=0.2, # optional override
max_tokens=2048, # optional override
)

Pattern 2: Simple string prompt (ask())

Convenience wrapper for single plain-string prompts — no messages list required:

response = llm_service.ask("Summarize this document: ...")
response = llm_service.ask("...", provider="openai", temperature=0.5)

ask() constructs [{"role": "user", "content": prompt}] and calls call_llm(). The default provider is "anthropic".

Pattern 3: Intelligent routing

Pass a routing_context dict to let the routing system select provider and model. When routing_context is present, routing owns all provider and model selection — the provider and model parameters are ignored and a warning is logged if you pass them.

messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Write a short story about a robot who learns to paint."}
]

# Route by task type
response = llm_service.call_llm(
messages=messages,
routing_context={"task_type": "code_generation"}
)

# Route by activity (takes priority over task_type)
response = llm_service.call_llm(
messages=messages,
routing_context={"activity": "code_generation"}
)

# Force a specific model through routing
response = llm_service.call_llm(
messages=messages,
routing_context={"task_type": "code_generation", "model_override": "claude-sonnet-4-6"}
)

# Set a fallback if routing fails
response = llm_service.call_llm(
messages=messages,
routing_context={"task_type": "code_generation", "fallback_provider": "openai"}
)

Resilience & Retries

Every LLM call is automatically protected by retry with exponential backoff and a circuit breaker. No additional configuration is required to get these protections — they are on by default.

Configuration

Configure resilience behavior in agentmap_config.yaml under llm.resilience:

llm:
resilience:
retry:
max_attempts: 3 # retries per provider:model
backoff_base: 2.0 # exponential backoff base (seconds): 1s, 2s, 4s...
backoff_max: 30.0 # cap on backoff delay
jitter: true # randomize delay to avoid thundering herd
circuit_breaker:
failure_threshold: 5 # failures before opening circuit for a provider:model
reset_timeout: 60 # seconds before half-open (allows one retry)

Retry behavior

Transient errors — rate limits, timeouts, and 5xx server errors — are retried automatically up to max_attempts times with exponential backoff. Non-transient errors (bad API key, missing model, missing package) fail immediately without retrying.

Circuit breaker behavior

After failure_threshold consecutive failures for a given provider:model pair, the circuit opens. While open, calls to that provider:model fail fast without making an API request. After reset_timeout seconds, the circuit enters a half-open state and allows one request through. A success closes the circuit; another failure re-opens it.

These protections apply to all LLM calls — direct provider calls, routed calls, and fallback attempts.

See LLM Configuration for the full configuration reference.


Tiered Fallback

When a call fails after all retries are exhausted, a tiered fallback strategy kicks in. Fallback requires routing to be configured.

TierStrategyExample
1Same provider, lower-complexity model from routing matrixanthropic:claude-opus-4-6anthropic:claude-haiku-4-5
2Configured fallback provider (routing.fallback.default_provider)Switch to openai:gpt-4o-mini
3Emergency — first available provider not yet triedTry google:gemini-2.5-flash-lite
4All fallbacks exhausted — raises LLMServiceError with full context

Dependency errors (missing packages) and configuration errors (bad API key) skip fallback entirely. Only transient provider errors trigger the fallback chain.


Routing System

Task Types vs Activities

These are two alternative approaches to controlling model selection:

ApproachWhat you configureHow the model is chosen
Task typeProvider preferences + complexity keywordsRouting matrix lookup (provider + complexity → model)
ActivityExact provider:model pairs per complexity tierDirect — bypasses the routing matrix

Task types provide soft guidance. You list preferred providers and keywords that detect complexity from the prompt. The system looks up the final model from the routing matrix. Good for most use cases.

Activities provide hard control. You pin exact models for each complexity tier with explicit fallback chains. The routing matrix is bypassed. Use when you need a specific model every time.

Most users need only one. If you set both with the same name (e.g., "code_generation"), the activity controls model selection and the task type only contributes complexity keyword detection.

Task type example

# System picks the model based on prompt analysis and provider preferences
response = llm_service.call_llm(
messages=messages,
routing_context={"task_type": "code_generation"},
)
# "debug" in prompt → medium complexity → anthropic preferred → claude-sonnet-4-6

Activity example

# You control exactly which model is used
response = llm_service.call_llm(
messages=messages,
routing_context={
"activity": "code_generation",
"complexity_override": "high",
},
)
# → anthropic:claude-sonnet-4-6 (primary for code_generation:high)
# → falls back to openai:gpt-4.1 if primary fails

See LLM Configuration for the full task type and activity configuration reference.

How routing selects a model

  1. Determine complexity (from complexity_analysis config — prompt length, keywords, memory size)
  2. Check routing cache
  3. If activity is set → look up activity routing table → get ordered candidates
  4. If no activity candidates → fall back to routing_matrix (task_type + complexity → model)
  5. On failure → use fallback.default_provider + fallback.default_model

routing_context fields

All fields are optional. Routing is activated by passing a routing_context dict — no flag required.

FieldDefaultDescription
task_type"general"Task classification; valid values come from routing.task_types in config
activityNoneExplicit activity name; takes priority over task_type
complexity_overrideNoneSkip auto-detection: "low", "medium", "high", "critical"
auto_detect_complexityTrueEnable keyword/length-based complexity analysis
provider_preference[]Override provider order
excluded_providers[]Providers to skip
model_overrideNoneForce a specific model
max_cost_tierNoneCap complexity tier (e.g. "medium" prevents high/critical models)
cost_optimizationTruePrefer cost-effective models
prefer_speedFalseBias toward faster models
prefer_qualityFalseBias toward highest-quality models
fallback_providerNoneOverride fallback provider for this call
fallback_modelNoneOverride fallback model for this call
retry_with_lower_complexityTrueOn failure, retry with lower complexity tier
max_tokensNoneMax response tokens for this call. Overrides provider and activity defaults. 0 = no limit

max_tokens Priority

When using routing, max_tokens is resolved from multiple sources in this priority order:

  1. Node contextrouting_context["max_tokens"] or max_tokens in the CSV context field
  2. Activity configmax_tokens set at the tier or candidate level in the activity definition
  3. Provider defaultmax_tokens in the provider's llm: config section

If no source sets max_tokens, the provider's built-in default is used. Setting max_tokens to 0 at any level means "no limit" — it actively suppresses any provider default.

For direct calls (no routing), max_tokens passed to call_llm() overrides the provider config default.


Exception Types

Import from agentmap.exceptions.

ExceptionWhen raisedRetryable?
LLMConfigurationErrorBad API key, auth failure, invalid modelNo
LLMDependencyErrorMissing provider package (e.g. anthropic not installed)No
LLMProviderErrorGeneric provider-level errorsNo
LLMTimeoutErrorTimeout, connection errors, 5xx server errorsYes (automatic)
LLMRateLimitError429 / rate limit / quota exceededYes (automatic)
LLMServiceErrorGeneral service errors, all fallbacks exhaustedNo

LLMTimeoutError and LLMRateLimitError are subclasses of LLMProviderError, which is a subclass of LLMServiceError.

Error handling in a host application

import logging
from agentmap.exceptions import (
LLMServiceError,
LLMConfigurationError,
LLMDependencyError,
LLMRateLimitError,
LLMTimeoutError,
)

logger = logging.getLogger(__name__)

def fallback_response():
"""Placeholder for your application's fallback logic."""
return "Service is temporarily unavailable. Please try again later."

try:
response = llm_service.call_llm(
provider="anthropic",
messages=[{"role": "user", "content": "Summarize this report"}],
)
except LLMConfigurationError as e:
# Bad API key or invalid model — fix your configuration
logger.error(f"Configuration error: {e}")
raise
except LLMDependencyError as e:
# Missing provider package — install it (e.g. pip install anthropic)
logger.error(f"Missing dependency: {e}")
raise
except LLMRateLimitError as e:
# Rate limited even after automatic retries — back off at application level
logger.warning(f"Rate limited after retries: {e}")
return fallback_response()
except LLMTimeoutError as e:
# Timeout/connection error after retries — provider may be down
logger.warning(f"Provider unreachable after retries: {e}")
return fallback_response()
except LLMServiceError as e:
# All fallback tiers exhausted
logger.error(f"LLM call failed completely: {e}")
raise

Error handling in a custom agent

class MyAgent(BaseAgent, LLMCapableAgent):
def process(self, inputs):
try:
return self.llm_service.call_llm(
provider="anthropic",
messages=[{"role": "user", "content": inputs["query"]}],
)
except LLMConfigurationError:
# Surface config errors — the workflow operator needs to fix this
raise
except LLMServiceError:
# Transient errors were already retried; fallback was attempted.
# Return a graceful degradation or let the error_node handle it.
return "I'm sorry, I couldn't process your request right now."

Available Providers

providers = llm_service.get_available_providers()
# Returns: ['anthropic', 'openai', 'google'] (only those with API keys configured)

Agent Integration

Agents that need LLM access implement the LLMCapableAgent protocol:

from agentmap.agents.base_agent import BaseAgent
from agentmap.services.protocols.llm_protocol import LLMCapableAgent, LLMServiceProtocol
from typing import Any, Dict

class MyLLMAgent(BaseAgent, LLMCapableAgent):
def configure_llm_service(self, llm_service: LLMServiceProtocol) -> None:
self._llm_service = llm_service

@property
def llm_service(self) -> LLMServiceProtocol:
if self._llm_service is None:
raise ValueError(f"LLM service not configured for agent '{self.name}'")
return self._llm_service

def process(self, inputs: Dict[str, Any]) -> Any:
provider = self.context.get("provider", "anthropic")
messages = [
{"role": "system", "content": self.prompt},
{"role": "user", "content": inputs.get("query", "")}
]
return self.llm_service.call_llm(
provider=provider,
messages=messages,
temperature=self.context.get("temperature", 0.7),
)

CSV configuration

The context field contains JSON. In CSV, double quotes inside a quoted field must be escaped as "" — this is standard CSV encoding, not AgentMap-specific.

Direct provider call:

workflow,node,description,type,next_node,error_node,input_fields,output_field,prompt,context
ChatBot,Chat,Chat with AI,llm,Chat,Error,message,response,You are a helpful assistant,"{""provider"": ""anthropic"", ""model"": ""claude-sonnet-4-6"", ""temperature"": 0.7, ""max_tokens"": 2048}"

With routing context — routing selects the provider and model; provider and model are omitted:

workflow,node,description,type,next_node,error_node,input_fields,output_field,prompt,context
CodeBot,Generate,Generate code,llm,Review,Error,request,code,You are an expert software engineer,"{""routing_context"": {""activity"": ""code_generation"", ""complexity_override"": ""high""}, ""temperature"": 0.2}"

With task-type routing and a cost cap:

workflow,node,description,type,next_node,error_node,input_fields,output_field,prompt,context
Analyst,Analyze,Analyze data,llm,Output,Error,data,analysis,You are a data analyst,"{""routing_context"": {""task_type"": ""data_analysis"", ""max_cost_tier"": ""medium""}, ""temperature"": 0.5}"

With an activity for pinned model selection:

workflow,node,description,type,next_node,error_node,input_fields,output_field,prompt,context
CodeBot,Review,Review code,llm,Done,Error,code,feedback,You are a code reviewer,"{""routing_context"": {""activity"": ""code_generation"", ""complexity_override"": ""high""}, ""temperature"": 0.2}"

External Usage

from agentmap import agentmap_initialize
from agentmap.runtime_api import get_container # not exported from top-level agentmap

agentmap_initialize()
llm_service = get_container().llm_service()

response = llm_service.call_llm(
provider="anthropic",
messages=[{"role": "user", "content": "Hello"}]
)

Monitoring

Use get_routing_stats() to inspect circuit breaker state and identify providers experiencing issues:

stats = llm_service.get_routing_stats()
# Returns:
# {
# "circuit_breaker": {
# "open_circuits": ["anthropic:claude-opus-4-6"],
# "failure_counts": {"anthropic:claude-opus-4-6": 5}
# },
# ...routing stats...
# }

Open circuits indicate a provider:model pair that has hit the failure threshold and is currently being bypassed. Monitor this in production to detect persistent provider outages or configuration issues early.


Best Practices

  1. Store API keys in environment variables — never hardcode them.
  2. Use routing for complex pipelines — activities give you explicit control; task_types offer keyword-driven automation.
  3. Use ask() for quick one-off prompts — only reach for call_llm() when you need messages, routing, or model overrides.
  4. Cap complexity tier with max_cost_tier — prevents accidentally routing simple tasks to expensive models.
  5. Keep conversation history reasonable — 10–20 messages is a good ceiling; trim older messages when memory grows.
  6. Let retries handle transient failures — don't add your own retry loop around call_llm(); the service already retries rate limits and timeouts automatically.
  7. Catch specific exceptions — handle LLMConfigurationError (fix your config) differently from LLMServiceError (transient, may resolve later).
  8. Monitor circuit breaker state — use get_routing_stats() to detect providers that are consistently failing.

Next Steps