LLM Service

The LLM Service provides unified access to multiple language model providers through a consistent interface, enabling seamless switching between OpenAI, Anthropic, Google, and other LLM providers.

Service Architecture

Supported Providers

OpenAI

Models: GPT-4, GPT-3.5-turbo, GPT-4-turbo Features: Chat completions, function calling, streaming Configuration: API key, organization ID

openai_config = {
    'provider': 'openai',
    'model': 'gpt-4',
    'temperature': 0.7,
    'max_tokens': 2000,
    'api_key': 'your-api-key'
}

Anthropic

Models: Claude-3 Opus, Sonnet, Haiku Features: Long context, safety features Configuration: API key, model selection

anthropic_config = {
    'provider': 'anthropic',
    'model': 'claude-3-5-sonnet-20241022',
    'temperature': 0.3,
    'max_tokens': 4096,
    'api_key': 'your-api-key'
}

Google

Models: Gemini Pro, Gemini Ultra Features: Multimodal capabilities, code generation Configuration: API key, project settings

google_config = {
    'provider': 'google',
    'model': 'gemini-1.0-pro',
    'temperature': 0.5,
    'max_tokens': 1024,
    'api_key': 'your-api-key'
}

Service Interface

Core Methods

call_llm()

Purpose: Make LLM requests with provider abstraction

def call_llm(
    self,
    provider: str,
    messages: List[Dict[str, Any]],
    model: Optional[str] = None,
    temperature: float = 0.7,
    max_tokens: Optional[int] = None,
    **kwargs
) -> str:
    """
    Call LLM with standardized interface
    
    Args:
        provider: LLM provider name ('openai', 'anthropic', 'google')
        messages: List of message dictionaries
        model: Specific model to use (optional)
        temperature: Response randomness (0.0-1.0)
        max_tokens: Maximum response length
        **kwargs: Provider-specific parameters
    
    Returns:
        str: LLM response text
    
    Raises:
        ValueError: If provider not configured
        APIError: If LLM request fails
    """

Usage Examples:

# Basic usage
response = llm_service.call_llm(
    provider="anthropic",
    messages=[{"role": "user", "content": "Explain quantum physics"}]
)

# With configuration
response = llm_service.call_llm(
    provider="openai",
    messages=[{"role": "user", "content": "Write a Python function"}],
    model="gpt-4",
    temperature=0.2,
    max_tokens=500
)

# With conversation context
messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4"},
    {"role": "user", "content": "What about 3+3?"}
]
response = llm_service.call_llm("anthropic", messages)

get_available_providers()

Purpose: List configured providers

def get_available_providers(self) -> List[str]:
    """Get list of available LLM providers"""
    
# Usage
providers = llm_service.get_available_providers()
# Returns: ['openai', 'anthropic', 'google']

get_provider_models()

Purpose: Get available models for a provider

def get_provider_models(self, provider: str) -> List[str]:
    """Get available models for specific provider"""

# Usage
models = llm_service.get_provider_models("openai")
# Returns: ['gpt-4', 'gpt-3.5-turbo', 'gpt-4-turbo']

Configuration Management

Environment Variables

# OpenAI
OPENAI_API_KEY=your-openai-key
OPENAI_ORGANIZATION=your-org-id

# Anthropic
ANTHROPIC_API_KEY=your-anthropic-key

# Google
GOOGLE_API_KEY=your-google-key
GOOGLE_PROJECT_ID=your-project-id

Configuration File

# config/llm.yml
llm:
  default_provider: "anthropic"
  providers:
    openai:
      api_key: "${OPENAI_API_KEY}"
      default_model: "gpt-4"
      timeout: 30
    anthropic:
      api_key: "${ANTHROPIC_API_KEY}"
      default_model: "claude-3-5-sonnet-20241022"
      timeout: 30
    google:
      api_key: "${GOOGLE_API_KEY}"
      default_model: "gemini-1.0-pro"
      timeout: 30

Dynamic Configuration

# Runtime configuration
llm_service.configure_provider("openai", {
    "api_key": "new-key",
    "default_model": "gpt-4-turbo",
    "temperature": 0.5
})

Agent Integration

LLMCapableAgent Protocol

Agents implement the LLMCapableAgent protocol to use LLM services:

class MyLLMAgent(BaseAgent, LLMCapableAgent):
    def configure_llm_service(self, llm_service: LLMServiceProtocol) -> None:
        """Configure LLM service for this agent"""
        self._llm_service = llm_service
        self.log_debug("LLM service configured")
    
    @property
    def llm_service(self) -> LLMServiceProtocol:
        """Get LLM service with error checking"""
        if self._llm_service is None:
            raise ValueError(f"LLM service not configured for agent '{self.name}'")
        return self._llm_service
    
    def process(self, inputs: Dict[str, Any]) -> Any:
        # Extract query from inputs
        query = inputs.get("query", inputs.get("message", ""))
        
        # Get provider from context or use default
        provider = self.context.get("provider", "anthropic")
        model = self.context.get("model")
        
        # Build messages
        messages = self._build_messages(query, inputs)
        
        # Call LLM
        response = self.llm_service.call_llm(
            provider=provider,
            messages=messages,
            model=model,
            temperature=self.context.get("temperature", 0.7)
        )
        
        return response
    
    def _build_messages(self, query: str, inputs: Dict[str, Any]) -> List[Dict[str, Any]]:
        """Build message list for LLM"""
        messages = []
        
        # Add system message if prompt exists
        if self.prompt:
            messages.append({"role": "system", "content": self.prompt})
        
        # Add conversation history if available
        history = inputs.get("conversation_history", [])
        messages.extend(history)
        
        # Add current query
        messages.append({"role": "user", "content": query})
        
        return messages

CSV Configuration

workflow,node,description,type,next_node,error_node,input_fields,output_field,prompt,context
ChatBot,Welcome,Welcome message,input,Chat,Error,,message,Welcome! Ask me anything:,
ChatBot,Chat,Chat with AI,llm,Chat,Error,message,response,You are a helpful assistant,"{""provider"": ""anthropic"", ""model"": ""claude-3-5-sonnet-20241022"", ""temperature"": 0.7}"

Advanced Features

Memory Management

The LLM service supports conversation memory:

class MemoryLLMAgent(BaseAgent, LLMCapableAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.conversation_memory = []
        self.max_memory_messages = self.context.get("max_memory_messages", 10)
    
    def process(self, inputs: Dict[str, Any]) -> Any:
        query = inputs.get("query", "")
        
        # Build messages with memory
        messages = self._build_messages_with_memory(query)
        
        # Call LLM
        response = self.llm_service.call_llm(
            provider=self.context.get("provider", "anthropic"),
            messages=messages
        )
        
        # Update memory
        self._update_memory(query, response)
        
        return response
    
    def _build_messages_with_memory(self, query: str) -> List[Dict[str, Any]]:
        messages = []
        
        # System message
        if self.prompt:
            messages.append({"role": "system", "content": self.prompt})
        
        # Recent conversation history
        messages.extend(self.conversation_memory[-self.max_memory_messages:])
        
        # Current query
        messages.append({"role": "user", "content": query})
        
        return messages
    
    def _update_memory(self, query: str, response: str):
        """Update conversation memory"""
        self.conversation_memory.extend([
            {"role": "user", "content": query},
            {"role": "assistant", "content": response}
        ])
        
        # Trim memory if too long
        if len(self.conversation_memory) > self.max_memory_messages * 2:
            self.conversation_memory = self.conversation_memory[-self.max_memory_messages * 2:]

Function Calling

For providers that support function calling:

class FunctionLLMAgent(BaseAgent, LLMCapableAgent):
    def process(self, inputs: Dict[str, Any]) -> Any:
        query = inputs.get("query", "")
        
        # Define available functions
        functions = [
            {
                "name": "get_weather",
                "description": "Get current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"}
                    },
                    "required": ["location"]
                }
            }
        ]
        
        # Call LLM with functions
        response = self.llm_service.call_llm(
            provider="openai",
            messages=[{"role": "user", "content": query}],
            functions=functions,
            function_call="auto"
        )
        
        return response

Streaming Responses

For real-time applications:

class StreamingLLMAgent(BaseAgent, LLMCapableAgent):
    def process(self, inputs: Dict[str, Any]) -> Any:
        query = inputs.get("query", "")
        
        # Stream response
        response_stream = self.llm_service.call_llm_stream(
            provider=self.context.get("provider", "openai"),
            messages=[{"role": "user", "content": query}]
        )
        
        # Process stream
        full_response = ""
        for chunk in response_stream:
            full_response += chunk
            # Optional: emit partial response
            self.emit_partial_response(chunk)
        
        return full_response

Error Handling

Common Error Patterns

try:
    response = llm_service.call_llm(provider, messages)
except ProviderNotConfiguredError:
    # Handle missing provider configuration
    fallback_response = self.get_fallback_response()
except APIKeyError:
    # Handle authentication issues
    self.log_error("LLM API key invalid or missing")
    raise
except RateLimitError:
    # Handle rate limiting
    time.sleep(60)  # Wait before retry
    response = llm_service.call_llm(provider, messages)
except ModelNotFoundError:
    # Handle model availability issues
    response = llm_service.call_llm(provider, messages, model="default")
except Exception as e:
    # Handle unexpected errors
    self.log_error(f"LLM service error: {e}")
    raise

Graceful Degradation

class RobustLLMAgent(BaseAgent, LLMCapableAgent):
    def process(self, inputs: Dict[str, Any]) -> Any:
        query = inputs.get("query", "")
        providers = ["anthropic", "openai", "google"]  # Fallback order
        
        for provider in providers:
            try:
                response = self.llm_service.call_llm(
                    provider=provider,
                    messages=[{"role": "user", "content": query}]
                )
                return response
            except Exception as e:
                self.log_warning(f"Provider {provider} failed: {e}")
                continue
        
        # All providers failed
        return "I'm sorry, I'm currently unable to process your request."

Performance Optimization

Caching Responses

from functools import lru_cache
import hashlib

class CachedLLMAgent(BaseAgent, LLMCapableAgent):
    @lru_cache(maxsize=128)
    def _cached_llm_call(self, messages_hash: str, provider: str) -> str:
        """Cache LLM responses based on message content"""
        # Note: This is a simplified example
        # In practice, you'd want more sophisticated caching
        pass
    
    def process(self, inputs: Dict[str, Any]) -> Any:
        query = inputs.get("query", "")
        messages = [{"role": "user", "content": query}]
        
        # Create hash for caching
        messages_str = str(messages)
        messages_hash = hashlib.md5(messages_str.encode()).hexdigest()
        
        try:
            return self._cached_llm_call(messages_hash, "anthropic")
        except:
            # Cache miss or error, make fresh call
            return self.llm_service.call_llm("anthropic", messages)

Batch Processing

class BatchLLMAgent(BaseAgent, LLMCapableAgent):
    def process(self, inputs: Dict[str, Any]) -> Any:
        queries = inputs.get("queries", [])
        
        # Process multiple queries efficiently
        responses = []
        for query in queries:
            response = self.llm_service.call_llm(
                provider="anthropic",
                messages=[{"role": "user", "content": query}]
            )
            responses.append(response)
        
        return responses

Testing

Unit Testing with Mocks

def test_llm_agent():
    from unittest.mock import Mock
    
    # Create mock LLM service
    mock_llm = Mock()
    mock_llm.call_llm.return_value = "Test response"
    
    # Create agent
    agent = MyLLMAgent("TestAgent", "Test prompt")
    agent.configure_llm_service(mock_llm)
    
    # Test
    result = agent.process({"query": "test question"})
    assert result == "Test response"
    
    # Verify call
    mock_llm.call_llm.assert_called_once_with(
        provider="anthropic",
        messages=[
            {"role": "system", "content": "Test prompt"},
            {"role": "user", "content": "test question"}
        ],
        model=None,
        temperature=0.7
    )

Integration Testing

def test_llm_service_integration():
    from agentmap.di.containers import Container
    
    # Use real container
    container = Container()
    llm_service = container.llm_service()
    
    if llm_service:  # Only test if configured
        response = llm_service.call_llm(
            provider="anthropic",
            messages=[{"role": "user", "content": "Hello"}]
        )
        assert isinstance(response, str)
        assert len(response) > 0

Best Practices

1. Provider Selection

Use Anthropic for safety-critical applications
Use OpenAI for function calling and structured outputs
Use Google for multimodal and code generation tasks

2. Configuration Management

Store API keys in environment variables
Use configuration files for non-sensitive settings
Implement fallback providers for reliability

3. Context Management

Keep conversation history reasonable (10-20 messages)
Use system messages for consistent behavior
Implement memory trimming for long conversations

4. Error Handling

Always implement graceful degradation
Log errors with sufficient context
Use retry logic for transient failures

5. Cost Optimization

Cache repeated queries when appropriate
Use cheaper models for simple tasks
Implement token counting for cost tracking

Troubleshooting

Common Issues

Provider Not Configured

# Error: Provider 'openai' not configured
# Solution: Check API key and configuration
container.app_config_service().get_llm_config("openai")

Model Not Available

# Error: Model 'gpt-5' not found
# Solution: Check available models
llm_service.get_provider_models("openai")

Rate Limiting

# Error: Rate limit exceeded
# Solution: Implement backoff and retry
import time
time.sleep(60)  # Wait before retry

Authentication Failed

# Error: Invalid API key
# Solution: Verify environment variables
import os
print(os.getenv("ANTHROPIC_API_KEY"))

Next Steps

Storage Services - Learn about data persistence
Capability Protocols - Understand agent protocols
Agent Development - Build custom LLM agents
Service Integration - Advanced patterns

Service Architecture​

Supported Providers​

OpenAI​

Anthropic​

Google​

Service Interface​

Core Methods​

call_llm()​

get_available_providers()​

get_provider_models()​

Configuration Management​

Environment Variables​

Configuration File​

Dynamic Configuration​

Agent Integration​

LLMCapableAgent Protocol​

CSV Configuration​

Advanced Features​

Memory Management​

Function Calling​

Streaming Responses​

Error Handling​

Common Error Patterns​

Graceful Degradation​

Performance Optimization​

Caching Responses​

Batch Processing​

Testing​

Unit Testing with Mocks​

Integration Testing​

Best Practices​

1. Provider Selection​

2. Configuration Management​

3. Context Management​

4. Error Handling​

5. Cost Optimization​

Troubleshooting​

Common Issues​

Next Steps​