Cache Management & System Administration
This guide provides comprehensive administrative procedures for managing AgentMap's cache systems, including validation cache, system performance optimization, and maintenance workflows.
Administrative Overview
Cache System Architecture
AgentMap implements a sophisticated caching system with multiple components:
- Unified Availability Cache: Centralized cache for all availability checking (dependencies, LLM providers, storage)
- Validation Cache: Stores CSV and configuration validation results
- File Hash Cache: MD5-based change detection for automatic invalidation
- Performance Cache: System-level performance optimizations
- Temporary Cache: Short-lived cache for active sessions
Cache Storage Locations
# Unified availability cache
~/.agentmap/availability_cache.json
# Validation cache directory
~/.agentmap/validation_cache/
# System cache (if applicable)
/tmp/agentmap_cache/
# Application cache
~/.agentmap/app_cache/
Administrative Responsibilities
- Cache Health Monitoring: Regular cache statistics review across all cache systems
- Storage Management: Disk space monitoring and cleanup for unified and validation caches
- Performance Optimization: Cache hit rate analysis and tuning for optimal system performance
- Maintenance Scheduling: Automated and manual cache maintenance workflows
- Troubleshooting: Cache-related issue resolution and diagnostic procedures
- Unified Cache Operations: Specialized management for availability cache service
Unified Availability Cache Administration
Availability Cache Overview
The unified AvailabilityCacheService is AgentMap's centralized caching system for storing boolean availability results across all categories:
- Dependencies:
dependency.llm.openai
,dependency.storage.csv
- LLM Providers:
llm_provider.anthropic
,llm_provider.openai
- Storage Services:
storage.csv
,storage.vector
,storage.firebase
- Custom Categories: Application-specific availability checks
Availability Cache Statistics
Monitor unified cache health and performance:
# Python API for cache statistics
from agentmap.services.config.availability_cache_service import AvailabilityCacheService
from pathlib import Path
# Initialize cache service
cache_service = AvailabilityCacheService(
cache_file_path=Path("~/.agentmap/availability_cache.json").expanduser()
)
# Get comprehensive cache statistics
stats = cache_service.get_cache_stats()
print(json.dumps(stats, indent=2))
Sample Administrative Output:
{
"cache_file_path": "/home/user/.agentmap/availability_cache.json",
"cache_exists": true,
"auto_invalidation_enabled": true,
"cache_version": "2.0",
"last_updated": "2024-01-15T15:45:30Z",
"total_entries": 24,
"categories": {
"dependency.llm": 3,
"dependency.storage": 5,
"llm_provider": 4,
"storage": 6,
"custom": 6
},
"environment_hash": "a1b2c3d4e5f6",
"performance": {
"cache_hits": 847,
"cache_misses": 156,
"cache_sets": 234,
"invalidations": 12,
"auto_invalidations": 3
}
}
Availability Cache Operations
Manual Cache Invalidation:
# Invalidate specific category and key
cache_service.invalidate_cache("dependency.llm", "openai")
# Invalidate entire category
cache_service.invalidate_cache("dependency.llm")
# Invalidate entire cache
cache_service.invalidate_cache()
# Force environment cache invalidation
cache_service.invalidate_environment_cache()
Configuration File Monitoring:
# Register config files for automatic invalidation
cache_service.register_config_file("/path/to/agentmap_config.yaml")
cache_service.register_config_file("/path/to/storage_config.yaml")
Performance Monitoring:
# Enable/disable automatic invalidation
cache_service.enable_auto_invalidation(True) # Enable
cache_service.enable_auto_invalidation(False) # Disable for debugging
# Get hit rate analysis
stats = cache_service.get_cache_stats()
hit_rate = stats["performance"]["cache_hits"] / (
stats["performance"]["cache_hits"] + stats["performance"]["cache_misses"]
) * 100
print(f"Cache hit rate: {hit_rate:.1f}%")
Availability Cache Maintenance
Daily Maintenance Script:
#!/usr/bin/env python3
"""
Daily availability cache maintenance script.
"""
import json
import sys
from pathlib import Path
from datetime import datetime, timezone
from agentmap.services.config.availability_cache_service import AvailabilityCacheService
def daily_cache_maintenance():
"""Perform daily cache maintenance and health checks."""
cache_file = Path("~/.agentmap/availability_cache.json").expanduser()
cache_service = AvailabilityCacheService(cache_file)
# Get cache statistics
stats = cache_service.get_cache_stats()
print(f"=== AgentMap Availability Cache Daily Report - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} ===")
print(f"📁 Cache File: {stats['cache_file_path']}")
print(f"📊 Total Entries: {stats.get('total_entries', 0)}")
# Performance metrics
perf = stats.get('performance', {})
total_requests = perf.get('cache_hits', 0) + perf.get('cache_misses', 0)
hit_rate = (perf.get('cache_hits', 0) / total_requests * 100) if total_requests > 0 else 0
print(f"⚡ Cache Hit Rate: {hit_rate:.1f}% ({perf.get('cache_hits', 0)}/{total_requests})")
print(f"💾 Cache Sets: {perf.get('cache_sets', 0)}")
print(f"🗑️ Invalidations: {perf.get('invalidations', 0)} (Auto: {perf.get('auto_invalidations', 0)})")
# Category breakdown
categories = stats.get('categories', {})
if categories:
print("\n📂 Category Breakdown:")
for category, count in categories.items():
print(f" {category}: {count} entries")
# Health checks
warnings = []
if hit_rate < 70:
warnings.append(f"⚠️ Low cache hit rate: {hit_rate:.1f}% (target: >70%)")
if perf.get('auto_invalidations', 0) > 10:
warnings.append(f"⚠️ High auto-invalidation count: {perf.get('auto_invalidations', 0)} (investigate environment changes)")
if not stats.get('cache_exists', False):
warnings.append("❌ Cache file does not exist - cache will rebuild on next access")
# Cache file size check
if cache_file.exists():
size_mb = cache_file.stat().st_size / (1024 * 1024)
if size_mb > 10: # Cache file larger than 10MB
warnings.append(f"⚠️ Large cache file: {size_mb:.1f}MB (consider cleanup)")
if warnings:
print("\n🚨 Warnings:")
for warning in warnings:
print(f" {warning}")
else:
print("\n✅ All cache health checks passed")
# Cleanup recommendations
if hit_rate < 50:
print("\n💡 Recommendations:")
print(" - Consider reviewing cache invalidation frequency")
print(" - Check for frequent environment changes")
print(" - Verify cache configuration file monitoring")
print("\n" + "=" * 80)
return len(warnings)
if __name__ == "__main__":
warning_count = daily_cache_maintenance()
sys.exit(0 if warning_count == 0 else 1)
Weekly Cache Analysis Script:
#!/usr/bin/env python3
"""
Weekly availability cache analysis and optimization script.
"""
import json
from collections import defaultdict
from pathlib import Path
from datetime import datetime, timezone, timedelta
from agentmap.services.config.availability_cache_service import AvailabilityCacheService
def weekly_cache_analysis():
"""Perform weekly cache analysis and optimization."""
cache_file = Path("~/.agentmap/availability_cache.json").expanduser()
if not cache_file.exists():
print("No cache file found - nothing to analyze")
return
# Load cache data directly for analysis
with open(cache_file, 'r') as f:
cache_data = json.load(f)
availability_data = cache_data.get('availability', {})
print(f"=== Weekly Availability Cache Analysis - {datetime.now().strftime('%Y-%m-%d')} ===")
# Analyze entry age distribution
now = datetime.now(timezone.utc)
age_distribution = {
'fresh': 0, # < 1 hour
'recent': 0, # 1-6 hours
'day': 0, # 6-24 hours
'week': 0, # 1-7 days
'old': 0, # > 7 days
'invalid': 0 # Invalid timestamps
}
category_stats = defaultdict(lambda: {'count': 0, 'avg_age_hours': 0, 'total_age': 0})
for cache_key, entry in availability_data.items():
# Parse category
parts = cache_key.split('.', 2)
if len(parts) >= 2:
category = f"{parts[0]}.{parts[1]}"
else:
category = parts[0] if parts else 'unknown'
category_stats[category]['count'] += 1
# Parse timestamp
cached_at = entry.get('cached_at')
if cached_at:
try:
cached_time = datetime.fromisoformat(cached_at.replace('Z', '+00:00'))
age_hours = (now - cached_time).total_seconds() / 3600
category_stats[category]['total_age'] += age_hours
if age_hours < 1:
age_distribution['fresh'] += 1
elif age_hours < 6:
age_distribution['recent'] += 1
elif age_hours < 24:
age_distribution['day'] += 1
elif age_hours < 168: # 7 days
age_distribution['week'] += 1
else:
age_distribution['old'] += 1
except (ValueError, TypeError):
age_distribution['invalid'] += 1
else:
age_distribution['invalid'] += 1
# Calculate average ages
for category, stats in category_stats.items():
if stats['count'] > 0:
stats['avg_age_hours'] = stats['total_age'] / stats['count']
# Report findings
total_entries = sum(age_distribution.values())
print(f"\n📊 Cache Entry Age Distribution (Total: {total_entries}):")
for age_group, count in age_distribution.items():
percentage = (count / total_entries * 100) if total_entries > 0 else 0
print(f" {age_group.title()}: {count} ({percentage:.1f}%)")
print("\n📂 Category Analysis:")
for category, stats in sorted(category_stats.items()):
avg_age = stats['avg_age_hours']
print(f" {category}: {stats['count']} entries, avg age: {avg_age:.1f}h")
# Optimization recommendations
print("\n💡 Optimization Recommendations:")
if age_distribution['old'] > total_entries * 0.1: # More than 10% old entries
print(f" ⚠️ {age_distribution['old']} entries are >7 days old - consider cleanup")
if age_distribution['invalid'] > 0:
print(f" ⚠️ {age_distribution['invalid']} entries have invalid timestamps - investigate")
# Find categories with high average age
stale_categories = [cat for cat, stats in category_stats.items() if stats['avg_age_hours'] > 72]
if stale_categories:
print(f" 📅 Stale categories (>72h avg): {', '.join(stale_categories)}")
# Environment hash analysis
env_hash = cache_data.get('environment_hash')
if env_hash:
print(f"\n🌍 Environment Hash: {env_hash}")
# Check if entries have mismatched environment hashes
mismatched = 0
for entry in availability_data.values():
if entry.get('environment_hash') != env_hash:
mismatched += 1
if mismatched > 0:
print(f" ⚠️ {mismatched} entries have mismatched environment hashes")
print("\n" + "=" * 80)
if __name__ == "__main__":
weekly_cache_analysis()
Availability Cache Troubleshooting
Common Issues and Solutions:
1. High Cache Miss Rate:
# Diagnostic script for cache miss analysis
def diagnose_cache_misses():
cache_service = AvailabilityCacheService(cache_file_path)
stats = cache_service.get_cache_stats()
perf = stats.get('performance', {})
total = perf.get('cache_hits', 0) + perf.get('cache_misses', 0)
miss_rate = (perf.get('cache_misses', 0) / total * 100) if total > 0 else 0
if miss_rate > 30: # High miss rate
print(f"High miss rate detected: {miss_rate:.1f}%")
# Check auto-invalidation frequency
auto_invalidations = perf.get('auto_invalidations', 0)
if auto_invalidations > 5:
print(f"Frequent auto-invalidations: {auto_invalidations}")
print("Possible causes:")
print(" - Frequent package installations")
print(" - Configuration file changes")
print(" - Environment instability")
# Check if cache file exists and is readable
if not stats.get('cache_exists', False):
print("Cache file missing - will be rebuilt on next access")
2. Cache File Corruption:
# Cache file recovery procedure
def recover_corrupted_cache():
cache_file = Path("~/.agentmap/availability_cache.json")
if cache_file.exists():
try:
with open(cache_file, 'r') as f:
cache_data = json.load(f)
print("Cache file is valid JSON")
except json.JSONDecodeError as e:
print(f"Cache file corrupted: {e}")
# Backup corrupted file
backup_file = cache_file.with_suffix('.corrupted')
cache_file.rename(backup_file)
print(f"Moved corrupted cache to: {backup_file}")
print("Cache will be rebuilt on next access")
3. Performance Degradation:
# Performance monitoring script
def monitor_cache_performance():
import time
cache_service = AvailabilityCacheService(cache_file_path)
# Test cache read performance
start_time = time.time()
for i in range(100):
result = cache_service.get_availability("test", f"key_{i}")
read_time = (time.time() - start_time) * 1000 / 100 # ms per operation
# Test cache write performance
test_data = {"available": True, "test": True}
start_time = time.time()
for i in range(10):
cache_service.set_availability("test", f"perf_key_{i}", test_data)
write_time = (time.time() - start_time) * 1000 / 10 # ms per operation
print(f"Cache read performance: {read_time:.2f}ms per operation")
print(f"Cache write performance: {write_time:.2f}ms per operation")
# Performance thresholds
if read_time > 10: # More than 10ms per read
print("⚠️ Slow cache reads detected")
if write_time > 50: # More than 50ms per write
print("⚠️ Slow cache writes detected")
Validation Cache Administration Commands
System-Wide Cache Statistics
Monitor cache health across the entire system:
# Comprehensive cache statistics
agentmap validate-cache --stats
# System-wide cache analysis
agentmap validate-cache --stats --verbose
# JSON output for monitoring tools
agentmap validate-cache --stats --format json
Sample Administrative Output:
🔧 AgentMap Cache Administration Report
==========================================
📊 Cache Statistics:
- Total cache entries: 247
- Valid cache entries: 198 (80.16%)
- Expired cache entries: 34 (13.77%)
- Corrupted cache entries: 15 (6.07%)
💾 Storage Usage:
- Cache directory: ~/.agentmap/validation_cache
- Total cache size: 1.2 MB
- Average file size: 4.9 KB
- Disk space available: 45.2 GB
⚡ Performance Metrics:
- Cache hit rate (24h): 87.3%
- Average cache access time: 12ms
- Average validation time: 234ms
- Performance improvement: 19.5x
🕒 Cache Age Distribution:
- < 1 hour: 45 files (18.2%)
- 1-6 hours: 89 files (36.0%)
- 6-24 hours: 64 files (25.9%)
- > 24 hours (expired): 34 files (13.8%)
- Corrupted: 15 files (6.1%)
📈 Recent Activity:
- Cache writes (24h): 156
- Cache reads (24h): 1,247
- Cache invalidations (24h): 23
- Cleanup operations (24h): 3
Administrative Cache Cleanup
Comprehensive cache maintenance operations:
# Remove all expired cache entries
agentmap validate-cache --cleanup
# Remove corrupted cache files
agentmap validate-cache --cleanup --corrupted
# Full cache reset (use with caution)
agentmap validate-cache --clear --force
# Cleanup with size limits
agentmap validate-cache --cleanup --max-size 500MB
# Cleanup files older than specific time
agentmap validate-cache --cleanup --older-than 7d
Cache Maintenance Scheduling
Set up automated cache maintenance:
# Daily cleanup of expired entries
0 2 * * * /usr/local/bin/agentmap validate-cache --cleanup >/dev/null 2>&1
# Weekly full cache analysis
0 3 * * 0 /usr/local/bin/agentmap validate-cache --stats --verbose >> /var/log/agentmap-cache.log
# Monthly cache optimization
0 4 1 * * /usr/local/bin/agentmap validate-cache --optimize >> /var/log/agentmap-cache.log
Performance Monitoring
Cache Performance Metrics
Key metrics for administrative monitoring:
Hit Rate Analysis:
# Current hit rate
agentmap validate-cache --stats | grep "hit rate"
# Historical hit rate (if logging enabled)
grep "cache hit" /var/log/agentmap.log | tail -100
Storage Growth Monitoring:
# Monitor cache directory size
du -sh ~/.agentmap/validation_cache
# Track cache growth over time
echo "$(date): $(du -sh ~/.agentmap/validation_cache)" >> /var/log/agentmap-cache-size.log
Performance Benchmarking:
# Benchmark cache vs no-cache performance
time agentmap validate csv --csv large_file.csv # With cache
time agentmap validate csv --csv large_file.csv --no-cache # Without cache
System Resource Impact
Monitor cache impact on system resources:
# Memory usage analysis
ps aux | grep agentmap
# I/O usage monitoring
iotop -p $(pgrep agentmap)
# Disk usage tracking
df -h ~/.agentmap/
Cache Optimization Strategies
Hit Rate Optimization
Improve cache efficiency through configuration tuning:
Cache TTL Adjustment:
# Extend cache lifetime for stable environments
export AGENTMAP_CACHE_TTL=48h
# Reduce cache lifetime for dynamic environments
export AGENTMAP_CACHE_TTL=6h
Cache Size Optimization:
# Set maximum cache size
export AGENTMAP_MAX_CACHE_SIZE=1GB
# Set maximum cache files
export AGENTMAP_MAX_CACHE_FILES=10000
Storage Optimization
Optimize cache storage for performance:
SSD Optimization:
# Move cache to SSD for better performance
export AGENTMAP_CACHE_DIR=/fast-ssd/agentmap-cache
Network Storage Considerations:
# Avoid network storage for cache (performance impact)
# Use local storage: ~/.agentmap/validation_cache
# Avoid: /nfs/shared/cache or /smb/cache
Multi-User Environment Setup
Configure cache for multi-user environments:
# Per-user cache isolation (default)
~/.agentmap/validation_cache/
# Shared cache directory (advanced setup)
/opt/agentmap/shared_cache/
chmod 775 /opt/agentmap/shared_cache/
chgrp agentmap-users /opt/agentmap/shared_cache/
Troubleshooting Administration
Common Administrative Issues
High Cache Miss Rate:
# Analyze cache miss patterns
agentmap validate-cache --stats --verbose | grep "miss"
# Check for frequent file changes
find ~/.agentmap/validation_cache -name "*.json" -newer -1h | wc -l
# Review cache TTL settings
echo "Current TTL: $AGENTMAP_CACHE_TTL"
Disk Space Issues:
# Check cache directory size
du -sh ~/.agentmap/validation_cache
# Find largest cache files
find ~/.agentmap/validation_cache -type f -exec ls -lh {} + | sort -k5 -hr | head -20
# Emergency cache cleanup
agentmap validate-cache --cleanup --max-size 100MB
Corrupted Cache Files:
# Identify corrupted files
agentmap validate-cache --stats | grep "corrupted"
# Remove corrupted files
agentmap validate-cache --cleanup --corrupted
# Verify cache integrity
agentmap validate-cache --verify
Cache Recovery Procedures
Complete Cache Recovery:
# Step 1: Backup current cache (optional)
tar -czf cache-backup-$(date +%Y%m%d).tar.gz ~/.agentmap/validation_cache/
# Step 2: Clear all cache
agentmap validate-cache --clear --force
# Step 3: Verify cache directory is clean
ls -la ~/.agentmap/validation_cache/
# Step 4: Run test validation to rebuild cache
agentmap validate all
# Step 5: Verify cache is working
agentmap validate-cache --stats
Selective Cache Recovery:
# Remove cache for specific patterns
find ~/.agentmap/validation_cache -name "*corrupted_file*" -delete
# Rebuild cache for specific files
agentmap validate csv --csv problem_file.csv --no-cache
Monitoring and Alerting
Cache Health Monitoring Scripts
Create monitoring scripts for cache health:
#!/bin/bash
# cache-health-check.sh
CACHE_DIR="$HOME/.agentmap/validation_cache"
MAX_SIZE_MB=1000
MAX_CORRUPTED=10
# Check cache size
CACHE_SIZE_MB=$(du -sm "$CACHE_DIR" | cut -f1)
if [ "$CACHE_SIZE_MB" -gt "$MAX_SIZE_MB" ]; then
echo "WARNING: Cache size $CACHE_SIZE_MB MB exceeds $MAX_SIZE_MB MB"
fi
# Check corrupted files
CORRUPTED_COUNT=$(agentmap validate-cache --stats | grep -o '[0-9]* corrupted' | cut -d' ' -f1)
if [ "$CORRUPTED_COUNT" -gt "$MAX_CORRUPTED" ]; then
echo "WARNING: $CORRUPTED_COUNT corrupted cache files exceeds threshold $MAX_CORRUPTED"
fi
# Check hit rate
HIT_RATE=$(agentmap validate-cache --stats | grep "hit rate" | grep -o '[0-9.]*%' | cut -d'%' -f1)
if (( $(echo "$HIT_RATE < 70" | bc -l) )); then
echo "WARNING: Cache hit rate $HIT_RATE% is below 70%"
fi
Log Analysis
Analyze cache behavior through logs:
# Enable cache logging
export AGENTMAP_CACHE_LOG_LEVEL=INFO
# Analyze cache patterns
grep "cache" /var/log/agentmap.log | tail -100
# Cache performance analysis
awk '/cache hit/ {hits++} /cache miss/ {misses++} END {print "Hit rate:", hits/(hits+misses)*100"%"}' /var/log/agentmap.log
Security Considerations
Cache Security
Protect cached data and system integrity:
File Permissions:
# Secure cache directory permissions
chmod 750 ~/.agentmap/validation_cache
chmod 640 ~/.agentmap/validation_cache/*.json
Data Sensitivity:
- Cache files may contain sensitive validation data
- Consider encryption for cache in sensitive environments
- Implement cache cleanup policies for compliance
Access Control:
# Multi-user environment access control
sudo groupadd agentmap-cache-users
sudo usermod -a -G agentmap-cache-users username
sudo chgrp -R agentmap-cache-users /opt/agentmap/cache
sudo chmod -R 2750 /opt/agentmap/cache
Enterprise Administration
Large-Scale Cache Management
For enterprise deployments with multiple AgentMap instances:
Centralized Cache Monitoring:
# Collect cache statistics from multiple instances
for host in $(cat agentmap-hosts.txt); do
echo "=== $host ==="
ssh $host "agentmap validate-cache --stats"
done
Distributed Cache Cleanup:
# Automated cleanup across multiple systems
ansible-playbook -i inventory cache-cleanup.yml
Cache Performance Aggregation:
# Aggregate cache metrics for enterprise dashboard
for host in $(cat agentmap-hosts.txt); do
ssh $host "agentmap validate-cache --stats --format json" >> cache-metrics.jsonl
done
Backup and Disaster Recovery
Cache Backup Strategy:
# Daily cache backup (optional - cache is rebuiltable)
tar -czf "cache-backup-$(date +%Y%m%d).tar.gz" ~/.agentmap/validation_cache/
# Weekly cleanup of old backups
find . -name "cache-backup-*.tar.gz" -mtime +7 -delete
Disaster Recovery:
- Cache is rebuiltable from source files
- Focus on backing up source CSV/config files
- Cache can be rebuilt automatically during recovery
Best Practices for Administrators
Daily Operations
- Monitor Cache Health: Check cache statistics daily
- Review Storage Usage: Monitor disk space consumption
- Analyze Performance: Track cache hit rates and performance metrics
- Cleanup Automation: Ensure automated cleanup is running
Weekly Maintenance
- Deep Analysis: Review cache patterns and optimization opportunities
- Security Review: Check cache permissions and access controls
- Performance Tuning: Adjust cache parameters based on usage patterns
- Documentation Updates: Update cache policies and procedures
Monthly Reviews
- Capacity Planning: Analyze growth trends and storage requirements
- Performance Benchmarking: Compare cache performance over time
- Policy Review: Update cache retention and cleanup policies
- Training Updates: Ensure team is aware of cache management procedures
Related Documentation
- Validation Cache Development Guide: Developer-focused cache usage
- Performance Optimization Guide: System-wide performance tuning
- System Health Monitoring: Overall system monitoring practices
- Troubleshooting Guide: General troubleshooting procedures
- CLI Validation Reference: Complete CLI command reference
Next Steps
- Set Up Monitoring: Implement cache health monitoring scripts
- Schedule Maintenance: Configure automated cache cleanup procedures
- Optimize Performance: Tune cache parameters for your environment
- Document Policies: Create organization-specific cache management policies
- Train Team: Ensure administrators understand cache management procedures