Skip to main content

Cache Management & System Administration

📍 AgentMapDeploymentCache Management

This guide provides comprehensive administrative procedures for managing AgentMap's cache systems, including validation cache, system performance optimization, and maintenance workflows.

Administrative Overview

Cache System Architecture

AgentMap implements a sophisticated caching system with multiple components:

  • Unified Availability Cache: Centralized cache for all availability checking (dependencies, LLM providers, storage)
  • Validation Cache: Stores CSV and configuration validation results
  • File Hash Cache: MD5-based change detection for automatic invalidation
  • Performance Cache: System-level performance optimizations
  • Temporary Cache: Short-lived cache for active sessions

Cache Storage Locations

# Unified availability cache
~/.agentmap/availability_cache.json

# Validation cache directory
~/.agentmap/validation_cache/

# System cache (if applicable)
/tmp/agentmap_cache/

# Application cache
~/.agentmap/app_cache/

Administrative Responsibilities

  • Cache Health Monitoring: Regular cache statistics review across all cache systems
  • Storage Management: Disk space monitoring and cleanup for unified and validation caches
  • Performance Optimization: Cache hit rate analysis and tuning for optimal system performance
  • Maintenance Scheduling: Automated and manual cache maintenance workflows
  • Troubleshooting: Cache-related issue resolution and diagnostic procedures
  • Unified Cache Operations: Specialized management for availability cache service

Unified Availability Cache Administration

Availability Cache Overview

The unified AvailabilityCacheService is AgentMap's centralized caching system for storing boolean availability results across all categories:

  • Dependencies: dependency.llm.openai, dependency.storage.csv
  • LLM Providers: llm_provider.anthropic, llm_provider.openai
  • Storage Services: storage.csv, storage.vector, storage.firebase
  • Custom Categories: Application-specific availability checks

Availability Cache Statistics

Monitor unified cache health and performance:

# Python API for cache statistics
from agentmap.services.config.availability_cache_service import AvailabilityCacheService
from pathlib import Path

# Initialize cache service
cache_service = AvailabilityCacheService(
cache_file_path=Path("~/.agentmap/availability_cache.json").expanduser()
)

# Get comprehensive cache statistics
stats = cache_service.get_cache_stats()
print(json.dumps(stats, indent=2))

Sample Administrative Output:

{
"cache_file_path": "/home/user/.agentmap/availability_cache.json",
"cache_exists": true,
"auto_invalidation_enabled": true,
"cache_version": "2.0",
"last_updated": "2024-01-15T15:45:30Z",
"total_entries": 24,
"categories": {
"dependency.llm": 3,
"dependency.storage": 5,
"llm_provider": 4,
"storage": 6,
"custom": 6
},
"environment_hash": "a1b2c3d4e5f6",
"performance": {
"cache_hits": 847,
"cache_misses": 156,
"cache_sets": 234,
"invalidations": 12,
"auto_invalidations": 3
}
}

Availability Cache Operations

Manual Cache Invalidation:

# Invalidate specific category and key
cache_service.invalidate_cache("dependency.llm", "openai")

# Invalidate entire category
cache_service.invalidate_cache("dependency.llm")

# Invalidate entire cache
cache_service.invalidate_cache()

# Force environment cache invalidation
cache_service.invalidate_environment_cache()

Configuration File Monitoring:

# Register config files for automatic invalidation
cache_service.register_config_file("/path/to/agentmap_config.yaml")
cache_service.register_config_file("/path/to/storage_config.yaml")

Performance Monitoring:

# Enable/disable automatic invalidation
cache_service.enable_auto_invalidation(True) # Enable
cache_service.enable_auto_invalidation(False) # Disable for debugging

# Get hit rate analysis
stats = cache_service.get_cache_stats()
hit_rate = stats["performance"]["cache_hits"] / (
stats["performance"]["cache_hits"] + stats["performance"]["cache_misses"]
) * 100
print(f"Cache hit rate: {hit_rate:.1f}%")

Availability Cache Maintenance

Daily Maintenance Script:

#!/usr/bin/env python3
"""
Daily availability cache maintenance script.
"""
import json
import sys
from pathlib import Path
from datetime import datetime, timezone
from agentmap.services.config.availability_cache_service import AvailabilityCacheService

def daily_cache_maintenance():
"""Perform daily cache maintenance and health checks."""

cache_file = Path("~/.agentmap/availability_cache.json").expanduser()
cache_service = AvailabilityCacheService(cache_file)

# Get cache statistics
stats = cache_service.get_cache_stats()

print(f"=== AgentMap Availability Cache Daily Report - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} ===")
print(f"📁 Cache File: {stats['cache_file_path']}")
print(f"📊 Total Entries: {stats.get('total_entries', 0)}")

# Performance metrics
perf = stats.get('performance', {})
total_requests = perf.get('cache_hits', 0) + perf.get('cache_misses', 0)
hit_rate = (perf.get('cache_hits', 0) / total_requests * 100) if total_requests > 0 else 0

print(f"⚡ Cache Hit Rate: {hit_rate:.1f}% ({perf.get('cache_hits', 0)}/{total_requests})")
print(f"💾 Cache Sets: {perf.get('cache_sets', 0)}")
print(f"🗑️ Invalidations: {perf.get('invalidations', 0)} (Auto: {perf.get('auto_invalidations', 0)})")

# Category breakdown
categories = stats.get('categories', {})
if categories:
print("\n📂 Category Breakdown:")
for category, count in categories.items():
print(f" {category}: {count} entries")

# Health checks
warnings = []

if hit_rate < 70:
warnings.append(f"⚠️ Low cache hit rate: {hit_rate:.1f}% (target: >70%)")

if perf.get('auto_invalidations', 0) > 10:
warnings.append(f"⚠️ High auto-invalidation count: {perf.get('auto_invalidations', 0)} (investigate environment changes)")

if not stats.get('cache_exists', False):
warnings.append("❌ Cache file does not exist - cache will rebuild on next access")

# Cache file size check
if cache_file.exists():
size_mb = cache_file.stat().st_size / (1024 * 1024)
if size_mb > 10: # Cache file larger than 10MB
warnings.append(f"⚠️ Large cache file: {size_mb:.1f}MB (consider cleanup)")

if warnings:
print("\n🚨 Warnings:")
for warning in warnings:
print(f" {warning}")
else:
print("\n✅ All cache health checks passed")

# Cleanup recommendations
if hit_rate < 50:
print("\n💡 Recommendations:")
print(" - Consider reviewing cache invalidation frequency")
print(" - Check for frequent environment changes")
print(" - Verify cache configuration file monitoring")

print("\n" + "=" * 80)

return len(warnings)

if __name__ == "__main__":
warning_count = daily_cache_maintenance()
sys.exit(0 if warning_count == 0 else 1)

Weekly Cache Analysis Script:

#!/usr/bin/env python3
"""
Weekly availability cache analysis and optimization script.
"""
import json
from collections import defaultdict
from pathlib import Path
from datetime import datetime, timezone, timedelta
from agentmap.services.config.availability_cache_service import AvailabilityCacheService

def weekly_cache_analysis():
"""Perform weekly cache analysis and optimization."""

cache_file = Path("~/.agentmap/availability_cache.json").expanduser()

if not cache_file.exists():
print("No cache file found - nothing to analyze")
return

# Load cache data directly for analysis
with open(cache_file, 'r') as f:
cache_data = json.load(f)

availability_data = cache_data.get('availability', {})

print(f"=== Weekly Availability Cache Analysis - {datetime.now().strftime('%Y-%m-%d')} ===")

# Analyze entry age distribution
now = datetime.now(timezone.utc)
age_distribution = {
'fresh': 0, # < 1 hour
'recent': 0, # 1-6 hours
'day': 0, # 6-24 hours
'week': 0, # 1-7 days
'old': 0, # > 7 days
'invalid': 0 # Invalid timestamps
}

category_stats = defaultdict(lambda: {'count': 0, 'avg_age_hours': 0, 'total_age': 0})

for cache_key, entry in availability_data.items():
# Parse category
parts = cache_key.split('.', 2)
if len(parts) >= 2:
category = f"{parts[0]}.{parts[1]}"
else:
category = parts[0] if parts else 'unknown'

category_stats[category]['count'] += 1

# Parse timestamp
cached_at = entry.get('cached_at')
if cached_at:
try:
cached_time = datetime.fromisoformat(cached_at.replace('Z', '+00:00'))
age_hours = (now - cached_time).total_seconds() / 3600

category_stats[category]['total_age'] += age_hours

if age_hours < 1:
age_distribution['fresh'] += 1
elif age_hours < 6:
age_distribution['recent'] += 1
elif age_hours < 24:
age_distribution['day'] += 1
elif age_hours < 168: # 7 days
age_distribution['week'] += 1
else:
age_distribution['old'] += 1

except (ValueError, TypeError):
age_distribution['invalid'] += 1
else:
age_distribution['invalid'] += 1

# Calculate average ages
for category, stats in category_stats.items():
if stats['count'] > 0:
stats['avg_age_hours'] = stats['total_age'] / stats['count']

# Report findings
total_entries = sum(age_distribution.values())
print(f"\n📊 Cache Entry Age Distribution (Total: {total_entries}):")
for age_group, count in age_distribution.items():
percentage = (count / total_entries * 100) if total_entries > 0 else 0
print(f" {age_group.title()}: {count} ({percentage:.1f}%)")

print("\n📂 Category Analysis:")
for category, stats in sorted(category_stats.items()):
avg_age = stats['avg_age_hours']
print(f" {category}: {stats['count']} entries, avg age: {avg_age:.1f}h")

# Optimization recommendations
print("\n💡 Optimization Recommendations:")

if age_distribution['old'] > total_entries * 0.1: # More than 10% old entries
print(f" ⚠️ {age_distribution['old']} entries are >7 days old - consider cleanup")

if age_distribution['invalid'] > 0:
print(f" ⚠️ {age_distribution['invalid']} entries have invalid timestamps - investigate")

# Find categories with high average age
stale_categories = [cat for cat, stats in category_stats.items() if stats['avg_age_hours'] > 72]
if stale_categories:
print(f" 📅 Stale categories (>72h avg): {', '.join(stale_categories)}")

# Environment hash analysis
env_hash = cache_data.get('environment_hash')
if env_hash:
print(f"\n🌍 Environment Hash: {env_hash}")

# Check if entries have mismatched environment hashes
mismatched = 0
for entry in availability_data.values():
if entry.get('environment_hash') != env_hash:
mismatched += 1

if mismatched > 0:
print(f" ⚠️ {mismatched} entries have mismatched environment hashes")

print("\n" + "=" * 80)

if __name__ == "__main__":
weekly_cache_analysis()

Availability Cache Troubleshooting

Common Issues and Solutions:

1. High Cache Miss Rate:

# Diagnostic script for cache miss analysis
def diagnose_cache_misses():
cache_service = AvailabilityCacheService(cache_file_path)
stats = cache_service.get_cache_stats()

perf = stats.get('performance', {})
total = perf.get('cache_hits', 0) + perf.get('cache_misses', 0)
miss_rate = (perf.get('cache_misses', 0) / total * 100) if total > 0 else 0

if miss_rate > 30: # High miss rate
print(f"High miss rate detected: {miss_rate:.1f}%")

# Check auto-invalidation frequency
auto_invalidations = perf.get('auto_invalidations', 0)
if auto_invalidations > 5:
print(f"Frequent auto-invalidations: {auto_invalidations}")
print("Possible causes:")
print(" - Frequent package installations")
print(" - Configuration file changes")
print(" - Environment instability")

# Check if cache file exists and is readable
if not stats.get('cache_exists', False):
print("Cache file missing - will be rebuilt on next access")

2. Cache File Corruption:

# Cache file recovery procedure
def recover_corrupted_cache():
cache_file = Path("~/.agentmap/availability_cache.json")

if cache_file.exists():
try:
with open(cache_file, 'r') as f:
cache_data = json.load(f)
print("Cache file is valid JSON")
except json.JSONDecodeError as e:
print(f"Cache file corrupted: {e}")

# Backup corrupted file
backup_file = cache_file.with_suffix('.corrupted')
cache_file.rename(backup_file)
print(f"Moved corrupted cache to: {backup_file}")

print("Cache will be rebuilt on next access")

3. Performance Degradation:

# Performance monitoring script
def monitor_cache_performance():
import time

cache_service = AvailabilityCacheService(cache_file_path)

# Test cache read performance
start_time = time.time()
for i in range(100):
result = cache_service.get_availability("test", f"key_{i}")
read_time = (time.time() - start_time) * 1000 / 100 # ms per operation

# Test cache write performance
test_data = {"available": True, "test": True}
start_time = time.time()
for i in range(10):
cache_service.set_availability("test", f"perf_key_{i}", test_data)
write_time = (time.time() - start_time) * 1000 / 10 # ms per operation

print(f"Cache read performance: {read_time:.2f}ms per operation")
print(f"Cache write performance: {write_time:.2f}ms per operation")

# Performance thresholds
if read_time > 10: # More than 10ms per read
print("⚠️ Slow cache reads detected")

if write_time > 50: # More than 50ms per write
print("⚠️ Slow cache writes detected")

Validation Cache Administration Commands

System-Wide Cache Statistics

Monitor cache health across the entire system:

# Comprehensive cache statistics
agentmap validate-cache --stats

# System-wide cache analysis
agentmap validate-cache --stats --verbose

# JSON output for monitoring tools
agentmap validate-cache --stats --format json

Sample Administrative Output:

🔧 AgentMap Cache Administration Report
==========================================

📊 Cache Statistics:
- Total cache entries: 247
- Valid cache entries: 198 (80.16%)
- Expired cache entries: 34 (13.77%)
- Corrupted cache entries: 15 (6.07%)

💾 Storage Usage:
- Cache directory: ~/.agentmap/validation_cache
- Total cache size: 1.2 MB
- Average file size: 4.9 KB
- Disk space available: 45.2 GB

⚡ Performance Metrics:
- Cache hit rate (24h): 87.3%
- Average cache access time: 12ms
- Average validation time: 234ms
- Performance improvement: 19.5x

🕒 Cache Age Distribution:
- < 1 hour: 45 files (18.2%)
- 1-6 hours: 89 files (36.0%)
- 6-24 hours: 64 files (25.9%)
- > 24 hours (expired): 34 files (13.8%)
- Corrupted: 15 files (6.1%)

📈 Recent Activity:
- Cache writes (24h): 156
- Cache reads (24h): 1,247
- Cache invalidations (24h): 23
- Cleanup operations (24h): 3

Administrative Cache Cleanup

Comprehensive cache maintenance operations:

# Remove all expired cache entries
agentmap validate-cache --cleanup

# Remove corrupted cache files
agentmap validate-cache --cleanup --corrupted

# Full cache reset (use with caution)
agentmap validate-cache --clear --force

# Cleanup with size limits
agentmap validate-cache --cleanup --max-size 500MB

# Cleanup files older than specific time
agentmap validate-cache --cleanup --older-than 7d

Cache Maintenance Scheduling

Set up automated cache maintenance:

# Daily cleanup of expired entries
0 2 * * * /usr/local/bin/agentmap validate-cache --cleanup >/dev/null 2>&1

# Weekly full cache analysis
0 3 * * 0 /usr/local/bin/agentmap validate-cache --stats --verbose >> /var/log/agentmap-cache.log

# Monthly cache optimization
0 4 1 * * /usr/local/bin/agentmap validate-cache --optimize >> /var/log/agentmap-cache.log

Performance Monitoring

Cache Performance Metrics

Key metrics for administrative monitoring:

Hit Rate Analysis:

# Current hit rate
agentmap validate-cache --stats | grep "hit rate"

# Historical hit rate (if logging enabled)
grep "cache hit" /var/log/agentmap.log | tail -100

Storage Growth Monitoring:

# Monitor cache directory size
du -sh ~/.agentmap/validation_cache

# Track cache growth over time
echo "$(date): $(du -sh ~/.agentmap/validation_cache)" >> /var/log/agentmap-cache-size.log

Performance Benchmarking:

# Benchmark cache vs no-cache performance
time agentmap validate csv --csv large_file.csv # With cache
time agentmap validate csv --csv large_file.csv --no-cache # Without cache

System Resource Impact

Monitor cache impact on system resources:

# Memory usage analysis
ps aux | grep agentmap

# I/O usage monitoring
iotop -p $(pgrep agentmap)

# Disk usage tracking
df -h ~/.agentmap/

Cache Optimization Strategies

Hit Rate Optimization

Improve cache efficiency through configuration tuning:

Cache TTL Adjustment:

# Extend cache lifetime for stable environments
export AGENTMAP_CACHE_TTL=48h

# Reduce cache lifetime for dynamic environments
export AGENTMAP_CACHE_TTL=6h

Cache Size Optimization:

# Set maximum cache size
export AGENTMAP_MAX_CACHE_SIZE=1GB

# Set maximum cache files
export AGENTMAP_MAX_CACHE_FILES=10000

Storage Optimization

Optimize cache storage for performance:

SSD Optimization:

# Move cache to SSD for better performance
export AGENTMAP_CACHE_DIR=/fast-ssd/agentmap-cache

Network Storage Considerations:

# Avoid network storage for cache (performance impact)
# Use local storage: ~/.agentmap/validation_cache
# Avoid: /nfs/shared/cache or /smb/cache

Multi-User Environment Setup

Configure cache for multi-user environments:

# Per-user cache isolation (default)
~/.agentmap/validation_cache/

# Shared cache directory (advanced setup)
/opt/agentmap/shared_cache/
chmod 775 /opt/agentmap/shared_cache/
chgrp agentmap-users /opt/agentmap/shared_cache/

Troubleshooting Administration

Common Administrative Issues

High Cache Miss Rate:

# Analyze cache miss patterns
agentmap validate-cache --stats --verbose | grep "miss"

# Check for frequent file changes
find ~/.agentmap/validation_cache -name "*.json" -newer -1h | wc -l

# Review cache TTL settings
echo "Current TTL: $AGENTMAP_CACHE_TTL"

Disk Space Issues:

# Check cache directory size
du -sh ~/.agentmap/validation_cache

# Find largest cache files
find ~/.agentmap/validation_cache -type f -exec ls -lh {} + | sort -k5 -hr | head -20

# Emergency cache cleanup
agentmap validate-cache --cleanup --max-size 100MB

Corrupted Cache Files:

# Identify corrupted files
agentmap validate-cache --stats | grep "corrupted"

# Remove corrupted files
agentmap validate-cache --cleanup --corrupted

# Verify cache integrity
agentmap validate-cache --verify

Cache Recovery Procedures

Complete Cache Recovery:

# Step 1: Backup current cache (optional)
tar -czf cache-backup-$(date +%Y%m%d).tar.gz ~/.agentmap/validation_cache/

# Step 2: Clear all cache
agentmap validate-cache --clear --force

# Step 3: Verify cache directory is clean
ls -la ~/.agentmap/validation_cache/

# Step 4: Run test validation to rebuild cache
agentmap validate all

# Step 5: Verify cache is working
agentmap validate-cache --stats

Selective Cache Recovery:

# Remove cache for specific patterns
find ~/.agentmap/validation_cache -name "*corrupted_file*" -delete

# Rebuild cache for specific files
agentmap validate csv --csv problem_file.csv --no-cache

Monitoring and Alerting

Cache Health Monitoring Scripts

Create monitoring scripts for cache health:

#!/bin/bash
# cache-health-check.sh

CACHE_DIR="$HOME/.agentmap/validation_cache"
MAX_SIZE_MB=1000
MAX_CORRUPTED=10

# Check cache size
CACHE_SIZE_MB=$(du -sm "$CACHE_DIR" | cut -f1)
if [ "$CACHE_SIZE_MB" -gt "$MAX_SIZE_MB" ]; then
echo "WARNING: Cache size $CACHE_SIZE_MB MB exceeds $MAX_SIZE_MB MB"
fi

# Check corrupted files
CORRUPTED_COUNT=$(agentmap validate-cache --stats | grep -o '[0-9]* corrupted' | cut -d' ' -f1)
if [ "$CORRUPTED_COUNT" -gt "$MAX_CORRUPTED" ]; then
echo "WARNING: $CORRUPTED_COUNT corrupted cache files exceeds threshold $MAX_CORRUPTED"
fi

# Check hit rate
HIT_RATE=$(agentmap validate-cache --stats | grep "hit rate" | grep -o '[0-9.]*%' | cut -d'%' -f1)
if (( $(echo "$HIT_RATE < 70" | bc -l) )); then
echo "WARNING: Cache hit rate $HIT_RATE% is below 70%"
fi

Log Analysis

Analyze cache behavior through logs:

# Enable cache logging
export AGENTMAP_CACHE_LOG_LEVEL=INFO

# Analyze cache patterns
grep "cache" /var/log/agentmap.log | tail -100

# Cache performance analysis
awk '/cache hit/ {hits++} /cache miss/ {misses++} END {print "Hit rate:", hits/(hits+misses)*100"%"}' /var/log/agentmap.log

Security Considerations

Cache Security

Protect cached data and system integrity:

File Permissions:

# Secure cache directory permissions
chmod 750 ~/.agentmap/validation_cache
chmod 640 ~/.agentmap/validation_cache/*.json

Data Sensitivity:

  • Cache files may contain sensitive validation data
  • Consider encryption for cache in sensitive environments
  • Implement cache cleanup policies for compliance

Access Control:

# Multi-user environment access control
sudo groupadd agentmap-cache-users
sudo usermod -a -G agentmap-cache-users username
sudo chgrp -R agentmap-cache-users /opt/agentmap/cache
sudo chmod -R 2750 /opt/agentmap/cache

Enterprise Administration

Large-Scale Cache Management

For enterprise deployments with multiple AgentMap instances:

Centralized Cache Monitoring:

# Collect cache statistics from multiple instances
for host in $(cat agentmap-hosts.txt); do
echo "=== $host ==="
ssh $host "agentmap validate-cache --stats"
done

Distributed Cache Cleanup:

# Automated cleanup across multiple systems
ansible-playbook -i inventory cache-cleanup.yml

Cache Performance Aggregation:

# Aggregate cache metrics for enterprise dashboard
for host in $(cat agentmap-hosts.txt); do
ssh $host "agentmap validate-cache --stats --format json" >> cache-metrics.jsonl
done

Backup and Disaster Recovery

Cache Backup Strategy:

# Daily cache backup (optional - cache is rebuiltable)
tar -czf "cache-backup-$(date +%Y%m%d).tar.gz" ~/.agentmap/validation_cache/

# Weekly cleanup of old backups
find . -name "cache-backup-*.tar.gz" -mtime +7 -delete

Disaster Recovery:

  • Cache is rebuiltable from source files
  • Focus on backing up source CSV/config files
  • Cache can be rebuilt automatically during recovery

Best Practices for Administrators

Daily Operations

  1. Monitor Cache Health: Check cache statistics daily
  2. Review Storage Usage: Monitor disk space consumption
  3. Analyze Performance: Track cache hit rates and performance metrics
  4. Cleanup Automation: Ensure automated cleanup is running

Weekly Maintenance

  1. Deep Analysis: Review cache patterns and optimization opportunities
  2. Security Review: Check cache permissions and access controls
  3. Performance Tuning: Adjust cache parameters based on usage patterns
  4. Documentation Updates: Update cache policies and procedures

Monthly Reviews

  1. Capacity Planning: Analyze growth trends and storage requirements
  2. Performance Benchmarking: Compare cache performance over time
  3. Policy Review: Update cache retention and cleanup policies
  4. Training Updates: Ensure team is aware of cache management procedures

Next Steps

  1. Set Up Monitoring: Implement cache health monitoring scripts
  2. Schedule Maintenance: Configure automated cache cleanup procedures
  3. Optimize Performance: Tune cache parameters for your environment
  4. Document Policies: Create organization-specific cache management policies
  5. Train Team: Ensure administrators understand cache management procedures