Skip to main content

Cloud Storage Integration

AgentMap supports seamless integration with major cloud storage providers for JSON document operations. This feature allows you to read and write JSON documents directly from/to Azure Blob Storage, AWS S3, and Google Cloud Storage without changing your workflow structure.

Cloud Storage Benefits
  • Scalability: Handle large datasets without local storage limitations
  • Reliability: Built-in redundancy and backup features
  • Security: Enterprise-grade encryption and access controls
  • Collaboration: Share data across teams and environments
  • Cost-Effective: Pay only for what you use

Supported Cloud Providers

  • Service: Azure Blob Storage
  • Authentication: Connection string or account key
  • Features: Container-based organization, metadata support
  • Best For: Microsoft ecosystem integration

Configuration

Basic Configuration Structure

Update your storage_config.yaml file with cloud provider configurations:

json:
default_provider: "local" # Default provider if not specified in URI
providers:
local:
base_dir: "data/json"

azure:
connection_string: "env:AZURE_STORAGE_CONNECTION_STRING"
default_container: "documents"
containers:
users: "users-container"
reports: "reports-container"

aws:
region: "us-west-2"
access_key: "env:AWS_ACCESS_KEY_ID"
secret_key: "env:AWS_SECRET_ACCESS_KEY"
default_bucket: "my-documents"
buckets:
users: "users-bucket"
reports: "reports-bucket"

gcp:
project_id: "env:GCP_PROJECT_ID"
credentials_file: "path/to/service-account.json"
default_bucket: "documents"

collections:
# Local files
users: "users.json"

# Cloud storage with explicit URIs
azure_users: "azure://users/data.json"
aws_reports: "s3://reports/monthly.json"
gcp_documents: "gs://documents/archive.json"

Provider-Specific Configuration

azure:
# Primary authentication method (recommended)
connection_string: "env:AZURE_STORAGE_CONNECTION_STRING"

# Alternative authentication
# account_name: "env:AZURE_STORAGE_ACCOUNT"
# account_key: "env:AZURE_STORAGE_KEY"

# Container configuration
default_container: "documents"
containers:
users: "users-prod-container"
configs: "app-configs"
logs: "application-logs"

# Optional settings
timeout: 30 # Connection timeout in seconds
retry_count: 3
enable_logging: true

URI Format for Cloud Storage

Cloud storage locations are specified using these URI formats:

ProviderURI FormatExampleDescription
Azure Blob Storageazure://container/path/to/blob.jsonazure://documents/users.jsonContainer-based storage
AWS S3s3://bucket/path/to/object.jsons3://my-bucket/data/config.jsonBucket-based storage
Google Cloud Storagegs://bucket/path/to/blob.jsongs://my-bucket/reports/monthly.jsonBucket-based storage

URI Examples

collections:
user_profiles: "azure://users/profiles.json"
app_config: "s3://config/app.json"
reports: "gs://analytics/reports.json"

Authentication Methods

Environment Variables

The recommended approach is to use environment variables for sensitive credentials:

# Azure Blob Storage
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net"

# Alternative approach
export AZURE_STORAGE_ACCOUNT="mystorageaccount"
export AZURE_STORAGE_KEY="your-account-key"

Authentication Best Practices

Security Best Practices
  1. Use Environment Variables: Never hardcode credentials in configuration files
  2. Rotate Credentials Regularly: Set up automatic credential rotation where possible
  3. Least Privilege Access: Grant only the minimum required permissions
  4. Use IAM Roles: Prefer IAM roles over static credentials in cloud environments
  5. Enable Audit Logging: Track access to sensitive data
Advanced Authentication Options

Azure Managed Identity (for Azure VMs/App Service):

azure:
use_managed_identity: true
default_container: "documents"

AWS IAM Roles (for EC2/Lambda):

aws:
region: "us-west-2"
# No credentials needed - uses instance profile
default_bucket: "my-documents"

GCP Service Account (detailed configuration):

gcp:
project_id: "env:GCP_PROJECT_ID"
credentials_file: "/opt/app/credentials/service-account.json"
scopes:
- "https://www.googleapis.com/auth/devstorage.read_write"
default_bucket: "documents"

Using Cloud Storage in Workflows

CSV Workflow Examples

GraphName,Node,Edge,Context,AgentType,Success_Next,Failure_Next,Input_Fields,Output_Field,Prompt
CloudFlow,ReadData,,Read from Azure,cloud_json_reader,Process,,collection,data,"azure://container/data.json"
CloudFlow,Process,,Process data,DataProcessor,SaveData,,data,processed_data,"Process the data"
CloudFlow,SaveData,,Save to AWS S3,cloud_json_writer,End,,processed_data,result,"s3://bucket/output.json"
CloudFlow,End,,Completion,Echo,,,"result",final_message,Data processing complete

Agent Context Configuration

# Cloud JSON Reader Agent Context
{
"collection": "azure://prod-data/users.json",
"format": "raw",
"timeout": 30,
"retry_count": 3,
"cache_enabled": true
}

Container/Bucket Mappings

You can map logical container/bucket names to actual storage containers for better organization and environment management:

azure:
containers:
users: "users-prod-container" # Production users
configs: "app-configs-v2" # Application configurations
logs: "application-logs-2024" # Current year logs
temp: "temporary-processing" # Temporary data

aws:
buckets:
analytics: "analytics-prod-us-west-2" # Regional analytics data
backups: "system-backups-encrypted" # Encrypted backups
ml_models: "ml-models-versioned" # Versioned ML models
user_uploads: "user-uploads-secure" # Secure user uploads

gcp:
buckets:
documents: "documents-prod-global" # Global document storage
images: "images-cdn-optimized" # CDN-optimized images
archives: "long-term-archives" # Long-term archival
processing: "temp-processing-queue" # Temporary processing queue

Then use logical names in URIs:

collections:
user_data: "azure://users/profiles.json" # Uses "users-prod-container"
app_config: "s3://configs/app.json" # Uses "app-configs-v2"
documents: "gs://documents/archive.json" # Uses "documents-prod-global"

Required Dependencies

Install the appropriate cloud SDK packages:

# Azure Blob Storage
pip install azure-storage-blob

# AWS S3
pip install boto3

# Google Cloud Storage
pip install google-cloud-storage

# Install all cloud providers
pip install azure-storage-blob boto3 google-cloud-storage

Error Handling and Troubleshooting

Common Issues and Solutions

Authentication Failures

Problem: AuthenticationError or Unauthorized exceptions

Solutions:

  1. Verify environment variables are set correctly
  2. Check credential expiration dates
  3. Ensure proper permissions on containers/buckets
  4. Validate connection strings format
# Debug authentication
import os
print("Azure connection string:", os.getenv("AZURE_STORAGE_CONNECTION_STRING", "Not set"))
print("AWS access key:", os.getenv("AWS_ACCESS_KEY_ID", "Not set"))
print("GCP project:", os.getenv("GCP_PROJECT_ID", "Not set"))
Network Connectivity Issues

Problem: ConnectionError or timeout exceptions

Solutions:

  1. Check internet connectivity
  2. Verify firewall rules allow outbound HTTPS
  3. Increase timeout values in configuration
  4. Check cloud provider service status
# Increase timeouts
azure:
timeout: 60 # Increase from default 30
retry_count: 5

aws:
timeout: 60
verify_ssl: false # Only for testing

gcp:
timeout: 60
retry_count: 5
Container/Bucket Not Found

Problem: ContainerNotFound or NoSuchBucket errors

Solutions:

  1. Verify container/bucket exists in cloud console
  2. Check spelling and case sensitivity
  3. Ensure proper region configuration
  4. Create containers/buckets if needed
# Check if container/bucket exists
def check_storage_exists():
# Implementation depends on provider
# Add checks before workflow execution
pass
Permission Denied Errors

Problem: PermissionDenied or AccessDenied exceptions

Solutions:

  1. Verify IAM permissions include required operations
  2. Check if containers/buckets have public access restrictions
  3. Ensure service account has proper roles
  4. Review bucket policies and ACLs

Required Permissions by Provider:

Azure: Storage Blob Data Contributor or custom role with:

  • Microsoft.Storage/storageAccounts/blobServices/containers/read
  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
  • Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write

AWS: Policy with actions:

  • s3:GetObject
  • s3:PutObject
  • s3:DeleteObject
  • s3:ListBucket

GCP: Role Storage Object Admin or custom role with:

  • storage.objects.get
  • storage.objects.create
  • storage.objects.update
  • storage.objects.delete

Monitoring and Logging

# Enhanced logging configuration
azure:
enable_logging: true
log_level: "INFO" # DEBUG, INFO, WARNING, ERROR

aws:
enable_logging: true
log_requests: true
log_responses: false # Avoid logging sensitive data

gcp:
enable_logging: true
log_level: "INFO"

Performance Optimization

Caching Strategies

json:
cache:
enabled: true
ttl: 300 # 5 minutes
max_size: 100 # Max cached items
strategy: "lru" # Least Recently Used

Batch Operations

# Efficient batch processing
async def process_cloud_data_batch(collections, batch_size=10):
"""Process multiple cloud collections efficiently."""
results = []

for i in range(0, len(collections), batch_size):
batch = collections[i:i + batch_size]

# Process batch concurrently
batch_tasks = [
storage_service.read(collection)
for collection in batch
]

batch_results = await asyncio.gather(*batch_tasks, return_exceptions=True)
results.extend(batch_results)

return results

Connection Pooling

# Connection pool configuration
azure:
connection_pool:
max_connections: 10
max_idle_time: 300

aws:
connection_pool:
max_connections: 10
max_retries: 3
backoff_mode: "adaptive"

gcp:
connection_pool:
max_connections: 10
keepalive_timeout: 300

Security Best Practices

Data Encryption

# Ensure HTTPS/TLS for all providers
azure:
use_ssl: true
verify_ssl: true

aws:
use_ssl: true
verify_ssl: true

gcp:
use_ssl: true # Always enabled

Access Control

# Implement least privilege access
azure:
rbac:
enabled: true
roles:
- "Storage Blob Data Reader" # For read-only operations
- "Storage Blob Data Contributor" # For read/write operations

aws:
iam:
policy_arn: "arn:aws:iam::123456789012:policy/AgentMapStoragePolicy"
# Custom policy with minimal required permissions

gcp:
iam:
service_account: "agentmap-storage@project.iam.gserviceaccount.com"
roles:
- "roles/storage.objectViewer" # Read access
- "roles/storage.objectCreator" # Write access
Production Deployment

For production deployments, consider implementing:

  1. Multi-region replication for disaster recovery
  2. Automated backup strategies with retention policies
  3. Monitoring and alerting for storage operations
  4. Cost optimization through lifecycle policies
  5. Compliance controls for data governance