CSV Validation Guide
The CSV validation system ensures your workflow definition files are structurally correct, logically consistent, and ready for compilation. This guide covers all aspects of CSV validation from basic structure to advanced graph topology checks.
Validation Overviewâ
CSV validation operates at multiple levels to ensure comprehensive workflow integrity:
- Structure Validation: Column presence, naming, and basic format
- Row-level Validation: Individual row data using Pydantic models
- Graph Consistency: Workflow topology and node relationships
- Agent Validation: Agent type availability and configuration
- Routing Logic: Edge definitions and navigation paths
Column Validationâ
Required Columnsâ
Every CSV workflow file must contain these essential columns:
GraphName
: Name of the workflow graphNode
: Unique identifier for each node within the graph
Optional Columnsâ
Additional columns provide functionality and configuration:
AgentType
: Specifies which agent handles the nodePrompt
: Instructions or template for the agentDescription
: Human-readable description of the nodeContext
: Additional context or configuration dataInput_Fields
: Pipe-separated list of input field namesOutput_Field
: Name of the output field producedEdge
: Direct routing to next nodeSuccess_Next
: Node to route to on successful executionFailure_Next
: Node to route to on failure
Column Alias Supportâ
The system supports flexible column naming with case-insensitive matching:
# These are all equivalent to "GraphName"
GraphName, graph_name, Graph, WorkflowName, workflow_name, workflow
# These are all equivalent to "Node"
Node, node_name, NodeName, Step, StepName, name
# These are all equivalent to "AgentType"
AgentType, agent_type, Agent, Type
Normalization Process:
- Column names are matched case-insensitively
- Aliases are automatically converted to canonical names
- Validation proceeds with normalized column names
Row-Level Validationâ
Each CSV row is validated using a Pydantic model to ensure data integrity:
Required Field Validationâ
# Required fields cannot be empty or whitespace-only
GraphName: "customer_onboarding" # â
Valid
Node: "validate_email" # â
Valid
GraphName: "" # â Error: cannot be empty
Node: " " # â Error: cannot be whitespace only
Input Fields Validationâ
Input fields must follow pipe-separated format with valid field names:
# Valid input field formats
Input_Fields: "email|name|phone" # â
Multiple fields
Input_Fields: "user_data" # â
Single field
Input_Fields: "customer-info|preferences" # â
With dashes
Input_Fields: "" # â
Empty (optional)
# Invalid formats
Input_Fields: "field with spaces" # â Spaces not allowed
Input_Fields: "field@domain" # â Special characters not allowed
Output Field Validationâ
Output fields must be valid identifiers:
# Valid output field names
Output_Field: "processed_email" # â
Valid identifier
Output_Field: "result" # â
Simple name
Output_Field: "customer-data" # â
With dash
# Invalid output field names
Output_Field: "result data" # â Spaces not allowed
Output_Field: "result@processed" # â Special characters not allowed
Routing Logic Validationâ
The system validates routing configurations to prevent conflicts:
# Valid: Direct routing
GraphName,Node,Edge
workflow1,start,process_data
# Valid: Conditional routing
GraphName,Node,Success_Next,Failure_Next
workflow1,validate,approved,rejected
# Invalid: Conflicting routing
GraphName,Node,Edge,Success_Next
workflow1,node1,next_node,success_node # â Cannot use both Edge and Success/Failure_Next
Graph Consistency Validationâ
Duplicate Node Detectionâ
Each node name must be unique within its graph:
GraphName,Node
customer_flow,validate_email # â
First instance
customer_flow,process_payment # â
Different node
customer_flow,validate_email # â Error: Duplicate node in same graph
# Valid: Same node name in different graphs
customer_flow,validate_email # â
Valid
admin_flow,validate_email # â
Valid (different graph)
Node Reference Validationâ
All edge targets must reference existing nodes within the same graph:
# Valid references
GraphName,Node,Edge
workflow1,start,process_data
workflow1,process_data,end
# Invalid reference
GraphName,Node,Edge
workflow1,start,nonexistent_node # â Error: Target node doesn't exist
Entry Point Detectionâ
The system identifies potential workflow entry points:
- Entry Points: Nodes with no incoming edges from other nodes
- Multiple Entry Points: Warning if multiple nodes could be starting points
- No Entry Points: Warning if all nodes have incoming edges (potential cycle)
# Clear entry point example
GraphName,Node,Edge
workflow1,start,middle # start = entry point (no incoming edges)
workflow1,middle,end # middle has incoming edge from start
workflow1,end, # end = terminal point (no outgoing edges)
Terminal Node Detectionâ
The system identifies workflow endpoints:
- Terminal Nodes: Nodes with no outgoing edges
- No Terminal Nodes: Warning if all nodes have outgoing edges (potential infinite loop)
Agent Type Validationâ
Agent Registry Checkâ
The system validates agent types against the available agent registry:
# Valid agent types (if registered)
AgentType: "GPTAgent" # â
Valid if registered
AgentType: "HumanAgent" # â
Built-in agent type
AgentType: "CustomEmailAgent" # â
Valid if custom agent exists
# Unknown agent types
AgentType: "NonexistentAgent" # â ī¸ Warning: Unknown agent type
AgentType: "GPTAgnet" # â ī¸ Warning: Possible typo
Agent Availability Verificationâ
The validator checks if agent classes can be instantiated:
- Built-in Agents: Always available
- Custom Agents: Must be in the custom agents directory
- Function-based Agents: Must be in the functions directory
Function Reference Validationâ
The system handles function references in routing:
# Function reference in routing
GraphName,Node,Edge
workflow1,decision,func:determine_next_node(result)
# Validation behavior:
# â
Function reference detected and noted
# âšī¸ Info: Cannot validate target nodes (determined at runtime)
# â ī¸ Warning: If function is not found in functions directory
Validation Output Examplesâ
Successful Validationâ
đ Validating CSV file: workflows/customer_onboarding.csv
â
CSV file format is valid
âšī¸ CSV contains 8 rows and 6 columns
âšī¸ Found 1 graph(s): 'customer_onboarding' (8 nodes)
âšī¸ Found 3 unique agent types: GPTAgent, HumanAgent, EmailAgent
âšī¸ Graph 'customer_onboarding' has multiple potential entry points: 'start', 'manual_entry'
âšī¸ Node 'final_approval' has no outgoing edges (terminal node)
â
Validation completed successfully
Validation with Errorsâ
đ Validating CSV file: workflows/broken_workflow.csv
â CSV Validation Errors:
1. Required column missing: 'Node'
2. Duplicate node 'validate_email' in graph 'customer_flow'
Line 5
3. Node 'process_payment' references non-existent target 'send_confirmtion' in Edge
Line 6, Field: Edge, Value: send_confirmtion
Suggestion: Valid targets: validate_email, process_payment, send_confirmation
â ī¸ CSV Validation Warnings:
1. Unknown agent type: 'GPTAgnet'
Line 3, Field: AgentType, Value: GPTAgnet
Suggestion: Check spelling or ensure agent is properly registered/available
Common Validation Errorsâ
Structural Errorsâ
Missing Required Columns
â Required column missing: 'GraphName'
Solution: Add GraphName column to your CSV
Empty Required Fields
â Row validation error: Field cannot be empty or just whitespace
Line 3, Field: Node
Solution: Provide a non-empty node name
Graph Consistency Errorsâ
Duplicate Nodes
â Duplicate node 'process_data' in graph 'workflow1'
Line 5, Field: Node
Solution: Use unique node names within each graph
Invalid Node References
â Node 'start' references non-existent target 'proces_data' in Edge
Line 2, Field: Edge, Value: proces_data
Suggestion: Valid targets: process_data, validate_input, end
Solution: Fix typo in target node name
Routing Logic Errorsâ
Conflicting Edge Definitions
â Cannot have both Edge and Success/Failure_Next defined
Solution: Use either direct routing (Edge) or conditional routing (Success/Failure_Next)
Best Practicesâ
CSV Structureâ
- Consistent Naming: Use consistent column names throughout your project
- Clear Node Names: Use descriptive, unique node names
- Logical Grouping: Group related nodes in the same graph
- Documentation: Use Description column for complex nodes
Graph Designâ
- Clear Entry Points: Design workflows with obvious starting points
- Defined Endpoints: Ensure workflows have clear termination conditions
- Error Handling: Use Success/Failure routing for robust error handling
- Function Usage: Leverage function references for dynamic routing
Development Workflowâ
- Validate Early: Run validation after structural changes
- Fix Errors First: Address errors before warnings
- Review Warnings: Investigate warnings to prevent future issues
- Use Cache: Let caching speed up repeated validations
Advanced Featuresâ
Graph Statisticsâ
The validator provides insights about your workflow structure:
âšī¸ Found 2 graph(s): 'main_flow' (12 nodes), 'error_handler' (4 nodes)
âšī¸ Graph 'main_flow' has multiple potential entry points: 'start', 'resume'
âšī¸ Found 5 unique agent types: GPTAgent, HumanAgent, EmailAgent, ValidationAgent, ProcessorAgent
Performance Optimizationâ
The validator can identify potential performance issues:
â ī¸ Node 'complex_analysis' has a large prompt (500+ characters)
Line 8, Field: Prompt
Suggestion: Consider breaking into smaller, focused prompts
Related Documentationâ
- Validation System Overview: Complete validation system architecture
- CSV Schema Reference: Detailed CSV format specification
- CLI Validation Commands: Command-line validation tools
- Agent Development: Creating custom agents
- Best Practices: Development workflow integration
Next Stepsâ
- Validate Your CSV: Run
agentmap validate csv --csv your_workflow.csv
- Fix Any Issues: Address errors and warnings systematically
- Configure Validation: Set up config validation for complete workflow validation
- Integrate Cache: Use cache management for optimal performance