Network Automation
Authoritative Resources
Getting Started:
- NANOG: Network Automation using Ansible - Introductory webcast
- NetBox + Ansible Getting Started - Practical implementation guide
Architecture and Design:
- NetBox Labs: Network Automation Architecture - Comprehensive architecture guide
- NetBox Automation Workflows (Build/Deploy/Operate) - Stage-by-stage workflow guidance
Multivendor Support:
- NANOG: NAPALM Presentation - Abstraction layer for multivendor networks
Core Components
Source of Truth (NetBox/Nautobot)
Centralized database for network state:
- Device inventory (make, model, serial numbers)
- IP address management (IPAM)
- Circuit documentation
- Cable/connection tracking
- Configuration context
Why it matters: Automation is only as good as its data source. Without accurate inventory, automated changes target wrong devices.
Choosing between NetBox and Nautobot:
- NetBox: Original project, large community, stable
- Nautobot: Fork with additional features, built for automation-first workflows
Configuration Management (Ansible/Salt/Nornir)
Deploy configurations consistently across devices.
Ansible:
- Agentless (uses SSH)
- Large network module ecosystem
- Playbooks in YAML
- Good for periodic changes
Salt:
- Agent-based (faster for large deployments)
- Event-driven architecture
- Good for continuous enforcement
Nornir:
- Python framework
- More programmatic than Ansible
- Better performance for large inventories
Selection criteria: Ansible dominates network automation due to ecosystem and ease of use. Consider alternatives for scale (>1000 devices) or event-driven requirements.
Version Control (Git)
All configurations and automation code must be versioned.
What to store:
- Device configurations (as rendered templates or as generated configs)
- Ansible playbooks/Salt states
- Jinja2 templates
- Python scripts
- Documentation
Workflow:
- Feature branches for changes
- Pull requests for peer review
- CI/CD pipeline for validation
- Merge to main after approval
CI/CD Pipeline
Automate testing before deployment.
Validation stages:
- Syntax checking (Ansible --syntax-check, yamllint)
- Linting (ansible-lint for playbooks)
- Unit tests (for custom modules)
- Integration tests (against lab environment)
- Deployment to production (manual approval gate)
Tools: Jenkins, GitLab CI, GitHub Actions, AWX/Ansible Tower
Template Engine (Jinja2)
Generate device-specific configurations from templates + data.
Example:
interface {{ interface.name }}
description {{ interface.description }}
ip address {{ interface.ipv4.address }}/{{ interface.ipv4.prefix_length }}
{% if interface.enabled %}
no shutdown
{% else %}
shutdown
{% endif %}
Data source: NetBox/Nautobot via API or Ansible inventory plugin
Automation Patterns
Intent-Based Automation
Declare desired state, let automation converge to it.
Example: "All edge routers should have NTP server 192.0.2.1"
Automation:
- Query source of truth for list of edge routers
- Generate NTP config snippet
- Deploy if different from current state
- Verify NTP sync after deployment
Event-Driven Automation
React to network events automatically.
Triggers:
- Device added to NetBox → provision base config
- BGP session down → create ticket, notify NOC
- Interface utilization >80% → adjust traffic engineering
Implementation: NetBox webhooks + Ansible Automation Platform
Declarative vs. Imperative
Declarative (preferred):
interfaces:
- name: GigabitEthernet0/0/0
description: Uplink to Core
enabled: true
Automation determines steps to achieve this state.
Imperative (fragile):
- configure terminal
- interface GigabitEthernet0/0/0
- description Uplink to Core
- no shutdown
Explicitly lists commands. Breaks if device already configured differently.
Device Interaction Methods
SSH (Paramiko/Netmiko)
Traditional CLI automation.
Netmiko: Wrapper around Paramiko with vendor-specific handling.
Pros: Works with any device that has SSH Cons: Screen scraping, fragile, slow
NETCONF/YANG
Structured configuration over XML/JSON.
YANG: Data modeling language NETCONF: Protocol for manipulating YANG-modeled data
Pros: Structured, transactional, supports validation Cons: Not universally supported, complex to learn
REST APIs
HTTP-based device APIs.
Pros: Easy to use, language-agnostic Cons: Vendor-specific, not standardized
gRPC/gNMI
Modern streaming telemetry and configuration.
Pros: High performance, streaming capable, structured Cons: Limited device support (improving)
Recommendation: Use highest-level API available. Prefer NETCONF > REST > SSH.
Practical Implementation
Starting Small
- Backup automation: Automated config backups to Git
- Documentation: Generate device lists from NetBox
- Read-only validation: Check configs match standards
- Simple config pushes: NTP, SNMP, syslog servers
- Expand gradually: More complex workflows as confidence grows
Common Use Cases
Day 0 (Provisioning):
- ZTP (Zero Touch Provisioning)
- Base configuration deployment
- Management access setup
Day 1 (Initial Config):
- Service-specific configuration
- Routing protocol setup
- Interface configuration
Day 2 (Operations):
- Config compliance checks
- Certificate renewal
- Software upgrades
Ongoing:
- Automated backups
- Configuration drift detection
- Inventory synchronization
Integration with Network Operations
With BGP: Automate peer provisioning, filter generation from IRR With Traffic Engineering: Deploy TE policies from centralized computation With Routing Security: Generate prefix lists from RPKI/IRR
See respective sections: BGP, Traffic Engineering, Routing Security
Avoiding Common Pitfalls
Don't automate broken processes: Fix the process first, then automate.
Test in lab: Never run untested automation in production.
Start with read-only: Prove automation works before making changes.
Have rollback plan: Every change should be reversible.
Monitor automation: Failed automation is worse than no automation.
Document: Explain why automation does what it does.
Tools Ecosystem
Network-Specific:
- NAPALM - Multivendor abstraction
- Nornir - Python automation framework
- Netmiko - SSH library for network devices
- Scrapli - Modern SSH library, faster than Netmiko
General Automation:
- Ansible - Widely used, large ecosystem
- Salt - Event-driven, agent-based
- Terraform - Infrastructure-as-code (limited network support)
Source of Truth:
- NetBox - Open source network documentation
- Nautobot - NetBox fork with automation focus
Testing:
- Batfish - Network configuration analysis
- GNS3/EVE-NG - Network emulation for testing
Recommended stack for most operators: NetBox + Ansible + Git + GitLab CI
Sources: