How We Reduced Manual CLI Changes by 60% Using NETCONF and YANG: A Network Automation Story by Ashwani Sugandhi, Cisco Network Engineer

Share to save for later

Feb 23, 2026

Ashwani Sugandhi
Expert Insight by

Ashwani Sugandhi

Network & Automation Engineer at Cisco Systems

Network Engineering / Network AutomationLinkedIn

Ashwani spent 4+ years at Cisco Systems automating enterprise data center operations for Microsoft, Standard Chartered Bank, Samsung, and Duke Energy. She holds CCNP, CCNA, and Cisco DevNet Associate certifications — a rare triple that spans both traditional networking and programmable infrastructure. She led the Master Automation Plan that standardized automation practices across Cisco project teams, trained new hires on Python and Robot Framework, and is currently pursuing an M.S. in Computer Science at the University of Cincinnati.

Verified Expert

Hundreds of CLI changes per week. That was the reality on one of Cisco's largest Microsoft data center engagements. Every interface configuration, every BGP neighbor, every QoS policy — typed by hand into a terminal, verified by eye, documented in a spreadsheet.

One mistyped subnet mask. One wrong AS number. One forgotten commit. Any of these could take down a production fabric serving millions of users.

Then we started pushing structured configurations through NETCONF using YANG models. Not scripts that scraped CLI output. Not Ansible playbooks that still sent raw commands under the hood. Actual model-driven automation — where the network device understands the intent, validates the schema, and either accepts the change or rejects it cleanly.

Within months, 60% of manual CLI changes never touched a human keyboard again. But the technology was the easy part. The hard part was everything else.

Quick Answers (TL;DR)

What is NETCONF and why does it matter for network automation?

NETCONF (Network Configuration Protocol) is an IETF standard (RFC 6241) that provides a programmatic interface for configuring network devices using structured XML data modeled by YANG. Unlike CLI screen-scraping, NETCONF supports atomic transactions, rollback on error, and schema validation — meaning the device either accepts a valid configuration change completely or rejects it entirely, eliminating partial-configuration failures.

How did NETCONF reduce manual CLI changes by 60%?

By replacing hand-typed CLI commands with structured YANG-modeled configurations pushed via Python (ncclient). Interface configs, BGP neighbors, and routing policies were defined as data templates, validated against the device schema before deployment, and pushed in bulk. The remaining 40% of changes stayed manual because they involved edge cases, troubleshooting, or one-off debugging that required human judgment.

What tools do Cisco network automation engineers actually use?

The core stack: NETCONF + ncclient (Python) for configuration, gNMI + OpenConfig for streaming telemetry, PyATS for regression testing, Robot Framework for end-to-end validation, Terraform for cloud infrastructure, and GitHub Actions for CI/CD pipelines. Ansible is common but often used as a thin wrapper over CLI — real model-driven automation uses NETCONF or gNMI directly.

Why do most network automation projects fail?

Technology is 20% of the problem. The other 80% is organizational: teams that have done CLI for a decade resist change, there is no rollback strategy, no one invests in testing, and automation scripts become unmaintainable because each engineer writes them differently. The fix is a standardized automation plan — shared libraries, coding standards, CI/CD governance, and hands-on training.

What Is Network Automation (And Why CLI Scripting Doesn't Count)

Share to save for later

Most network engineers think they're doing automation. They're not. They're doing faster typing.

Writing a Python script that SSH-es into a router, sends show ip bgp summary, and parses the output with regex — that's not automation. That's a brittle hack that breaks every time the vendor changes the CLI output format by one space.
Network Automation

Network automation is the use of programmatic interfaces, data models, and orchestration systems to configure, validate, and monitor network infrastructure without manual CLI interaction. True automation uses structured data (not screen-scraped text), supports atomic transactions (all-or-nothing changes), and includes automated validation — confirming that a change produced the intended result.

There's a spectrum, and most teams are stuck at level 1:

LevelWhat It Looks LikeReliability
Level 0: Manual CLIEngineer types commands one by one, copies output into a spreadsheetLow — human error rate of 3-5% per change window
Level 1: CLI ScriptingPython/Expect scripts that SSH in and send CLI commands, parse text output with regexMedium — breaks when CLI output format changes
Level 2: Model-DrivenNETCONF/gNMI with YANG models — structured data in, structured data out, schema validationHigh — device validates the config before applying
Level 3: Intent-BasedDeclare desired state, system computes and applies the diff, validates the outcomeHighest — closed-loop automation with self-healing

At Microsoft, we operated between Level 2 and Level 3. The difference between Level 1 and Level 2 is night and day. When a Python script sends a CLI command, the device has no idea whether the command is syntactically valid, semantically correct, or even intended for that platform. It just executes whatever text it receives. With NETCONF and YANG, the device understands the data model. It validates the configuration against a schema before applying it. If something is wrong, it rejects the entire transaction — no partial configs, no half-applied changes sitting on a live production device.

Key Takeaway

CLI scripting is faster typing. Model-driven automation with NETCONF and YANG is a fundamentally different approach — the device validates the intent before execution, eliminating an entire class of human errors that CLI scripting cannot prevent.

The Stack That Actually Works — NETCONF, YANG, gNMI, and Beyond

Share to save for later

Everyone teaches you Python for network automation. And you need it — Python is the glue. But Python is a language, not an architecture. The question isn't "do you know Python?" It's "do you know what to build with it?"

Here's the stack that worked across Microsoft data centers, Standard Chartered Bank's ACI fabric, and Samsung's VXLAN environments:

NETCONF + ncclient: Structured Configuration

NETCONF (RFC 6241) is the protocol. ncclient is the Python library. Together, they let you push structured XML configurations to network devices over SSH.

NETCONF (Network Configuration Protocol)

NETCONF is an IETF standard protocol (RFC 6241) for installing, manipulating, and deleting the configuration of network devices. It uses XML-encoded data over a secure transport (SSH) and supports operations like get-config, edit-config, commit, and rollback — enabling atomic, validated configuration changes that either succeed completely or fail cleanly.

Here's what an actual NETCONF config push looks like in production. This isn't a tutorial example — it's the pattern we used to configure BGP neighbors across hundreds of IOS-XR routers:

from ncclient import manager

bgp_config = """
<config xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
  <bgp xmlns="http://openconfig.net/yang/bgp">
    <global>
      <config>
        <as>65001</as>
        <router-id>10.0.0.1</router-id>
      </config>
    </global>
    <neighbors>
      <neighbor>
        <neighbor-address>10.0.0.2</neighbor-address>
        <config>
          <peer-as>65002</peer-as>
          <description>DC-SPINE-02</description>
        </config>
      </neighbor>
    </neighbors>
  </bgp>
</config>
"""

with manager.connect(
    host="router-01.dc.microsoft.com",
    port=830,
    username=username,
    password=password,
    hostkey_verify=False,
    device_params={"name": "iosxr"}
) as m:
    response = m.edit_config(target="candidate", config=bgp_config)
    m.commit()

The critical difference from CLI: if the YANG model validation fails — wrong data type, missing required field, invalid neighbor address — the device rejects the entire change. No half-applied configuration. No production outage because one field was wrong.

YANG Models: The Schema That Makes It Work

YANG (RFC 7950) is the data modeling language that defines what a valid configuration looks like. Think of it as a schema for your network. Without YANG, NETCONF is just XML over SSH. With YANG, every configuration element has a defined type, constraints, and relationships.

YANG Data Modeling Language

YANG (RFC 7950) is a data modeling language used to define the structure, syntax, and semantics of network device configurations and operational state. YANG models describe what fields exist, what values they accept, and how elements relate to each other — enabling programmatic validation of configurations before they reach the device.

gNMI + OpenConfig: Real-Time Telemetry

SNMP polling every 5 minutes tells you what happened 5 minutes ago. When a link flaps or a BGP session drops, 5 minutes is an eternity.

gNMI (gRPC Network Management Interface) with OpenConfig models provides streaming telemetry — the device pushes interface counters, routing state, and health metrics in real time. At Microsoft, we used this to replace SNMP polling with sub-second visibility into interface utilization and routing metrics.

PyATS + Robot Framework: Automated Validation

Pushing configs is half the job. Validating that they worked is the other half. PyATS (Python Automated Test System) from Cisco provides structured parsers for device output, so you can validate post-change state programmatically:

  • Did the BGP neighbor come up?
  • Is the interface in the correct VLAN?
  • Are the expected routes in the routing table?

Robot Framework sits on top, providing readable test cases that non-Python engineers can understand and maintain.

Key Takeaway

The network automation stack is not a single tool — it's an architecture. NETCONF + YANG handles configuration, gNMI + OpenConfig handles telemetry, PyATS + Robot Framework handles validation, and CI/CD ties it all together. Learning Python without understanding this architecture is like learning a language without knowing what to say.

Real Project: Automating Microsoft's Data Center Fabric

Share to save for later

This was the largest and most complex engagement — validating and automating large-scale data center topologies supporting IPv4/IPv6, BGP, MPLS, Segment Routing, VXLAN, QoS, and MACsec.

The Problem

Microsoft's data center fabric had rigorous standards. Every network change — interface configuration, routing policy update, QoS adjustment — required manual validation against Microsoft's specifications. Before automation, this meant:

  • Engineers typing CLI commands one device at a time
  • Manual comparison of running configs against baseline specifications
  • Regression testing performed by hand after every software upgrade
  • Throughput and latency benchmarking using Spirent — results validated manually

The volume was relentless. Hundreds of changes per sprint, across multiple device platforms (IOS-XR, NX-OS), with zero tolerance for deviation from Microsoft's standards.

The Solution

We replaced manual CLI operations with NETCONF-based configuration automation using Python and ncclient. The approach:

  1. Define configurations as YANG-modeled templates — BGP neighbors, interface parameters, routing policies became structured data, not CLI scripts
  2. Push via NETCONF with schema validation — the device validates before applying, eliminating typo-induced outages
  3. Validate with PyATS and Robot Framework — automated post-change verification confirms the config produced the intended state
  4. Benchmark with Spirent — traffic generation and performance analysis validated throughput and latency against Microsoft's standards
60%
Reduction in manual CLI-based changes
100%
Compliance with Microsoft network standards
Multiple
"Getting to the Finish Line" awards earned

What Stayed Manual (And Why)

The 40% that remained manual wasn't a failure — it was a design decision. Some operations require human judgment:

  • Troubleshooting live issues — when a BGP session flaps at 2 AM, you need an engineer reading show commands and making real-time decisions
  • One-off debugging — packet captures, traceroutes, and protocol-level analysis don't lend themselves to templates
  • Edge cases in new deployments — first-time configurations for new topologies require human validation before they can become templates

Automation replaces the repeatable. It amplifies the engineer — it doesn't eliminate them.

The biggest misconception about network automation is that it replaces network engineers. It doesn't. It replaces the boring, error-prone parts of their job so they can focus on architecture, troubleshooting, and design — the work that actually requires expertise.

Ashwani Sugandhi, Network & Automation Engineer, Cisco Systems
Key Takeaway

Network automation at Microsoft scale meant replacing 60% of manual CLI operations with NETCONF-based model-driven configuration. The remaining 40% stayed manual by design — troubleshooting, debugging, and first-time deployments require human judgment that templates cannot replicate.

Real Project: CI/CD for Network Changes at Standard Chartered Bank

Share to save for later

At Standard Chartered Bank, the challenge wasn't configuration — it was validation. The bank's ACI (Application Centric Infrastructure) data center fabric required rigorous pre- and post-change checks for every maintenance window. Each check was manual, time-consuming, and prone to human oversight.

The Problem

Before every change window, engineers ran dozens of show commands, copied output into documents, and compared results line by line. After the change, the same process repeated. A single change window could consume 4-6 hours of manual pre/post-check work — and the results still depended on whether the engineer caught every deviation.

The Solution

We automated the entire pre/post-check workflow using Robot Framework with TextFSM for structured parsing:

  • Pre-check automation: Scripts captured the state of ACI fabric — tenant configurations, endpoint groups, contracts, interface status — in a structured format before any change
  • Post-check automation: The same scripts ran after the change, comparing results against the pre-change baseline
  • Deviation alerting: Any difference between pre and post states was flagged automatically

But the real breakthrough was the CI/CD pipeline built with GitHub Actions:

CI/CD Pipeline for Network Changes
0/6

The Results

35%
Reduction in post-change incidents
40%
Reduction in manual pre/post-check effort

The APIC API automation scripts earned recognition from Cisco leadership for creating new business opportunities in the APJC region — proving that automation isn't just an operational efficiency play, it's a revenue driver.

The Governance Battle

The hardest part wasn't building the pipeline. It was getting approval to use it.

Enterprise banks have strict change management governance. Every tool that touches production infrastructure requires security review, compliance sign-off, and integration with existing ITSM workflows. The technical build took weeks. The governance approval took months.

The lesson: if you're building network automation for a regulated industry, start the compliance conversation on day one — not after the code is written.

Key Takeaway

CI/CD for network infrastructure is not just a DevOps trend — it's a governance tool. At Standard Chartered Bank, the GitHub Actions pipeline didn't just reduce incidents by 35%. It created an auditable, repeatable process that satisfied compliance requirements better than manual change windows ever could.

Real Project: The $600K VXLAN Sprint at Samsung

Share to save for later

This one was about speed. Samsung needed VXLAN Testing Automation scripts — and they needed them fast.

The Sprint

Ten to twelve Robot Framework scripts per week. For an entire month. Each script validated a specific aspect of VXLAN fabric behavior: overlay tunnel establishment, VTEP discovery, VNI-to-VLAN mappings, BUM traffic handling, multi-tenancy isolation.

10-12
Robot Framework scripts delivered per week
1 month
Total delivery time
$600K
Upsell opportunity generated

Why Speed Matters for Automation ROI

Most automation projects die in a pilot that takes too long. Stakeholders lose patience. Budgets get reallocated. The team moves on to the next priority.

The Samsung sprint proved something important: when automation delivers visible results fast, it generates its own funding. The 40+ scripts we delivered in one month demonstrated enough value that Samsung committed to a $600K expansion of the automation engagement.

Key Takeaway

The best automation investment is the one that proves ROI within the first month. At Samsung, a focused VXLAN automation sprint — 40+ Robot Framework scripts in 30 days — generated a $600K upsell because speed of delivery built confidence in the approach.

Why Most Network Automation Initiatives Fail

Share to save for later

Here's the contrarian take that four years at Cisco taught me: network automation failures are almost never technical. The tools work. NETCONF works. Python works. CI/CD works.

The failures are organizational.

Why Network Automation Projects Fail
  • No standardization — every engineer writes scripts their own way, creating unmaintainable spaghetti code
  • No testing culture — automation scripts are pushed to production without validation, causing the very outages they were supposed to prevent
  • No rollback strategy — when an automated change goes wrong, there is no automated way to undo it
  • Team resistance — engineers who have used CLI for 10+ years see automation as a threat, not a tool
  • Over-engineering the first iteration — teams try to build a complete self-healing network before they have automated a single show command

The Master Automation Plan

At Cisco, I created the Master Automation Plan to solve exactly these problems. The plan had three pillars:

1. Standardization — Shared Libraries, Not Individual Scripts

Instead of every engineer writing their own connection handler, parser, and output formatter, we built shared libraries that everyone used. This improved code reusability by 30-40% across projects.

2. Training — Build Skills, Not Dependencies

I designed a training pathway on the Degreed learning platform and ran organization-wide sessions on Python, Robot Framework, and automation best practices. The goal was to make every team member capable of contributing to automation — not dependent on a single "automation person."

3. Governance — CI/CD for Network Code

Every automation script went through code review, linting, and regression testing before deployment. This wasn't optional. If your script didn't pass CI, it didn't deploy. The same discipline that software engineering teams take for granted was completely absent from network operations.

Where to Start
If your team has zero automation today, don't start with NETCONF or CI/CD. Start with automating one show command that your team runs every day. Parse the output. Store it. Compare it over time. Once the team sees the value, they'll ask for more — and that's when you introduce model-driven automation.
Key Takeaway

Network automation is 20% technology and 80% organizational change. The Master Automation Plan worked because it addressed all three failure modes: inconsistent code (standardization), skill gaps (training), and lack of discipline (CI/CD governance). Without all three, even the best tools fail.

The Network Automation Toolbox — An Honest Assessment

Share to save for later

After four years of using these tools in production across multiple enterprise clients, here's the unfiltered assessment — what works, what doesn't, and when to use each.

Tool / ProtocolBest ForLimitations
NETCONF + ncclientStructured config management on IOS-XR, IOS-XE, Junos. Atomic transactions with rollback.XML verbosity. Not all features have YANG model coverage. Vendor-specific model deviations.
gNMI + OpenConfigReal-time streaming telemetry. Sub-second interface and routing metrics.Fewer vendors support it natively. Client tooling less mature than NETCONF ecosystem.
AnsibleQuick wins for teams new to automation. Playbook syntax is readable. Large module ecosystem.Often just a CLI wrapper under the hood. No true model-driven validation. Scaling issues with large inventories.
TerraformCloud infrastructure (AWS VPC, Transit Gateway, security groups). Repeatable lab environments.Limited native support for network device configuration. Better for cloud-adjacent networking than box-level config.
PyATS + GenieAutomated testing and validation. Structured parsers for 1000+ CLI commands across vendors.Learning curve. Cisco-centric (though community parsers exist for other vendors).
Robot FrameworkEnd-to-end test suites. Readable test cases for non-Python engineers. Strong reporting.Slower execution than pure Python. Can become unwieldy for very complex test logic.
TextFSM / NTC TemplatesParsing unstructured CLI output into structured data. Useful when NETCONF is not available.Regex-based — fragile when output format changes. A bridge technology, not a long-term solution.
Pros
  • Model-driven automation (NETCONF/gNMI) eliminates entire categories of human error through schema validation
  • CI/CD pipelines create auditable change trails that satisfy compliance requirements
  • Streaming telemetry provides real-time visibility that SNMP polling cannot match
  • Standardized automation frameworks reduce onboarding time for new engineers by 30-40%
  • Automated pre/post-change validation catches configuration drift before it causes outages
Cons
  • YANG model coverage varies wildly between vendors and even between platforms from the same vendor
  • Initial investment is significant — building the automation framework takes months before ROI materializes
  • Not every operation should be automated — troubleshooting and edge-case debugging require human judgment
  • Organizational resistance from experienced CLI engineers can stall adoption for years
  • Maintaining automation scripts requires the same discipline as maintaining application code — many teams underestimate this

The Tool Nobody Talks About: Culture

No tool in this list matters if the team doesn't trust automation. The engineers who have been typing CLI commands for 15 years need to see automation succeed — in production, on their devices, without breaking anything — before they'll adopt it. That means starting small, proving value fast, and never deploying untested automation to production.

Key Takeaway

The best network automation tool is the one your team will actually use. Start with the simplest tool that solves a real pain point (often PyATS or Robot Framework for validation), prove value, then graduate to model-driven automation with NETCONF and YANG once the team trusts the approach.

Key Takeaways: Network Automation with NETCONF and YANG
  1. 01CLI scripting is faster typing, not automation. Model-driven automation with NETCONF and YANG provides schema validation, atomic transactions, and rollback — eliminating entire categories of human error
  2. 02At Microsoft, NETCONF-based automation reduced manual CLI changes by 60%. The remaining 40% stayed manual by design — troubleshooting and edge-case debugging require human judgment
  3. 03CI/CD for network changes reduced post-change incidents by 35% at Standard Chartered Bank, while also satisfying compliance and audit requirements
  4. 04Speed of delivery matters for automation ROI. A one-month VXLAN sprint at Samsung delivered 40+ scripts and generated a $600K upsell opportunity
  5. 05Network automation fails for organizational reasons, not technical ones. Standardization, training, and CI/CD governance — the Master Automation Plan — address the 80% that technology alone cannot fix
  6. 06Start small. Automate one show command. Parse the output. Store it. Once the team sees value, they will ask for more
FAQ

What is NETCONF and how does it differ from SSH/CLI automation?

NETCONF (RFC 6241) is a protocol that provides a programmatic, structured interface for configuring network devices using XML data modeled by YANG. Unlike SSH-based CLI automation — which sends text commands and parses text output — NETCONF uses structured data with schema validation. The device validates configurations against its YANG model before applying them, supports atomic transactions (all-or-nothing changes), and provides native rollback capability. This eliminates partial-configuration failures that are common with CLI scripting.

What is YANG and why is it important for network automation?

YANG (RFC 7950) is a data modeling language that defines the structure, syntax, and semantics of network device configurations and operational state. It specifies what configuration fields exist, what values they accept (data types, ranges, patterns), and how elements relate to each other. Without YANG, NETCONF is just XML transport. With YANG, every configuration change is validated against a formal schema — catching errors before they reach production devices.

What tools do network automation engineers use?

The production stack typically includes: NETCONF + ncclient (Python) for structured configuration management, gNMI + OpenConfig for streaming telemetry, PyATS + Genie for automated testing and CLI output parsing, Robot Framework for end-to-end validation suites, Terraform for cloud infrastructure (AWS VPC, Transit Gateway), and GitHub Actions for CI/CD pipeline automation. Ansible is common for quick wins but often operates as a CLI wrapper rather than true model-driven automation.

How do I get started with network automation if my team uses only CLI?

Start with automated validation, not configuration. Pick one show command your team runs daily (e.g., show ip bgp summary). Write a Python script that connects via SSH, runs the command, parses the output with TextFSM or PyATS, and stores the result. Run it daily and compare results over time. Once the team sees the value of automated monitoring, introduce NETCONF for configuration changes — starting with low-risk, repeatable operations like interface descriptions or SNMP community strings.

What is the difference between NETCONF, RESTCONF, and gNMI?

All three are programmatic interfaces for network devices, but they serve different primary use cases. NETCONF (RFC 6241) uses XML over SSH — best for configuration management with atomic transactions and rollback. RESTCONF (RFC 8040) uses JSON/XML over HTTPS — best for integration with web-based systems and REST APIs. gNMI (gRPC Network Management Interface) uses Protocol Buffers over gRPC — best for high-performance streaming telemetry and real-time monitoring. Most production environments use NETCONF for config and gNMI for telemetry.

Is Ansible good enough for network automation?

Ansible is a good starting point — its playbook syntax is readable, it has a large module ecosystem, and it lowers the barrier to entry. However, most Ansible network modules are CLI wrappers: they SSH into the device, send text commands, and parse text output. This means you don't get schema validation, atomic transactions, or rollback. For production-grade automation at scale, model-driven approaches using NETCONF or gNMI directly provide stronger guarantees. Many teams start with Ansible and graduate to NETCONF as their automation maturity increases.

How do you handle rollback when automated network changes fail?

NETCONF provides native rollback capability through its candidate configuration datastore. Changes are applied to a candidate config, validated, and only committed to the running config after passing checks. If validation fails or the commit is rejected, the candidate config is discarded — the running config is never touched. For CI/CD pipelines, we implement automated rollback triggers: if post-deployment validation detects a deviation from the expected state, the pipeline automatically reverts to the last known-good configuration.

Sources
  1. 01RFC 6241 — Network Configuration Protocol (NETCONF)R. Enns, M. Bjorklund, A. Bierman, J. Schoenwaelder, IETF (2011)
  2. 02RFC 7950 — The YANG 1.1 Data Modeling LanguageM. Bjorklund, IETF (2016)
  3. 03gNMI Specification — gRPC Network Management InterfaceOpenConfig Working Group
  4. 04Cisco PyATS Documentation — Python Automated Test SystemCisco Systems
  5. 05OpenConfig — Vendor-Neutral Network Configuration ModelsOpenConfig Working Group
  6. 06ncclient — Python Library for NETCONF Clientsncclient contributors