Why Automate? Ansible Playbooks and Desired State for Network Operating Systems
Writing your own code isn’t always the answer
Often, communities such as Python will contribute code of substantially higher quality than what you/I can create individually.
This is OK. In nearly every case, dyed-in-the-wool traditionalist programmers will consume “libraries” in their language of choice — it’s only an outsider perspective that developers create everything they use.
In modern engineering, a true engineer or architect will often apply practices they studied in college to real-world situations instead of trying to create their own solutions. This doesn’t discount creativity, nor does it discount those who are more pragmatically oriented. Without creativity, we have no way to improve engineering practice, and without pragmatism, we have seen some pretty serious loss of life: https://interestingengineering.com/23-engineering-disasters-of-all-time
…but you still have a lot of work to do
Adapting engineering practices, code from the internet, Googled Cisco example topologies as a matter of practice does take work. Do you trust all code from Stack Overflow? Cisco-answers.net (not a real website)?
You shouldn’t, and modern engineering practice doesn’t either. In nearly every case, the ability to apply engineering practice to a problem comes with years of training, millennia of past examples (failures and successes) as history for individual practice, ideally with similar applications. A good example of this is the study of brittle fractures where manipulating (maximizing) material hardness is no longer an automatic victory, but more of a serious safety risk.
We live in a simpler world of abstraction and pure mathematics, and behaviors are a lot more reliable — but they’re not perfectly so. We as designers and implementers of computer solutions (Network, Systems, don’t care) can learn from our more disciplined cousins. I’ll write more on this later, but for now, let’s simply at least agree to review every action critically.
Let’s use the lens of an engineer evaluating a technical control here. Ansible is going to be my example here, as it’s probably the most straightforward.
While it is possible to run a standalone, self-supporting playbook, it’s not generally recommended at scale. The first step towards leveraging this automation is by defining an inventory. As always, this is typically in YAML, so most of the effort goes into structuring your data as opposed to actual work.
- Don’t let names collide between production, lab, etc. We don’t want to have a Wargames scenario in anybody’s production network.
- Make sure it makes sense. It’s pretty easy to over/under-organize; think about the smallest elemental unit you may work on.
- Leverage Source Control! Save a copy, keep your revision history. Even better, get peer reviews.
- Remember, this can be edited later! This should continually improve.
Example (loosely based from https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html)
I’m using the project (virtualized Clos Topologies) as a prefix, and then organizing device types from there. Spines don’t need VLANs, and will be route reflectors — which is enough to justify separation in this case.
Let’s explain what I’ve done here. There are a few deviations from the typical. I’ll try to explain them here:
- YAML Inventory: This is just me, I prefer it over the INI format as a Linux guy. It also helps a lot with structured hierarchies, which I like as a network guy.
- Variable declarations:
ansible_network_os: More or less does exactly what it says. There's a built-in ansible interpreter for VyOS - but this is really only true for a handful of network distros. You can get more from Ansible Galaxy, but extensive testing should be applied.
ansible_connection: This is basically the "driver" for the CLI. You can use Paramiko or SSH as well. this is primarily governed by your Network OS.
ansible_userjust instructs the control node on what username to attempt against the target host.
Outside of this, I have also set up SSH key authentication to all VyOS nodes. It’s pretty easy: ( https://wiki.vyos.net/wiki/Remote_access)
Before designing a playbook, we do need to cover some of Ansible’s key design values:
- Idempotency: Run once, get the same result every time. If a change already has been made and is invasive, don’t repeat it unless the state doesn’t match.
- Thin Veil of Abstraction: You should be aware of what is being implemented from a technical perspective, but not have to control every last aspect of it.
- Be Declarative: Try to design from the abstract concept you want to implement, and fill in the technical details as needed, not the other way around.
Day 0, get the system online
In this example, we want to have four devices have some level of usable configuration, and we don’t want to do lots of manual, error-prone editing to get there. We’re going to adapt my base configuration for this purpose by re-tooling it to support Jinja deployments. At a high level, Jinja playbooks:
- Load Variables: This will be a separate file, effectively designing the what of your deployment
- Load Template, then translate variables: This will be executed by the
We’ll keep this example pretty short — it’s available in the linked repository, but we also want to leverage idemopotency for future changes. It doesn’t leverage inventory, because it’s creating base configurations to be applied by some other method.
Fun fact — this is the first stage to any Infrastructure-as-Code implementation. The created end results (
*-compiled.conf) can be directly applied, or by using a "Day 2 Method".
- hosts: localhost tasks: - name: Import Vars... include_vars: file: vyos-base.yml - name: Combine vyos... template: src: templates/vyos-base.j2 dest: vyos-compiled.conf
Day 2, apply routine changes
In this example, we’ve already started the deployment, and have it up and running. We have some form of routine change to make, but we want it to be consistently applied, and idempotently. This will mean that the configuration change playbook shouldn’t contain anything about the specific change in an ideal world with this method.
This will re-apply any changes that are staged via the base configuration and Jinja merge repeatedly if re-executed.
Note: This particular network driver is not idempotent. In production networks something like NAPALM/Nornir may be more appropriate. You can verify if a method is idempotent by repeatedly running the playbook — an expected result is
The next step is important — automatically updating a network based on configuration changes! As always, my source code for executing this is here. Note that this is a moving project and will get updates with future posts.