Ansible Adventures

Fixed! Discourse had a glitch formatting the quote for some reason so I retyped that section.

1 Like

lol, I need some oversight…

This will parse a list of physical addresses on the host from dmesg output. This will provide true hardware mac addresses and will skip over vlans, virtual interfaces, etc. Also, if a device was plugged in but was not present when Ansible gathered facts (device name not included in ansible_interfaces), it is not included in the list.


  - name: Get boot log (dmesg)
    ansible.builtin.command:
      cmd: dmesg
    become: true
    changed_when: false
    register: dmesg_reg

  - name: Parse hardware addresses and device names from boot log
    ansible.builtin.set_fact:
      phy_ifaces: >-
        {{  phy_ifaces
            | default([])
            | union(  [ { 'dev':  iface_lines_item
                                  | map('regex_replace', ':', '')
                                  | intersect(ansible_interfaces)
                                  | join,
                          'hw_addr':  iface_lines_item
                                      | select( 'match',
                                                '^' + hw_addr_re_var + '$' )
                                      | join } ] ) }}
    vars:
      hw_addr_re_var: '([0-9a-fA-F]{2}:){5}[0-9a-fA-F]{2}'
    when:
    - iface_lines_item
      | map('regex_replace', ':', '')
      | intersect(ansible_interfaces)
      != []
    - iface_lines_item
      | select( 'match', '^' + hw_addr_re_var + '$' )
      != []
    loop: "{{ dmesg_reg['stdout_lines']
              | map('lower')
              | select('match', '^.*' + hw_addr_re_var + '.*$')
              | map('split', ' ') }}"
    loop_control:
      loop_var: iface_lines_item

  - debug:
      var: phy_ifaces
ok: [fedora35] => {
    "phy_ifaces": [
        {
            "dev": "eth0",
            "hw_addr": "08:00:27:fe:97:ba"
        },
        {
            "dev": "eth1",
            "hw_addr": "08:00:27:ab:7d:a6"
        },
        {
            "dev": "eth2",
            "hw_addr": "0a:5d:08:6d:fe:2f"
        }
    ]
}
ok: [debian11] => {
    "phy_ifaces": [
        {
            "dev": "eth0",
            "hw_addr": "08:00:27:fe:b5:aa"
        },
        {
            "dev": "eth1",
            "hw_addr": "08:00:27:4e:fb:3d"
        },
        {
            "dev": "eth2",
            "hw_addr": "08:00:27:0d:a6:90"
        }
    ]
}
ok: [arch-current] => {
    "phy_ifaces": [
        {
            "dev": "eth0",
            "hw_addr": "08:00:27:42:78:de"
        },
        {
            "dev": "eth1",
            "hw_addr": "08:00:27:02:cd:8d"
        }
    ]
}
ok: [openbsd7] => {
    "phy_ifaces": [
        {
            "dev": "em0",
            "hw_addr": "08:00:27:4d:a2:90"
        },
        {
            "dev": "em1",
            "hw_addr": "0a:5d:08:6d:00:01"
        },
        {
            "dev": "em2",
            "hw_addr": "0a:5d:08:6d:00:02"
        },
        {
            "dev": "em3",
            "hw_addr": "0a:5d:08:6d:00:03"
        }
    ]
}

This would be a universal solution to discovering physical interfaces on a host, except on Arch and macOS, the dmesg buffer tends to get full of garbage and you lose the early boot messages pretty quickly. On Arch, this is mostly attributable to auditd and on macOS, it was primarily the wifi spamming the kernel log.

I was planning to use nmcli to do this if it was available, but I realized that what it calls the hardware address does not report the true hardware address of the NIC if it’s configured to be randomized. However, the better solution appears to be the ip command which will specify the “permanent” hardware address if it differs from the configured one in the output of ip link (haven’t tested this on all distros yet though).

1 Like

I am working through an issue where a portion of my role needs to be run serially to avoid address collisions. Officially, there is not a way to switch to serial execution in the middle of a role, however there are ways to effectively achieve this with run_once and delegation. However, there are several gotchas. I was able to pretty easily work around hostvars using delegate_facts: true and/or simply navigating through the hostvars dictionary. But what has stopped me from using run_once is that handlers are not delegated. If you use notify on a run_once task, it will only run the handler on the first host, despite delegation on the task itself. Also of note, task results (failed, changed, ok) are also not delegated, so if you delegate a config change to Host B from Host A, Host A will be marked as changed.

So I am working on achieving the same result but with a host loop and when: ansible_host == item. We’ll see if that works out.

2 Likes

This appears to be working. It looks like this:

- name: Configure the network
  ansible.builtin.include_tasks: cfg_net.yml
  loop: "{{ansible_play_hosts}}"
  loop_control:
    loop_var: host_item
# cfg_net.yml
- name: Configure interfaces
  ansible.builtin.include_tasks: cfg_iface.yml
  when: inventory_hostname == host_item
  loop: "{{ phy_ifaces | default([]) }}"
  loop_control:
    loop_var: phy_iface_item

A nice side benefit of running tasks in serial is that you can put variables into the task names which makes debugging easier and generally gives you a better idea of what’s going on.

- name: "Interface {{iface['dev']}} has an IPv4 address of {{ip_var}}"
  ansible.builtin.set_fact:
    iface_ip4: "{{ip_var}}"
  vars:
    iface_var: "{{ vars[ 'ansible_' + iface['dev'] ] | default }}"
    ip_var: "{{ iface_var['ipv4'][0]['address']
                | default(iface_var['ipv4']['address'])
                | default('') }}"

1 Like

One thing that can be frustrating in Ansible is that variables cannot be unset. Once you declare a variable, it exists forever. You can set it to an empty string, empty list or whatever, but it’s never completely gone.

However, there are 2 good ways to scope variables… well it’s really one way applied in 2 different ways.

I think most people using Ansible are aware that you can add a vars: section to any task to declare some variables that will only be used within that task. But you can also do this on blocks and include_tasks which will make those variables available to a series of sub-tasks and afterwards that variable will be undefined. It is the closest you can get to a “local” variable.


Additionally, while most examples I’ve come across online structure blocks like this:

- name: The block
  block:
    - task1
    - task2
    - task3
  when: block conditional
  vars:
    block_var: foo

I have found it much more readable to contain all of the block information at the beginning of the block, so:

- name: The block
  when: block conditional
  vars:
    block_var: foo
  block:
    - task1
    - task2
    - task3

This prevents me from having to scroll down scanning for an indention change to find the conditionals and/or variables for the block.

1 Like

Related to this post:

I have Ansible generating standardized IAID/DUIDs for DHCP across several platforms.

# RFC 4361 advocates for IAID/DUID use in DHCPv4. No specific method for
# generating the IAID is specified, other than it be unique, persistent
# and 32 bits. We use a truncated SHA256 hash the hardware MAC address,
# or if the interface is virtual and doesn't have a hardware address,
# we hash the interface name (dev).
- name: "The IAID for {{iface['dev']}} is {{iaid_var}}"
  ansible.builtin.set_fact:
    iface: "{{  iface
                | combine( { 'iaid': iaid_var } ) }}"
  vars:
    iaid_var: "{{ ( iface['hw_addr']
                    | default(iface['dev'])
                    | hash('sha256') )[:8]
                    | regex_findall('..')
                    | join(':') }}"

- name: "The DUID for {{iface['dev']}} is {{duid_var}}"
  ansible.builtin.set_fact:
    iface: "{{  iface
                | combine( { 'duid': duid_var } ) }}"
  vars:
    duid_var: "{{ ( '0004'
                    + ( ( ansible_system_vendor
                          + ansible_product_name
                          + ansible_product_uuid
                          + ansible_product_version
                          + ansible_product_serial )
                        | hash('sha256') )[:32] )
                  | regex_findall('..')
                  | join(':') }}"
TASK [o0_o.site.network : The IAID for eth0 is 52:e6:d6:1e] *****************************************************************
ok: [debian11.hq.example.com]

TASK [o0_o.site.network : The DUID for eth0 is 00:04:d4:d6:5f:0d:ae:79:ea:72:b7:97:91:84:2d:d1:b4:c8] ***********************
ok: [debian11.hq.example.com]

It turned out (imo) that generating my own IAID and DUID and supplying them to NetworkManager or networkd was easier than trying to get the stock values out of Linux.*

At least it’s all standardized now.

* I have not yet fully implemented this, so knock on wood…

2 Likes

macOS is to Ansible what Internet Explorer was to CSS.

# Ansible fails to collect certain hardware information on macOS
- name: Get system hardware information (macOS)
  ansible.builtin.command:
    cmd: system_profiler -json SPHardwareDataType
  changed_when: false
  register: sys_prof_hw_reg

- name: Define missing Ansible hardware facts (macOS)
  ansible.builtin.set_fact:
    ansible_system_vendor: 'Apple Inc.'
    ansible_product_version: "{{  ansible_product_name
                                  | regex_replace('^[A-Za-z]*', '') }}"
    ansible_product_uuid: "{{ hw_var['platform_UUID'] }}"
    ansible_product_serial: "{{ hw_var['serial_number'] }}"
  vars:
    hw_var: "{{ ( sys_prof_hw_reg['stdout']
                  | from_json )['SPHardwareDataType'][0] }}"
2 Likes
The error was: error while evaluating conditional (carp_var is undefined)

…what? How does that error?

If you are building a collection of handlers across multiple roles, all handlers are loaded per play. So if you, like me, had a catch-all listen: save host vars, and add lineinfile handlers to it across multiple roles, you may need a when: var is defined as well because flushing handlers will even call handlers from roles that have not yet executed.

1 Like

Decided to be clever with this one:

# The nmcli module seems insistent that certain parameters are set in
# the presence of others, so in order to gradually build the
# configuration, we have to build the command and then execute it. This
# is not recommended by Ansible because set_fact can be overridden in
# variable precedence. We can protect against that with the assert task
# below.

- name: >-
    Test that we have control of the the nm_task variable by assigning it a
    benign value
  ansible.builtin.set_fact:
    nm_task: {}

- name: Confirm nm_task has that value
  ansible.builtin.assert:
    that: nm_task == {}
    quiet: true
    fail_msg: >
      The variable nm_tasks has been overriden (potentially maliciously).

- name: >-
    Define the basic community.general.nmcli task for {{ nm_name_pretty }}
  ansible.builtin.set_fact:
    nm_task:
      conn_name: "{{ nm_con_name }}"
      type: "{{ nm_iface_type }}"
      method6: disabled
      autoconnect:  true
      state: "{{ present }}"

- name: Confirm nm_task has changed
  ansible.builtin.assert:
    that: nm_task != {}
    quiet: true
    fail_msg: >
      The variable nm_tasks has been overriden (potentially maliciously).

For reference:

https://docs.ansible.com/ansible/devel/reference_appendices/faq.html#when-is-it-unsafe-to-bulk-set-task-arguments-from-a-variable

An attack on a set_fact declaration seems unrealistic to me and even if it was, it would be a much larger threat than is described in the link above, but whatever. Just doing my due diligence…

1 Like

I thought you might be interested in reading this.

2 Likes

Currently, I’m targeting bare metal deployment across several OS’s including OpenBSD, so Vagrant is the better option for me but when I start working on services further down the line, I might switch to Docker.

My thought process that it would be easier to use Docker to test your mock deploys. Because then you can use GitLab or GitHub pipelines to increase your testing.

2 Likes

Screen Shot 2022-04-08 at 18.34.52

image

1 Like

isnt set_fact module setting it after you tried to debug it?

The IP in the task name doesn’t match the IP in the set_fact. It’s defined below in the vars section (but it’s kind of long so I didn’t include it). There’s no pre-existing value, so I don’t understand.


Oh wait, I see it! lol

vars:
  ip_var: "{{ available_ips_var | random }}"

So I reasonably assumed that this meant that it would pick a random IP and save it in the ip_var variable for re-use in both name and set_fact. BUT, it recalculates it each time it’s referenced! Why?!


Simple example:

  - debug:
      msg: "{{ rand }}-{{ rand }}-{{ rand2 }}-{{ rand2 }}"
    vars:
      rand: "{{ [0,1,2,3,4,5,6,7,8] | random }}"
      rand2: "{{ rand }}"
ok: [localhost] => {
    "msg": "6-6-5-5"
}

So it will re-use the value within an action but if it’s referenced in multiple places within a task, it’s completely re-evaluated each time.


Here’s the entire task from above if anyone is curious. It’s how I assign IP addresses at random.

    # We use the integer representation of the IP address so that we can
    # use range to enumerate all usable host addresses which is then
    # differenced against the used IP list and finally a random IP is
    # chosen. However, if an address pool is present, an IPv4 address is
    # chosen at random from that list instead.
    - name: "An IPv4 address has been selected at random"
      ansible.builtin.set_fact:
        ip4_candidate: "{{ ip_var }}"
      vars:
        first_ip_int_var: "{{ iface['subnet_addr']
                              | ansible.netcommon.ipv4('next_usable')
                              | ansible.netcommon.ipv4('int') }}"
        last_ip_int_var: "{{  iface['subnet_addr']
                              | ansible.netcommon.ipv4('last_usable')
                              | ansible.netcommon.ipv4('int') }}"
        available_ips_var: "{{  iface['addr_pool']
                                | default(  range(  first_ip_int_var | int,
                                                    last_ip_int_var | int )
                                            | list )
                                | ansible.netcommon.ipv4
                                | difference( used_ip4s | default([]) ) }}"
        ip_var: "{{ available_ips_var | random }}"
2 Likes

are you checking dns or pinging it to see if its in use before using it?

1 Like
  1. Used IPs are pulled from Ansible facts and inventory and excluded from the available IP list.

  2. A random IP is selected.

  3. Both the remote host and local host try to ping it.

  4. If either is successful, the task file adds the selected IP to the used list and calls the random assignment recursively until it finds a good IP or runs out and fails.

# groupvars inventory file
# site_ips:
#   subnet/vlan:
#     dhcp-client-id:
#       ip4:
#       ip6: #future use

# BEGIN ANSIBLE MANAGED BLOCK: Site IPs
site_ips:
  sales:
      21:49:54:dd:00:04:4a:13:2f:1b:23:71:41:45:98:e4:f4:14:99:87:78:ed:
          ip4: 10.157.253.179
      24:89:fd:bf:00:04:4a:13:2f:1b:23:71:41:45:98:e4:f4:14:99:87:78:ed:
          ip4: 10.157.253.131
  srv:
      8a:c4:c2:58:00:04:4a:13:2f:1b:23:71:41:45:98:e4:f4:14:99:87:78:ed:
          ip4: 10.157.87.114
# END ANSIBLE MANAGED BLOCK: Site IPs
- run_once: true
  block:

    - name: Get IPv4 addresses from all hosts
      ansible.builtin.setup:
        filter:
          - ansible_all_ipv4_addresses
      delegate_to: "{{ play_host_item }}"
      delegate_facts: true
      loop: "{{ ansible_play_hosts }}"
      loop_control:
        loop_var: play_host_item

    - name: Define a list of used IPv4 addresses
      ansible.builtin.set_fact:
        used_ip4s: "{{ ansible_used_ip4s_var | union(site_used_ip4s_var) }}"
      vars:
        ansible_used_ip4s_var: >-
          {{  ansible_play_hosts
              | map('extract', hostvars, 'ansible_all_ipv4_addresses')
              | select('defined')
              | flatten
              | unique }}
        site_used_ip4s_var: >-
          {{  ( site_ips | default({}) ).values()
              | default({})
              | map('dict2items')
              | flatten
              | selectattr('value.ip4', 'defined')
              | map(attribute='value.ip4') }}
    - name: Address collision tests (will re-attempt on failure)
      block:

        # NOTE: ping on some platforms format 100%, others 100.0%
        - name: "Ping {{ ip4_candidate }} from the host (collision test)"
          ansible.builtin.command:
            cmd: "ping -c 2 {{ ip4_candidate }}"
          register: host_ping_reg
          changed_when: false
          failed_when: not  host_ping_reg['stdout']
                            | regex_search('100.*% packet loss')

        - name: "Ping {{ ip4_candidate }} from localhost (collision test)"
          ansible.builtin.command:
            cmd: "ping -c 2 {{ ip4_candidate }}"
          register: localhost_ping_reg
          changed_when: false
          failed_when: not  localhost_ping_reg['stdout']
                            | regex_search('100.*% packet loss')
          delegate_to: 127.0.0.1

      # If a collision is detected, recursively call this tasks file
      # until all available IPs are exhausted.
      rescue:

        - name: >-
            Collision was detected, adding {{ ip4_candidate }} to used IPv4
            list
          ansible.builtin.set_fact:
            used_ip4s: "{{ used_ip4s | union( [ip4_candidate] ) }}"

        - name: Increment recursion counter
          ansible.builtin.set_fact:
            def_iface_ip4_rec_count: "{{  def_iface_ip4_rec_count
                                          | default(1)
                                          | int
                                          + 1 }}"

        - name: >-
            Begin attempt {{ def_iface_ip4_rec_count }} to assign an IPv4
            address to {{ iface['dev'] }}
          ansible.builtin.include_tasks: def_iface_ip4.yml

Oh and IP assignment through configuration is run serially to avoid collisions.

2 Likes

Ansible is randomly detecting this vlan on my RHEL VMs when it doesn’t exist (Fedora below but also happening on Rocky):

I seem to remember something about facts caching? That is a VLAN in my config, it just isn’t on that host at the moment. I am constantly popping snapshots on these though so if it’s getting cached, that could be why…

Adding ansible.builtin.meta: clear_facts at the beginning of the play to see if that helps. Hard to test because it’s intermittent.

2 Likes

Finally decided to take a crack at using Ansible with Mikrotik’s RouterOS. I had a feeling it would be a pain, and yes it was.

First thing, you’ll need to make an account at Mikrotik’s site. Once that’s done, log in and select Make a demo key from the left menu.

Next, go grab whichever vagrant box you want or I guess download the CHR image from Mikrotik and spin up your VM. SSH into it and it should let you know that you have 24 hours to enter a key. Hit enter to get a prompt. Then, literally paste the whole multiline key into the command prompt. Like magic, it will accept this and prompt you to reboot. Do so and then snapshot the VM for your convenience.

Note that none of this is necessary on actual Mikrotik hardware which comes pre-licensed.

Anyway, so then you might try to add the VM to your inventory and run a basic command with community.routeros.command which will of course fail. You will want to add the following variables to the RouterOS host, or if you like, to a group:

ansible_connection: ansible.netcommon.network_cli
ansible_network_os: community.network.routeros

At this point, it might work, but it probably won’t. First thing, Miktrotik uses some sketchy console detection/colors that I already knew about because it would always crash minicom on login. The solution to this is to append +cet1024w to the username, so in my case:

ansible_user: vagrant+cet1024w

In addition to the color issue, the 1024w (console width) circumvents a problem if the username and/or hostname are too long.

At this point, I could get facts out of the RouterOS VM, but community.routeros.command would crash, with "msg": "encountered RSA key, expected OPENSSH key".

Long story short, by default ansible.netcommon.network_cli uses paramiko to connect via ssh. I honestly don’t know what paramiko is but it just does not work for me. Luckily, we can:

ansible_network_cli_ssh_type: libssh

But wait! We need ansible-pylibssh. On Linux, or in any properly managed Python environment, that should merely be a pip install away, but unfortunately on macOS, this is not the case. For some reason, while using the correct version of pip3 from the Ansible dependency installed by Homebrew, import reverts to macOS’s stock version of Python3 when attempting to import toml.

Quick solve for this is /usr/bin/python3 -m pip install toml. No idea if these version mismatches will byte me later, but at this point I am finally able to issue commands to the RouterOS VM. Huzzah.

2 Likes