Networkd for nspawn with OpenStack-Ansible

Networkd for nspawn with OpenStack-Ansible

Within this post, I will cover the systemd-networkd aspect of the PR to create a systemd-nspawn driver within OSA. This post will be working with systemd-networkd and systemd-resolved to integrate nspawn containers within the existing topology of a typical OpenStack-Ansible deployment.

Getting systemd-networkd working for you

macvlan-osa-prod-simple

One of the container interfaces, overview covered here,
has a single MACVLAN interface which binds to all containers. This single interface (Container Bridge) allows for external communication from the containers through the gateway using IP forwarding and masquerading. This special network is bound only to this one host, setups up a locally scoped DHCP network and pushes NTP and DNS to the containers from the host. To make this possible we're going to use two additional systemd built-ins called systemd-networkd and systemd-resolved.

Setting up the resolved

The systemd-resolved setup is nothing special and, in the test case, I used nothing more than the package installed defaults. See more about the options of systemd-resolved here. To setup systemd-resolved we simply link the appropriate file to our resolve.conf, enable the service and start it.

ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf  # Create the link
systemctl enable systemd-resolved.service  # Enable the service
systemctl start systemd-resolved.service  # Start the service

Building the Container Bridge

For the DHCP configuration we setup a simple dummy interface and a standalone bridge called br-container which will all be done using systemd-networkd.

Link configuration

The first thing we need is a link file, /etc/systemd/network/99-default.link, which will ensure all interfaces managed by systemd-networkd use a persistent mac address:

[Link]
MACAddressPolicy=persistent
Dummy Interface

Once the link file is in place, create a network device file for our dummy interface, /etc/systemd/network/00-dummy0.netdev, with the following contents:

[NetDev]
Name=dummy0
Kind=dummy

With the network device in place, we'll create a network configuration file, /etc/systemd/network/00-dummy0.network, so that the dummy0 interface is enrolled within the br-container device:

[Match]
Name=dummy0

[Network]
Bridge=br-container
Bridge Interface

Now that the dummy network device is ready, we're going create a network device file for our bridge, /etc/systemd/network/br-container.netdev, with the following contents:

[NetDev]
Name=br-container
Kind=bridge

Finally, we'll create our bridge network file, /etc/systemd/network/br-container.network, with all of our options to enable all of our needed options:

[Match]
Name=br-container

[Network]
Address=10.0.3.2/24
Gateway=10.0.3.1
IPForward=yes
IPMasquerade=yes
DHCPServer=yes

[DHCPServer]
PoolOffset=25
PoolSize=100
DefaultLeaseTimeSec=3h
EmitTimezone=yes
EmitNTP=yes
EmitDNS=yes
Load the new configs

Now that we have all of the needed files in place we can enable and start the systemd-networkd service:

systemctl enable systemd-networkd.service  # Enable the service
systemctl start systemd-networkd.service  # Start the service

Check the status of the systemd-networkd to make sure it's running:

systemctl status systemd-networkd.service  # Check service status

Finally, list out all of the networks managed:

networkctl list
IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 ens3             ether              routable    unmanaged 
  3 br-container     ether              routable    configured
  4 dummy0           ether              degraded    configured

Note: the OPERATIONAL state of these interfaces will always be "degraded". This state is simply because the underlying interface at the lowest level is a dummy device. If you wanted br-container to be routable you could use an actual interface, in my case ens3, instead of the dummy device.

This simple bridge interface managed by systemd-resolved replaces a several iptables rules, a dnsmasq process with a specialized configuration limiting its scope to only the one host, and an ad-hoc interface running pre-up scripts gluing everything needed together. As you can surmise, this process can be fragile and should it get out of sync, it impacts every running container on a given host. With the new systemd-networkd capabilities we only have one process to manage in a single unified way across all supported operating systems. While this change may confuse operators initially due to the generally smaller footprint, folks will come to love this change as there's no longer any magic required to understand how this all works and recovering from failure is as simple as a service restart.

Connecting bridges to MacVlan interfaces

Now that we've setup our basic container bridge, which is capable of issuing addresses on our internal only network and routing that traffic out of our host, we're going to create a macvlan interface and wire it back into our the br-container device. To do this we're going to create a few more files.

MacVlan Interface

First create a network device file, /etc/systemd/network/mv-container.netdev for our mv-container interface:

[NetDev]
Name=mv-container
Kind=macvlan

[MACVLAN]
Mode=bridge

Then create a network file, /etc/systemd/network/mv-container.network, to ensure our mv-container is a link only network:

[Match]
Name=mv-container

[Network]
DHCP=no
MacVlan to Bridge Network Integration

Now create a veth pair file, /etc/systemd/network/mv-container-vs.netdev, which will be used to link the mv-container interface to the br-container interface:

[NetDev]
Name=mv-container-v1
Kind=veth

[Peer]
Name=mv-container-v2

Plugin the one side of the veth pair into the br-container interface by creating a network file, /etc/systemd/network/mv-container-v1.network, for this side of the veth:

[Match]
Name=mv-container-v1

[Network]
Bridge=br-container

And then Plugin the other one side of the veth pair into the mv-container interface by creating a network file, /etc/systemd/network/mv-container-v2.network, for the remaining side of the veth:

[Match]
Name=mv-container-v2

[Network]
MACVLAN=mv-container
Load the config and connect the networks

Finally, restart systemd-networkd and then list out all managed systemd-networkd interfaces again and see that the new mv-container interface is up configured:

systemctl restart systemd-networkd.service  # Restart the service
networkctl list

IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 ens3             ether              routable    unmanaged 
  3 br-container     ether              routable    configured
  4 dummy0           ether              degraded    configured
  5 mv-container-v2  ether              degraded    configured
  6 mv-container-v1  ether              degraded    configured
  7 mv-container     ether              degraded    configured

Integrating the rest of the network interfaces

In a typical OpenStack-Ansible deployment we will have multiple purpose built bridges which will all need macvlan integration. The process to get these bridges and macvlan devices squared away can be taken care of by running an easy script:

#!/usr/bin/env bash
# Create the macvlan network device
cat >/etc/systemd/network/mv-${1}.netdev <<EOF
[NetDev]
Name=mv-${1}
Kind=macvlan

[MACVLAN]
Mode=bridge
EOF

# Create the macvlan network
cat >/etc/systemd/network/mv-${1}.network <<EOF
[Match]
Name=mv-${1}

[Network]
DHCP=no
EOF

# Create the veth network device
cat > /etc/systemd/network/mv-${1}-vs.netdev <<EOF
[NetDev]
Name=mv-${1}-v1
Kind=veth

[Peer]
Name=mv-${1}-v2
EOF

# plugin the first side of the veth-pair 
cat > /etc/systemd/network/mv-${1}-v1.network <<EOF
[Match]
Name=mv-${1}-v1

[Network]
Bridge=br-${1}
EOF

# plugin the other side of the veth-pair
cat > /etc/systemd/network/mv-${1}-v2.network <<EOF
[Match]
Name=mv-${1}-v2

[Network]
MACVLAN=mv-${1}
EOF

# Restart systemd-networkd
systemctl restart systemd-networkd.service

This script can be used by simply executing it and passing in the name of the bridge you want to integrate with. I saved the script as a file name mv-int-script.sh on my local system. As root run the following:

for interface in mgmt vlan vxlan storage flat; do
  bash ./mv-int-script.sh "$interface"
done

Verify Configuration

Once you've integrated with all of the bridges on your host machine, check the network list to ensure they're all online. Assuming everything is showing up as configured, you're good to go.

networkctl list
IDX LINK             TYPE               OPERATIONAL SETUP     
  1 lo               loopback           carrier     unmanaged 
  2 ens3             ether              routable    unmanaged 
  3 br-storage       ether              routable    unmanaged 
  4 eth12            ether              degraded    unmanaged 
  5 br-vlan-veth     ether              carrier     unmanaged 
  6 br-vlan          ether              routable    unmanaged 
  7 br-vxlan         ether              routable    unmanaged 
  8 br-container     ether              routable    configured
  9 dummy0           ether              degraded    configured
 10 mv-vxlan-v2      ether              degraded    configured
 11 mv-vxlan-v1      ether              degraded    configured
 12 mv-vlan-v2       ether              degraded    configured
 13 mv-vlan-v1       ether              degraded    configured
 14 mv-storage-v2    ether              degraded    configured
 15 mv-storage-v1    ether              degraded    configured
 16 mv-mgmt-v2       ether              degraded    configured
 17 mv-mgmt-v1       ether              degraded    configured
 18 mv-storage       ether              degraded    configured
 19 mv-vxlan         ether              degraded    configured
 20 mv-mgmt          ether              degraded    configured
 21 mv-vlan          ether              degraded    configured
 22 mv-container-v2  ether              degraded    configured
 23 mv-container-v1  ether              degraded    configured
 24 mv-container     ether              degraded    configured
 25 br-mgmt          ether              routable    unmanaged 

And then verify the network status

networkctl status
●        State: routable
       Address: 10.1.167.190 on ens3
                172.29.244.100 on br-storage
                172.29.248.1 on br-vlan
                172.29.248.100 on br-vlan
                172.29.240.100 on br-vxlan
                10.0.3.2 on br-container
                172.29.236.100 on br-mgmt
                2001:4800:1ae1:18:f816:3eff:fe84:5510 on ens3
                fe80::f816:3eff:fe84:5510 on ens3
                fe80::f058:67ff:fe7d:aa82 on br-storage
                fe80::4c84:b2ff:fef7:1614 on eth12
                fe80::3821:d7ff:fe73:afd3 on br-vlan
                fe80::98ac:97ff:fecf:d91b on br-vxlan
                fe80::d8d9:8ff:fe6a:36a8 on br-container
                fe80::5c34:6cff:fe2f:d45d on dummy0
                fe80::d440:33ff:fea7:3f7d on mv-vxlan-v2
                fe80::fc0e:88ff:fe9a:d149 on mv-vxlan-v1
                fe80::a02e:cdff:fe82:a606 on mv-vlan-v2
                fe80::2891:39ff:feca:f070 on mv-vlan-v1
                fe80::30af:ccff:fe75:fa62 on mv-storage-v2
                fe80::5401:bff:fe9b:b9a0 on mv-storage-v1
                fe80::f08c:c0ff:fe7e:db5f on mv-mgmt-v2
                fe80::d041:f5ff:fee9:2377 on mv-mgmt-v1
                fe80::40b2:a8ff:fed3:a077 on mv-storage
                fe80::d05e:ffff:fe04:ff83 on mv-vxlan
                fe80::894:4aff:fefc:5fdf on mv-mgmt
                fe80::c7:e1ff:fec8:1d77 on mv-vlan
                fe80::7876:ecff:fee4:6819 on mv-container-v2
                fe80::20ed:44ff:feda:97d6 on mv-container-v1
                fe80::d0e2:e2ff:fed3:b283 on mv-container
                fe80::c004:9dff:fe1f:6269 on br-mgmt
                fe80::98dc:1fff:fe8a:d2b6 on brq2c0f1c0f-20
                fe80::f855:d2ff:fe47:9d33 on brq915f42bf-cf
       Gateway: 10.0.0.1 on ens3
                10.0.3.1 on br-container

Assuming all of the interfaces expected are present, have addresses, and are configured everything should be ready to receive container workloads.

Mastodon