VyOS 1.2.0 development news in July

Despite the slow news season and the RAID incident that luckily slowed us down only for a couple of days, I think we've made good progress in July.

First, Kim Hagen got cloud-init to work, even though it didn't make it to the mainline image, and WAAgent required for Azure is not working yet. Some more work, and VyOS will get a much wider cloud platform support. He's also working on Wireguard integration and it's expected to be merged into current soon.

The new VRRP CLI and IPv6 support is another big change, but it's got its own blog post, so I won't stop there and cover things that did not get their own blog posts instead.

IPsec and VTI

While I regard VTI as the most leaky abstraction ever created and always suggest using honest GRE/IPsec instead, I know many people don't really have any choice because their partners or service providers are using it. In older StrongSWAN versions it used to just work.

Updating StrongSWAN to the latest version had an unforeseen and very unpleasant side effect: VTI tunnels stopped working. A workaround in form of "install_routes = no" in /etc/strongswan.d/charon.conf was discovered, but it has an equally bad side effect: site to site tunnels stop working when it's applied.

The root cause of the problem is that for VTI tunnels to work, their traffic selectors have to be set to 0.0.0.0/0 for traffic to match the tunnel, even though actual routing decision is made according to netfilter marks. Unless route insertion is disabled entirely, StrongSWAN thus mistakenly inserts a default route through the VTI peer address, which makes all traffic routed to nowhere.

This is a hard problem without a workaround that is easy and effective. It's an architectural problem in the new StrongSWAN, according to our investigation of its source code and its developer responses, there is simply no way to control route insertion per peer. One developer responded to it with "why, site to site and VTI tunnels are never used on the same machine anyway" — yeah, people are reporting bugs just out of curiosity.

While there is no clean solution within StrongSWAN, this definitely has been a blocker for the release candidate. Reimplementing route insertion with an up/down script proved to be a hard problem since there are lots of cases to handle and complete information about the intended SA may not always be available to scripts. Switching to another IKE implementation seems like an attractive option, but needs a serious evaluation of the alternatives, and a complete rewrite of the IPsec config scripts — which is planned, but will take a while because the legacy scripts is an unmaintainable mess.

I think I've found a workable (even if far from perfect workaround) — instead of inserting missing routes, delete the bad routes. I've made a test setup and it seems to work reasonably well. The obvious issue is that it doesn't prevent bad things from happening, but rather undoes the damage, so there may still be a brief traffic disruption when VTI tunnels go up. Another problem is a possible race condition between StrongSWAN inserting routes and the script deleting them, though I haven't seen it in practice yet and I hope it doesn't exist. But, at least you can now use both VTI and site to site tunnels on the same machine.

For people who want to use VTI exclusively, there is now "set vpn ipsec options disable-route-autoinstall" option that disables route insertion globally, thus removing the possible disruption, at cost of making site to site tunnels impossible to use. That option is disabled by default.

I hope it will be good enough until we find a better solution. Your testing is needed to confirm that it is!

DNS forwarding in VyOS

A lot of small networks do not have their own DNS server, but it's not always desirable to just leave hosts to use an external third-party server either, that's why we've had DNS forwarding in VyOS for a long time and are going to keep it there for the foreseeable future.

Experienced VyOS users already know all about it, but we should post something for newcomers too, shouldn't we?

Configuring DNS forwarding is very simple. Assuming you have "system name-server" set, all you need to do to simply forward requests from hosts behind eth0 to it is "set service dns forwarding listen-on eth0". Repeat for every interfaces where you have clients and you are done.

There are some knobs for telling the service to use or not use specific DNS servers though:

set service dns forwarding listen-on eth0

# Use name servers from "system name-server"
set service dns forwarding system

# Use servers received from DHCP on eth1 (typically an ISP interface)
set service dns forwarding dhcp eth1

# Use a hardcoded name server
set service dns forwarding name-server 192.0.2.10

You can also specify cache size:

set service dns forwarding cache-size 1000

One of the less known features is the option to use different name servers for different domains. It can be used for a quick and dirty split-horizon DNS, or simply for using an internal server just for internal domains rather than recursive queries:

set service dns forwarding domain mycompany.local server 192.168.52.100
set service dns forwarding domain mycompany.example.com server 192.168.52.100

And that's all to it. DNS forwarding is not a big feature — useful doesn't always equal complex.