blog.vyos.net

VyOS 1.2.0 development news in July

Despite the slow news season and the RAID incident that luckily slowed us down only for a couple of days, I think we've made good progress in July.

First, Kim Hagen got cloud-init to work, even though it didn't make it to the mainline image, and WAAgent required for Azure is not working yet. Some more work, and VyOS will get a much wider cloud platform support. He's also working on Wireguard integration and it's expected to be merged into current soon.

The new VRRP CLI and IPv6 support is another big change, but it's got its own blog post, so I won't stop there and cover things that did not get their own blog posts instead.

IPsec and VTI

While I regard VTI as the most leaky abstraction ever created and always suggest using honest GRE/IPsec instead, I know many people don't really have any choice because their partners or service providers are using it. In older StrongSWAN versions it used to just work.

Updating StrongSWAN to the latest version had an unforeseen and very unpleasant side effect: VTI tunnels stopped working. A workaround in form of "install_routes = no" in /etc/strongswan.d/charon.conf was discovered, but it has an equally bad side effect: site to site tunnels stop working when it's applied.

The root cause of the problem is that for VTI tunnels to work, their traffic selectors have to be set to 0.0.0.0/0 for traffic to match the tunnel, even though actual routing decision is made according to netfilter marks. Unless route insertion is disabled entirely, StrongSWAN thus mistakenly inserts a default route through the VTI peer address, which makes all traffic routed to nowhere.

This is a hard problem without a workaround that is easy and effective. It's an architectural problem in the new StrongSWAN, according to our investigation of its source code and its developer responses, there is simply no way to control route insertion per peer. One developer responded to it with "why, site to site and VTI tunnels are never used on the same machine anyway" — yeah, people are reporting bugs just out of curiosity.

While there is no clean solution within StrongSWAN, this definitely has been a blocker for the release candidate. Reimplementing route insertion with an up/down script proved to be a hard problem since there are lots of cases to handle and complete information about the intended SA may not always be available to scripts. Switching to another IKE implementation seems like an attractive option, but needs a serious evaluation of the alternatives, and a complete rewrite of the IPsec config scripts — which is planned, but will take a while because the legacy scripts is an unmaintainable mess.

I think I've found a workable (even if far from perfect workaround) — instead of inserting missing routes, delete the bad routes. I've made a test setup and it seems to work reasonably well. The obvious issue is that it doesn't prevent bad things from happening, but rather undoes the damage, so there may still be a brief traffic disruption when VTI tunnels go up. Another problem is a possible race condition between StrongSWAN inserting routes and the script deleting them, though I haven't seen it in practice yet and I hope it doesn't exist. But, at least you can now use both VTI and site to site tunnels on the same machine.

For people who want to use VTI exclusively, there is now "set vpn ipsec options disable-route-autoinstall" option that disables route insertion globally, thus removing the possible disruption, at cost of making site to site tunnels impossible to use. That option is disabled by default.

I hope it will be good enough until we find a better solution. Your testing is needed to confirm that it is!

On security of GRE/IPsec scenarios

As we've already discussed, there are many ways to setup GRE (or something else) over IPsec and they all have their advantages and disadvantages. Recently an issue was brought to my attention: which ones are safe against unencrypted GRE traffic being sent?

The reason this issue can appear at all is that GRE and IPsec are related to each other more like routing and NAT: in some setups their configuration has to be carefully coordinated, but in general they can easily be used without each other. Lack of tight coupling between features allows greater flexibility, but it may also create situations when the setup stops working as intended without a clear indication as to why it happened.

Let's review the knowingly safe scenarios:

VTI

This one is least flexible, but also foolproof by design: the VTI interface (which is secretly simply IPIP) is brought up only when an IPsec tunnel associated with it is up, and goes down when the tunnel goes down. No traffic will ever be sent over a VTI interface until IKE succeeds.

Tunnel sourced from a loopback address

If you have missed it, the basic idea of this setup is the following:

set interfaces dummy dum0 address 192.168.1.100/32

set interfaces tunnel tun0 local-ip 192.168.1.100/32
set interfaces tunnel tun0 remote-ip 192.168.1.101/32 # assigned to dum0 on the remote side

set vpn ipsec site-to-site peer 203.0.113.50 tunnel 1 local prefix 192.168.1.100/32
set vpn ipsec site-to-site peer 203.0.113.50 tunnel 1 remote prefix 192.168.1.101/32

Most often it's used when the routers are behind NAT, or one side lacks a static address, which makes selecting traffic for encryptions by protocol alone impossible. However, it also introduces tight coupling between IPsec and GRE: since the remote end of the GRE tunnel can only be reached via an IPsec tunnel, no communication between the routers over GRE is possible unless the IPsec tunnel is up. If you fear that any packets may be sent via the default route, you can nullroute the IPsec tunnel network to be sure.

The complicated case

Now let's examine the simplest kind of setup:

set interfaces tunnel tun0 local-ip 192.0.2.100 # WAN address
set interfaces tunnel tun0 remote-ip 203.0.113.200

set vpn ipsec site-to-site peer 203.0.113.200 tunnel 1 protocol gre

In this case IPsec is setup to encrypt the GRE traffic to 203.0.113.200, but the GRE tunnel itself can work without IPsec. In fact, it will work without IPsec, just without encryption, and that is the concern for some people. If the IPsec tunnel goes down due to misconfiguration, it will fall back to the common, unencrypted GRE.

What can you do about it?

As a user, if your requirement is to prevent unencrypted traffic from ever being sent, you should use VTI or use loopback addresses for tunnel endpoints.

For developers this question is more complicated.

What should be done about it?

The opinions are divided. I'll summarize the arguments here.

Arguments for fixing it:

Cisco does it that way (attempts to detect that GRE and IPsec are related — at least in some implementations and at least when it's referenced as IPsec profile in the GRE tunnel)
The current behaviour is against user's intentions

Arguments against fixing it:

Attempts to guess user's intentions are doomed to fail at least some of the time (for example, what if a user intentionally brings an IPsec tunnel down to isolate GRE setup issues?)
The only way to guarantee that unencrypted traffic is never sent is checking for a live SA matching protocol and source before forwarding every packet — that's not good for performance).

Practical considerations:

Since IKE is in the userspace, the kernel can't even know that an SA is supposed to exist until IKE succeeds: automatic detection would be a big change that is unlikely to be accepted in the mainline kernel.
Configuration changes required to avoid the issue are simple

If you have any thoughts on the issue, please share with us!

Interaction between IPsec and NAT (on the same router)

I've just completed a certain unusual setup that involved NATing packets before they are sent to an IPsec tunnel, so I thought I'll write about this topic. Even in perfectly ordinary setups, the interaction between the two often catches people off guard, me included.

No, this is not a premature Friday post. The Friday post will be a continuation of the little known featured of the VyOS CLI.

Most routers these days have some NAT configured, so if you setup an IPsec tunnel, you need to understand the interaction between the two. Luckily, it's pretty simple.

Every network OS has a fixed packet processing order, and for a good reason. For example, source NAT has to be performed after routing because otherwise the OS will not know which outgoing interface must be used for the packet, and will not be able to determine which SNAT rule must be applied to that packet. Likewise, destination NAT must happen before routing if we want to be able to send incoming packets to the intended host — the routing decision depends on the new destination address.

Sometimes the order is less critical but reversing it would create inconvenience for network admins. For example, in Linux (and thus in VyOS), inbound firewall rules are processed after DNAT, so the destination address the firewall will see is the internal address, and you can easily setup a firewall that mentions private addresses on your WAN interface. If it was the other way around, then if you wanted to setup firewall rules for your private addresses, you would have to assign the firewall to the out direction of the LAN interface — not quite as logical or convenient, even if the end result is the same.

Where's IPsec in that processing flow and what are the implications of its position in it?

Let's revisit the complete diagram (image by Jan Engelhardt, CC-BY-SA):

If posthaven can't handle images properly, here's a direct link to the larger version:

The box you are looking for is "XFRM". In Linux, IPsec is not a special component, but a part of the XFRM framework that can do encryption amond other things (it also does compression and header modification).

From the diagram we can see that XFRM decode step (thus IPsec encryption) is before DNAT (NAT prerouting), and IPsec decryption is after SNAT (NAT postrouting). The implications of it are twofold: first you need to be careful when setting up SNAT and IPsec on the same machine, second, you can apply NAT rules to traffic that will go to the tunnel if you really have to.

Avoiding adverse interaction

Suppose you have this config:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 192.168.10.0/24
         }
         remote {
             prefix 10.10.10.0/24
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

What will happen to a packet sent by host 192.168.10.100 to host 10.10.10.200? Since SNAT is performed before IPsec, and the 192.168.10.100 source address matches the rule 10, the rule will be applied and the packet will go down the packet processing pipeline with source address 203.0.113.134, which does not match the IPsec policy from tunnel 1. The packet will be sent out of the eth0 interface, unencrypted, and destined to be dropped by the ISP due to its private destination address (or it will be sent to a wrong host, which is not any better).

In this case this order of packet processing seems to be a real hassle. There's a very easy workaround though: exclude packets with destination address 10.10.10.0/24 from SNAT, like this:

vyos@VyOS-AMI# show nat source 
rule 5 {
    outbound-interface eth0
    destination {
        address 10.10.10.0/24
    }
    exclude
}
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

If you've setup IPsec, the SA is up, but for some reason packets don't get through, make sure that you didn't forget to exclude traffic to the remote network from NAT. It's easy to see with tcpdump whether packets are sent the wrong way or not.

Exploiting the interaction

So far we've only seen how this particular processing order can be bad for our setup. Can it be good for anything then? Sometimes it seems like the Linux network stack was optimized to allow doing crazy things. Just a few days ago I've run into a case when this turned beneficial.

Suppose you setup an IPsec tunnel to your partner, and it turns out you both are using 192.168.10.0/24 subnet internally. None of you is willing to renumber your own network to solve the problem cleanly, but some compromise must be made. The solution is to NAT packets before they are encrypted, which works as expected precisely because IPsec happens after SNAT.

For simplicity let's assume only a single host from our network (internal address 192.168.10.45) needs to interact with a single host from the remote network (10.10.10.55). We will make up an intermediate 172.16.17.45 address and NAT the tunnel traffic to and from 10.10.10.55 host to actually be sent to the 192.168.10.45 host.

The config looks like this:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 172.16.17.45/32
         }
         remote {
             prefix 10.10.10.55/32
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface any
     destination {
         address 10.10.10.55
     }
     source {
         address 192.168.10.45
     }
     translation {
         address 172.16.17.45
     }
 }

vyos@vyos# show nat destination 
 rule 10 {
     destination {
         address 172.16.17.45
     }
     inbound-interface any
     translation {
         address 192.168.10.45
     }
 }

If IPsec was performed before source NAT, this kind of setup would be impossible.

Setting up GRE/IPsec behind NAT

In the previous posts of this series we've discussed setting up "plain" IPsec tunnels from behind NAT.

The transparency of the plain IPsec, however, is more often a curse than a blessing. Truly transparent IPsec is only possible between publicly routed networks, and the tunnel mode creates a strange mix of the two approaches: you do not have a network interface associated with the tunnel, but the setup is not free of routing issues either, and it's often hard to test whether the tunnel actually works or not from the router itself.

GRE/IPsec (or IPIP/IPsec, or anything else) offers a convenient solution: for all intents and purposes it's a normal network interface and makes it look like the networks are connected with a wire. You can easily ping the other side, use the interface for firewall and QoS rulesets, and setup dynamic routing protocols in a straightforward way. However, NAT creates a unique challenge for this setup.

The canonical and the simplest GRE/IPsec setup looks like this:

interfaces {
  tunnel tun0 {
    address 10.0.0.2/29
    local-ip 192.0.2.10
    remote-ip 203.0.113.20
    encapsulation gre
  }
}
vpn {
  ipsec {
    site-to-site {
      peer 203.0.113.20 {
        tunnel 1 {
          protocol gre
        }
        local-address 192.0.2.10

It creates a policy that encrypts any GRE packets sent to 203.0.113.20. Of course it's not going to work with NAT because the remote side is not directly routable.

Let's see how we can get around it. Suppose you are setting up a tunnel between routers called East and West. The way to get around it is pretty simple even if not exactly intuitive and boils down to this:

Setup an additional address on a loopback or dummy interface on each router, e.g. 10.10.0.1/32 on the East and 10.10.0.2/32 on the West.
Setup GRE tunnels that are using 10.10.0.1 and .2 as local-ip and remote-ip respectively.
Setup an IPsec tunnels that uses 10.10.0.1 and .2 as local-prefix and remote-prefix respectively.

This way when traffic is sent through the GRE tunnel on the East, the GRE packets will use 10.10.0.1 as a source address, which will match the IPsec policy. Since 10.10.0.2/32 is specified as the remote-prefix of the tunnel, the IPsec process will setup a kernel route to it, and the GRE packets will reach the other side.

Let's look at the config:

interfaces {
  dummy dum0 {
    address 10.10.0.1/32
  }
  tunnel tun0 {
    address 10.0.0.1/29
    local-ip 10.10.0.1
    remote-ip 10.10.0.2
    encapsulation gre
  }
}
vpn {
  ipsec {
    site-to-site {
      peer @west {
        connection-type respond
        tunnel 1 {
          local {
            prefix 10.10.0.1/32
          }
          remote {
            prefix 10.10.0.2/32
          }

This approach also has a property that may make it useful even in publicly routed networks if you are going to use the GRE tunnel for sensitive but unencrypted traffic (I've seen that in legacy applications): unlike the canonical setup, GRE tunnel stops working when the IPsec SA goes down because the remote end becomes unreachable. The canonical setup will continue to work even without IPsec and may expose the GRE traffic to eavesdropping and MitM attacks.

This concludes the series of posts about IPsec and NAT. Next Friday I'll find something else to write about. ;)

How to setup an IPsec connection between two NATed peers: using id's and RSA keys

In the previous post from this series, we've discussed setting up an IPsec tunnel from a NATed router to a non-NATed one. The key point is that in the presence of NAT, the non-NATed side cannot identify the NATed peer by its public address, so a manually configured id is required.

What if both peers are NATed though? Suppose you are setting up a tunnel between two EC2 instances. They are both NATed, and this creates its own unique challenges: neither of them know their public addresses or can identify their peers by their public address. So, we need to solve two problems.

In this post, we'll setup a tunnel between two routers, let's call them "east" and "west". The "east" router will be the initiator, and "west" will be the responder.

Why IPsec behind 1:1 NAT is so problematic and what you can do about it

Not so long ago the only scenario when the issues with IPsec and NAT could arise was a remote access setup, while routers invariably had real public addresses and router to router IPsec operators were incredibly unlikely to run into those issues. A router behind NAT was seen as a pathological case.

I believe it's still a pathological case, but as platforms such as Amazon EC2 that use one to one NAT to simplify IP address management rose in popularity and people started setting up their routers there, it became the new normal. This new reality became a constant source of confusion for beginner network admins and people who simply are not VPN specialists. Attempts to setup a VPN as if there's no NAT, with peer external IP addresses and pre-shared keys as it's commonly done, just fail to work.

Sometimes I'm asked why is that so, if the NAT in question is 1:1, and no other protocol just refuses to work behind it without reconfiguration. The reasons are inside the IPsec protocols themselves, and 1:1 NAT is not much better in this regard than any other kind (even though it's somewhat better than the situation when there are multiple clients behind NAT).

IPsec pre-dates NAT by over a decade, and was explicitly designed with end to end connectivity in mind. It was also designed as a way to add built-in security to the IP protocol stack rather than a new protocol within the stack, so it made heavy use of those underlying assumptions. That's why workarounds had to be added when the assumption of end to end connectivity became incorrect.

Let's look into the details and then see how IPsec can be configured to work around those issues. Incidentally, the solutions are equally applicable to machines with dynamic addresses as well as those behind a 1:1 NAT, since the root cause of the issues is not knowing beforehand what your outgoing address will be next time.

AH and ESP

We all (hopefully) remember that there are three criteria of information security: availability, confidentiality, and integrity. Availability (when required) in the IP is provided by reliable transmission protocols such as TCP and SCTP. IPsec was supposed to provide the other two.

AH (Authentication Header) was designed to ensure packet integrity. For that, it calculates a hash function from the entire packet with all its headers, including source and destination addresses. Since NAT changes the addresses, it invalidates the checksum and NATed packets would be considered corrupted. So, AH is unconditionally incompatible with NAT.

ESP only protects the payload of the packet (whether it's some protocol data, or an entire tunnelled packet), so it, luckily, can work behind NAT. The UDP encapsulation of ESP packets was introduced to allow the traffic to pass through incorrect NAT implementations that may discard non-TCP or UDP protocols, and for correct implementations this should be irrelevant (I've seen a number of SOHO routers that couldn't handle typical L2TP/IPsec connections despite that UDP encapsulation — but that's another story).

Before we can start sending ESP packets, we need to establish a "connection" (security association) though. In practice, most IPsec connections are negotiated via the IKE protocol rather than statically configured, and that's where IKE issues come into play.

IKE and pre-shared keys

To begin with, original IKE was specified to use UDP/500 for both source and destination port. Since NAT may change the source port, the specification had to be adjusted to allow other source ports. Even more issues arise when there are multiple clients behind NAT and traffic has to be multiplexed, but we'll limit the discussion to 1:1 NAT where canonical IKE might work. Let's assume the IKE exchange has gone that far.

The vast majority of IPsec setups in the wild use pre-shared keys. I've configured connections to vendors and partners on behalf of my employers and customers countless times, and I believe I've seen an x.509-based setup once or twice. In a setup with pre-shared keys, routers need to know which keys to use for which peers. How do they know? Suppose you've configured a tunnel with peer 192.0.2.10 and key "qwerty". The other side has 172.16.0.1 address NATed to 192.0.2.10. Will it be the source address of the peer that will be used for the key lookup? Not quite.

In the IKE/ISAKMP exchange, there are identifiers. It's the pairs of identifiers and pre-shared keys that must be unique to reliably tell one peer from another and use correct keys for every peer. There are several types of identifiers specified by the ISAKMP protocol: IP (v4 or v6) address, subnet, FQDN, user and FQDN pair (user@example.com), and raw byte string.

By default, most implementations (including StrongSWAN in VyOS) will use the IP address of the outgoing interface for the identifier, and it will be embedded in the IKE packet. If the host is behind NAT, that address is a private address, 172.16.0.1. When the packet passes through the NAT, the payload will obviously remain unchanged, and when it arrives, the responder will try to find the pair of 172.16.0.1:qwerty in its configuration rather than 192.0.2.1:qwerty that is has configured, and will send NO_PROPOSAL_CHOSEN to the NATed peer.

The solutions

So, how do you get around those issues? Whether IKE is configured to use NAT detection or not, you'll get to use peer identification methods that do not rely on known and fixed IP addresses.

The simplest approach is to use a manually configured peer identifier. This approach will work when one side is NATed and the other is not. The only thing you need to remember is that VyOS (right now at least) requires that FQDN identifiers are prepended with the "@" character. They also do not need to match any real domain name of your machine. You also need to set the local-address on the NATed side to "any".

Here's an example:

Non-NATed side

vpn {
  ipsec {
    site-to-site {
       peer @foo {
         authentication {
             mode pre-shared-secret
             pre-shared-secret qwerty
         }
         default-esp-group Foo
         ike-group Foo
         local-address 203.0.113.1
         tunnel 1 {
             local {
                 prefix 10.103.0.0/24
             }
             remote {
                 prefix 10.104.0.0/24
             }
         }
     }
 }

NATed side

  ipsec {
    site-to-site {
         peer 203.0.113.1 {
             authentication {
                 id @foo
                 mode pre-shared-secret
                 pre-shared-secret qwerty
             }
             default-esp-group Foo
             ike-group Foo
             local-address any
             tunnel 1 {
                 local {
                     prefix 10.104.0.0/24
                 }
                 remote {
                     prefix 10.103.0.0/24
                 }
             }
         }
     }
 }