NAT with a thousand faces

The familiar use cases for NAT are source NAT/masquerade for allowing private subnets access to the Internet, and port forwarding from the Internet to a host in a private network. However, there are more use cases that are less obvious, in part because they are defined by the relative size of the source/destination and translation address options.

One to one NAT

Very common among cloud providers, but equally useful if your ISP is ready to give you an additional address, but not a routable subnet.

Suppose your ISP gave you two addresses, 203.0.113.114 and 203.0.113.115. You use the .114 address for the router itself and want to map the .115 to a server inside your network that has 192.168.136.100 address.

Here's how to do it:

 interfaces {
     ethernet eth0 {
         address 203.0.113.114/24
         address 203.0.113.115/24
         ...
     }
 nat {
     destination {
         rule 10 {
             inbound-interface eth0
             destination {
                 address 203.0.113.115
             }
             translation {
                 address 192.168.136.100
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

One to many NAT

If the network or range specified in translation address is larger than the network in source/destination address, connections from the same host will be translated to more than one address. In source NAT, this is only useful for a bizzare kind of conspicious consumptions like buying a /24 subnet for yourself and using it all for just your desktop.

In destination NAT, however, it can be used as a simple form of L3, non-application aware load balancing.

Suppose you got 10 web servers all in the range of 192.168.136.100 to 192.168.136.110. You want traffic sent to 203.0.113.115 balanced across them. Here's an example:

nat {
     destination {
         rule 10 {
             destination {
                 address 203.0.113.115
                 port 80,443
             }
             inbound-interface eth0
             protocol tcp
             translation {
                 address 192.168.136.100-192.168.136.110
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 192.168.136.100-192.168.136.110
             }
             translation {
                 address 203.0.113.115
             }
         }
     }
 }

Many to many NAT

What happens if the source/destination and translation networks are the same size though? In that case, only the network part is translated, while the host part should stay untouched.

This is useful for getting around subnet conflicts.

nat {
     destination {
         rule 10 {
             destination {
                address 192.168.136.0/24
             }
             inbound-interface eth0
             translation {
                 address 10.20.30.0/24
             }
         }
     }
     source {
         rule 10 {
             outbound-interface eth0
             source {
                 address 10.20.30.0/24
             }
             translation {
                 address 192.168.136.0/24
             }
         }
     }
 }

If you know more variations, please let me know.

Interaction between IPsec and NAT (on the same router)

I've just completed a certain unusual setup that involved NATing packets before they are sent to an IPsec tunnel, so I thought I'll write about this topic. Even in perfectly ordinary setups, the interaction between the two often catches people off guard, me included.

No, this is not a premature Friday post. The Friday post will be a continuation of the little known featured of the VyOS CLI.

Most routers these days have some NAT configured, so if you setup an IPsec tunnel, you need to understand the interaction between the two. Luckily, it's pretty simple.

Every network OS has a fixed packet processing order, and for a good reason. For example, source NAT has to be performed after routing because otherwise the OS will not know which outgoing interface must be used for the packet, and will not be able to determine which SNAT rule must be applied to that packet. Likewise, destination NAT must happen before routing if we want to be able to send incoming packets to the intended host — the routing decision depends on the new destination address.

Sometimes the order is less critical but reversing it would create inconvenience for network admins. For example, in Linux (and thus in VyOS), inbound firewall rules are processed after DNAT, so the destination address the firewall will see is the internal address, and you can easily setup a firewall that mentions private addresses on your WAN interface. If it was the other way around, then if you wanted to setup firewall rules for your private addresses, you would have to assign the firewall to the out direction of the LAN interface — not quite as logical or convenient, even if the end result is the same.

Where's IPsec in that processing flow and what are the implications of its position in it?

Let's revisit the complete diagram (image by Jan Engelhardt, CC-BY-SA):

If posthaven can't handle images properly, here's a direct link to the larger version:


The box you are looking for is "XFRM". In Linux, IPsec is not a special component, but a part of the XFRM framework that can do encryption amond other things (it also does compression and header modification).

From the diagram we can see that XFRM decode step (thus IPsec encryption) is before DNAT (NAT prerouting), and IPsec decryption is after SNAT (NAT postrouting). The implications of it are twofold: first you need to be careful when setting up SNAT and IPsec on the same machine, second, you can apply NAT rules to traffic that will go to the tunnel if you really have to.

Avoiding adverse interaction

Suppose you have this config:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 192.168.10.0/24
         }
         remote {
             prefix 10.10.10.0/24
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

What will happen to a packet sent by host 192.168.10.100 to host 10.10.10.200? Since SNAT is performed before IPsec, and the 192.168.10.100 source address matches the rule 10, the rule will be applied and the packet will go down the packet processing pipeline with source address 203.0.113.134, which does not match the IPsec policy from tunnel 1. The packet will be sent out of the eth0 interface, unencrypted, and destined to be dropped by the ISP due to its private destination address (or it will be sent to a wrong host, which is not any better).

In this case this order of packet processing seems to be a real hassle. There's a very easy workaround though: exclude packets with destination address 10.10.10.0/24 from SNAT, like this:

vyos@VyOS-AMI# show nat source 
rule 5 {
    outbound-interface eth0
    destination {
        address 10.10.10.0/24
    }
    exclude
}
 rule 10 {
     outbound-interface eth0
     source {
         address 192.168.10.0/24
     }
     translation {
         address 203.0.113.134
     }
 }

If you've setup IPsec, the SA is up, but for some reason packets don't get through, make sure that you didn't forget to exclude traffic to the remote network from NAT. It's easy to see with tcpdump whether packets are sent the wrong way or not.

Exploiting the interaction

So far we've only seen how this particular processing order can be bad for our setup. Can it be good for anything then? Sometimes it seems like the Linux network stack was optimized to allow doing crazy things. Just a few days ago I've run into a case when this turned beneficial.

Suppose you setup an IPsec tunnel to your partner, and it turns out you both are using 192.168.10.0/24 subnet internally. None of you is willing to renumber your own network to solve the problem cleanly, but some compromise must be made. The solution is to NAT packets before they are encrypted, which works as expected precisely because IPsec happens after SNAT.

For simplicity let's assume only a single host from our network (internal address 192.168.10.45) needs to interact with a single host from the remote network (10.10.10.55). We will make up an intermediate 172.16.17.45 address and NAT the tunnel traffic to and from 10.10.10.55 host to actually be sent to the 192.168.10.45 host.

The config looks like this:

vyos@vyos# show vpn ipsec site-to-site 
 peer 192.0.2.150 {
     [SNIP]
     tunnel 1 {
         local {
             prefix 172.16.17.45/32
         }
         remote {
             prefix 10.10.10.55/32
         }
     }
 }

vyos@vyos# show nat source 
 rule 10 {
     outbound-interface any
destination {
address 10.10.10.55
}
 source { address 192.168.10.45 }
 translation { address 172.16.17.45 } } vyos@vyos# show nat destination rule 10 { destination { address 172.16.17.45 } inbound-interface any translation { address 192.168.10.45 } }

If IPsec was performed before source NAT, this kind of setup would be impossible.

Setting up GRE/IPsec behind NAT

In the previous posts of this series we've discussed setting up "plain" IPsec tunnels from behind NAT.

The transparency of the plain IPsec, however, is more often a curse than a blessing. Truly transparent IPsec is only possible between publicly routed networks, and the tunnel mode creates a strange mix of the two approaches: you do not have a network interface associated with the tunnel, but the setup is not free of routing issues either, and it's often hard to test whether the tunnel actually works or not from the router itself.

GRE/IPsec (or IPIP/IPsec, or anything else) offers a convenient solution: for all intents and purposes it's a normal network interface and makes it look like the networks are connected with a wire. You can easily ping the other side, use the interface for firewall and QoS rulesets, and setup dynamic routing protocols in a straightforward way. However, NAT creates a unique challenge for this setup.

The canonical and the simplest GRE/IPsec setup looks like this:

interfaces {
  tunnel tun0 {
    address 10.0.0.2/29
    local-ip 192.0.2.10
    remote-ip 203.0.113.20
    encapsulation gre
  }
}
vpn {
  ipsec {
    site-to-site {
      peer 203.0.113.20 {
        tunnel 1 {
          protocol gre
        }
        local-address 192.0.2.10

It creates a policy that encrypts any GRE packets sent to 203.0.113.20. Of course it's not going to work with NAT because the remote side is not directly routable.

Let's see how we can get around it. Suppose you are setting up a tunnel between routers called East and West. The way to get around it is pretty simple even if not exactly intuitive and boils down to this:

  1. Setup an additional address on a loopback or dummy interface on each router, e.g. 10.10.0.1/32 on the East and 10.10.0.2/32 on the West.
  2. Setup GRE tunnels that are using 10.10.0.1 and .2 as local-ip and remote-ip respectively.
  3. Setup an IPsec tunnels that uses 10.10.0.1 and .2 as local-prefix and remote-prefix respectively.

This way when traffic is sent through the GRE tunnel on the East, the GRE packets will use 10.10.0.1 as a source address, which will match the IPsec policy. Since 10.10.0.2/32 is specified as the remote-prefix of the tunnel, the IPsec process will setup a kernel route to it, and the GRE packets will reach the other side.

Let's look at the config:

interfaces {
  dummy dum0 {
    address 10.10.0.1/32
  }
  tunnel tun0 {
    address 10.0.0.1/29
    local-ip 10.10.0.1
    remote-ip 10.10.0.2
    encapsulation gre
  }
}
vpn {
  ipsec {
    site-to-site {
      peer @west {
        connection-type respond
        tunnel 1 {
          local {
            prefix 10.10.0.1/32
          }
          remote {
            prefix 10.10.0.2/32
          }

This approach also has a property that may make it useful even in publicly routed networks if you are going to use the GRE tunnel for sensitive but unencrypted traffic (I've seen that in legacy applications): unlike the canonical setup, GRE tunnel stops working when the IPsec SA goes down because the remote end becomes unreachable. The canonical setup will continue to work even without IPsec and may expose the GRE traffic to eavesdropping and MitM attacks.

This concludes the series of posts about IPsec and NAT. Next Friday I'll find something else to write about. ;)