blog.vyos.net

VyOS builds now use the deb.debian.net load balanced mirror

If there are any good things about that packages server migration and restructuring is that it promoted a revamp of the associated part of the build scripts.

Since the start the default Debian mirror was set to nl.debian.org for a completely arbitrary reason. This of course was suboptimal for most users who are far from the Netherlands, and while the mirror is easy enough to change in ./configure options, a better out of the box experience wouldn't harm.

Danny ter Haar (fromport) suggested that we change it to deb.debian.org which is load balanced, which I think is a good idea. There's a small chance that it will redirect you to a dead mirror, but if you run into any issues, you can always set it by hand.

VyOS builds and HTTPS: build works again, HTTP still needs testing

We have restored VyOS builds. Nightly build should work as expected today, and you can build it by hand as well if you want. This is not exactly the end of the story for us since we need to finish some reconfiguration of Jenkins to accomodate the new setup, but nothing dramatic should happen to the ISO builds any soon, or so we hope.

HTTPS, however, is another story. It still doesn't work for me, and I'm not sure if it's APT itself to blame or anything in our build setup. Since this is not a pressing issue, I'm not going to put much effort into it right now, but if you have a build setup, please check if it works for you. If it doesn't work for anyone, then we can write it off as an APT issue.

Follow-up: VyOS builds and HTTPS

We've made HTTP on the dev.packages.vyos.net host optional, and restored the real directory index (provided by the Apache HTTP's mod_autoindex) instead of using the DirectoryLister that was proven a bit problematic with APT.

Since we had to change the default repository URL anyway, I also took a chance to finally make it configurable rather than hardcoded (T519). Now you can specify a custom URL with the --vyos-mirror="$URL" option. It defaults to the plain HTTP URL right now for the reason stated below.

I have also found a way to make live-build include apt-transport-https packages at the bootstrap stage and enable it to use HTTPS servers for building images. However, for some reason it doesn't work for me, apt says it cannot fetch the package index, while fetching that file with curl works just fine from the same host. I'm not sure what the issue may be. If you verify that it works for you or doesn't, or you know how to make it work, please comment upon T422.

Actual builds "still" don't work but for a completely unrelated reasons: mdns-repeater package needed for the recently merged mDNS repeater feature is not yet in the repository. We will fix it shortly, now that the builds otherwise work.

VyOS builds and HTTPS

For a while some people kept asking why we do not enable HTTPS on the servers with ISOs and repositories. Now we have enabled it, but it turned out it's not all that simple: since APT doesn't support HTTPS by default and the apt-transport-https package is not installed by the bootstrap stage of the live-build process, the dev.packages.vyos.net server is now unusable and image builds are broken.

We are looking for solutions. If you know how to make live-build use apt-transport-https for the boostrap stage, please tell us. If we don't find anything today, we'll try to make HTTPS optional, and if that turns out to be impractical in VestaCP (which we are using for the web frontend server), we'll just disable it there.

Sorry for the inconvenience!

No new features in Perl and shell and no old style templates since May 2018

Now that the Python library for accessing the running config and the generator of old style templates from new style XML command definitions are known to be functional, it's time to set a cutoff date for the old style code. We decided to arbitrarily set it to May 2018.

Since May the 1st, no new code written in the old style will be accepted. We will accept (and make ourselves) fixes to the old code when the bug severity is high enough to affect VyOS operation, but all new features get to be new style.

If you have some new feature already almost done, consider completing it until May. If you are planning a new feature, consider learning about the new style first.

If you are late to the party, please read these blog posts:

For those who are not, I'll reiterate the main points:

Contributors who know Perl and are still willing to write it are getting progressively harder to find and many people say they would contribute if it wasn't in Perl.
The reasons people abandoned Perl are more than compelling: lack of proper type checking, exception handling, and many other features of post-60's language designs do not help reliability and ease of maintenance at all.
Old style handwritten templates are very hard to maintain since both the data and often the logic is spread across dozens, if not hundreds of files in deeply nested directories. Separation of logic from data and a way to keep all command definitions of a feature in a single, observable file are required to make it maintainable.
With old style templates, there's no way to verify them without installing them on VyOS and trying them by hand. XML is the only common language that has ready to use tools for verification of its syntax and semantics: right now it's already integrated in our build system and a build with malformed command definitions will fail.

I'll write a tutorial about writing features in the new style shortly, stay tuned. Meanwhile you can look at the implementation of cron in 1.2.0:

https://github.com/vyos/vyos-1x/blob/current/src/conf-mode/vyos-update-crontab.py (conf mode script)
https://github.com/vyos/vyos-1x/blob/current/interface-definitions/cron.xml (command definitions)

Migration and restructuring of the (dev).packages.vyos.net hosts

The original host where the packages.vyos.net and dev.packages.vyos.net web servers used to live has kept having serious I/O performance issues ever since a RAID failure event, and the situation hasn't improved much. After we've started having very noticeable latency even with bash completion (not speaking of very slow download speed for our users and mirror maintainers), we decided that it's time to move it elsewhere.

We do not blame the hoster, to the contrary, we are grateful for hosting it for us at no cost for so long despite its bandwidth consumption, and for offering it to us at the point in the VyOS project life when it has exactly zero budget for anything. But, it seems now it's time to move on.

The dev.packages.vyos.net host is used for the package repositories of the current branch and for nightly builds, as well as experimental images we make available for testing. It is already migrated to the new host and operational, even though some things are not working as well as they should, including temporary (we hope) loss of IPv6 access to it, and somewhat untidy directory links generated by the (otherwise very nice) DirectoryLister script.

We are sorry for the migration taking longer than it should have and for not communicating it to the community. The dev.packages.vyos.net stayed inaccessible for a whole day because of miscommunication: we have migrated the data, but I assumed that Yuriy (syncer) will setup the web server, while he assumed I will. Our maintenance procedures definitely could use some improvement.

Note that apart from physical data migration, we have also done some directory restructuring. We hope it's now more logical and that it will not have any impact on anyone, but if it does, we can setup redirects to make old direct links work again.

The packages.vyos.net server will stay on the old machine for a bit as we also need to migrate the rsync setup that the mirrors are using. We are still not sure whether to use downloads.vyos.io as a central server for mirrors too.

As of building VyOS images, we'll make adjustments to the build scripts to make them use the new paths by default and images will be buildable again.

Why IPsec behind 1:1 NAT is so problematic and what you can do about it

Not so long ago the only scenario when the issues with IPsec and NAT could arise was a remote access setup, while routers invariably had real public addresses and router to router IPsec operators were incredibly unlikely to run into those issues. A router behind NAT was seen as a pathological case.

I believe it's still a pathological case, but as platforms such as Amazon EC2 that use one to one NAT to simplify IP address management rose in popularity and people started setting up their routers there, it became the new normal. This new reality became a constant source of confusion for beginner network admins and people who simply are not VPN specialists. Attempts to setup a VPN as if there's no NAT, with peer external IP addresses and pre-shared keys as it's commonly done, just fail to work.

Sometimes I'm asked why is that so, if the NAT in question is 1:1, and no other protocol just refuses to work behind it without reconfiguration. The reasons are inside the IPsec protocols themselves, and 1:1 NAT is not much better in this regard than any other kind (even though it's somewhat better than the situation when there are multiple clients behind NAT).

IPsec pre-dates NAT by over a decade, and was explicitly designed with end to end connectivity in mind. It was also designed as a way to add built-in security to the IP protocol stack rather than a new protocol within the stack, so it made heavy use of those underlying assumptions. That's why workarounds had to be added when the assumption of end to end connectivity became incorrect.

Let's look into the details and then see how IPsec can be configured to work around those issues. Incidentally, the solutions are equally applicable to machines with dynamic addresses as well as those behind a 1:1 NAT, since the root cause of the issues is not knowing beforehand what your outgoing address will be next time.

AH and ESP

We all (hopefully) remember that there are three criteria of information security: availability, confidentiality, and integrity. Availability (when required) in the IP is provided by reliable transmission protocols such as TCP and SCTP. IPsec was supposed to provide the other two.

AH (Authentication Header) was designed to ensure packet integrity. For that, it calculates a hash function from the entire packet with all its headers, including source and destination addresses. Since NAT changes the addresses, it invalidates the checksum and NATed packets would be considered corrupted. So, AH is unconditionally incompatible with NAT.

ESP only protects the payload of the packet (whether it's some protocol data, or an entire tunnelled packet), so it, luckily, can work behind NAT. The UDP encapsulation of ESP packets was introduced to allow the traffic to pass through incorrect NAT implementations that may discard non-TCP or UDP protocols, and for correct implementations this should be irrelevant (I've seen a number of SOHO routers that couldn't handle typical L2TP/IPsec connections despite that UDP encapsulation — but that's another story).

Before we can start sending ESP packets, we need to establish a "connection" (security association) though. In practice, most IPsec connections are negotiated via the IKE protocol rather than statically configured, and that's where IKE issues come into play.

IKE and pre-shared keys

To begin with, original IKE was specified to use UDP/500 for both source and destination port. Since NAT may change the source port, the specification had to be adjusted to allow other source ports. Even more issues arise when there are multiple clients behind NAT and traffic has to be multiplexed, but we'll limit the discussion to 1:1 NAT where canonical IKE might work. Let's assume the IKE exchange has gone that far.

The vast majority of IPsec setups in the wild use pre-shared keys. I've configured connections to vendors and partners on behalf of my employers and customers countless times, and I believe I've seen an x.509-based setup once or twice. In a setup with pre-shared keys, routers need to know which keys to use for which peers. How do they know? Suppose you've configured a tunnel with peer 192.0.2.10 and key "qwerty". The other side has 172.16.0.1 address NATed to 192.0.2.10. Will it be the source address of the peer that will be used for the key lookup? Not quite.

In the IKE/ISAKMP exchange, there are identifiers. It's the pairs of identifiers and pre-shared keys that must be unique to reliably tell one peer from another and use correct keys for every peer. There are several types of identifiers specified by the ISAKMP protocol: IP (v4 or v6) address, subnet, FQDN, user and FQDN pair (user@example.com), and raw byte string.

By default, most implementations (including StrongSWAN in VyOS) will use the IP address of the outgoing interface for the identifier, and it will be embedded in the IKE packet. If the host is behind NAT, that address is a private address, 172.16.0.1. When the packet passes through the NAT, the payload will obviously remain unchanged, and when it arrives, the responder will try to find the pair of 172.16.0.1:qwerty in its configuration rather than 192.0.2.1:qwerty that is has configured, and will send NO_PROPOSAL_CHOSEN to the NATed peer.

The solutions

So, how do you get around those issues? Whether IKE is configured to use NAT detection or not, you'll get to use peer identification methods that do not rely on known and fixed IP addresses.

The simplest approach is to use a manually configured peer identifier. This approach will work when one side is NATed and the other is not. The only thing you need to remember is that VyOS (right now at least) requires that FQDN identifiers are prepended with the "@" character. They also do not need to match any real domain name of your machine. You also need to set the local-address on the NATed side to "any".

Here's an example:

Non-NATed side

vpn {
  ipsec {
    site-to-site {
       peer @foo {
         authentication {
             mode pre-shared-secret
             pre-shared-secret qwerty
         }
         default-esp-group Foo
         ike-group Foo
         local-address 203.0.113.1
         tunnel 1 {
             local {
                 prefix 10.103.0.0/24
             }
             remote {
                 prefix 10.104.0.0/24
             }
         }
     }
 }

NATed side

  ipsec {
    site-to-site {
         peer 203.0.113.1 {
             authentication {
                 id @foo
                 mode pre-shared-secret
                 pre-shared-secret qwerty
             }
             default-esp-group Foo
             ike-group Foo
             local-address any
             tunnel 1 {
                 local {
                     prefix 10.104.0.0/24
                 }
                 remote {
                     prefix 10.103.0.0/24
                 }
             }
         }
     }
 }

The meltdown and spectre vulnerabilities

Everyone is talking about the meltdown and the spectre vulnerabilities now. If you are late to the party, read https://meltdownattack.com/

Of course we are aware of it and took time to assess the risks for VyOS. Since both vulnerabilities can only be exploited locally, the risk for a typical VyOS installation is very low: if an untrusted person managed to login to your router, you are already deep in trouble and unathorized access to the OS memory is arguably the least of your concerns since even operator level users can make traffic dumps.

The fix will not be included in 1.1.9. Since the fix is associated with an up to 30% performance penalty, in 1.2.x, we will make it optional is feasible.

VyOS mission statement

Past year the VyOS project has turned four. Perhaps it's time to update the mission statement I've made back in 2013. This one doesn't exactly contradict it, but needs to include the most important lesson we've learnt: if there is essentially no influx of commited maintainers rather than occasional contributors, the emphasis should be on finding ways to build and retain the maintainers team and enable them to put more time and effort into the project, and thus the commercial services should be a part of the mission statement.

If anyone has fears of VyOS losing anything in the freedom part, please do not fear. There will be a post about release model change that has to do with reconciling the conflicting goals of people who want latest features even at cost of stability and people who want stability, and creating an incentive for commercial users to give back to the project, but none of those come at cost of compromising the software freedom.

So, here's the updated mission statement:

VyOS is a successor of the original Vyatta

VyOS started as a Vyatta Core fork when it became clear that Vyatta Core was abandoned and no one else was going to continue it as a free open source software product. The development team and a substantial part of the user community still consist of long time Vyatta Core users and the primary reason why VyOS came into existence is to not let Vyatta Core effort to die.

Now that Brocade vRouter is no longer sold and supported, VyOS can as well be an update path for its former users, though we cannot provide a zero-effort update path for them of course.

VyOS is a free and open source platform

We believe that software freedom is very important and every user should be free to inspect the code, modify it, contribute to it, build customized images.

We will always keep the entire codebase of VyOS under free licenses. We will always keep the entire toolchain for building VyOS images open source and publicly accessible. We will try to make it as easy to use as possible. We will collaborate with vendors to support latest hardware, virtualization and cloud platforms.

VyOS is an affordable and accessible platform for a diverse user base

We know that price is a serious contributing factor that makes people turn to software-based routers. We also know that many VyOS users are networking enthusiasts and small businesses from around the world who cannot afford or not willing to use expensive hardware, whether commodity or specialized.

We will never make VyOS require any specific hardware to be fully functional. We will try to make VyOS usable on wide range of hardware, from small devices to carrier grade servers.

We are open for collaboration with hardware/software manufacturers who want to adopt VyOS

Maintainers’ work needs to be compensated

While we hope that VyOS is a community-driven project, an important thing is that community contributors typically only do the work they need themselves, but the maintainers get to maintain even features they do not use themselves and do work that is necessary but offers no reward for anyone in the short run such as code refactoring and test writing etc. This work needs to be compensated.

Besides, a large number of VyOS users are for-profit companies and use VyOS to reduce their costs. The maintainers are committed to keep VyOS free as in freedom, but will commercialize the project as well to get compensated for their work and build a team of people required to keep the development sustainable.

Possible re-appearance of the AWS public key loading bug and what to do if you see it

There were some reports of re-appearence of the infamous bug with SSH public key loading in Amazon EC2. None of those were confirmed so far and one of them was traced to a user error, and we and our community members alike created dozens instances in different regions but could not reproduce the bug, so chances are it's a false alarm.

It's too early to become concerned, but we cannot rule out anything just yet because its previous appearence also started as intermittent and then became permanent and propagated to all regions.

To prevent a wide scale disaster, we are watching the situation closely. Past time when we could reproduce the bug ourselves it has already propagated too all regions and made the old AMI unusable for all users, so we definitely wouldn't want to repeat that. I suppose we should deploy some kind of automated test for it as well.

Meanwhile, if you have any issues logging in to a newly deployed AMI:

Check carefully if the permissions of the private key file are set to 600 (rw-------). OpenSSH will say permission denied if the permissions are not restrictive enough, after telling you about permissions (http://lpaste.net/360350), but it has nothing to do with the key itself, it's just that it ignores the key with wrong permissions
If you still cannot login to a newly deployed instance, please do not terminate it!. Snapshot it and contact us so that we can investigate it together.

Since marketplace AMI updates can take up to two weeks, if the bug does re-appear, we will provide a community AMI with all platform checks disabled to make the key load by default as an interim solution.

We are not ready to say if 1.1.9 will include the code that checks the platform by verifying the digital signature of the environment data, but 1.2.0 definitely will.

P.S.

Right now the marketplace page of VyOS (https://aws.amazon.com/marketplace/pp/B074KJK4WC) has the sole negative review about the original SSH key issue that was written (or at least published) after the issue was resolved and the update was announced on the blog, which gives our AMI overall rating of one star. It feels rather unfair that this is all we get for working hard to fix the issue that wasn't even our fault as soon as possible and communicate it to our users.

If you are using VyOS on AWS, please consider leaving a customer review there to give it a more balanced rating.