Possible re-appearance of the AWS public key loading bug and what to do if you see it

There were some reports of re-appearence of the infamous bug with SSH public key loading in Amazon EC2. None of those were confirmed so far and one of them was traced to a user error, and we and our community members alike created dozens instances in different regions but could not reproduce the bug, so chances are it's a false alarm.

It's too early to become concerned, but we cannot rule out anything just yet because its previous appearence also started as intermittent and then became permanent and propagated to all regions.

To prevent a wide scale disaster, we are watching the situation closely. Past time when we could reproduce the bug ourselves it has already propagated too all regions and made the old AMI unusable for all users, so we definitely wouldn't want to repeat that. I suppose we should deploy some kind of automated test for it as well.

Meanwhile, if you have any issues logging in to a newly deployed AMI:

  1. Check carefully if the permissions of the private key file are set to 600 (rw-------). OpenSSH will say permission denied if the permissions are not restrictive enough, after telling you about permissions (http://lpaste.net/360350), but it has nothing to do with the key itself, it's just that it ignores the key with wrong permissions
  2. If you still cannot login to a newly deployed instance, please do not terminate it!. Snapshot it and contact us so that we can investigate it together.

Since marketplace AMI updates can take up to two weeks, if the bug does re-appear, we will provide a community AMI with all platform checks disabled to make the key load by default as an interim solution.

We are not ready to say if 1.1.9 will include the code that checks the platform by verifying the digital signature of the environment data, but 1.2.0 definitely will.

P.S.

Right now the marketplace page of VyOS (https://aws.amazon.com/marketplace/pp/B074KJK4WC) has the sole negative review about the original SSH key issue that was written (or at least published) after the issue was resolved and the update was announced on the blog, which gives our AMI overall rating of one star. It feels rather unfair that this is all we get for working hard to fix the issue that wasn't even our fault as soon as possible and communicate it to our users.

If you are using VyOS on AWS, please consider leaving a customer review there to give it a more balanced rating.


VyOS cloud support platform strategy

Recently we have merged the old VyOS product on Amazon (which was free as in price) with the new one listed by Sentrium (reminder: the company setup by VyOS maintainers to provide commercial services for VyOS) that is available at ~$60/year as a means to support the project. It’s probably a good time to talk about our cloud platform support strategy.

It’s quite obvious that cloud platforms are popular, and that VyOS is often used as an alternative to their own VPN and routing facilities or proprietary products, because of both functionality and price. So far we’ve been providing our official EC2 AMI for free. This approach is not exactly sustainable since we ourselves do not benefit from it in any way, but the support level required of an AMI listed on the marketplace creates work that has to be done, and so far (past the initial contribution of the person who wrote the first AMI build scripts) it has been done solely by the maintainers.

And, let’s face it, VyOS needs funding sources. There are many activities that get virtually no community participation, including documentation writing, code refactoring, developing automated tests and so on. We are grateful to everyone who contributes, but that’s not enough to keep VyOS as a service provider and enterprise quality router. Migration to Jessie was the point when making an enterprise quality router OS by few people in spare time stopped scaling, and now, at least until we refactor the whole system to be easy to maintain and extend, either it's going to take a very long time to get done or we need a way to free the time of the maintainers and hire some community members (or someone else) to get it done at reasonable pace.

Some companies mentioned a possibility of corporate sponsorships but those offers failed to materialize, donations likewise have been enough only to pay for domains and some hosting. Switching to a RedHat-like model with paid access to LTS releases is something that we are considering, but not until a stable 1.2.0 release is made. As for now, an alternative revenue stream is needed to break the cycle, and making the official cloud images paid looks like the most viable option. This wasn't an easy decision, but it had to be made.

We've been keeping both the old (free as in price) 1.1.0 listing and new paid listing updated to 1.1.8 active for a while and so far the new one has been positively received and is already bringing us a small profit. We hope most of the old AMI users will choose to support the project as well, but if not, you have time until February 2018 to migrate your setup to self-built or community AMIs.

Most importantly

VyOS will remain free as in freedom forever — the goal of the project was and is to make a free and open source router that people can study and modify. It will always be possible to build a cloud image yourself and use it for free. While we hope most cloud users who run unmodified images will choose to support the project, trying to prevent people from building their own image would be against the free software ideals. We are also not going to hide the build scripts we use for building the official images, so self-built images will be identical to the official ones — without the convenience of deploying them in one click from the marketplace.

The AMI build scripts for 1.1.x can be found at https://github.com/vyos/build-ami

The build process is fairly straightforward and boils down to "./build-ami $isoURL". Right now it only takes signed releases and ourselves we test new images by upgrading already deployed instances to them, but we may add an option to allow building AMIs from unsigned images in the future.

Free access for contributors

The point of the whole deal is not just to make money for the maintainers, it's to make VyOS a better system and people who support VyOS by contributing to it are equally valuable as people who support it financially by using the paid images.
Everyone who had merged pull requests before this time or made substantial contribution to the documentation, automatically qualifies. New contributors who contribute to high priority areas (particularly, fix bugs in 1.2.0 and/or document the yet undocumented features) will also get free access. Active evangelists may also quality.

Update on the AWS SSH key fetching issue

We have fixed the issue with key fetching and submitted the updated AMI for review. It passed the automated scan, but manual review and deployment to the marketplace will take some time.

The new AMI also includes updates for dnsmasq security vulnerabilities that will be included in 1.1.8. If you want to install those updates on 1.1.7 by hand, you can use these packages: http://dev.packages.vyos.net/tmp/dnsmasq/


Permission denied issues with AWS instances

Quick facts: the issue is caused by an unexpected change in the EC2 system, there is no solution or workaround yet but we are working on it.

In the last week a number of people reported an issue with newly created EC2 instances of VyOS where they could not login to their newly created instance. At first we thought it may be an intermittent fault in the AWS since the AMI has not changes and we could not reproduce the problem ourselves, but the number of reports grew quickly, and our own test instances started showing the problem as well.

Since EC2 instances don't provide any console access, it took us a bit of time to debug. By juggling EBS volumes we finally managed to boot an affected instance with an disk image modified to include our own SSH keys.

The root cause is in our script that checks if the machine is running in EC2. We wanted to produce the AMI from an unmodified image, which required inclusion of the script that checks if the environment is EC2. Executing a script that obtains an SSH key from a remote (even if link-local) address is a security risk since in a less controlled environment an attacker could setup a server that could inject their keys to all VyOS systems.

The key observation was that in EC2, both system-uuid and system-serial-number fields in the DMI data always start with "EC2". We thought this is a good enough condition, and for the few years we've been providing AMIs, it indeed was.

However, Amazon changed it without warning and now the system-uuid may not start with EC2 (serial numbers still do), and VyOS instances stopped executing their key fetching script.

We are working on the 1.1.8 release now, but it will go through an RC phase, while the solution to the AWS issue is needed right now. We'll contact Amazon support to see what are the options, stay tuned.

VyOS usage report on AWS

Once in a while people ask how many people are using VyOS on AWS. Since AWS sends us usage reports, this is something we can find out.

Note that this is about the AMI we distribute through the marketplace, we know nothing about instances people deploy from community AMIs.

Anyway, on May the 29th, there were 418 users of the marketplace AMI who ran 466 active instances (roughly 1.1 instance per user).

Breakdown by country

There were users from 32 countries. Countries with the largest numbers of users are the USA (232), Japan (57), United Kingdom (25), and Australia (18). Most other users are in other european countries and a few are in american countries other than the USA and asian countries other than Japan.

Breakdown by instances

The most popular instance type is, unsurprisingly, the free tier eligible t2.micro (202 instances). It is followed by 100 m3.medium, 54 t2.small, 48 t2.medium, and 13 c4.large instances. The remaining larger 49 instances include various large and xlarge ones, including some m4.4xlarge and c3.8xlarge (I sure would like to hear from those people about their use cases!).

Conclusion

Conclusion? What conclusion? I'm not sure if 418 users on AWS is a failure or an achievement. Maybe when we make images for other cloud platforms, we'll have something to compare it with.