VyOS 2.0 development digest #1

I keep talking about the future VyOS 2.0 and how we all should be doing it, but I guess my biggest mistake is not being public enough, and not being structured enough.

In the early days of VyOS, I used to post development updates, which no one would read or comment upon, so I gave up on it. Now that I think of it, I shouldn't have expected much as the size of the community was very small at the time, and there were hardly many people to read it in the first place, even though it was a critical time for the project, and input from the readers would have been very valuable.

Well, this is a critical time for the project too, and we need your input and your contributions more than ever, so I need to get to fixing my mistakes and try to make it easy for everyone to see what's going on and what we need help with.

Getting a steady stream of contributions is a very important goal. While the commercial support thing we are doing may let the maintainers focus on VyOS and ensure that things like security fixes and release builds get guaranteed attention in time, without occasional contributors who add things they personally need (while maintainers may not, I think myself I'm using maybe 30% of all VyOS features any often) the project will never realize its full potential, and may go stale.

But to make the project easy to manage and easy to contribute to, we need to solve multiple hard problems. It can be hard to get oneself to do things that promise no immediate returns, but if you looks at it the other way, we have a chance to build a system of our dreams together. As of 1.1.x and 1.2.x (the jessie branch), we'll figure it out how to maintain it until we solve those problems, but that's for another post. Right now we are talking about VyOS 2.0, which gets to be a cleanroom rewrite.

Why VyOS isn't as good as it could be, and can't be improved

I considered using "Why VyOS sucks" to catch reader's attention. It's a harsh word, and it may not be all that true, given that VyOS in its current state is way ahead of many other systems that don't even have system-wide config consistency checks, or revisions, or safe upgrades, but there are multiple problems that are so fundamental that they are impossible to fix without rewriting at least a very large part of the code.

I'll state the design problems that cannot be fixed in the current system. They affect both end users and contributors, sometimes indirectly, but very seriously.

Design problem #1: partial commits

You've seen it. You commit, there's an error somewhere, and one part of the config is applied, while the other isn't. Most of the time it's just a nuisance, you fix the issue and commit again, but if you, say, change interface address and firewall rule that is supposed to allow SSH to it, you can get locked out of your system.

The worst case, however, is when commit fails at boot. While it's good to have SSH at least, debugging it can be very frustrating, when something doesn't work, and you have no idea why, until you inspect the running config and see that something is simply missing (if you run into it in VyOS 1.x, do "load /config/config.boot" and commit, this will either work or show you why it failed). It's made worse by lack of notifications about config load failure for remote users, you can only see that error on the console.

The feature that can't be implemented due to it is what goes by "commit check" in JunOS. You can't test if your configuration will apply cleanly without actually commiting it.

It's because in the scripts, the logic for consistency checking and generating real configs (and sometimes applying them too) is mixed together. Regardless of the backend issues, every script needs to be taken apart and rewritten to separate that logic. We'll talk more about it later.

Design problem #2: read and write operations disparity

Config reads and writes are implemented in completely different ways. There is no easy programmatic API for modifying the config, and it's very hard to implement because binaries that do it rely on specific environment setup. Not impossible, but very hard to do right, and to maintain afterwards.

This blocks many things: network API and thus an easy to implement GUI, modifying the config script scripts in sane ways (we do have the script-template which does the trick, kinda, but it could be a lot better).

Design problem #3: internal representation

Now we are getting to really bad stuff. The running config is represented as a directory tree in tmpfs. If you find it hard to believe, browse /opt/vyatta/config/active, e.g. /opt/vyatta/config/active/system/time-zone/node.val

Config levels are directories, and node values are in node.val files. For every config session, a copy of the active directory is made, and mounted together with the original directory in union mount through UnionFS.

There are lots of reasons why it's bad:

  • It relies on behaviour of UnionFS, OverlayFS or another filesystem won't do. We are at mercy of unionfs-fuse developers now, and if they stop maintaining it (and I can see why they may, OverlayFS has many advantages over it), things will get interesting for us
  • It requires watching file ownership and permissions. Scripts that modify the config need to run as vyattacfg group, and if you forget to sg, you end up with a system where no one but you (or root) can make any new commits, until you fix it by hand or reboot
  • It keeps us from implementing role-based access control, since config permissions are tied to UNIX permissions, and we'd have to map it to POSIX ACLs or SELinux and re-create those access rules at boot since the running config dir is populated by loading the config
  • For large configs, it creates a fair amount of system calls and context switches, which may make system run slower than it could

Design problem #3: rollback mechanism

Due to certain details (mostly handling of default values), and the way config scripts work too, rollback cannot be done without reboot. Same issue once made Vyatta developers revert activate/deactivate feature.

It makes confirmed commit a lot less useful than it should be, especially in telecom where routers cannot be rebooted at random even in maintenance windows.

Implementation problem #1: untestable logic

We already discussed it a bit. The logic for reading the config, validating it, and generating application configs is mixed in most of the scripts. It may not look like a big deal, but for the maintainers and contributors it is. It's also amplified by the fact that there is not way to create and manipulate configs separately, the only way you can test anything is to build a complete image, boot it, and painstakingly test everything by hand, or have expect-like tool emulate testing it by hand.

You never know if your changes may possibly work until you get them to a live system. This allows syntax errors in command definitions and compilation errors in scripts to make it into builds, and it make it into a release more than one time when it wasn't immediately apparent and only appread with certain combination of options.

This can be improved a lot by testing components in isolation, but this requires that the code is written in appropriate way. If you write a calculator and start with add(), sub(), mul() etc. functions, and use them in a GUI form, you can test the logic on its own automatically, e.g. does add(2,3) equal 5, and does mul(9, 0) equal 0, does sqrt(-3) raise an exception and so on. But if you embed that logic in button event handlers, you are out of luck. That's how VyOS is for the most part, even if you mock the config subsystem so that config read functions return the test data, you need to redo the script so that every function does exactly one thing testable in isolation.

This is one of the reasons 1.2.0 is taking so long, without tests, or even ability to add them, we don't even know what's not working until we stumble upon it in manual testing.

Implementation problem #2: command definitions

This is a design problem too, but it's not so fundamental. Now we use custom syntax for command definitions (aka "templates"), which have tags such as help: or type: and embedded shell scripts. There are multiple problem with it. For example, it's not so easy to automatically generate at least a command reference from them, and you need a complete live system for that, since part of the templates is autogenerated. The other issue is that right now some components feature very extensive use of embedded shell, and some things are implemented in embedded shell scripts inside templates entirely, which makes testing even harder than it already is.

We could talk about upgrade mechanism too, but I guess I'll leave it for another post. Right now I'd like to talk about proposed solutions, and what's being done already, and what kind of work you can join.

Sentrium Cyber Monday promotion: get some free hands on assistance hours with every VyOS commercial support plan

Hi everyone,

This summer,  Sentrium, an IT consulting company ran by some of long time VyOS developers and users, launched commercial support for VyOS, something that we think will make VyOS more attractive for enterprise and service provider networks and give the VyOS project some funding and ensure its sustainable development and growth.

We also would like to say huge thanks to all of our existing customers for the commitment to the VyOS project and for your trust in us! You are contributing to the VyOS development and we are really happy to see such interest from companies around the world. Thank you!

Based on these few months of experience with VyOS support, we decided to adjust our support plans based on customer feedback You can view the new plans at this web page: https://sentrium.io/vyos-commercial-support/

We tried to cover common use cases which we observing:

  • Small companies and nonprofit organizations with budget constraints
  • Companies with internal expertise which just need a “formal” support contract due to the business requirements.
  • Businesses who need phone support and hands-on assistance with strict SLAs for mission critical applications.

So, at a glance, how our support from vendors:

  • All support plans cover all routers in the company, no matter if this production instance, or you just spin up another VyOS in your lab. You don’t need get to pay more when your network grows, or wait to get support contracts for new routers.

  • Initial configuration and environment review is included in each plan, and this allows us to suggest config improvements and also shorten resolution times (we can go directly to issue, bypassing topology discovery)

  • We only employ people with real networking knowledge and do not do anything to artificially make response time appear shorter than it is (e.g. by sending a form reply). We do not offer guaranteed short response time for all plans, but you can be sure the first reply you get includes some useful information about your problem. For email support, relatively long response time also allows us to do some research about your problem, try a solution in a lab, or consult a maintainer or a contributor, if we cannot offer a useful response offhand.

From Cyber Monday and until end of the year we’ll be running a promotion: everyone who buys a support subscription also gets free hand on assistance hours. The basic plan comes with one free hour, the standard plan comes with four, and the production support plan comes with eight.

We know there are still a number of people running Vyatta Core and who may want to switch to VyOS. Switching to a new system is always a concern, so if you are using Vyatta Core, you can use the hands on assistance hours for migration to VyOS. But with all improvements and, most importantly, security fixes that have been added, and will be added by 1.1.8 and then 1.2.0, upgrade is very important.

If you are running Vyatta Core 6.5 or 6.6, it can be upgraded to VyOS as if it was a new Vyatta version, with a few minor caveats. If your version is 6.4 or earlier, there are some features that may require manual config rewrite, which shouldn’t take too long unless your config is particularly large. We have successfully upgraded versions as old as 6.0 to VyOS without any problems. If, no matter how unlikely, you are still using VC 5.0, things get interesting, since it didn’t have the image upgrade yet, but even that is doable and has been done before (though if you can, we suggest that you reinstall and we can help with migrating the config).

Some VyOS servers are down

Hi everyone,

Due to a hardware issue our hoster is having, some our servers are down. What's down is the wiki.vyos.net/forum.vyos.net host and the packages.vyos.net hosts.

The new website, vyos.io, is up, it's hosted in a different place. If you need to download VyOS images, you can use any of the mirrors, e.g. 0.de.mirrors.vyos.net.

VyOS Project October Update

Hi everyone,

Last two weeks was pretty busy, and we attended multiple events, so I decided to share what happens with all of you.

On behalf of Sentrium,  I (Yuriy Andamasov, that is) with other Sentrium employees visited VMworld, OpenStack Summit, and a meeting held by an ISP consortium known as Guifi  in Barcelona this October.

Among other things, one of my goals was to find out if there’s any interest in VyOS in the enterprise and cloud markets. I’m excited to see that VyOS seems to be the best deal for many use cases. There are so many cool applications and technologies around, many of them require networking, other provide networking functions and/or orchestration functions for large deployments.

However, many people are interested in integration with management tools and other software, so we came to the conclusion that it’s a good idea to provide integration with VMware NSX and OpenStack Neutron at least, as they are quite popular. Of course, we need your help with it. If you are interested, or even already working on it, feel free to contact us, come to Phabricator or the IRC channel, or contact me personally by email (yuriy@sentrium.io) or in the RocketChat (I’m syncer there).

I’ve also had a great conversation with Netronome people regarding their SmartNICs and VyOS and how we can collaborate on open source solutions which get best from software and hardware worlds and hopefully this collaboration can bring us some truly impressive results.

As we realized and as strange it is to us, many people just never heard about VyOS and only some of them heard something about Vyatta Core, so we are trying to raise awareness of it, and that was one of the reasons I attended those events. We made some leaflets with information about the project, and I handed them out at all those events I attended. We’ll make a PDF and Scribus source files available soon so that you can print them for your own use if you are going to attend an event and promote VyOS there.

We also gave a talk about VyOS at the  Guifi meeting. Guifi members are building a big community network here in Spain, and we think that VyOS is a perfect match to this effort. Later this year we are going to give a more detailed technical talk, discuss the challenges they are having and see how VyOS can help them make their network better.

We hope this effort will help VyOS to acquire a wider community interested in open source networking. Remember that even if you are not a programmer, there are still lots of things you can help with, such as testing, feature design discussion, documentation, howtos, and just helping other community members on our forum or chat.

In other news, there will be a few more designs was added and another few will be available soon in our merchandise store

Now when this streak of community building events is over, we all can get back to working on the code. 

Daniil(dmbaturin) is working on the 1.1.8 maintenance release and messing around with the nextgen VyOS prototype, and 1.2.0 beta still needs your testing, so please, pick up it here and give it a try, but remember that it still not production ready.



Informal meeting of VyOS users in Barcelona

If you are in Barcelona this October, we have some news for you.

There is a plan to organize an informal meeting of VyOS users and all interested in open source routing, roughly at the same time as VMworld and the OpenStack Summit. 

The current idea is October the 17th at 19-00 for VMWorld week

and on 24th 19-00 on OpenStack Summit week, though it's not set in stone yet 

Who's in charge

Yuriy Andamasov (syncer on #vyos) is the initiator and the primary contact.
On behalf of Sentrium Yuriy and Santiago will take care about administrative questions.

Note: None of the maintainers planning to attend, so if you want to discuss specific issues with the code and contributing to it, it's likely not the best place to ask. 
Yuriy is one of VyOS infrastructure admins and the community manager, so feel free to discuss these issues with him. If you want to contribute to VyOS but don't know where to start, feel free to ask him and he can suggest some tasks to work on too.

What's planned

A walk around Barcelona places and have informal talks about VyOS, virtualization, networking and other IT and non-IT subjects. 
There is no plan to have a formal meeting with talks prepared in advance or anything else at this time, though if there is an interest in it, perhaps it's possible to arrange it.
The idea is for VyOS users and everyone interested in VyOS and open source routing in general, to come together and socialize over sightseeing/food/etc. in great city Barcelona

Want to participate?

Go to https://www.meetup.com/VyOS-Default-Route-Barcelona/events/234567678/, register, and leave your comments. If you want to sponsor laptop stickers, t-shirts, or anything else for the attendees, it will be appreciated. In exchange, we can put your company logo on the VyOS brochure that Yuriy will be distributing at VMworld and OpenStack Summit.

VyOS virtual meeting notes - 14 September 2016

We hosted our first VyOS virtual meeting here in September and invited both developers and enthusiasts to attend. The meeting was held on September 14th at 18:00 UTC and all in all we had about 11 participants join. Yuriy Andamsov (syncer) brought this idea of a virtual meeting to fruition, thank you Yuriy!

Meeting summary:

VyOS 1.1.8 release
We discussed the general question of whether this should be a maintenance-only release or whether new features should be included. The community has readied a few new features which could easily be imported to this release. In general, past VyOS micro releases have included new features as long as they are safe, low risk changes. There was a lot of discussion about this topic mostly related to where developer time is best spent and whether making this a maintenance-only release would help justify more effort from the community to put towards v1.2. In the end we agreed that 1.1.8 will include backports of a few new features, but only where it's not a major headache or risk to do so.

Web GUI discussion
Mihail brought up the work he's been doing on a web GUI front-end for VyOS. His work can be found here:
https://github.com/mickvav/vyatta-webgui

https://github.com/mickvav/vyatta-accel-ppp

General consensus on a web GUI is that it's a nice to have, not a requirement at the moment for the project. We might look to integrate this at some point in the 1.2 future or beyond.

The move to Jessie (VyOS 1.2)

Here's where we are.  We have nightly builds for VyOS 1.2 based on Debian Jessie.  The original VyOS code base is challenging and there's no current automated testing system, so we need testers.  We agreed that one thing we need is visibility on what testing has been done so far.  If you have tested a 1.2 nightly build or would like to, please see this thread to view and get access to the testing matrix.

Jason Hendry mentioned a side project which some Mintel hackers had started on, using serverspec to automate tests of VyOS nightly builds.  On top of doing some manual testing of the nightly builds and contributing to the spreadsheet, he's going to look into getting the serverspec base pushed into CI.

Community Members Present

We had a lot of responses to the original phabricator thread.  Unfortunately not everyone could make it, and also a few people weren't able to join because we hit limits with maximum number of participants in Google Hangouts.  Next meeting we will try a different piece of technology.

  • Jason Hendry (jhendryUK)
  • Daniil Baturin (Dmbaturin)
  • Kim Hagen (UnicronNL)
  • Paul Fitzgerald
  • Michael Zimmerer (mtz4718)
  • Mihail Vasilev (mickvav)
  • Ewald van Geffen (Feedmytv)
  • Patrick van Staveren (trickv)
  • Yuriy Andamsov (syncer)
  • Bronislav Robenek (BillyTheCzech)
  • Amos Shapira
We took some meeting notes which are currently available only on Google Docs but will be centralized somewhere agreeable in the future.

Feedback & Next Meeting
If you would like to join the next meeting, please comment on Q55 in Phabricator to get yourself on the list.  Hope to see you there!

CVE-2016-5696, development meeting, and other things

TCP vulnerability and its fix

There was a vulnerability discovered in the implementation of TCP in Linux (CVE-2016-5696) that is remotely exploitable and allows an attacker to impersonate a connected user. For a hotfix on a live system, you can add sudo sysctl net.ipv4.tcp_challenge_ack_limit=1000000000 to /config/scripts/vyatta-postconfig-bootup.script. The fix will be included in the 1.1.8 release.

Development meetup

Today at 18:00 UTC we will hold a development meeting, everyone is invited to join. It was discussed in phabricator (https://phabricator.vyos.net/Q41) originally, and most people voted for Google Hangouts, so that's what we'll use. I'm not fond of it myself, but so be it.

If you want to participate, fill the form here: https://goo.gl/forms/PBarp2bPWvAncQtJ2 so that we know your email address and can send you an invite.

It will be a semi-structured format, the agenda is the following:

  • 1.1.8 maintenance release and what backports to include in it
  • 1.2.0 and how to go about verifying that it actually works
  • VyOS 2.0 (the clean rewrite) design principles
  • Strategies for attracting more contributors

Rocket Chat

Lately we've setup Rocket Chat , an open source chat platform with some interesting features, including voice support, offline logs, and more. If it works well, we can use it for future development meetings, and possibly other things as well. If you want to give it a try, visit https://chat.vyos.io

 

1.2.0 beta2

People keep asking about 1.2.0-beta2. The truth is, if we build some image, and call it 1.2.0-beta2, nothing will change really. You can always get the latest nightly build from http://dev.packages.vyos.net/iso/current/amd64/ and start testing it. Maybe when it's stable enough for at least non-critical production environments, we will call it beta2, but until then, nightly builds is really a better way to go.

VyOS shirts

A bit to our surprise, there was some interest in VyOS shirts (https://teespring.com/stores/vyos), but they also created a bit of confusion as to what it costs and where the profit goes.

The base cost of the shirt around $15/EUR 10, so the "retail price" is some $5 higher than the base cost. Any profit from it will be used for things directly related to VyOS, such as domain renewals and the like (if the shirts somehow become popular enough to pay for anything else, we'll see what else we can do with it).

We need to come up with some convenient way to make the ledger public so that everyone can see what we received what it was used for.


VyOS remote management library for Python

Someone on Facebook rightfully noted that lately there's been more work on the infrastructure than development. This is true, but that work on infrastructure was long overdue and we just had to do it some time. There is even more work on the infrastructure waiting to be done, though it's more directly related to development, like restructuring the package repos.

Anyway, it doesn't mean all development has stopped while we've been working on infrastructure. Today we released a Python library for managing VyOS routers remotely.

Before I get to the details, have a quick example of what using it is like:

import vymgmt

vyos = vymgmt.Router('192.0.2.1', 'vyos', password='vyos', port=22)

vyos.login()
vyos.configure()

vyos.set("protocols static route 203.0.113.0/25 next-hop 192.0.2.20")
vyos.delete("system options reboot-on-panic")
vyos.commit()

vyos.save()
vyos.exit()
vyos.logout()

If you want to give it a try, you can install it from PyPI ("pip install vymgmt"), it's compatible with both Python 2.7 and Python 3. You can read the API reference at http://vymgmt.readthedocs.io/en/latest/ or get the source code at https://github.com/vyos/python-vyos-mgmt .

Now to the details. This is not a true remote API, the library connects to VyOS over SSH and sends commands as if it was a user session. Surprisingly, one of the tricky parts was to find an SSH/expect library that can cope with VyOS shell environment well, and is compatible with both 2.7 and 3. All credit for this goes to our contributor who goes by Hochikong, who tried a whole bunch of them, settled with pexpect and wrote a prototype.

How the library is better than using pexpect directly, if it's a rather thin wrapper for it? First, it's definitely more convenient to just call set() or delete() or commit() than to format command strings yourself and take care of the sending and receiving lines.

Second, common error conditions are detected (through simple regex matching) and raise appropriate exceptions such as ConfigError (for set/delete failures) or CommitError for commit errors. There's also a special ConfigLocked exception (a subclass of CommitError) that is raised when commit fails due to another commit in progress, so you can recover from it by sleep() and retry. This may seem uncommon, but people who use VRRP transition scripts and the like on VyOS already reported that they ran into it.

Third, the library is aware of the state machine of VyOS sessions, and will not let you accidentally do wrong things such as trying to enter set/delete commands before entering the conf mode. By default it also doesn't let you exit configure sessions if there are uncommited or unsaved changes, though you can override it. If a timeout occursm an exception will be raised too (while pexpect returns False in this case).

Right now it only supports set, delete, and commit, of all high level methods. This should be enough for the start, but if you want something else, there are generic methods for running op and conf mode commands (run_op_mode_command() and run_conf_mode_command() respectively). We are not sure what people want most, so what we implement depends on your requests ans suggestions (and pull requests of course!). Other things that are planned but that aren't there yet are SSH public key auth and top level words other than set and delete (rename, copy etc.). We are not sure if commit-confirm is really friendly to programmatic access, but if you have any ideas how to handle it, share with us.

On an unrelated note, syncer and his graphics designer friend made a design for VyOS t-shirts. If anyone buys that stuff, the funds will be used for the project needs. The base cost is around 20 eur, but you can get them with 15% discount by using VYOSMGTLIB promo code: https://teespring.com/stores/vyos?source=blog&pr=VYOSMGTLIB

The new website is now live

Hi everyone,

The new website is now live. There are still some rough edges (typos, odd links, odd wording etc.), if you find anything like this, let us know. But overall it does what it's supposed to do, tells newcomers what VyOS is and provides quick links to downloads and other resources to existing users.

Here's an explanation of what happened exactly. Before that, vyos.net and vyos.org were pointing at the wiki host, and the wiki main page used to serve as our primary website. It means there are quite a few links like http://vyos.net/wiki/Something on the net, and simply pointing that domain at the host with the new website would create quite some link rot.

To avoid this, we've setup two conditional redirects, one redirects vyos.net/wiki/Something to wiki.vyos.net/wiki/Something, another one redirects everything else to vyos.io. So far it seems to work properly, but if you notice any issues with it, such as links that are not redirected correctly, let us know.

P.S. we added some merch(not much but we will add more soon) to our store, please check out https://teespring.com/stores/vyos