VyOS 2.0 development digest #9: socket communication functionality, complete parser, and open tasks

Socket communication

A long-awaited (by me, anyway ;) milestone: VyConf is now capable of communicating with clients. This allows us to write a simple non-interactive client. Right now the only supported operaion is "status" (a keepalive of sorts), but the list will be growing.

I guess I should talk about the client before going into technical details of the protocol. The client will be way easier to use than what we have now. Two main problems with CLI tools from VyOS 1.x is that my_cli_bin (the command used by set/delete operations) requires a lot of environment setup, and that cli-shell-api is limited in scope. Part of the reason for this is that my_cli_bin is used in the interactive shell. Since the interactive shell of VyConf will be a standalone program rather than a bash completion hack, we are free to make the non-interactive client more idiomatic as a shell command, closer in user experience to git or s3cmd.

This is what it will look like:

SESSION=$(vycli setupSession)
vycli --session=$SESSION configure
vycli --session=$SESSION set "system host-name vyos"
vycli --session=$SESSION delete "system name-server 192.0.2.1"
vycli --session=$SESSION commit
vycli --session=$SESSION exists "service dhcp-server"
vycli --session=$SESSION commit returnValue "system host-name"
vycli --session=$SESSION --format=json show "interfaces ethernet"

As you can see, first, the top level words are subcommands, much like "git branch". Since the set of top level words is fixed anyway, this doesn't create new limitations. Second, the same client can execute both high level set/delete/commit operations and low level exists/returnValue/etc. methods. Third, the only thing it needs to operate is a session token (I'm thinking that unless it's passed in --session option, vycli should try to get it from an environment variable, but we'll see, let me know what you think about this issue). This way contributors will get an easy way to test the code even before interactive shell is complete; and when VyOS 2.0 is usable, shell scripts and people fond of working from bash rather than the domain-specific shell will have access to all system functions, without worrying about intricate environment variable setup.

The protocol

As I already said in the previous post, VyConf uses Protobuf for serialized messages. Protobuf doesn't define any framing, however, so we have to come up with something. Most popular options are delimiters and length headers. The issue with delimiters is that you have to make sure they do not appear in user input, or you risk losing a part of the message. Some programs choose to escape delimiters, other rely on unusual sequences, e.g. the backend of OPNSense uses three null bytes for it. Since Protobuf is a binary protocol, no sequence is unusual enough, so length headers look like the best option. VyConf uses 4 byte headers in network order, that are followed by a Protobuf message. This is easy enough to implement in any language, so it shouldn't be a problem when writing bindings for other languages.

The code

There is a single client library that can be used by all of the non-interactive client and the interactive shell. It will also serve as the OCaml bindings package for VyConf (Python and other languages wil need their own bindings, but with Protobuf, most of it can be autogenerated).

Parser improvements

Inactive and ephemeral nodes

The curly config parser is now complete. It supports the inactive and ephemeral properties. This is what a config with those will look like:

protocols {
  static {
    /* Inserted by a fail2ban-like script */
    #EPHEMERAL route 192.0.2.78/32 {
      blackhole;
    }
    /* DIsabled by admin */
    #INACTIVE route 203.0.113.128/25 {
      next-hop 203.0.113.1;
    }
  }
}

While I'm not sure if there are valid use cases for it, nodes can be inactive and ephemeral at the same time. Deactivating an ephemeral node that was created by scritp perhaps? Anyway, since both are a part of the config format that the "show" command will produce, we get to support both in the parser too.

Multi nodes

By multi nodes I mean nodes that may have more than one value, such as "address" in interfaces. As you remember, I suggested and implemented a new syntax for such nodes:

interfaces {
  ethernet eth0 {
    address [
      192.0.2.1/24;
      192.0.2.2/24;
    ];
  }
}

However, the parser now supports the original syntax too, that is:.

interfaces {
  ethernet eth0 {
    address 192.0.2.1/24;
    address 192.0.2.2/24;
  }
}

I didn't intend to support it originally, but it was another edge case that prompted me to add it. For config read operations to work correctly, every path in the tree must be unique. The high level Config_tree.set function maintains this invariant, but the parser gets to use lower level primitives that do not, so if a user creates a config with duplicate nodes, e.g. by careless pasting, the config tree that the parser returns will have them too, so we get to detect such situations and do something about it. Configs with duplicate tag nodes (e.g. "ethernet eth0 { ... } ethernet eth0 { ... }") are rejected as incorrect since there is no way to recover from this. Multiple non-leaf nodes with distinct children (e.g. "system { host-name vyos; } system { name-server 192.0.2.1; }") can be merged cleanly, so I've added some code to merge them by moving children of subsequent nodes under the first on and removing the extra nodes afterwards. However, since in the raw config there is no real distinction between leaf and non-leaf nodes, so in case of leaf nodes that code would simply remove all but the first. I've extended it to also move values into the first node, which equates support for the old syntax, except node comments and inactive/ephemeral properties will be inherited from the first node. Then again, this is how the parser in VyOS 1.x behaves, so nothing is lost.

While the show command in VyOS 2.0 will always use the new syntax with curly brackets, the parser will not break the principle of least astonishment for people used to the old one. Also, if we decide to write a migration utility for converting 1.x configs to 2.0, we'll be able to reuse the parser, after adding semicolons to the old config with a simple regulat expression perhaps.

Misc

Node names and unquoted values now can contain any characters that are not reserved, that is, anything but whitespace, curly braces, square brackets, and semicolons.

What's next?

Next I'm going to work on adding low level config operations (exists/returnValue/...) and set commands so that we can do some real life tests.

There's a bunch of open tasks if you want to join the development:

T254 is about preventing nodes with reserved characters in their names early in the process, at the "set" time. There's a rather nasty bug in VyOS 1.1.7 related to this: you can pass a quoted node name with spaces to set and if there is no validation rule attached to the node, as it's with "vpn l2tp remote-access authentication local-users", the node will be created. It will fail to parse correctly after you save and reload the config. We'll fix it in 1.2.0 of course, but we also need to prevent it from ever appearing in 2.0 too.

T255 is about adding the curly config renderer. While we can use the JSON serializer for testing right now, the usual format is also just easier on the eyes, and it's a relatively simple task too.

VyOS 2.0 development digest #8: vote for or against the new tag node syntax, and the protobuf schema

Tag node syntax

The change in tag node format I introduced in the previous post turned out quite polarizing and started quite some discussion in the comments. I created a poll in phabricator for it: https://phabricator.vyos.net/V3 , please cast your vote there.

If you missed the post, or found the explanation confusing, here's what it's all about. Right now in config files we format tag nodes (i.e. nodes that can have children without predefined names, such as interfaces and firewall rules) differently from other nodes:

/* normal node */
interfaces {
  /* tag node */
  ethernet eth0 {
    address 192.0.2.1/24
  }
  /* tag node */
  ethernet eth1 {
    address 203.0.113.1/24
  }
}

It looks nice, but complicates the parser. What I proposed and implemented in the initial parser draft is to not use any custom formatting for tag nodes:

/* normal node */
interfaces {
  /* actually a tag node, but rendering is now the same as for normal */
  ethernet {
    eth0 {
      address 192.0.2.1/24;
    }
    eth1 {
     address 203.0.113.1/24
    }
  }
}

This makes the parser noticeable simpler, but makes the syntax more verbose and adds more newlines.

If more people vote against this change than for it, I'll take time to implement it in the parser.

Note: This change only affects the config syntax, and has no effect on the command syntax. The command for the example above would still be "set interfaces ethernet eth0 address 192.0.2.1/24", in user input and in the output of "show configuration commands". Tag nodes will also be usable as edit levels regardless of the config file syntax, as in "edit interfaces tunnel; copy tun0 to tun1".

Protobuf schema

Today I wrote an initial draft of the protobuf schema that VyConf daemon will use for communication with clients (shell, CLI tool, and HTTP bridge). You can find it here: https://github.com/vyos/vyconf/blob/master/data/vyconf.proto

Right now it defines the following operations:

VyOS 2.0 development digest #7: Python coding guidelines for config scripts in 1.2.0, and config parser for VyConf

Python coding guidelines for 1.2.0

In some previous post I was talking about the Python wrapper for the config reading library. However, simply switching to a language that is not Perl will not automatically make that code easy to move to 2.0 when the backend is ready, neither it will automatically improve the design and architecture. It will also improve the code in general, and help keeping it readable and maintainable.

You can find the document here: http://wiki.vyos.net/wiki/Python_config_script_policy

In short:

Logic for config validation, generating configs, and changing system settings/restarting services must be completely separated
For any configs that allow nesting (dhcpd.conf, ipsec.conf etc.) template processor must be used (as opposed to string concatenation)
Functions should not randomly output anything to stdout/stderr
Code must be unit-testable

Config parser for VyConf/VyOS 2.0

Today I pushed initial implementation of the new config lexer and parser. It already supports nodes and node comments, but doesn't support node metadata (that will be used to mark inactive and ephemeral nodes).

You can read the code (https://github.com/vyos/vyconf/blob/master/src/curly_lexer.mll and https://github.com/vyos/vyconf/blob/master/src/curly_parser.mly) and play with it by loading the .cma's into REPL. Next step is to add config renderer. Once the protobuf schema is ready we can wrap it all into a daemon and finally have something to really play with, rather than just run the unit tests.

Informally, here's what I changed in the config syntax.

Old config

interfaces {
  /* WAN interface */
  ethernet eth0 {
    address 192.0.2.1/24
    address 192.0.2.2/24
    duplex auto
  }
}

New config

interfaces {
  ethernet {
    /* WAN interface */
    eth0 {
      address [
        192.0.2.1/24;
        192.0.2.2/24;
      ];
      duplex auto;
      // This kind of comment is ignored by the parser
    }
  }
}

As you can see, the changes are:

Leaf nodes are now terminated by semicolons rather than newlines.
There is syntax for comments that are ignored by the parser
Multi nodes have the array of values in square brackets.
Tag nodes do not receive any special formatting.

I suppose the last change may be controversial, because it can lead to somewhat odd-looking constructs like:

interfaces {
  ethernet {
    eth0 {
      vif {
        21 {
          address 192.0.2.1/24
        }
      }
    }
  }
}

If you are really going to miss the old approach to tag nodes (that is "ethernet eth0 {" as opposed to "ethernet { eth0 { ...", let me know and I guess I can come up with something. The main difficulty is that, while this never occurs in configs VyOS config save produces, different tag nodes, e.g. "interfaces ethernet" and "interfaces tunnel" can be intermingled, so for parsing we have to track which ones were already created, and this will make the parser code a lot longer.

I'm pretty convinced that "address 192.0.2.1/24; address 192.0.2.2/24" is simply visual clutter and JunOS-like square bracket syntax will make it cleaner. It also solves the aforementioned problem with interleaved nodes tracking for leaf nodes.

Let me know what you think about the syntax.

VyOS 2.0 development digest #6: new beginner-friendly tasks, design questions, and the details of the config tree

The tasks

Both tasks from the previous post have been taken up and implemented by Phil Summers (thanks, Phil!). New tasks await.

First task was very simple: the Reference_tree module needs functions for checking facts about nodes, analogous to is_multi. For config output, and for high level set/delete/commit operations we need easy ways to know if the node is tag or leaf, or valueless, what component is responsible for it etc. It can be done mostly by analogy with is_multi function and its relatives, so it's friendly to complete beginners. But Phil Summers implemented it before I could make the post (thanks again, Phil!).

Second task is a little bit more involved but still simple enough for anyone who started learning ML not long ago. It's about loading interface definitions from a directory. In VyOS, we may have a bunch of files in /usr/share/vyos/interfaces such as firewall.xml, system.xml, ospf.xml, and so on, and we need to load them into the reference tree that is used for path validation, completion etc.

Design questions

To give you some context, I'll remind you that the vyconf shell will not be bash-based, due to having to fork and modify bash (or any other UNIX shell) to get completion from the first word to begin with, and for variety of other reasons. So, first question: do you think we should use the vyconf shell where you can enter VyOS configuration commands as login shell, or we should go for JunOS-like approach when you login to a UNIX shell and then issue a command to enter the configuration shell? You can cast your vote here: https://phabricator.vyos.net/V2

Second question is more open-ended: we are going to printing the config as the normal VyOS config syntax, and as set commands, but what else should we support? Some considerations: since "show" will be a part of the config API, it can be used by e.g. web GUI to display the config. This means config output of XML or JSON can be a useful thing. But, which one, or perhaps both? And also we need to decide what the XML and/or JSON shouid look like, since we can go for a generic schema that keeps node names in attributes, or we can use custom tags such as <interfaces> (but then every component should provide a schema).

Now, to the "long-awaited" details of the config tree...

VyOS 2.0 development digest #4: simple tasks for beginners, and the reasons to learn (and use) OCaml

Look, I lied again. This post is still not about the config and reference tree internals. People in the chat and elsewhere started showing some interest in learning OCaml and contributing to VyConf, and Hiroyuki Satou even already made a small pull request (it's regarding build docs rather than the code itself, but that's a good start and will help people with setting up the environment), so I decided to make a post to explain some details and address common concerns.

The easy tasks

There are a couple of tasks that can be done completely by analogy, so they are good for getting familiar with the code and making sure your build environment actually works.

The first one is about new properties of config tree nodes, "inactive" and "ephemeral", that will be used for JunOS-like activate/deactivate functionality, and for nodes that are meant to be temporary and won't make it to the saved config respectively.

The other one is about adding "hidden" and "secret" properties to the command definition schema and the reference tree, "hidden" is meant for hiding commands from completion (for feature toggles, or easter eggs ;), and "secret" is meant to hide sensitive data from unpriviliged users or for making public pastes.

Make sure you reference the task number in your commit description, as in "T225: add this and that" so that phabricator can automatically connect the commits with the task.

If you want to take them up and need any assistance, feel free to ask me in phabricator or elsewhere.

VyOS 2.0 development digest #3: questions for you, vyconf daemon config format, appliance directory structure, and external validators lookup

Ok, I changed my mind: before we jump into the abyss of datastructures and look how the config and reference trees work, I'll describe the changes I made in the last few days.

Also, I'd like to make it clear that if you don't respond to design questions I state in these posts, I, or whoever takes up the task, will just do what they think is right. ;)

I guess I'll start with the questions this time. First, in the comments to the first post, the person who goes by kglkgvkvsd544 suggested two features: commit-confirm by default, and an alternative solution to the partial commit where instead of loading the failsafe config, the system loads a previous revision instead, in the same way as commit-confirm does. I don't think any of those should be the only available option, but having them as configurable options may be a good idea. Let me know what you think about it.

Another very important question: we need to decide on the wire protocol that vyconf daemon will use for communication with its clients (the CLI tool, the interactive shell, and the HTTP bridge). I created a task for it, https://phabricator.vyos.net/T216, let me know what you think about it. I'm inclined towards protobuf myself.

Now to the recent changes.

vyconfd config

As I already said, VyConf is supposed to run as a daemon and keep the running config tree, the reference tree, and the user sessions in memory. Obviously, it needs to know where to get that data. It also needs to know where to write logs, where the PID file and socket file should be, and other things daemons are supposed to know. Here's what the vyconf config for VyOS may look like:

[appliance]
name = "VyOS"

data_dir = "/usr/share/vyos/"
program_dir = "/usr/libexec/vyos"
config_dir = "/etc/vyos"

# paths relative to config_dir
primary_config = "config.boot"
fallback_config = "config.failsafe"

[vyconf]
socket = "/var/run/vyconfd.sock"
pid_file = "/var/run/vyconfd.pid"
log_file = "/var/log/vyconfd.log"

That INI-like language is called TOML, it's pretty well specified and capable of some fancy stuff like arrays and hashes, apart from simple (string, string) pairs. The best part is that there are libraries for parsing it for many languages, and the one for OCaml is particularly nice and idiomatic (it uses lenses and option types to access a nested and possibly underdefined datastructure in a handy and typesafe way), like:

let pid_file = TomlLenses.(get conf (key "vyconf" |-- table |-- key "pid_file" |-- string)) in
match pid_file with
| Some f -> f
| None -> "/var/run/vyconfd.pid"

The config format and this example reflects some decisions. First, the directory structure if more FHS-friendly. What do we need /opt/vyos for, if FHS already has directories meant for exactly what we need: architecture-independent data (/usr/share), programs called by other programs and not directly by users (/usr/libexec), and config files (/etc)?
Second, all important parameters are configurable. The primary motivation for this is making VyConf usable for every appliance developer (most appliances will not be called VyOS for certain), but for ourselves and for every contributor to VyOS it's a reminder to avoid hardcoded paths anywhere: if changing it is just one line edit away, a hardcoded path is an especially bad idea.

Here's the complete directory structure I envision:

$data_dir/
  interfaces/ # interface definitions
  components/ # Component definitions
$program_dir/
  components/  # Scripts/programs that verify the appliance config, generate actual configs, and apply them
  migration/       # Migration scripts (convert appliance config if syntax changes)
  validators/       # Value validators
$config_dir/
  scripts/              # User scripts
  post-config.d/   # Post-config hooks, like vyatta-postconfig-bootup.script
  pre-commit.d/   # Pre-commit hooks
  post-commit.d/ # Post-commit hooks
  archive/             # Commit archive

This is not an exhaustive list of directories an appliance can have of course, it's just directories that have any significance for VyConf. I'm also wondering if we should introduce post-save hooks for those who want to do something beyond the built-in commit archive functionality.

External validators

As you remember, my idea is to get rid of the inflexible system of built-in "types" and make regex the only built-in constraint type, and use external validators for everything else.

External validators will be stored in the $program_dir/validators. Since many packages use the same types of values (IP addresses is a common example), and in VyOS 1.x we already have quite a lot of templates that reference the same validation scripts, making it a separate entity will simplify reusing them, since it's easy to see what validators exist, and you can also be sure that they behave like you expect (or if they don't, it's a bug).
Validator executable take two arguments, first argument is constraint string, and the second is the value to be validated (e.g. range "1-99" "42").The expected behaviour is to return 0 if the value is valid and a non-zero exit code otherwise. Validators should not produce any output, instead the user will get the message defined in the constraintError tag of the interface definition (this approach is more flexible since different things may want to use different messages, e.g. to clarify why exactly the value is not valid).

That's all for today. Feel free to comment and ask questions. The next post really will be about the config tree and the way set commands work, stay tuned.

VyOS 2.0 development digest #2

In the previous post we talked about the reasons for rewrite, design and implementation issues, and basic ideas. Now it's time to get to details. Today we'll mostly talk about command definitions, or rather interface definitions, since set commands is just one way to access the configuration interface.

Let's review the VyConf architecture (I included a few things in the diagram that we haven't discussed yet, ignore them for now):

At startup, VyConf will load the main config (or the fallback config, if that fails). But to know whether the config is valid, and to know what programs to call to actually configure the target applications, it needs additional data. We'll call that data "interface definitions" since it defines the configuration interface. Specifically, it defines:

What config nodes (paths) are allowed (e.g. "interfaces ethernet", or "protocols ospf")
What values are valid for that nodes (e.g. any IPv4 or IPv6 address for "system name-server")
What script/program should be called when this or that part of the config is changed

The old way

Before we get to the new way, let's review the old way, the way it's one in the current VyOS implementation. In the current VyOS, those definitions are called "templates", no one remembers why.

This is a typical template file:

vyos@vyos# cat /opt/vyatta/share/vyatta-cfg/templates/interfaces/ethernet/node.tag/speed/node.def 
type: txt
help: Link speed
default: "auto"
syntax:expression: $VAR(@) in "auto", "10", "100", "1000", "2500", "10000"; "Speed must be auto, 10, 100, 1000, 2500, or 10000"
allowed: echo auto 10 100 1000 2500 10000

commit:expression: exec "\
	/opt/vyatta/sbin/vyatta-interfaces.pl --dev=$VAR(../@) \
	--check-speed $VAR(@) $VAR(../duplex/@)"

update: if [ ! -f /tmp/speed-duplex.$VAR(../@) ]; then
	   /opt/vyatta/sbin/vyatta-interfaces.pl --dev=$VAR(../@) \
	   	--speed-duplex $VAR(@) $VAR(../duplex/@)
	   touch /tmp/speed-duplex.$VAR(../@)
	fi

val_help: auto; Auto negotiation (default)
val_help: 10; 10 Mbit/sec
val_help: 100; 100 Mbit/sec
val_help: 1000; 1 Gbit/sec
val_help: 2500; 2.5 Gbit/sec
val_help: 10000; 10 Gbit/sec

We can spot a few issues with it already. First, the set of definitions is a huge directory tree where each directory represents a config node, e.g. "interfaces ethernet address" will be "interfaces/ethernet/node.tag/address/node.def". This makes them hard to navigate, you need to read a lot of small files to get the whole picture, and you need to do a lot of file/directory hopping to edit them. Now, try mass editing, or checking for mistakes before release...

Next, they use a custom syntax, which needs custom lexer and parser, and custom documentation (in practice, the source is your best bet, though I did write something of a syntax reference). No effortless parsing, no effortless analysis or transformation either.

Next, the value checking has its peculiarities. There is concept of type (it can be txt, u32, ipv4, ipv4net, ipv6, ipv6net, and macaddr), and there is "syntax:expression:" thing which partially duplicate each other. The types are hardcoded in the config backend and cannot be added without modifying it, even though they only define validation procedure and are not used in any other way. The "syntax:expression:" can be either "foo in bar baz quux", or "pattern $regex", or "exec $externalScript".

But, the "original sin" of those files is that they allow embedded shell scripts, as you can see. Mixing data with logic is rarely a good idea, and in this case it's especially annoying because a) you cannot test such code other than on a live system b) you have to read every single file in a package to get the complete picture, since any of them may have embedded shell.

Now to the new way.

VyOS 2.0 development digest #1

I keep talking about the future VyOS 2.0 and how we all should be doing it, but I guess my biggest mistake is not being public enough, and not being structured enough.

In the early days of VyOS, I used to post development updates, which no one would read or comment upon, so I gave up on it. Now that I think of it, I shouldn't have expected much as the size of the community was very small at the time, and there were hardly many people to read it in the first place, even though it was a critical time for the project, and input from the readers would have been very valuable.

Well, this is a critical time for the project too, and we need your input and your contributions more than ever, so I need to get to fixing my mistakes and try to make it easy for everyone to see what's going on and what we need help with.

Getting a steady stream of contributions is a very important goal. While the commercial support thing we are doing may let the maintainers focus on VyOS and ensure that things like security fixes and release builds get guaranteed attention in time, without occasional contributors who add things they personally need (while maintainers may not, I think myself I'm using maybe 30% of all VyOS features any often) the project will never realize its full potential, and may go stale.

But to make the project easy to manage and easy to contribute to, we need to solve multiple hard problems. It can be hard to get oneself to do things that promise no immediate returns, but if you looks at it the other way, we have a chance to build a system of our dreams together. As of 1.1.x and 1.2.x (the jessie branch), we'll figure it out how to maintain it until we solve those problems, but that's for another post. Right now we are talking about VyOS 2.0, which gets to be a cleanroom rewrite.

Why VyOS isn't as good as it could be, and can't be improved

I considered using "Why VyOS sucks" to catch reader's attention. It's a harsh word, and it may not be all that true, given that VyOS in its current state is way ahead of many other systems that don't even have system-wide config consistency checks, or revisions, or safe upgrades, but there are multiple problems that are so fundamental that they are impossible to fix without rewriting at least a very large part of the code.

I'll state the design problems that cannot be fixed in the current system. They affect both end users and contributors, sometimes indirectly, but very seriously.

Design problem #1: partial commits

You've seen it. You commit, there's an error somewhere, and one part of the config is applied, while the other isn't. Most of the time it's just a nuisance, you fix the issue and commit again, but if you, say, change interface address and firewall rule that is supposed to allow SSH to it, you can get locked out of your system.

The worst case, however, is when commit fails at boot. While it's good to have SSH at least, debugging it can be very frustrating, when something doesn't work, and you have no idea why, until you inspect the running config and see that something is simply missing (if you run into it in VyOS 1.x, do "load /config/config.boot" and commit, this will either work or show you why it failed). It's made worse by lack of notifications about config load failure for remote users, you can only see that error on the console.

The feature that can't be implemented due to it is what goes by "commit check" in JunOS. You can't test if your configuration will apply cleanly without actually commiting it.

It's because in the scripts, the logic for consistency checking and generating real configs (and sometimes applying them too) is mixed together. Regardless of the backend issues, every script needs to be taken apart and rewritten to separate that logic. We'll talk more about it later.

Design problem #2: read and write operations disparity

Config reads and writes are implemented in completely different ways. There is no easy programmatic API for modifying the config, and it's very hard to implement because binaries that do it rely on specific environment setup. Not impossible, but very hard to do right, and to maintain afterwards.

This blocks many things: network API and thus an easy to implement GUI, modifying the config script scripts in sane ways (we do have the script-template which does the trick, kinda, but it could be a lot better).

Design problem #3: internal representation

Now we are getting to really bad stuff. The running config is represented as a directory tree in tmpfs. If you find it hard to believe, browse /opt/vyatta/config/active, e.g. /opt/vyatta/config/active/system/time-zone/node.val

Config levels are directories, and node values are in node.val files. For every config session, a copy of the active directory is made, and mounted together with the original directory in union mount through UnionFS.

There are lots of reasons why it's bad:

It relies on behaviour of UnionFS, OverlayFS or another filesystem won't do. We are at mercy of unionfs-fuse developers now, and if they stop maintaining it (and I can see why they may, OverlayFS has many advantages over it), things will get interesting for us
It requires watching file ownership and permissions. Scripts that modify the config need to run as vyattacfg group, and if you forget to sg, you end up with a system where no one but you (or root) can make any new commits, until you fix it by hand or reboot
It keeps us from implementing role-based access control, since config permissions are tied to UNIX permissions, and we'd have to map it to POSIX ACLs or SELinux and re-create those access rules at boot since the running config dir is populated by loading the config
For large configs, it creates a fair amount of system calls and context switches, which may make system run slower than it could

Design problem #3: rollback mechanism

Due to certain details (mostly handling of default values), and the way config scripts work too, rollback cannot be done without reboot. Same issue once made Vyatta developers revert activate/deactivate feature.

It makes confirmed commit a lot less useful than it should be, especially in telecom where routers cannot be rebooted at random even in maintenance windows.

Implementation problem #1: untestable logic

We already discussed it a bit. The logic for reading the config, validating it, and generating application configs is mixed in most of the scripts. It may not look like a big deal, but for the maintainers and contributors it is. It's also amplified by the fact that there is not way to create and manipulate configs separately, the only way you can test anything is to build a complete image, boot it, and painstakingly test everything by hand, or have expect-like tool emulate testing it by hand.

You never know if your changes may possibly work until you get them to a live system. This allows syntax errors in command definitions and compilation errors in scripts to make it into builds, and it make it into a release more than one time when it wasn't immediately apparent and only appread with certain combination of options.

This can be improved a lot by testing components in isolation, but this requires that the code is written in appropriate way. If you write a calculator and start with add(), sub(), mul() etc. functions, and use them in a GUI form, you can test the logic on its own automatically, e.g. does add(2,3) equal 5, and does mul(9, 0) equal 0, does sqrt(-3) raise an exception and so on. But if you embed that logic in button event handlers, you are out of luck. That's how VyOS is for the most part, even if you mock the config subsystem so that config read functions return the test data, you need to redo the script so that every function does exactly one thing testable in isolation.

This is one of the reasons 1.2.0 is taking so long, without tests, or even ability to add them, we don't even know what's not working until we stumble upon it in manual testing.

Implementation problem #2: command definitions

This is a design problem too, but it's not so fundamental. Now we use custom syntax for command definitions (aka "templates"), which have tags such as help: or type: and embedded shell scripts. There are multiple problem with it. For example, it's not so easy to automatically generate at least a command reference from them, and you need a complete live system for that, since part of the templates is autogenerated. The other issue is that right now some components feature very extensive use of embedded shell, and some things are implemented in embedded shell scripts inside templates entirely, which makes testing even harder than it already is.

We could talk about upgrade mechanism too, but I guess I'll leave it for another post. Right now I'd like to talk about proposed solutions, and what's being done already, and what kind of work you can join.

blog.vyos.net

VyOS Blog - An open source operating system for routers and firewalls

Tag vyconf

VyOS 2.0 development digest #9: socket communication functionality, complete parser, and open tasks

Socket communication

The protocol

The code

Parser improvements

Inactive and ephemeral nodes

Multi nodes

Misc

What's next?

VyOS 2.0 development digest #8: vote for or against the new tag node syntax, and the protobuf schema

Tag node syntax

Protobuf schema

VyOS 2.0 development digest #7: Python coding guidelines for config scripts in 1.2.0, and config parser for VyConf

Python coding guidelines for 1.2.0

Config parser for VyConf/VyOS 2.0

Old config

New config

VyOS 2.0 development digest #6: new beginner-friendly tasks, design questions, and the details of the config tree

The tasks

Design questions

VyOS 2.0 development digest #4: simple tasks for beginners, and the reasons to learn (and use) OCaml

The easy tasks

VyOS 2.0 development digest #3: questions for you, vyconf daemon config format, appliance directory structure, and external validators lookup

vyconfd config

External validators

VyOS 2.0 development digest #2

The old way

VyOS 2.0 development digest #1

Why VyOS isn't as good as it could be, and can't be improved

Design problem #1: partial commits

Design problem #2: read and write operations disparity

Design problem #3: internal representation

Design problem #3: rollback mechanism

Implementation problem #1: untestable logic

Implementation problem #2: command definitions