VyOS 2.0 development digest #6: new beginner-friendly tasks, design questions, and the details of the config tree

The tasks

Both tasks from the previous post have been taken up and implemented by Phil Summers (thanks, Phil!). New tasks await.

First task was very simple: the Reference_tree module needs functions for checking facts about nodes, analogous to is_multi. For config output, and for high level set/delete/commit operations we need easy ways to know if the node is tag or leaf, or valueless, what component is responsible for it etc. It can be done mostly by analogy with is_multi function and its relatives, so it's friendly to complete beginners. But Phil Summers implemented it before I could make the post (thanks again, Phil!).

Second task is a little bit more involved but still simple enough for anyone who started learning ML not long ago. It's about loading interface definitions from a directory. In VyOS, we may have a bunch of files in /usr/share/vyos/interfaces such as firewall.xml, system.xml, ospf.xml, and so on, and we need to load them into the reference tree that is used for path validation, completion etc.

Design questions

To give you some context, I'll remind you that the vyconf shell will not be bash-based, due to having to fork and modify bash (or any other UNIX shell) to get completion from the first word to begin with, and for variety of other reasons. So, first question: do you think we should use the vyconf shell where you can enter VyOS configuration commands as login shell, or we should go for JunOS-like approach when you login to a UNIX shell and then issue a command to enter the configuration shell? You can cast your vote here: https://phabricator.vyos.net/V2 

Second question is more open-ended: we are going to printing the config as the normal VyOS config syntax, and as set commands, but what else should we support? Some considerations: since "show" will be a part of the config API, it can be used by e.g. web GUI to display the config. This means config output of XML or JSON can be a useful thing. But, which one, or perhaps both? And also we need to decide what the XML and/or JSON shouid look like, since we can go for a generic schema that keeps node names in attributes, or we can use custom tags such as <interfaces> (but then every component should provide a schema).

Now, to the "long-awaited" details of the config tree...

The tree

As I already said, VyOS config is essentially a multi-way tree: nodes have children and data associated with them. For instances, node "system" has children named "host-name", "name-server", and so on, and node "host-name" may have value "vyos" associated with it. However, the data is not limited to value alone: nodes may have comments, and if we implement long wished for activate/deactivate, it will also be a piece of data associated with the node, internally.

Config tree nodes have this kind of data attached to them:

type config_node_data = {
    values: string list;
    comment: string option; (* set by the "comment" command *)
    inactive: bool; (* set by "deactivate" command *)
    ephemeral: bool; (* set by scripts that create temporary nodes *)
}

Reference tree nodes have this kind of data:

type ref_node_data = {
    node_type: node_type; (* tag, leaf, or "normal" *)
    constraints: (Value_checker.value_constraint list); (* used for value validation *)
    help: string; (* displayed in tab completion *)
    value_help: (string * string) list; (* value format help in tab completion *)
    constraint_error_message: string; (* displayed if the value doesn't match constraints *)
    multi: bool; (* indicates that node can have more than one value *)
    valueless: bool; (* indicates that node can't have values (such as "disable") *)
    owner: string option; (* which component is called if node is changed in proposed config *)
    keep_order: bool; (* whether config output is allowed to auto-sort nodes or not *)
    hidden: bool; (* whether the node will show up in completion *)
    secret: bool; (* whether the value is sensitive data and should be obscured in output *)
}

Apart from the config tree that represents the running config and proposed configs from sessions, we also need a way to store information about available commands (really, allowed node names in the config tree) to validate paths (as in, "interfaces ethernet eth0" is valid while "interfaces foo bar0" is not), get help strings, get information needed to validate values and so on. The key observation here is that if we take a fully populated config tree (where every possible node is created) and attach value validation data instead of values to leaf nodes, we can validate config paths simply* by checking if they exist in that tree, and validate values by retrieving validation data in the same way as we retrieve values from the config and doing something with that data. We'll call that a reference tree, because we use it for the reference when we need to check what's allowed.

Config tree:
interfaces: 
  ethernet: 
    eth0: 
      address: data(values=[192.0.2.1/24, 192.0.2.2/24])

Reference tree:
interfaces: 
  ethernet: data(type=tag)
    address: data(multi=true,value_constraint=ipv4|ipv6)

Ok, not quite that simple. Tag nodes (nodes whose children can have variable names, such as "ethernet" there) ruin the pretty picture: in the reference tree that path is "interfaces ethernet address", while in a config tree this path would be invalid. But, you get the idea.

At the top of every tree, there is a root node, and every other node is its child. These are the primitive operations on the tree nodes:

  • List its children
  • Update the data associated with it
  • Retrieve the data associated with it
  • Insert a child

There are some practical considerations that come into play, however. First, a lot of time in VyOS we don't insert nodes directly, we do something like "set interfaces tunnel tun0 parameters ip key 42", where of all nodes involved perhaps only "interfaces" already exists. This needs some workaround to make such inserts convenient: I went with a function that takes default data value and creates the missing nodes on the way, with default data attached to them. This approach works very well for config tree where only leaf nodes have any meaningful data, and for building the reference tree from interface definitions we can use the direct insertions sequentially.

But, how does this translate to high level set/delete operations? There are some tricky points.

Suppose we have this command: "set interfaces ethernet eth0 address 192.0.2.1/24". To add it to the config tree, we need to create a node at path "interfaces ethernet eth0 address" and put "192.0.2.1" into the "values" field of its data. But wait, how do we know which of those is the value? We cannot know without consulting the reference tree, so the cooperation between config and reference tree functions needs to be very close. For this reason, the function for validating paths doesn't simply return true or false, instead it returns the path and value parts, or raises an exception if the command is invalid.

Another tricky thing about set is where exactly you put the new child in the list of children. Most of the time order doesn't matter, but in route-maps, firewall rulesets, and other things that are read top down, changing the order changes the semantics! VyOS 1.x avoids this issue by using numeric names for such nodes, and doing numerical sort in the config scripts, but the downside is that node names have a hidden meaning (sorting in the output is also numeric, but it doesn't guarantee that config script really treats it this way, since it's just a convention), and it also makes reordering rules quite annoying since you have to rename them (try to insert a rule between say rules 5 and 6, you get to rename both, and probably some nodes before and after that too).
For this reason, while we are still to decide what the syntax for it will look like in the CLI, I already implemented the foundation for it in the tree module. The insertion function takes a "position" argument that can make it insert at the beginning, at the end, before child with certain name, or after child with certain name.

To learn more about the trees, you can read the vytree.ml, config_tree.ml, and reference_tree.ml modules. If you have any questions about them, feel free to ask.

What's next?

By the next post, I hope I'll have a draft of the protobuf schema and its implementation, so that's what I'll write about. In 1.2.0, we are working on packaging the Python library, so there may be some news about using Python in 1.x development soon too.

6 responses
"I'll remind you that the vyconf shell will not be bash-based, due to having to fork and modify bash (or any other UNIX shell) to get completion from the first word to begin with" You could create commands to manipulate the config and add them to the path. Eg: ./show, ./set, ./commit... Then, you can use bash autocompletion (https://github.com/scop/bash-completion). These autocompletions could be autogenerated from the syntax config tree each time it is changed. So you could have a standard bash...
kglkgvkvsd544: Been there, done that. ;) First, completion from the first word, something any network guy would expect, requires modifying bash. Second, there are minor annoyances here and there that may not be a big deal, but if we were to fix them, it would require modifying bash too, such as lexicographic sort as the only kind of sort, that puts eth10 before eth9. Third , regarding autogenerated completion, those completion scripts aren't quite declarative, they use logic even for stuff with fixed set of options like SVN, to account for special cases. And there are things that always require logic, such as handling of tag and leaf nodes, since they are not predictable, you have to stop in the middle and get a list of existing nodes. Check out https://github.com/scop/bash-completion/blob/ma... for instanse and imagine what it would be like to autogenerate it. Last, and arguably most important: the shell part of the system ends up absolutely undebuggable. Remember the bug when pasting too many set commands into the shell starts giving "command not found" errors? We have no idea where it comes from, and we don't have a feasible way to know either. After years of experience with it, I'm convinced that bash doesn't make a good domain-specific shell, and, well, it was never meant for that so who can blame it.
4 visitors upvoted this post.