Trust configuration access

Trust is a hierarchical read-only database engine which can be used locally or remotely through REST to access hierarchies stored in plain text files or in a MongoDB database. While users and groups can be used to limit access to specific parts of the hierarchy, Trust also monitors access to sensitive data in order to provide audit capabilities. Trust is ideally used to store configuration and describe infrastructure.

You can use the following form to query the demo data and watch how Trust responds. Make sure you look at the data itself. To get some inspiration, you may also use the templates available below the form.

Type a query to see the results or click on one of the examples below
Query of an object /illustration1/example/product guest Query of a value /illustration2/example/product/price guest Query of an array /illustration3/example/products guest Getting multiple files at once /illustration4 guest Fork resolution /illustration5/demo/product guest Fork resolution of a node /illustration5 guest Retrieving object keys /illustration6/example/product/.keys guest Escaping /illustration7/example/product/.plain:.plain:.plain:.keys guest Default object inheritance /illustration8/http-server guest Object inheritance with replacement /illustration9/http-server guest Inheritance involving values /illustration10/child guest Inheritance involving arrays addition /illustration11/child/numbers guest Inheritance involving arrays merging /illustration12/child/numbers guest Access to public nodes with invalid credentials /illustration13/example/hello William:invalid password Access to restricted nodes /illustration14/example/restricted/hello Lucy:demo Access to restricted nodes without granted access /illustration14/example/restricted/hello William:demo Access to restricted nodes from guest /illustration14/example/restricted/hello guest Access through groups /illustration15/example/restricted/hello James:demo Access through groups denied /illustration15/example/restricted/secrets James:demo Reversed groups /illustration16/example/restricted/secrets James:demo Combining users and groups /illustration17/example/restricted/hello Lucy:demo Combining users and groups /illustration17/example/restricted/hello Emily:demo Accessing forbidden path containing users /_users/Lucy guest Accessing forbidden path containing groups /_groups/administrators guest Accessing a node which doesn't exist /illustration24 guest Accessing a node which has itself as its parent /illustration27/example guest Accessing a node which has a parent which is the child of the original node. /illustration29/example guest

Trust provides a caller with information stored in a JSON file, multiple JSON files or MongoDB database. By allowing JSON objects to refer to other JSON objects, Trust is a convinient way to access data while ensuring no data duplication exists. A single piece of data, such as an IP number, a price or a list of records, is stored in a single location and may be refered to by other objects in a transparent way for the caller.

Trust is also a way to secure data and audit data access. This makes it particularly valuable for storing configuration containing sensitive pieces, such as passwords.

Trust is a sort of an hierarchical database engine with read-only data access. Trust only queries the data and never changes it, which means that it is intended to be used for cases such as application configuration or infrastructure description, and not for situations where the application accessing the data will also change it.

Let's see how it works:

Terms

Object: as opposed to value, a object is a part of which is enclosed in curly braces.

Value: as opposed to object, a value is either a scalar value such as a string, a number or a boolean, or an array of values or objects.

Fork: a case where JSON file has the same name (ignoring extension) than a directory within the same parent. see Queries, Forks.

Getting started

If you're reading this documentation on GitHub, you probably know how to get the source. Otherwise, you can get it from our own SVN server by executing:

svn checkout http://source.pelicandd.com/infrastructure/trust/

Now create a directory where data will be stored, for instance /tmp/trust-data. Inside, create a file called example.json with the following contents:

{
    "hello": "Hello, World!"
}

Go to the application directory and run the local client:

python3 localtrust.py /tmp/trust-data /example/hello

You should be able to see "Hello, World!". The next section explains how to make queries go beyond the boring hello world.

If you want to use the web service, than you should be interested by app/trust_blueprint.py file. This is a Flask blueprint file which you can add to your Flask application. If you have no idea what a blueprint is, server.py contains an example. Another example is website.py which corresponds to the actual site presenting the web service.

If you want to host the web service as a standalone web application, without integrating it to an existent website, use server.py. For instance, it can be hosted with Gunicorn using the following command:

gunicorn server:flaskApp -b :80 -w 4 --log-syslog

Queries

When accessing Trust data, the underlying data source is abstracted away from the caller. The data may be in a single JSON file, multiple JSON files, MongoDB documents, or something else.

This means that, when working with underlying JSON files, Trust will use the query to determine both the concerned file and the node within the file itself.

The structure of the data being hierarchical, the query looks like a path traversing the nodes from the root down to a leaf (or a non-leaf node).

The query may lead to either an object or a value. The original JSON is parsed, which means that the resulting objects or values are not necessarily written in the same way they were originally written, but in a way a JSON serializer will write them. For instance, note the dropped trailing zero in the price in the illustrations below.

Illustration 1: query of an object

If it is a value, it is returned as is, without being enclosed in curly braces.

Illustration 2: query of a value

Arrays are considered values too, which means they don't have curly braces either. Arrays can contain values or objects.

Illustration 3: query of an array

Since the underlying data source is not important, a query may return multiple files when data is stored in JSON files. Those files are combined following the file system hieararchy, the names of directories and files being used as keys in the resulting JSON:

Illustration 4: getting multiple files at once

Forks

It may happen that either the name of the file (ignoring the extension) is the same as the name of a directory within the same parent directory. We call this a fork.

In a case of a collision between JSON contents and directory hierarchy, only the JSON contents are taken and directory hierarchy is ignored. This is done to ensure consistency. Let's see how it works on an example:

Illustration 5: fork resolution

The situation is necessarily ambiguous: should we use demo.json, product.json or both? Since demo.json is the first one appearing through the hierarchy, it looks natural to use demo.json. Merging, in this situation, is problematic, because a person editing the JSON file may be unaware of other data through the hierarchy. To avoid this source of error, the directory hierarchy is ignored.

Note that with the current data, there is no way to read the description. For instance, the query for /illustration5/demo/product/description will produce the same result as when querying for a non-existent piece of data.

Back to consistency, let's query data a few levels above:

Description doesn't appear here either; otherwise, the results will be highly inconsistent: we'll get product description, but when requesting for product specifically, the description would disappear.

When fork is encountered, a warning is issued. It is crucial to show a warning, because the behavior may lead to inexpected results. As in most systems, warnings should be avoided. In Trust, data architects are expected to avoid forks by naming directories and files in a way they don't collide with JSON data.

Query formatting

The tree is traversed using a slash (/) character. Every step in the path query contains one or more Unicode characters. Although any Unicode class is allowed, the underlying data sources may be more limitative. In order to simplify later migration of data from one data source to another, data architects are invited to consider the limitations of all the concerned data sources.

A query always starts with a slash (/) and may contain nothing more than a slash (which will query for everything .

Queries are case insensitive and lowercased before reaching the data source. In operating systems such as Linux where file names are case sensitive, all files involved in querying should have lowercase characters.

The characters . (a dot) and : (a colon) receive special treatment. For instance, /example/.keys will return an array containing the keys of /example object:

Illustration 6: retrieving object keys

while /example/.plain:.keys will return an object or a value corresponding to /example/.keys. If the .plain: part appears several times, only the first one is used.

Illustration 7: escaping

A step cannot be . or .., including when the data source is not a file system. The reason for that is that it's easier and safer to block such steps early, and should not be an important limitation for data architects (if you're really actually calling your nodes . and .., you may reconsider your naming conventions).

Note that quotes (") are valid in query paths.

The encoding used by Trust is UTF-8 for both the query and the underlying data. Trust is expected to support the full Unicode range of characters, although some underlying data sources may not be able to accept some characters due to their specific constraints.

Inheritance

Inheritance is the mecanism used by Trust to eliminate data duplication.

Inheritance consists for a node (a child) to refer to another node (a parent). The child and the parent can then be combined using different rules described below. Those rules are different whether they involve objects or values (see Terms).

The inheritance is done with the statement .special:inherit.

Inheritance involving objects

When an object inherits another one, the default behavior is to merge them. For instance, if JSON files are describing machines, another file may contain global settings used for every machine.

Illustration 8: default object inheritance

It is also possible to replace the child node by the parent node. For instance, let's imagine two JSON objects: one describes a DNS machine; another one describes another machine which needs to access DNS server.

Illustration 9: object inheritance with replacement

Inheritance involving values

Values are simply overwritten, that is child values replace values from parent when applying inheritance.

Illustration 10: inheritance involving values

Dealing with arrays is slightly different. While the default behavior is replacement, exactly as with the non-array values, other actions can be applied. Arrays may be extended by specifying either add or merge in .special:actions.

Arrays addition produces an array which contains all the elements of the parent array and all the elements of the child array. If an element exists in both the parent and the child array, it will appear twice in the resulting array.

Illustration 11: inheritance involving arrays addition

The merge mode is very different. Not only it takes only once an element which appears in both the child and the parent array, but it also removes duplicate elements within the child and within the parent. It is like using an addition and then selecting unique elements only. Moreover, the results are sorted.

Illustration 12: inheritance involving arrays merging

Prevention of circular inheritance

The circular inheritance is prevented by disallowing to inherit from an element which is already in the inheritance chain. This means that it applies both to the situation where the node inherits from itself (or nodes which contain itself) and the one where two nodes inherit from each other.

Illustration 27: circular inheritance on a single node

Illustration 29: circular inheritance involving multiple nodes

Trust comes in two forms: a script and an API.

Script

A script is a Python 3 application which can be used locally to access locally data stored in files, on NFS or in MongoDB.

Synopsis

python3 trust.py [-h] [--optional] [--source source] [--response-mode json|complete|text] [--username username [--password password]] query

Arguments

API

(This section is incomplete. The API is not done yet.)

Warnings and errors

When calling the script, warnings and errors are logged to syslog. If the caller specifies complete response mode (see response modes above), the warnings and errors are also enclosed within the response.

Security

Some pieces of data can be restricted to some users or groups of users.

Trust has a default authentication provider, but other providers can be developed and injected in Trust when called through Python. In order to work, a credentials provider should be a Python class which has the following methods:

def verify(username, password)

def ingroup(username, groupname)

When the default provider is used, the users' data exists within the /_users/ node. The node contains a dictionary where keys are user names and values are objects containing information about the users, such as a hash. For instance, the user hello with a password world belonging to groups administrators and data managers will be stored this way:

{
    "hello": {
        "hash": "$pbkdf2-sha256$100000$k1IqZUwphbA2RgghxPg/5w$iqYsBdtwBKxAI2p/HAOvFuKLfakQDhwFqzszP3IgD/w",
        "member-of": ["administrators", "data managers"]
    }
}

The hash can be generated by genpassword.py in extras directory. The tool has three levels of security, set through --level option:

The longer is the check, the harder is for an attacker to crack the password if the attacker achieves to get the hashes.

The groups are stored in /_groups/ node. The node contains a dictionary where keys are group names and values are objects containing information about the groups, such as the groups this group belongs to. Example:

{
    "administrators": { },
    "data managers": {
        "member-of": ["backup operators"]
    },
    "backup operators": { },
}

Let's use a few illustrations for which we use four users and four groups. Each user has the same password demo. Here's the actual contents of /_users.json file (hashes being trunkated for better readability):

{
    "Lucy": {
        "hash": "$pbkdf2-sha256$10$7l0LYUxpzbn3H...",
        "member-of": ["users"]
    },
    "Emily": {
        "hash": "$pbkdf2-sha256$10$7l0LYUxpzbn3H...",
        "member-of": ["administrators"]
    },
    "William": {
        "hash": "$pbkdf2-sha256$10$7l0LYUxpzbn3H...",
        "member-of": ["backup operators"]
    },
    "James": {
        "hash": "$pbkdf2-sha256$10$7l0LYUxpzbn3H...",
        "member-of": ["users"]
    }
}

Here are the groups. Notice the circular link between backup operators and administrators: Trust doesn't complain about it, so a user who belongs to backup operators group or the one who belongs to administrators group will belong to all three groups.

{
    "users": { },
    "backup operators": {
        "member-of": ["users", "administrators"]
    },
    "administrators": {
        "member-of": ["backup operators"]
    }
}

Credentials are checked only when private data is accessed. This is important, since the sole fact of providing credentials doesn't mean that they will be checked. In the following example, credentials are invalid, and still, no error is returned because credentials are not checked:

The checks are postponed for a reason. Checks are slow and have an important impact on the CPU, which means that in order to achieve better performance, the caller should know whether the accessed information is public or contains restricted nodes. However, callers may not necessarily know that information is restricted or public, and even if they know it, providing credentials in some cases but not in others can make the scripts unecessarily difficult.

Let's see how permissions work. We have a JSON file containing a node restricted to a specific user:

We can get the piece of information when the correct credentials of this specific user are provided:

On the other hand, a different user cannot access the data:

The response is identical when a guest is trying to access the data:

Groups work in the same way. Let's consider the following file:

The whole contents of the file are restricted to "users" group, and secrets node is further restricted to administrators. For instance, James can access data, but not the secrets:

Obviously, if we reverse the groups, James won't be access anything:

Combining users and groups is possible too:

Here, Lucy can access the node even if she is not part of administrators group:

Emily can access the node as well, since she is a member of administrators group:

Forbidden parts

For security reasons, some parts are forbidden in a query. Any access to /_users and /_groups nodes is forbidden. Another forbidden element are two dots surrounded by slashes (or a slash followed by two dots at the end of the query).

Data architects ye be warned: accessing users and groups and using two dots is still possible through inheritance.

Note that when Trust is accessed through a web service, URIs such as http://example.com/a/../b is automatically transformed by the client to http://example.com/b. This is why, for instance, CURL tests which use two dots won't generate a query-invalid error.

Illustration 18: access of _users node

Illustration 19: access inside _users node

Illustration 20: access to _groups node

Illustration 21: access inside _groups node

Audit

The audit covers two scenarios:

Currently, the audit entries can be sent to syslog only. Later, it is expected to add support for Redis and MongoDB.

Contribute

If you have a technical question, you may post it on Stack Overflow. You may also contact me by e-mail at arseni.mourzenko@pelicandd.com

The source code of the project is hosted in our own version control. If you want to contribute to the project, contact me so I give you access to the SVN servers.

Here are some technical details about the architecture.

When Trust receives a request, it selects a formatter based on the value of the response mode. This formatter is then in charge of the process. This may seem unnatural (why would a formatter be in charge, instead of being called only at the end of the process?), but this approach makes the code simpler. In fact, different formatters handle differently exceptions and warnings. For instance, some will catch exceptions while others will let them being thrown up the stack.

The formatter handles the request by using a finder. A finder corresponds more or less to what we call a data source in the documentation, given that some finders may correspond to multiple data sources.

Finders do all the major work: authentication, permissions and the actual loading of data.

That's all for now.