You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Filetree Schema Language is a format that's intended to make it easy
to specify certain file tree layouts.
The idea is that the performance dataset tool (yet to be written) will be capable of producing actual file trees upon feeding it with a file tree schema.
I tried to addressed a good trade-off between versatility and simplicity.
Syntactically, this manifests in being based on JSON -- that's easy to write by humans while providing the processing utility with properly typed structural data. A lot of abbreviations are added to make writing easy and
expressive (eg. you can specify the empty directory as "DIR" or {},
while the full form would be [ "DIR", {"entries": {} } ]).
Semantically, it manifests as having good support for recursive constructs,
while enforcing some naming conventions (ie. giving up some freedom in terms of naming the files); also, we stick with the most basic filesystem semantics, at the level of VFAT -- there is no metadata and the types known are restricted to regular files and directories. If due to some specific demand we need more, please speak up and I try to adjust it accordingly.
To get some feel of it, I attach a patch against Vijaykumar's crefi tool
(revision ecc49c04) which adds the dumpschema fop; so that with
--fop=dumpschema the tool spits out a description of the file tree that would be created, instead of doing any changes on the filesystem. Due to the above
mentioned naming conventions, the schema just approximates the names used by Vijaykumar, but the layout and the content of the files will be faithfully represented.
Filetree Schema Language is a JSON-based domain specific language to describe
file trees.
Text documents adhering the specficiation of the Filetree Schema
Langauge are simply referred to as "file tree schemas".
semantic sketch
The semantical atom we choose is the entity, which, in this discourse,
is either a file or a directory; and a file tree is a tree of entites.
(So hereby we neglect other possible filesystem objects, like symlinks
or device files, and also the identity of an entity is given by its position
in the tree, so there are no absolute (position-indepent) identifiers like
inodes.)
A file tree schema describes (potentially ambiguously) a file tree.
A file tree schema is a tree of entity schemas which describe (potentially
ambiguously) an entity. Let's see an example.
is a file tree schema that describes the file tree we can get by
executing the following commands in an empty directory:
mkdir foo quux
touch foo/baz
echo aa > foo/bar
Within these, ["STRING", "aa"], "NULL", {},
{ "bar": ["STRING", "aa"], "baz": "NULL" } and the whole thing
are entity schemas (the first two are file schemas, the latter three
are directory schemas).
This is very straightforward, but hard to see what is it good for.
Let's see then an ambiguous file schema:
[ "RANDOM", {"size": ["1k", "1m"] } ]
describes a file with random content of random size in between 1 kilobyte
and 1 megabyte.
This is a little less trivial but the real purpose is to describe large
trees with constant size schemas. Eg. the directory chain a/a/a/a/a
will be possible to specify with the following schema:
and the length of the chain can be anything by replacing the 5 with
the chosen value. (This is also an example of a file tree schema which
is not an entity schema: the labels ROOT and entry do not correspond
to directory entries.)
Or would we like an increase in width? Eg. how could we describe the
binary file tree that's produced by
That is, an index after the entry name describes an enumerated list of
entries up to that index.
syntax elements
Entity schemas are specified, identified and referenced by means of labels.
Labels are strings, and they are grouped in two ways:
reserved vs. user: reserved ones have a meaning defined in this spec;
user ones are specific to a particular file tree schema. Reserved are
written ALL_CAPS, user ones should start with a lowercase letter.
topological vs. type vs. special: topological ones have a role to define the tree
layout; type ones give type information for an entity. Special ones are, well, special.
All user labels are topological; that is, all type and special labels are reserved.
There is a fix set of reserved labels, as follows:
DIR is the label representing directories; all the other type ones represent files,
in various ways. An entity schema is given in the form
[ <TYPE>, {<attribute>: <value>,...} ]
where TYPE is a type label; the optional and mandatory attrbitues are specific to TYPE
as discussed below.
Any string can serve as user label adhering to the syntactical restriction above
(start with lowercase).
size spec
One particular value is size. Mathematically, a size is given by
a closed non-empty interval of integers, which means a random value
within that interval for any entity instantiating the schema. Note that in JSON
all numbers are floats; thus in terms of JSON, we consider a dotless float to
be an integer.
Thus:
a size spec is either exact (represents a single value) or fuzzy (represents
a proper interval).
An exact size spec is:
either a dotless float;
or a string consisting of digits followed by "K", "M", or "G"
(case-insensitively), whereby "K", "M", and "G" stand for, respectively,
multipliers of kilo (1<<10), mega (1<<20) and giga (1<<30) for the number
represented by the digits.
A fuzzy size spec is an array [a, b] with a, b being exact size specs,
so that a ≤ b.
Example:
["1k", 1536]
file types
For each file type we give the attributes with type and default value. Type of
the value is either a native JSON type or a size spec. Attributes are optional unless
declared mandatory.
STRING
Attributes: data (string, defaults to ""), size (size spec, defaults to length of data).
[In the sequel we shorten such a description as data (string: ""), size (size: len(data)).]
Represents a file with data as its content, iterated circularly up to size. Eg.
[ "STRING", {"data": "abc", "size": 5 } ]
represents a file with content "abcab"
BINARY
Attributes: data (string: ""), encoding (one of "hex", "base64", "quoted", mandatory),
size (size: len(decoded data)).
Represents a file with content of data decoded from encoding, iterated circularly up to size. Eg.
Attributes: file (string, mandatory), size (size: size of the file named file).
Represents a file with content taken from file, iterated circularly up to size. Eg.
[ "LOOP", {"file": "/etc/services" } ]
is a replica of /etc/services.
RANDOM
Attributes: size (size: 0).
Represents a file with random binary content of size size. Eg.
[ "RANDOM", {"size": ["1k", "1m"] } ]
represents a file with random content of random size in between 1 kilobyte and 1 megabyte.
CALLOUT
Attributes: command (string or array of strings, mandatory), size (size: size of output).
Represents a file with content taken from the output of command, iterated circularly up to size.
If command is a string, it will be passed on to shell; if command is an array of strings, it will
be directly executed by fork/exec (in particular, the execution will happen in execlp(3) style, ie.
initial member of the command array is searched along the PATH environment variable and that
will be executed with command as the argument vector). The path of the file to be created will
be passed on to the program in the ENTRYPATH environment variable (it is guaranteed to be a relative
path without ".." components). Eg.
represents a file with its content being the MD5 sum of its path from the top of the file tree.
abbreviations
The following abbreviations can be applied:
[ <TYPE>, {} ] can be shortened either as [ <TYPE> ] or just <TYPE>. Thus, eg., "NULL"
specifies an empty file, abbreviating [ "NULL", {} ].
if the only given attribute is the first one (wrt. the order it occurs in the type definitions
above), the enclosing JSON object can be omitted. Thus, eg., for the STRING type the first
attribute is data, which lets us write [ "STRING", "aa" ] instead of
[ "STRING", { "data": "aa" } ].
the directory type
DIR
Attributes: entries (object: {})
Represents a directory with entries serving as a template of its content. Eg.
represents a directory with two files in it, bar, with content aa, and the empty file baz.
the entries object
In the above example entries is a literal specification of the content; but in general
it's a template that's expanded according to certain rules.
We will refer to the attributes (or names) of entries as entry name schemas and to the
values as entry specs.
entry name schemas
an entry name schema is a string which
must contain a non-numerical character
cannot be identical with ".."
if the last character of an entry name schema is not a digit, then it's a literal entry name schema, ie. an instance of the containing directory schema will have
an entry of the same name.
[ "DIR", {"entries": {"baz": "NULL"} } ]
represents an directory with a single empty file in in named baz.
if an entry name schema ends with a digit, then it is splitted as
<entry name schema> = <name base><multiplicity specifier>
where <multiplicity specifier> is a sequence of digits, and the last character of
<name base> is not a digit. The multiplicity of the name is the integer denoted
by the <multiplicity specifier>. The actual entry names are indexed copies of the
<name base>, ie. with printf syntax, "<name base>%0ld" % k, for 0 ≤ k < multiplicity,
with l being the number of digits in multiplicity - 1 in decimal representation.
So for example
[ "DIR", {"entries": {"baz3": "NULL"} } ]
represents a directory with 3 empty files in it, named baz0, baz1 and baz2.
An exception is ".", called inline entry name schema. It has a special
value (which is not an entry spec), see below.
This relation between entry names and entry name schemas will be referred to as derived from.
So in the above example the actual entry names baz0, baz1 and baz2 are derived from the
entry name schema baz3.
entry specs
In their full form, entry specs are arrays of the following formats:
either [ <schema reference>, <stacking level> ]
or [ "SELF", <schema reference>, <stacking level> ]
where:
<schema reference> is an entry schema, an user label, or NONE;
<stacking level> is a non-negative integer (dotless float).
These are called, respectively, non-self-referent and self-referent entry specs.
A directory schema is self-contained if no user label occurs in it -- ie. the
schema references of the entry specs of its entries object are entry schemas, and these
entry schemas (would they be directory schemas), also fulfill this condition, and so on.
You can notice that in the above examples the entry specs are not arrays. That's because
there we used abbreviated forms. Let's therefore discuss available abbreviations.
the inline entry name schema
The value of the inline entry name schema is an array of schema references.
abbreviations
The abbreviations discussed above, for the file schemas, also apply for directories.
Thus, for example, "DIR" is a valid abbreviation for the schema that specifies the
empty directory, ie. [ "DIR", {"entries": {}} ].
However, in case of directories we also allow to omit the type label and use solely
the entries object; in that way, {} is also an abbreviation for the empty schema.
Furthermore, entry specs can be abbreviated:
the stacking level of 0 can be omitted
[ <schema reference> ] can be unboxed, ie. abbreviated as <schema reference>
[ "SELF", "NONE" ] can be abbreviated as "SELF"
[ <schema reference> ] can be unboxed also when it occurs as the value of the inline
entry name schema.
expandig directory schemas
A single directory schema represents a tree of its instances. Producing this tree
from the given schema is referred to as "expanding the directory schema".
User labels are references; they resolve to entry schemas (we'll discuss later how this
association is defined). Therefore, as a zeroth step of the expansion, we can recursively
subtitute all user labels in the schema with their corresponding entry schemas (cyclical
references won't be allowed so this is a finite process).
Recall, "." is the inline entry name schema and its value is an array of schema references.
These are required to resolve to directory schemas, with the semantics that their entries
object is merged into the current one. So, also as a pre-processing step, we can perform these
object mergers (in case of an attribute conflict, the later ones overwrite the earlier ones,
and the entries object of the actual directory schema trumps the others).
So we can restrict ourselves to the expansion of self-contained directory schemas where
the entries object does not have "." among its attributes.
To each entry that instantiates the given entry schema we will assign a stacking level.
(Note that the stacking level will have a significance only for directories and directory
schemas, but formally we define this assignment for any entry and entry schema.)
Assume that
D is a directory schema;
n: X is an entry name schema / entry spec pair in its entries object;
d is an instance of D with stacking level s;
m is an entry name in d derived from n.
Then
if X is of the form [ E, S ] (for some entry schema E and stacking level S),
then the entry at m will be an instance of E with stacking level S;
if X is of the form [ "SELF", E, S ] and s = 0, then again,
the entry at m will be an instance of E with stacking level S;
if X is of the form [ "SELF", E, S ] and s > 0,
then the entry at m will be an instance of D with stacking level s - 1.
A special case is when E is NONE; in that case, instance of E is to be understood
as there is no entry at m (that is, {} and {"foo": "NONE"} are the same).
So, for example, an instance of
{ "a": ["SELF", { "b": "SELF"}, 2 ] }
with stacking level 2 will unfold into the directory chain a/a/a/b/b.
file tree schema
A file tree schema is a JSON object with certain labels as attributes. That is,
a file tree schema
must have a ROOT attribute;
must have a VERSION attribute;
optionally might have any number user labels as attributes.
the ROOT attribute
The value of the ROOT attribute is a non-self-referent entry spec.
the VERSION attribute
VERSION serves for versioning the file tree schema format itself. Its
value must be 1. Further revisions of this document might specify or allow
higher numeric values (possibly non-integers).
user label attributes
Values of user label attributes are entity schemas. They are regarded to be
the definitions of their respective labels.
consistency requirements
Let's call a JSON object that matches the above conditions a quasi file tree schema.
(Sorry for the weird name, we need to call such objects somehow, and they are not yet
necessary file tree schemas, as further conditions apply.)
Basically, what we require is that each user label should be defined and the definitions
should be non-circular.
Formally, we can define the following bipartite graph between the user labels and entry
schemas occurring in a quasi file tree schema (considering the { "ROOT": <entry spec> }
pair of the quasi file tree schema to be a honorary directory schema):
there is an edge from an user label to an entry schema if the entry schema is the
definition of the label;
there is an edge from an entry schema to an user label if it's a directory schema
and some entry spec in its entries object includes the given label.
Then our requirement is:
there should be an edge going out from each user label in the graph;
the graph should not contain a directed circle.
If a quasi file tree schema meets this condition, we call it a file tree schema.
expandig file tree schemas
Let the value of the ROOT attribute be of the form [ E, S ], where E is a
schema reference and S is a stacking level; then an instance of this file tree
schema is an instance of the entry schema referred by E with stacking level S.
As an abbreviation, the VERSION attribute might be omitted.
The entry spec that is the value of the ROOT attribute can be abbreviated as
entry specs in general.
{ "ROOT": E }, with E being a self-contained entry schema such that
"ROOT" and "VERSION" is not among its entry name schemas, can be abbreviated
by E. In other words, any self-contained entry schema E, with "ROOT" and
"VERSION" not among its entry name schemas, is identified with the file tree schema
that has E as its ROOT with stacking level 0.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters