Created
February 19, 2014 19:18
-
-
Save wickman/9099469 to your computer and use it in GitHub Desktop.
pystachio 1.0 designdoc
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Pystachio 1.0 redesign | |
1. Documents: | |
Essentially a set of hierarchical key/value pairs a la a JSON document. | |
Documents may contain: | |
1. Leaves (string or numeric [integer, long, float, etc]) | |
2. Iterables (can contain values of 1, 2, 3; will always be coerced to | |
tuples, regardless of underlying type e.g. set, so set | |
properties will not be preserved when coerced to Document) | |
3. Maps (keys /must/ be coercible to 1, values can be any of 1, 2, 3) | |
The Document itself is just a Map (#3). String leaves are always coerced to | |
either a Ref or Fragment as described in the next section. Maps are always | |
coerced to Documents (in other words, Documents are recursive data types.) | |
There are two important methods on documents: | |
.bind(*args, **kw) | |
.__call__(*args, **kw) | |
These used to be different, but in Pystachio 1.0 they are aliased together. | |
Each argument in args must be either another document or a dict which is to | |
be merged with this document. **kw should be passed to a Document and | |
merged in as well. Merge is performed by dict .update(). | |
Documents should provide a .raw_items() iterator analagous to .items() that | |
iterates over the raw, unresolved contents of the document. .bind() and | |
.__call__() should merge .raw_items() from Documents, but plain .items() | |
from dicts. .items() should iterate over *resolved* versions of the | |
dictionary contents in the case of Documents. | |
NOTE: .bind and .__call__ functions return *new* Documents -- the original | |
document is immutable and unchanged, so: | |
>>> d = Document(hello = 'world') | |
>>> d(hello = 'universe') | |
Leaves 'd' unchanged, but instead returns a new Document(hello = | |
'universe'). | |
Furthermore, as Documents are like dictionaries, so they can be accessed via | |
__getitem__. Use of __getattr__ is discouraged but just delegates to | |
__getitem__. | |
2. Mustaches: | |
We do not implement the Mustache document format (i.e. no loops or | |
conditionals.) We just use mustaches '{{}}' to denote 'pointer.' | |
There are two kinds of indirect objects: | |
1. Refs: | |
A single mustache instance, e.g. | |
{{foo}} | |
{{foo.bar}} | |
{{foo.bar[baz]}} | |
{{foo[`baz`]}} | |
2. Fragments | |
A sequence of [... str, ref, str, ref, str ...] representing a string. | |
A fragment may neither start with a Ref nor end with a Ref. A fragment | |
cannot be [Ref(...)]; instead it will just be a Ref. | |
It is possible to have the following case: | |
{{foo.bar[{{baz}}]}} | |
This will be parsed as: | |
Fragment(['{{foo.bar[', Ref('baz'), ']}}']) | |
The reason for the distinction between Refs and Fragments is due to how they | |
are resolved. | |
Ref resolution is simply "find the reference and substitute it in place of | |
this value." The process of finding the associated reference value is | |
described in the next section. | |
Fragment resolution on the other hand is an iterative process. If there are | |
any Refs within a fragment that cannot be found within the Document, raise | |
NotFound as it cannot be resolved. If all Refs can be resolved, | |
''.join(resulting strings) and reparse. There are three possible outcomes: | |
1. A single Ref: perform ref resolution (i.e., substitution) | |
2. Fragment with no Refs: return fragment.components()[0] (its string representation) | |
3. Fragment with Refs: repeat iteration | |
3. Evaluating mustache refs | |
There are three forms of refs: | |
{{name}} | |
{{name1.name2}} | |
{{name1[name2]}} | |
It is possible to escape a mustache using &: | |
{{&name}} | |
will always result in the string {{name}} rather than Ref('name'). The & | |
should be stripped as late as possible. | |
Implicitly, {{name}} is equivalent to {{.name}}, which on a document means | |
__getitem__['name']. Furthermore, these can be composed together, e.g. | |
{{name1.name2[name3]}}. It should also be possible to escape names using | |
back-ticks (this was not possible in Pystachio 0.x): | |
{{`foo bar`}} | |
{{name1.`foo bar`}} | |
{{name1[`foo bar`]}} | |
As such, back-ticks are not allowed within keys in Documents. Names must be | |
escaped if they do not conform to C-style variable names, i.e. [a-zA-Z_][a-zA-Z0-9_]* | |
This includes things like common team names like `aurora-team`. | |
In order to find a reference value, each of the 3 primary Document types | |
must understand finding via . and []: | |
1. Leaves: Dereference is not supported -- raise NotFound | |
2. Iterables | |
a) .-dereference: Not supported | |
b) []-dereference: The dereference is coerced to integer (if not | |
coercible, raise NotFound type error) and indexed. | |
3. Maps | |
a) .-dereference: Equivalent to __getitem__ | |
b) []-dereference: Equivalent to __getitem__ | |
*scoping rules* | |
Since Documents may be contained with Documents, there are scoping rules to | |
be aware of. Consider the following document | |
{ | |
'name': '{{profile.first}} {{profile.last}}', | |
'profile': { | |
'title': 'mr.', | |
'first': '{{title}} brian', | |
'last': 'wickman', | |
} | |
} | |
Resolving {{profile.last}} is unambiguous: dereference .profile, which | |
results in the document {'title': 'mr.', 'first': '{{title}} brian', 'last': 'wickman'}. | |
Then dereference .last from said document resulting in 'wickman'. | |
Resolving {{profile.first}} is slightly more nuanced. You begin with | |
resolving {{profile}}, then next you must resolve {{.first}}. In order to | |
resolve {{.first}}, we must resolve '{{title}} brian'. '{{title}}' is | |
scoped to the 'profile' document. In this case, it's simple, as it resolves | |
directly to 'mr.'. However, should {{title}} not be found | |
within 'profile', all enclosing documents must be searched *in stack order*. | |
In other words, the top level document containing 'name' and 'profile' must be | |
used to attempt to resolve {{title}}. (If it were nested multiple levels, | |
this would continue until no more documents are reached, at which point | |
NotFound is raised.) | |
*special names* | |
There are two special names in the dereferencing algorithm that alter | |
behavior of resolution. | |
1. self: Restricts resolution of the variable to within the current document | |
and will not delegate to parent documents. This won't ever change the | |
resolution result (as it's always done locally) but it will change error | |
handling. For example, the difference between {{name}} and {{self.name}} | |
is that parent documents should never be able to provide the value for | |
{{name}} should it not be provided within the scope of that document. | |
2. super: Restricts resolution to the parent document. For example, if you | |
want explicit inheritance: | |
task = Document(name = '{{super.name}}', attribute = 'value') | |
job = Document(name = 'the job', task = task) | |
job.task.name will be 'the job' | |
This can be used multiple times, e.g. {{super.super.cpu}}. | |
This does complicate things slightly as Documents must retain the context in | |
which they were evaluated: | |
job.task is a Document with the parent document of job | |
This is necessary in order for job.task.name to be properly evaluated when | |
{{name}} is being resolved from "job". | |
There are two approaches to implementing this functionality: | |
i) If a Ref returns a Document 'd', yield d(super=self), but make Document | |
aware that 'super' ought to be hidden from most introspection e.g. | |
items(), raw_items(), and __str__. This means that you must be very | |
careful about doing resolution in that you do not lose the contents of | |
self['super']. | |
ii) Maintain a hidden _super attribute set by the parent and treat it | |
specially. | |
**Illustration 1** | |
d = Document({ | |
'name': '{{profile.first}} "{{nicknames[{{profile.nick_index}}]}}" {{profile.last}}', | |
'yob': '{{profile.yob}}', | |
'nicknames': ['b', 'bibby', 'wickyman'], | |
'profile': { | |
'title': 'mr.', | |
'first': '{{title}} brian', | |
'last': 'wickman', | |
'yob': 1981, | |
'occupation': 'engineer', | |
'nick_index': '1', | |
}, | |
}) | |
assert d['name'] == d.name == 'brian "bibby" wickman' | |
assert d['yob'] == d.yob == 1981 | |
assert d['nicknames'][0] == d.nicknames[0] == 'b' | |
assert dict(d.items()) == { | |
'name': 'brian "bibby" wickman', | |
'yob': 1981, | |
'nicknames': ('b', 'bibby', 'wickyman'), | |
'profile': { | |
'title': 'mr.', | |
'first': 'mr. brian', | |
'last': 'wickman', | |
'occupation': 'engineer', | |
'nick_index': '1', | |
}, | |
} | |
N.B. These should always return copies of the underlying structure. So that | |
"d['nicknames'][0] = 23" should be a no-op, as d['nicknames'] is merely a | |
copy of the original. | |
4. Traits | |
Once a document has been acquired, information must be extracted from that | |
document. Traits are effectively schemas that dictate how to extract and | |
optionally serialize content from documents. | |
In an ideal world, they would be completely separate from documents, but in | |
order to maintain backwards compatibility with Pystachio 0.x, they must be | |
slightly conflated with documents in that they must subclass documents. | |
In other words, ideally trait expression and extraction would appear like: | |
class Process(Trait): | |
name = Required(String) | |
cmdline = Required(String) | |
daemon = Default(Boolean, False) | |
ephemeral = Default(Boolean, False) | |
max_failures = Default(Integer, 1) | |
d = Document.from_json('process.json') | |
process = d.extract_trait(Process) [extracts trait and type checks] | |
subprocess.call(process.cmdline.split()) | |
Instead Trait's metaclass must inject Document as a parent class and | |
behave like so in order to maintain backwards compatibility: | |
process = Process.from_json('process.json') | |
process.check() | |
subprocess.call(process.cmdline.split()) | |
However, they should not require _any_ shared methods with Documents, so | |
they can be tested in isolation. | |
This separation of concerns of Documents and Traits should make it simpler | |
to extract Traits from other IDLs e.g. Thrift for example using a library | |
like ptsd: | |
Process = ThriftTrait('thermos.thrift', 'Process') | |
process = Process.from_json('process.json') | |
process.check() | |
process.serialize() # serialize to thrift byte stream | |
where thermos.thrift may look like: | |
struct Process { | |
1: required string name | |
2: required string cmdline | |
3: optional bool daemon = false | |
4: optional bool ephemeral = false | |
5: optional i16 max_failures = 1 | |
} | |
In practice, we'll alias Struct = Trait to maintain backwards compatibility, | |
but implicitly you can think of Structs as Documents with an implied Trait. | |
5. Trait representation and extraction | |
XXX(finishme) | |
TBD. Represent base types: | |
Boolean .coerce | |
Integer .coerce | |
Float .coerce | |
String .coerce | |
Enum .coerce | |
Container types: | |
List .coerce | |
Map .coerce | |
Trait.coerce(document) ? Seems reasonable -- should also be compatible with | |
ThriftTrait too. | |
Requirements: | |
Required | |
Default | |
Then Trait has a class attribute: | |
_TYPE_MAP { name => ??? } | |
_TRAIT_MAP { name => Value } | |
6. Trait merging | |
It should be possible to merge certain traits together and/or monkeypatch | |
existing traits. Traits should support a .extends() methods that accept new | |
traits and merges them together to produce new ones. | |
For example, | |
class Job(Trait): | |
name = Required(String) | |
task = Required(Task) | |
class Announceable(Trait): | |
announce = Announce | |
AnnounceableJob = Job.extend(Announceable) | |
Then in the Pystachio Loader (which should remain essentially unchanged from | |
version 0.x to version 1.x) can do things like: | |
Job = Job.extend(AnnounceableJob) | |
so that it is possible to create organizational-specific configuration on | |
top of jobs and tasks. | |
Now unfortunately this only works elegantly for top-most declarations, so it | |
should be worth considering to do something along the lines of: | |
Job = Job.extend_attribute('task', Task.extend(HealthCheckable)) | |
which could correspondingly be chained: | |
Job = Job.extend_attribute('task', | |
Task.extend_attribute('process', Process.extend(HealthCheckable))) | |
7. Lambdas | |
Just joking, I'm not proposing implementing Lambdas for Pystachio. | |
However, there are certain use-cases where it makes sense to provide a | |
smarter document. Consider a typical schema: | |
class Resources(Trait): | |
cpu = Default(Float, 1.0) | |
ram = Default(Integer, 1 * GB) | |
disk = Default(Intege, 1 * GB) | |
class Process(Trait): | |
name = Required(String) | |
cmdline = Required(String) | |
class Task(Trait): | |
name = Default(String, '{{processes[0].name}}') | |
processes = Required(List(Process)) | |
resources = Default(Resources, Resources()) | |
Now you may construct a task in the following manner: | |
task = Task( | |
processes = [Process(name = 'hello_world', cmdline = 'echo hello world')], | |
resources = Resources(ram = 8 * GB), | |
) | |
Consider the case where we want to run a JVM but must set things like -Xmx | |
properly. We may not want to set -Xmx blindly to -Xmx{{super.resources.ram}} | |
but instead perform arithmetic on the values provided. | |
Documents accept both dicts and Documents as Mappings, but explicitly only | |
coerce dicts to Documents. Therefore, it is perfectly acceptable to | |
subclass Document to do smarter things. | |
class JavaProcess(Document): | |
def __init__(self, jar_name, args, **kw): | |
self._jar_name = jar_name | |
self._args = args | |
super(JavaProcess, self).__init__(**kw) | |
def __produce_cmdline(self): | |
cpu = self._resolve('{{super.resources.cpu}}') | |
assert cpu >= 1, 'cpu cannot be less than 1' | |
if cpu <= 8: | |
gc_threads = cpu | |
else: | |
gc_threads = int(math.ceil(8 + (cpu - 8) * 5/8)) | |
return 'java -jar %s -XX:ParallelGCThreads=%s %s' % ( | |
self._jar_name, gc_threads, ' '.join(self._args)) | |
def __getitem__(self, name): | |
# only resolve a unique cmdline: | |
if name == 'cmdline': | |
return self.__produce_cmdline() | |
return super(JavaProcess, self).__getitem__(name) | |
This way it is possible to do: | |
task = Task( | |
processes = [JavaProcess('foo.jar', ['-httpPort', '80'], name='foo')], | |
resources = '{{profile.resources}}', | |
)(profile = {'resources': {'cpu': 16}}) | |
and have the following hold true: | |
assert task.processes[0].cmdline == 'java -jar foo.jar -XX:ParallelGCThreads=13.0 -httpPort 80' | |
whereas before it was not possible to add logic to template evaluation. | |
The downside of course is that it is no longer possible to do d = | |
Document.from_json('task.json') and have any way to express that a process | |
should be evaluated as a JavaProcess. However, task.to_json() would work | |
correctly in the above situation. | |
8. Dynamic Documents | |
Much in the same spirit of section 7, it is possible to consider dynamic | |
documents. There are specific use-cases where we have need for this at | |
Twitter: | |
1) Resolving package locations | |
2) Resolving jenkins artifacts | |
3) Resolving build artifacts | |
For example, | |
{{artifactory[`wickman-cache`][`org.apache.aurora.scheduler`][`0.5.0`]}} | |
This might dynamically resolve this artifact and replace it with an https | |
URL that can be curled. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment