w.r.t. my experience with defcon and ufoLib. Some of the functionality I discuss (notifications, etc.) definitely shouldn't go into fontTools however we ought to make fontTools.ufoLib "compatible" with these extra features. I'm sure we can do simple and versatile.
IMO fontTools.ufoLib should basically be written from scratch (with copy-pasting here and there) since ufoLib/defcon have significant bloat and apparently we want to use lxml.
ufoLib does a lot of things that look like this:
if not os.path.exists(path):
return
file = open(path)
file.read()
This is slow, as it's making an extra check which becomes expensive in hot code paths as I witnessed when flame-graphing ufoLib/defcon.
In this case it's also wrong (the well-known TOCTOU from C) as the filesystem can be accessed from multiple processes and thus is racy.
Exceptions let us drop pre-checks:
try:
with open(path) as file:
file.read()
except FileNotFoundError:
return
Take what's good, leave what's bad. For perf and complexity reasons, also for letting a minimally bad file be opened...
Conceptually the reading process goes like this: file -parse-> tree (e.g. ElementTree) -structure-> font (e.g. Font) -validate-> blessed font
Validation should be optional since with e.g. with fontTools you want to be able to open and modify the font even if it's semantically invalid, and with fontmake you want to optimize for speed (assuming 99% of fonts are presumed valid) and it'll fail during compilation if some data is invalid either way. For font editors though validation is generally desirable otherwise you have to do many checks everytime you use data from the font, which isn't practical for the developer and quickly becomes redundant.
In the majority of cases though validation is as simple as converting the data into its attended type (which can be done easily with attrs); more sophisticated forms of validations include uniqueness of identifiers and correctness of segments in the contours.
Just like fontTools.feaLib does – it maintains a location tuple that is passed to each exception being thrown and indicates the current filename/line/column.
The original ElementTree library doesn't maintain a location but lxml objects have a sourceline
attribute.
Whatever data structure the reader reads to, it should be retained after the initial read (don't trash them or copy them). It should be the data, that sticks until the end of the program.
This ties into support for custom classes. I'll need to extend the base with extra attributes, so..
- do we want to read to primitive data structures like dict (which are easy to swap/serialize/compose) and then the Font, etc. classes "mount" (store) that data structure and mutate it internally. I like the KISS aspect of it: "this" file corresponds to "this" data structure. or
- do we want to pass classes to the reader (which it'll instantiate? in which case the ctor is part of the reader/writer API?)
What we should aim for is not need to reconstruct classes (e.g. currently for glyph anchors, ufoLib gives a list of dicts and defcon takes each dict and creates an Anchor() from it – the two pass aspect can be avoided).
attrs lets us write data classes without boilerplate ctors, cmp etc. Showoff
A compelling candidate for writing the data structures mentioned in the previous point, and directly instantiate from the reader.
Would be nice to have UFOs as a single file.
Only Mac considers dotted folders as files ("packages"), and file picker dialogs on Windows/Linux/etc. aren't expecting a folder (or it's impractical, e.g. when you saveAs with a folder the folder picker can only choose one that exists, you can't write a desired name and just click OK). Also the filesystem isn't optimized for many small files afaict.
Many modern multi-platforms apps (Microsoft Office, XD, Sketch etc.) use a zipped tree of files (zip with no compression).
Going forward it would be nice to move towards single-file becoming the default.
Also what I like with this is we can lock the whole file while we're reading it, in a directory things can race. Directories cannot be locked on Windows.
Clojure/FP-like data structures, cf. FB Immutable.js talk.
Hierarchy of Layer -> Glyph -> Contour etc. could form a deeply nested such structure inside the Font.
Takes more memory, but is fully versioned... easy to slice?
makes it easy to save the font while still using it (since it's COW).
And make renaming etc. easier. Layer colors?
Could probably be made zero-cost
I'd prefer a more generic set of infos and custom parameters that can override specific OT fields.
Does not need to be dynamic.
What happens right now with current stack is ufoLib reads data with zealous checks and at times unnecessary copy, then ufoLib and defcon are totally blind to eachother. ufoLib sets the attribute of whatever Font object it's given like .anchors, .guidelines etc. defcon treats it like arbitrary data and retakes each element one-by-one, sends notifications, makes asserts etc. while it's TOTALLY UNNEEDED in that case!
The custom classes need a privileged path where it just swallows the furnished data, ideally with zero copy (i.e. setting data using that path is free).
The alternative is to disable notifications/have zero-cost-when-no-subscribers notifications system and let the ufoLib set data normally.
Also, ctors should be as zero-cost as possible.
Essentially for Copy and Paste.
-
When copying part of a glyph, currently I create another glyph, use a special pen to pass on the selection to it then serialize that glyph. Maybe that could be simplified? It would be nice to have a serialization pen but I don't know if the pickle library could work that way. Also if serializing all elements of a glyph that are selected can be automated, that's cool.
-
I should be able to deserialize into a Glyph without clearing its contents (for Paste, basically) – defcon doesn't currently allow it
Will these be going into fontTools?
defcon has extended unicodeData and some bezierMath (join segments, cut contour at position). Will these be going into fontTools?
Also there's the representations system (just a cache, pretty straightforward) and identifiers (unique persistent hash for a given point, I think). These are only useful in apps so I don't think they should be in fontTools.
a.k.a the detect external modifications thing.
Note: the stampGlyphDataState method in defcon is also wrong as it stamps before attempting to load the Glyph and thus is prone to data races.
Cons:
- the os.path.getmtime() function can be expensive when in a hot path.
The NSNotificationCenter-like system has several shortcomings that are well-documented (relevant: Deprecating the Observer pattern):
-
string notifications prone to typos (e.g.), no checking
-
you have to unregister and not fuck it up.. which is unpractical for gui environments. otherwise you get errors like: "only one observer allowed for this notification" (with weakrefs, you don't even need to unsubscribe explicitly... if we make subscribers a set() then it's totally irrelevant!)
Related Q: does the order of notifications matter? can we make it so that it doesn't matter? (generally representations clearing [kill-cache] should have priority)
- high overhead of the machinery (packing notifications, weakrefs, is it disabled?, send to all possible targets :: objects get spawn during that process/high cost even with no observers), a slimmer, typed system such as delegates should be more effective
Solution:
have each object store its listeners and call them directly
- avoids having a big global notification handler that's expensive to work with
- avoids having to deal with many weakrefs and a chain of getDispatcher calls (i.e. where's my font? :: for e.g. glyph guidelines that can take 6 stack frames and weakref unpacking)
NOTE: compared to the current system this won't allow e.g. subscribing to all instances of a given class or notification in bulk but I don't think that's needed (even if it turned out to be, we needn't optimize for that case).
use an Enum type for notifications
- that gives us typechecking
- possible to add to that enum at runtime? otherwise store it as a class attr
allow disarming notifications? have a _isLoading attr? or try to handle that all in ctor?
- Note: if we nil the cost of notifications w. no subscribers, this becomes a non-issue
Re: Single file UFOs
One thing I like about UFOs now is that they lend themselves better to version control. I dislike how e.g. .glyphs files are just big text blobs you'd need tools to dissect. Granted, this might be a more ideological view.