Created
June 9, 2014 21:11
-
-
Save srcspider/17591a43cf6b1b13c542 to your computer and use it in GitHub Desktop.
Draft Spec for Community Dependency Management for the Go Language
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Specifically written for the Go team. Feel free to fork. | |
It's unproductive to have preconcieved notions so the following assumes | |
"any language" and ignores "current language limitations" though Go and node | |
among others are references quite a bit in some examples where real world | |
situations are relevant to the point made. Also, the most unideal circumstances | |
are taken as the "default working circumstances," since that's what serves the | |
community best. | |
Community Dependency Management | |
------------------------------- | |
A depependency management system should do the following: | |
- pull all the dependencies | |
- error out when the dependencies are not met | |
- provide dependency safe-states (commonly .lock files) | |
- isolate dependency persistence to the project directory | |
- read/parse remote resources (this includes git, svn, etc repos) | |
And optionally, and very desirably | |
- "help" easily and reliably get dependencies (including private ones) | |
- provide an easy to blog/writeout dependency grabbing | |
syntax (eg. cmd install A --save) | |
- not force the user to use only one source for dependencies (github is not | |
the end all be all, same for any sort of "central system"), though having an | |
official source can be helpful in a lot of situations | |
Actors | |
====== | |
The following are responsible for "managing dependencies" | |
- the "dependency system" | |
- the language (LANGUAGE) | |
- the dependencies (LIB) | |
- the developer trying to manage his dependencies (USER) | |
Dependency resolution is a problem that effects everyone. It is not something | |
that any one entity can or should be responsible on it's own. The language is | |
included because a language is nothing but a toy if it can not be used in a | |
real world production environment or can only be used under very specific | |
circumstances. | |
There is a 5th actor involved in the process, | |
- the developer that shortcircuits/sabotages your system for the purpose of | |
"getting the job done," be that with good intentions or malecious laziness | |
(aka. the BADUSER) | |
I mentioned this seperatly as this only matters for one thing: how strict or | |
permisive you make any part of the dependency system. | |
The rule of thumb is this: if you are NOT sure it's impossible for a BADUSER | |
to sabotage a part of your system, make that part of the system permisive | |
so the that the BADUSER stays as a (good) USER and therefore control stays in | |
the hands of the dependency system and not "undefined" as that doesn't help | |
anyone (except the BADUSER, which doesn't care) | |
Bad practices when done by enough users become defactor standards. | |
The only ones who stand to lose are the dependency system which now has to be | |
clasified by the community as having inconsistent/undefined behaviour for all | |
the cases (much as it might defend itself as X, Y being BADUSERs), and the poor | |
saps who have to come in after to clean the mess (the good USERs), because you | |
the dependency system can not help them. | |
An example: if you strictly do not allow packages that only differ in a bugfix | |
version that two dependencies ask for, then a malecious user might just forge | |
the version of one of them in some way resulting in your system now not having | |
any clue what it is doing; if instead you allow it but show a warning then | |
everything works and therefore the BADUSER doesn't have to be a BADUSER and if | |
in the future a good USER comes in to pickup the work and wants/knows how to | |
fix it they can, as opposed to recieving a project filled with hacks of which | |
you the dependency system, by not offering the option, are partially | |
responsible for. | |
Real world example 1: consider jslint vs jshint. jshint was more or less | |
developed as a "version with hacks" of jslint and is now more or less defacto | |
standard with jslint almost never being mentioned as being used. Consider the | |
implications of that on a packaging tool, your version of the packaging tool | |
would become unusable if a "hacks version" becomes defacto standard by virtue | |
of the official version being "unusable" in realworld circumstances. | |
Real world example 2: coffeescript, sass, etc; life becomes complicated when | |
the community has to fix "your problems" | |
Problem 1: Project A depends on B, C and D | |
========================================== | |
The USER needs to be able to specify within Project A dependencies in a | |
preferably human readable format (if possible one that supports comments), to | |
project B, C and D. We'll refer to this as the DEPMAP to avoid confusion with | |
other mappings. | |
In addition, | |
- the method by which the user specifies his dependencies needs to be capable | |
of getting persisted with other project source files in a source version | |
control system | |
Note: only the project directory (the location of go source files) is | |
safe enough to assume as "sacred Go land," the dependency system should | |
not assume anything more. It is not even safe to assume that the | |
sacred Go land is at the root of the source version control; since | |
there are patterns where it is not. | |
eg. server + frontend in the same project under the same source | |
version control, both in distinct directories that are built into | |
deployable versions in some other directory in the project via a | |
3rd party build system not controlled by Go | |
- the USER needs to have precise control over which VERSION of his | |
dependencies he is pulling. If he is pulling a "non-version" identifier then | |
said identifier should just be exactly what it is (eg. master, dev, | |
[commithash], etc) | |
- the USER needs to be able to distinguish "trusted" from "untrusted" versions, | |
which is to say specify that they want version 1.2.3 but do not trust the | |
authors of the library to properly version or know the authors use a | |
different versioning system that just looks the same; suggested syntax | |
#1.2.3, will resolve the same as 1.2.3 only any version logic will not be | |
executed on it, the version will be interpreted verbatim the same as | |
"master" or any other symbol | |
- the USER needs to be able to specify an array of acceptable versions; this | |
is to empower the user to "manually" achieve single dependency parity by | |
saying they accept any of a list of symbols (eg. master, dev, etc) which can | |
be used by the versioning system to mitigate conflicts with | |
other dependencies | |
- a LIB needs to be able to specify an array of acceptable versions; this is | |
to allow for multi-major-version-compatibility; for example a LIB might | |
depend on a utility library X, in the lib you would have X -> ^1.0. In time | |
the version of X advances to 2.0 due to changes to some utility functions, | |
but LIB doesn't actually break from the changes since the functions it | |
depends on haven't changed and it's tests all pass when using both 1.0 and | |
2.0 of X, unfortunately the LIB author now has to force their users to | |
do a lot of undesirable things to help both his X v1.0 users and users who | |
need compatibility to v2.0. Ideally the author should be able to say he is | |
compatible with both ^1.0 and ^2.0 of X and not change his version or do | |
any other unecesary task. | |
- the dependency system MUST try as hard as possible to avoid fooling the user | |
of the "stable" nature of the dependencies he is pulling | |
Example of (very common) bad behavior: | |
- pulling master branches as default, as if those are stable | |
- pulling the last tagged version as if there is never going be be | |
backwards incompatiblity between tags | |
- making ANY sort of assumtion on behavior/conventions/etc the | |
dependency is using | |
It should be noted that forcing some "almighty standard" and then | |
allowing only based on that is very unproductive in a real world | |
environment outside of completely closed ecosystems; such as those of | |
very large corporations (ie. google, facebook, etc). The USER needs to | |
have the ability to pull from anywhere since denying that right will | |
just force them to hack their way around it. | |
- the dependency system must have the ability to pull all dependencies (in | |
preferably clear source format if available) into the project directory so | |
that the USER has the choice of saving his dependencies with the project. | |
No system on Earth is foolproof and for some users even a small downtime in | |
the service is catastrophic. | |
- the dependency system should write a "exact dependency mapping" file | |
(LOCKFILE) after every "dependency refresh." Whenever another USER on | |
another machine asks for the dependencies to be resolved (assuming | |
dependencies weren't pushed with the project altogheter) the dependency | |
system MUST use the LOCKFILE to resolve the dependencies, if the state | |
of the DEPMAP has not change relative to the state for which the LOCKFILE | |
was created. The USER of course can explicitly specify to ignore the | |
LOCKFILE and reprocess all dependencies if she/he wishes. | |
LOCKFILEs are meant to be commited. By commiting them the user thereby | |
ensures his team members have a consistent copy (unless there's a error | |
with LIB which is not entirely the dependency systems problem), as well | |
as ensure there is a history of "stable states" of what is posibly very | |
volatile dependencies. | |
Dependency system is responsible for storing some basic consistency | |
information with the LOCKFILE and warning the user when they install | |
the dependencies but get a slightly different version then what was | |
recorded in the lockfile as corresponding to the given symbol. | |
eg. LOCKFILE has dependency A as 1.2.4 and a checksum of 1010101 if it | |
gets a checksum of 100000 after pulling version 1.2.4 into the project | |
then it warns the user | |
- the dependency system should allow the USER to split off the dependencies | |
into a seperate distinct source path (eg. vendor/ etc); this is so that the | |
USER when pulling 3rd party dependencies he/she wishes to recieve only | |
upstreem changes and not change themselves can easily communicate to his | |
colleagues this fact though the project structure | |
In an ideal world the user would have full control over where each | |
and every dependency goes, but this is not explicitly required and many | |
dependency systems are very bad at supporting this. | |
Optionally, | |
- the USER would much appreciate the ability to specify a smart range of | |
acceptable versions (eg. 1.*, <=2.2, ^1.2.* ie. 1.2.* to <2.0, etc) as | |
well as "tooling dependencies" (eg. golang >=1.3, optipng >=1.*). If tooling | |
dependencies are allowed then it may be wise to force golang version | |
constraints always be provided for libraries, so that the standard library | |
of the language can be mitigated just as any other dependency (see problem 3) | |
As a sidenote, people expect 1.* but ^1.0 (ie. any version above 1.0 up | |
to but not including 2.0) is much clearer and does the same thing; | |
authors of libraries might have a harder time providing incomplete or | |
incorect information (ie. 1.* instead of ^1.2 for example) if the star | |
syntax is just plain not supported (for libraries). | |
- the USER would much appreciate the ability to specify dependencies that are | |
environment specific (ie. dev, staging, etc), so as to avoid the workload | |
when deploying to an environment that doesn't need them (realworld: when | |
using build tools a node project can have easily 30 dev depencies and 5 | |
actual "production" dependencies). This is even more useful if we consider | |
such headaches as dependency conflics, less dependencies less conflicts. | |
- when possible the dependency system should try caching dependencies | |
globablly on the system to avoid network activity (some servers ironically | |
can have crappy behavior when it comes to retrieving files from exotic | |
sources such as github, etc). The user needs to have the choice of both | |
ignoring the cache as well as purging it at will (as that's been known | |
to cause problems in other dependency systems) | |
- it would be nice for the USER to be able to "correct" the dependencies of | |
his dependencies both as a means of applying security hotfixes, applying a | |
custom version he/she maintains that his dependencies can't officially use, | |
or just fixing his dependency tree manually | |
Problem 2: Project A depends on B, B dependeds on C and D | |
========================================================= | |
The dependency system should be able to read into B, check if B has a DEPMAP | |
and resolve B's dependencies. (finer points will be treated in other problems) | |
The same applies to if C itself depends on E, etc. | |
If A depends on B and B depends on C and C dependends on A, the dependency | |
system should error out. Same for any other case of circular dependency. It is | |
very important that the system inform the user what the circular dependency is | |
and NOT just dump a load of internal variables or states on the user. | |
Example of good user presentation: | |
Error, circular dependency was detected: | |
A -> B | |
B -> C | |
C -> A | |
Please fix and try again. Bye. | |
In a more complex case the example would show a tree of dependencies and | |
highlight in color the key points (ie. A -> B, B -> C and C -> A) so that | |
the user can visualize the error. | |
Problem 3: Project A depends on B and C, which both depend on D | |
=============================================================== | |
There are multiple problems here, depending on the version of D, but most boil | |
down to the same thing. | |
It's important to note that is a good idea, albeit not required if you wish to | |
just deny all, to have the LANGUAGE understand package version (as a hidden | |
part of the package name), both for the sake of interoperability of different | |
versions of the same dependency within different packages that need it but also | |
because it's useful information when debugging. It's also "good idea" for the | |
dependency system to be able to notify the user of "updated versions" whenver | |
it can, especially for cases such as the user having a dependency on 1.2.3 | |
(either due to LOCKFILE or DEPMAP) and 1.2.4 being available, since 1.2.4 may | |
be a critical security update (this applies to all dependencies, even | |
dependencies of dependencies, not just root). | |
Before continuing its important to first verify the source of D, if the source | |
of D can't be indentified as the same or compatible source then the USER should | |
be notified though a warning and offered help on how they can configure the | |
DEPMAP to identify the two sources of D as the same source if they believe it | |
to be identical. D from two sources follows the same case as D with different | |
symbols. | |
If D has two different sources then it's treated as two seperate entities so | |
each version is independent of the other and the symbols in the LANGUAGE are | |
considered different. | |
If D is the same version (B -> Dv1.0, C -> Dv1.0) both B and C should just be | |
linked to the same D (v1.0). The LANGUAGE should see the same D. | |
If one package can be coerced into the other, ie. one package specifies 1.1 but | |
the other specifies 1.* or similar then the 1.* is corced into 1.1 by the | |
dependency system; it is the LIB's responsibility to provide accurate | |
dependencies, if the intention was "any version of 1.0 so long as it's higher | |
then 1.2" then it should have specified ^1.2. It is the dependency systems | |
responsibility to provide support for the LIB to specify the correct intent. | |
If D differs in only bugfix versions (B -> Dv1.0.1, C -> Dv1.0.2) then B and C | |
should be linked to the highest bugfix version, unless the USER specifies in | |
his DEPMAP he doesn't want that behavior. If the versions cause incompatibility | |
then it's a LIB problem, not a dependency system problem. A bugfix release | |
should be considered compatible by default unless otherwise specified. | |
If D differs in minor version but not major version (B -> Dv1.1.1, C -> Dv1.2.1) | |
then the dependency system must first check if D specifies it can only exist | |
as a single dependency (ie. it MUST be a single D dependency), and if so the | |
system fails providing a map of the user of how D asks to be unique and which | |
packages are trying to use inconpatible versions. Otherwise, if D accepts to be | |
multiple entities the dependency system must ask the LANGUAGE to perform a | |
DEPEMDEMCY LOGIC MAP (DLM) on the two D versions (explained later). If the DLM | |
fails the dependency system fails, if the DLM doesn't fail then the LANGUAGE | |
needs to treat the two as different symbols. The USER may specify that the | |
dependency system should forcefully coerce "feature versions" into a single | |
entity, in which case the LANGUAGE just sees v1.2.1 of D and everything is the | |
same symbol. | |
If D differs in major version or is just different symbols then a DEPENDENCY | |
LOGIC MAP (DLM) is asked from the LANGUAGE. If it fails the dependency system | |
fails, if it doesn't fail then the LANGUAGE treats both as different symbols. | |
### Algorythm for forming a DEPENDENCY LOGIC MAP (DLM) for package X | |
We'll consider we have two versions of X, version X1 and version X2. | |
Start with all .go files. | |
Naive loop: | |
1. if a file imports "the package X#" then it is part of the map | |
2. the package version it imports is considered the SEMANTIC INPUT | |
3. anything that exports the package out is considered OUTPUT | |
4. repeat from (1) with every package that has OUTPUT as relative to | |
"the package X#" until it's pointless to continue | |
After doing the above two times (once for X1 and another time for X2), you now | |
have every package as reciving X1 or X2 and either outputing one of them or | |
not outputing any one of them. | |
So now you just check if there is a package that accepts both as input. If you | |
find one then then algorythm FAILs and you print a map to the user of how the | |
two would get used by a single package simultaniously. If you don't the | |
algorythm has passed, since the two versions of X1 and X2 won't ever exist | |
in the same scope and hence them not being able to exist in the program | |
simultaniously is only a LANGUAGE problem. | |
It should be noted that in the realworld this theoretical problem is very | |
rare as most "shared dependencies" tend to be in the form of "utility libraries" | |
and libraries will typically export a "universal" resource. If the PHP | |
ecosystem is any indication when it does happen its not such a "world ending | |
problem" that the USER can't simply work around it themselves to an extent, | |
much like if you had two seperate packages with identical package name. A lot | |
of libraries (of any language) also consider it "sexy" to be able to claim | |
"we dont depend on anything" either in the spirit of avoiding the problem or | |
just because it allows them to be consistent (though this depends on the | |
ecosystem and popularity) | |
Problem 4: Project A depends on B, C and D, which all depend on E | |
================================================================= | |
Same solution as 3. The dependency system needs to be able to apply the logic | |
of all problems so far both on any number of dependencies as well as any | |
depth in the dependency tree. | |
Algorythm wise the problem so long as it can be solved for 2 can be solved for | |
any number greater then 2. The process is as follows if you have 3: | |
- solve E for B and C | |
- solve E for B and D (tacking into account solution of B and C) | |
- solve E for C and D (tacking account both previous steps) | |
Or if we take a hard example: | |
Let B be incompatible with C | |
Let C be compatible with D | |
- solve E for B and C: Eb, Ec | |
- solve E for B and D: Ed | |
- solve E for C and D: Ec, Ed become Ecd | |
Result: Eb, Ecd | |
This is just a naive algorythm to prove it is possible; better algorythms may | |
be applicable in practice. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment