GEDCOM is a well structured format. Defining a "standard" way to represent a GEDCOM node in JavaScript memory is useful, because it allows tools to be built in JavaScript space that can work with these JavaScript representations. For example, an editor can be crafted that is able to make edits in a browser to such a structure; while another tool might be able to use the information to present a pleasing web page, while a third tool might be able to traverse links between documents to make a pedigree report.
Both formats are hierarchical:
- GEDCOM is a hierarchical file format, although the Wikipedia page doesn't mention the word "hierarchy" at all; only "indentation level". Indentation level "1" items are usually followed by one or more indentation level 2 items; which are
- JSON is in fact a hierarchical format, in that nodes contain other nodes.
Both formats are ordered:
- In GEDOM two DATE records might have conflicting information; the first one is the "preferred" one.
- JSON arrays are defined explicitly to be ordered, and can contain duplicates.
Both formats (should) support links:
- In GEDCOM, a record can be a link (with
@X123@
syntax) pointing to another record (in the same database) - In JSON, a link could be defined to be a simple URI reference (e.g.
./X123
)
To distill a GEDCOM node into its constituent parts:
- "indentation level"
- node type
- textual value (or linked @X1234@ record)
- non-nunique, ordered sub-nodes (indentation level +1)
For example:
1 NAME John /Doe/
2 SOUR @S123@
2 NOTE Pretty sure that's his name
Could be transcribed as:
- Indentation level 1
- Node type NAME
- Textual value "John /Doe/"
- sub-nodes: two subnodes.
Translated to JSON:
- Indentation level comes from the JSON containment hierarchy
- Node type → attribute "type", a STRINg, like
NAME
orSOUR
- Textual value → attribute "value", like
John /Doe/
- Links → attribute "link", like
@S234@
or perhaps./S223.json
,http://other-place.org/other.file.json
- child nodes → attribute "nodes" — an array of nodes like this one.
The GEDCOM above could then be rewritten as the following JSON:
{ "type": "NAME",
"value" : "John /Doe/",
"nodes" : [
{ "type": "SOUR",
"link": "./S123" },
{ "type": "NOTE",
"value": "Pretty sure that's his name" }
]
}
Or (as more readable but equivalent YAML:
type: NAME
value: John /Doe/
nodes:
- type: SOUR
link: ./S123
- type: NOTE
value: "Pretty sure that's his name"
Note, the @ signs around the source record might not be the right thing...
The full Wikipedia gedcom example:
0 @I1@ INDI
1 NAME John /Doe/
1 BIRT
2 DATE 10 JAN 1800
2 SOUR @S1@
3 DATA
4 TEXT Transcription from birth certificate would go here
3 NOTE This birth record is preferred because it comes from the birth certificate
3 QUAY 2
1 BIRT
2 DATE 11 JAN 1800
2 SOUR @S2@
3 DATA
4 TEXT Transcription from death certificate would go here
3 QUAY 2
would become this JSON structure (the root node has been removed since this tries to represent a single individual). Note that the type of element is an array (corresponding to all the "1" level GEDCOM records).
[
{
"type" : "NAME",
"value" : "John /Doe/"
},
{
"type" : "BIRT",
"nodes" : [
{
"type" : "DATE",
"value" : "10 JAN 1800"
},
{
"link" : "@S1@",
"type" : "SOUR",
"nodes" : [
{
"type" : "DATA",
"nodes" : [
{
"type" : "TEXT",
"value" : "Transcription from birth certificate would go here"
}
]
},
{
"type" : "NOTE",
"value" : "This birth record is preferred because it comes from the birth certificate"
},
{
"type" : "QUAY",
"value" : "2"
}
]
}
]
},
{
"type" : "BIRT",
"nodes" : [
{
"type" : "DATE",
"value" : "11 JAN 1800"
},
{
"link" : "@S2@",
"type" : "SOUR",
"nodes" : [
{
"type" : "DATA",
"nodes" : [
{
"type" : "TEXT",
"value" : "Transcription from death certificate would go here"
}
]
},
{
"type" : "QUAY",
"value" : "2"
}
]
}
]
}
]
which again is the same as this (more legible) YAML file:
- type: NAME
value: "John /Doe/"
- type: BIRT
nodes:
- type: DATE
value: 10 JAN 1800
- type: SOUR
link: '@S1@'
nodes:
- type: DATA
nodes:
- type: TEXT
value: Transcription from birth certificate would go here
- type: NOTE
value: This birth record is preferred because it comes from the birth certificate
- type: QUAY
value: 2
- type: BIRT
nodes:
- type: DATE
value: 11 JAN 1800
- type: SOUR
link: '@S2@'
nodes:
- type: DATA
nodes:
- type: TEXT
value: Transcription from death certificate would go here
- type: QUAY
value: 2
The mapping above can handle any GEDCOM thrown at it and it should be pretty easy to make round-tripping software to go between both formats, without any loss of data.
The root node (0 @I1@ INDI
) could also have a definition, perhaps borrowing from the same names.
{ "type": "INDI",
"nodes": [ ... (nodes) ... ]
}
My current plans don't need to keep the 0 @S123@ INDI
in the actual object; this would be replaced with the URI at which the JSON resource resides; as my plan involves hypelinking between smaller files of GEDCOM JSON format.
With a single (or a list of) GEDCOM JSON files, it would be possible to assemble a large GEDCOM file for easy import into any genealogy software for further processing (phpgedview, lifelines, gramps or whatever) for nice report generation and so on. The primary editing would happen on the basis of these hyperlinked JSON files. The conversion from GEDCOM to JSON and back should be possible to do losslessly.
Interestingly I might be able to link to someone else's tree (and them back to me) so that I could build e.g. large GEDCOM files based on many users' (hyperlinked) GEDCOM JSON files, which then can be used to generate reports.
Other possibilities include the porting of e.g. some of the lifelines-reports directly to a system which merely traverses these JSON structures and dereferences links; this could make it possible to do advanced client based reports which allow users to experience a distributed genealogical tree as one.
I could keep separate "databases" just by creating different directories to hold the different lineages, and enforce different access control levels if needed. For example my brother would be interested in editing our parents and their ancestors, but not my wife's ancestors; and my father in-law isn't so interested in my parents' ancestors. Links between them would be just as simple as "link":"fred/I212"
(and back again).
Any image on the web would be fair game as a media resource. Be it a youtube video, flickr image, whatever. Being a primary citizen of the Web means a lot.
http://synapse.cs.byu.edu/~randy/misc/WishList.html (a smart guy who now works for FamilySearch) outlined his wishes back in 1997/1998; a lot of them just happen magically with this way of solving the Genealogy problem.
And finally, it's not centralized. Like geni.com or the like, where you have to buy into their way of doing things.
- You own your own data
- You control how it's published
- You link to the sources you want to
- You include the data you want to from others
- You can even indicate that this URL is an
ALIA
(Alias) of a person. sort of aowl:sameAs