To have a more maintainable library with type definitions
I have always hated needing to transform organisms back and forth between the BioLogica.Organism class objects and plain-object formats. Frequently in Geni* we need to turn objects into BioLogica.Organisms before we can use them, or, worse, check whether we've already been passed an Organism or a plain object.
For this reason it would be nice to define an Organism so that it was simply a typed object, which could be serialized, read, and used without any further transformations.
As a result of this, it seems that the library should consist only of pure
functions which take Organisms (or other typed objects) as inputs (e.g.
createGametes(Organism)
instead of Organism.createGametes()
). A worry is that
this may result in some awkward syntax.
{
alleles: "a-a b-b ...",
sex: "female"
// optional
phenotype: {
trait: characteristic
},
// optional
secondXAlleles: "c d ..."
}
- alleles: string
Fully-defined allele string. Two organisms with the same allele string will always have the same phenotype. Or should this be calledgenotype
? - sex: string
- phenotype: { [s: string]: string; }
It's useful to be able to read the phenotype of an organism easily, e.g.createOrganism(...).phenotype.color
instead oforg = createOrganism(...); getCharacteristic(org, 'wings')
, so it seems to make sense for this to be a part of any organism generated by the library. However, it is redundant information, so it probably doesn't need to be passed in when using an organism. - secondXAlleles: string
This formalizes the pattern from geni* that allows us to switch sexes without changing the phenotype. Maybe this is unneeded? We could presumably also have achangeSex
function that explictly picks second-x-alleles that don't change the phenotype. Do we need this for anything else?
organism({
/** optional. One would pass in fully-specified alleles when re-generating
phenotypes */
alleles: "a-a b-b ..."
/** optional. Explicitly specifying the authored syntax */
authoredAlleles: "a-a b- -c d-[D1, D2] e-^[E1] XY"
/** optional */
sex: "male"
}): Organism
breed(org: Organism, org: Organism, crossover=true): Organism
breed(org: Organism, org: Organism, quantity: number, crossover=true): Organism[]
fertilize(gamete: Gamete, gamete: Gamete): Organism
In biologica 1.0, we need to pass in a BioLogica.Species object in any Organism creation
method. This seems ugly, and would be even more so now because we'd no longer create
Organism classes that have their own references to the species, so we'd have to either
add the entire species spec to every organism, or pass in the species spec in every
single method (e.g. breed(Organism, Organism, Species)
).
One solution would be to have an initialization method that sets up the species from the start:
import biologica from `biologica`
const Drakes = biologica(drakeSpecies)
const org1 = Drakes.organism(...)
const child = Drakes.breed(org1, org2)
// instead of
import { organism, breed } from `biologica`
const org1 = organism(..., drakeSpecies)
const child = breed(org1, org2, drakeSpecies)
The top example seems a little uglier than the alternative below, but saves us from
including drakeSpecies
everywhere. Thoughts?
Edit: Or maybe we can keep the top solution but make it cleaner by using
import biologica from `biologica`
const { organism, breed } = biologica(drakeSpecies)
const org1 = organism(...)
const child = breed(org1, org2)
This is probably not so important, but classes with methods can often be read in a more literate manner: If we wanted to, say, get the image src for an authored organism in 1.0, we could write
new Organism(authoredAlleles).getImageSrc()
In 2.0 it seems that we would either need to split it into two lines or use the backwards
getImageSrc(organism(authoredAlleles))
(or we could use the futuristic syntax organism(authoredAlleles)::getImageSrc()
but
let's not.)
a1-a2 b1- -c2 d1 e1-[e2, e3] f1-^[f3] XY
This makes the new dashed syntax the prime format (though we probably need to suport authoring
with the old a:a1,b:a1
synatax) and adds a couple features Geni* has been needing
a1-a2 b1- -c2
Any allele listed on one side or the other of a dash indicates a requirement for that allele on the left or right chromosome. (Note, we need to make explicit which side is from which parent. This is implicit, not explicit, in 1.0).d1
A required allele that could be on either side.[e2, e3]
Either e2 or e3^[f3]
Not f3XY
It seems like it would be nice to be able to specify the sex of an organism directly in the allele string. The sex chromosome are part of the genes, after all, and this would allow for authoring organisms with a single string. Unsure whether it should also be part of the fully-defined allele string, because it is redundant withsex
, and it seems like thesex
property is a useful one to have in an Organism. That said,sex
could also be treated likephenotype
: part of the organism object when created by biologica, but not necessary for any functions.
Thoughts?
Gametes are generated through meiosis, and it's useful to know where the crossovers occured and the provenance of the alleles, so that we can accurately depict a meiosis animation if needed.
In 1.0 we pass back an object with gametes and metadata as separate properties. I think we can do something similar, with the added addition that the MeiosisData
object should have everything it needs to display meiosis, so this includes the original genotype:
{
genotype: `a1-a2 b1-b2 ...`,
crosses: {
[chromosome1Name-a]: [500, 1252, ...],
[chromosome1Name-b]: [...],
[chromosome2Name-a]: [...],
[chromosome2Name-b]: [...]
...
},
haploidCells: [
[chromosome1Name-a-1, chromosome2Name-b-1, ...]
...
]
gametes: [
`a1 b2 ... X`,
`a2 b1 ... Y`,
...
]
}
- genotype
Original genotype of the organism, so we don't need to maintain a reference to it in order to display meiosis - crosses
The location along the length of the chromatid of where crosses occurred. Note a chromosome namedchromosome1Name
will eventually split into four chromatids. These are made up of two pairs with complementary crosses. So we can call themchromosome1Name-a-1, chromosome1Name-a-2, chromosome1Name-b-1, chromosome1Name-b-2
wherechromosome1Name-a-1 chromosome1Name-a-2
have the exact same crosses but alleles from oposite parents. - haploidCells
These are four cells which become the gamete cells, but here we specify which chromatid ends up in each cell, rather than their actual alleles. We need to specify chromatids beause the actual alleles may be identical, but we still want to know which one came from which parent. - gametes
The genotype of each gamete. This is redundant and can be deduced from the above, but it seems useful to put this here rather than requiring another step to transform the above into alleles.
Not sure if there is a clearer way to represent this, and if there are any thoughts on the "haploidCells/gametes" redundancy.
A single gamete
in this case is just a string, so we'll have
fertilize(gamete1: string, gamete2: string): Organism