This formalisation is based on the idea that the meaning of a variant description is a set of fixed sequences projected on the referencesequence. (We call them replacements, but they can also be identities.) We explicitely do not mean the result (sequence) of applying these replacements. At least in my understanding, that is not how HGVS is intended to interpreted (and would also make it impossible to combine variants; how do you combine two plain sequences?).
A replacement r
is a triple s @ n:m
where s
is a sequence and n<=m
are
two integers.
With s @ n:m
we mean replacing the interbase range n:m
by sequence s
.
We shall define a relation
s, d => s, R
which means the application of variant description d
on sequence s
results
in the set of replacements R
on s
.
For example:
ATCG, 3del => s, { '' @ 3:4 }
ATCG, 2dup => s, { T @ 2:2 }
ATCG, [3del;2dup] => s, { '' @ 3:4 , T @ 2:2 }
We now define our relation:
s, iX>Y => s, { Y @ i:i+1 } if i < len(s) and s[i] = X
s, idel => s, { '' @ i:i+1 } if i < len(s)
s, i_jdel => s, { '' @ i:j+1 } if i < j < len(s)
s, idup => s, { s[i] @ i+1:i+1 } if i < len(s)
s, i_jdup => s, { s[i:j+1] @ j+1:j+1 } if i < j < len(s)
s, [d1;...;dn] => s, R1 + .. + Rn if s, di => s, Ri for i = 1...n