TL:DR; I'm not sure if it just a data problem or i'm tackling the problem wrong
What i'm trying to do is getting a list of entries with variable properties mapped to a well known list of entries with fixed properties
In other words i have a variable textual entry composed of {"ABC (test)" "wrongdata"}
which needs to be mapped to to a fixed entry {"ABC" "test"}
There are a variety of data providers that output entries with different combinations of fields and values that need to possibly be mapped to the same final entry.
The input data format contains about 4 fields, with a few optional ones, and the output one contains the same amount of fields and an order of magnitude more fields that further identify each entry
Input format is a list containing a possible variant of
"Name, Edition"
"Name (Variant)"
(noEdition
)"Name, WrongEdition"
(WrongEdition
is wrong, it should be ignored)"Name (Variant), Edition"
(Edition
could be wrong or it could be a hint forVariant
)- optional additional values, not described here, that could be used to further process each entry
Output format is
- a map (json file in the form of {"Edition": {"Name", "property1", "property2"...})
I created a sort of parser that for each possible combination of inputs it tries different permutation of the output, until it finds one which more closely mimics the input
However, there are several drawbacks
- it creates a lot of false duplicates
- it takes an enormous amount of time to find and write each rule
- every time you want to add a new input provide you need to repeat the whole process
- Example data provider (a little more than a csv parser): https://github.com/kodabb/go-mtgban/blob/master/strikezone/strikezone.go#L221-L350
- Example parser: https://github.com/kodabb/go-mtgban/blob/master/strikezone/card.go#L92-L135 - this basically iterates of a map according to the rules defined by
parseSet()
andparseNumber()
and these two functions are just a whole lot of nothing filled with "if xxx try this" - Example ruleset for Set: https://github.com/kodabb/go-mtgban/blob/master/strikezone/set.go#L131-L346
- Example ruleset for Number: https://github.com/kodabb/go-mtgban/blob/master/strikezone/set.go#L348-L568
What the parser is doing is merely for ed in Editions; for name in Editions.Names if name == MYNAME, then FOUND()
.
If this was a one-time parser it would be done and done, but there are hundreds of parsers that perform differently so what I thought of doing was creating a generic one instead of an ad-hoc one, but it's becoming an eldritch monstrosity the more rules i add
Example unified parser: ttps://github.com/kodabb/go-mtgban/blob/unified-converter/mtgdb/match.go
Right now if a new edition (data entry format) comes out, i need to a new rule to each data provider, while i wanted to make something that would let me add a single rule to a single parser