I have some problems related to how core data in the Pokémon games has changed over time. "Time" here is a discrete quantity, where each game is a point in time.
I just said each game is a point in time and I am lying. Many of the games come in pairs, which come out simultaneously. So Red and Blue are distinct games, but they are almost identical, except when they aren't. The current schema adds a layer of indirection called a "version group", but for the most part it's just been cumbersome and confusing.
There are also side games that are somewhat compatible to the main series, like Stadium, an N64 game that allowed you to transfer your Pokémon from your game and battle with them. The game exists, and has a couple of differences from the main-series games of the time, but placing it in the main-series chronology as a point in time doesn't quite make sense.
As an additional wrinkle, there are games that are completely incompatible with the main series, but which bear the Pokémon name and borrow many of the mechanics — primarily Pokémon Mystery Dungeon, Pokémon Ranger, and Pokémon Conquest. Putting these in the chronology doesn't make a lot of sense, either.
Oh and there have been a couple very rare cases where very minor details changed in the same game between different language releases. But I'm okay ignoring those, especially since it looks like we'll be doing simultaneous identical worldwide releases from now on.
These are the primary elements of the game: Pokémon, moves, types, abilities, and whatnot.
They have some number of properties, which is not a surprise. A subset of the Pokémon table might look like this:
identifier | parent | capture_rate |
---|---|---|
bulbasaur | 45 | |
ivysaur | bulbasaur | 45 |
venusaur | ivysaur | 45 |
But wait, oops! X and Y changed the capture rate for a bunch of Pokémon. So now I would need a table like this:
identifier | parent |
---|---|
bulbasaur | |
ivysaur | bulbasaur |
venusaur | ivysaur |
And ALSO a table (pokemon_data
?) like this:
pokemon | game | capture_rate |
---|---|---|
bulbasaur | red | 45 |
bulbasaur | blue | 45 |
bulbasaur | yellow | 45 |
bulbasaur | gold | 45 |
bulbasaur | silver | 45 |
bulbasaur | crystal | 45 |
bulbasaur | ruby | 45 |
bulbasaur | sapphire | 45 |
bulbasaur | fire-red | 45 |
bulbasaur | leaf-green | 45 |
bulbasaur | emerald | 45 |
bulbasaur | diamond | 45 |
bulbasaur | pearl | 45 |
bulbasaur | platinum | 45 |
bulbasaur | heart-gold | 45 |
bulbasaur | soul-silver | 45 |
bulbasaur | black | 45 |
bulbasaur | white | 45 |
bulbasaur | black-2 | 45 |
bulbasaur | white-2 | 45 |
bulbasaur | x | 40 |
bulbasaur | y | 40 |
bulbasaur | omega-ruby | 40 |
bulbasaur | alpha-sapphire | 40 |
I feel in my soul that this may be slightly ridiculous.
The only obvious alternative is to keep "current" data in the main table:
identifier | parent | capture_rate |
---|---|---|
bulbasaur | 40 | |
ivysaur | bulbasaur | 40 |
venusaur | ivysaur | 40 |
And then have a "history" table for changes:
pokemon | last_version | capture_rate |
---|---|---|
bulbasaur | white-2 | 45 |
ivysaur | white-2 | 45 |
venusaur | white-2 | 45 |
This is certainly more compact. But it's also very awkward to query, especially if all you want to know is "what was Bulbasaur's capture rate in Leaf Green"? And it relies on ordering the games, when their identifiers aren't inherently ordered. It doesn't really make sense to say "up through" White 2 anyway — what if I said up through Black 2? They were released at the same time.
It's worth noting that Pokémon evolution (the "parent" field above) has also changed, rather frequently. But only in the form of adding entirely new Pokémon. So it's easy to tell that Pikachu doesn't evolve from Pichu in Pokémon Red, because Pichu didn't yet exist at all.
A similar problem happens with one-to-many data, like Pokémon types. A Pokémon may have one or two types, thus:
pokemon | type | order |
---|---|---|
bulbasaur | grass | 1 |
bulbasaur | poison | 2 |
ivysaur | grass | 1 |
ivysaur | poison | 2 |
venusaur | grass | 1 |
venusaur | poison | 2 |
But, alas, these too have changed. New types have been introduced twice, and both times, existing Pokémon gained a new type. So now I need to do this:
pokemon | game | type | order |
---|---|---|---|
magnemite | red | electric | 1 |
magnemite | blue | electric | 1 |
magnemite | yellow | electric | 1 |
magnemite | gold | electric | 1 |
magnemite | gold | steel | 2 |
magnemite | silver | electric | 1 |
magnemite | silver | steel | 2 |
magnemite | crystal | electric | 1 |
magnemite | crystal | steel | 2 |
...and so on.
Until now, I actually haven't been doing the above; there's only been a single set of types. The reason was that the Pokémon who changed (like Magnemite) only grew a new type, which hadn't existed previously. So much like the evolution case, it was easy to "correct" the data for older versions.
X and Y have (once again) added a huge wrinkle. Clefairy has historically been normal-type, but now is fairy-type. This is a change, not an addition, which has never happened before.
Types in Pokémon primarily serve as damage modifiers. Type A attacking type B might be super effective (×2), not very effective (×½), or not at all effective (×0). This matrix of interactions is usually called the type chart.
Naturally I have a table as follows:
attacker_type | target_type | modifier |
---|---|---|
electric | water | 200 |
electric | normal | 100 |
electric | dragon | 50 |
electric | ground | 0 |
The problem, of course, is that the type chart has also changed twice. So this table, too, needs to be duplicated once for every game.
There is one alternative here that doesn't readily present itself anywhere else. Since the type chart as a whole has only changed twice, I could make "type chart" itself an entity. Then I'd only have to store the data three times, and associate each game with one of the three type charts as appropriate.
However, in the grand scheme of things this is probably the smallest table that would need duplicating, so it doesn't seem worth the pain of doing things differently.
Names are localized, so we already have a composite key for them, e.g.:
pokemon | in_language | name |
---|---|---|
eevee | en | eevee |
eevee | ja | イーブイ |
But names have, on rare occasions, also changed. Before Black and White, most names were actually rendered in all caps throughout the games, even though we refer to them (and falsely store them) as Title Case. Then X and Y increased the maximum length of move names, so several moves that were crammed awkwardly into the old limit have now grown spaces. Thus the move Thunder Shock has actually had three different names: THUNDERSHOCK, ThunderShock, and Thunder Shock.
The obvious thing to do is to add to the composite key.
move | in_language | game | name |
---|---|---|---|
thunder-shock | en | red | THUNDERSHOCK |
thunder-shock | en | black | ThunderShock |
thunder-shock | en | x | Thunder Shock |
But... that example doesn't really tell the whole story. The table would actually contain, just for this one move in one language:
move | in_language | game | name |
---|---|---|---|
thunder-shock | en | red | THUNDERSHOCK |
thunder-shock | en | blue | THUNDERSHOCK |
thunder-shock | en | yellow | THUNDERSHOCK |
thunder-shock | en | gold | THUNDERSHOCK |
thunder-shock | en | silver | THUNDERSHOCK |
thunder-shock | en | crystal | THUNDERSHOCK |
thunder-shock | en | ruby | THUNDERSHOCK |
thunder-shock | en | sapphire | THUNDERSHOCK |
thunder-shock | en | fire-red | THUNDERSHOCK |
thunder-shock | en | leaf-green | THUNDERSHOCK |
thunder-shock | en | diamond | THUNDERSHOCK |
thunder-shock | en | pearl | THUNDERSHOCK |
thunder-shock | en | platinum | THUNDERSHOCK |
thunder-shock | en | heart-gold | THUNDERSHOCK |
thunder-shock | en | soul-silver | THUNDERSHOCK |
thunder-shock | en | black | ThunderShock |
thunder-shock | en | white | ThunderShock |
thunder-shock | en | black-2 | ThunderShock |
thunder-shock | en | white-2 | ThunderShock |
thunder-shock | en | x | Thunder Shock |
thunder-shock | en | y | Thunder Shock |
thunder-shock | en | omega-ruby | Thunder Shock |
thunder-shock | en | alpha-sapphire | Thunder Shock |
This seems, to me, a slightly excessive way to express "fixed the case, then added a space".
Again, the only alternative is a "history" table — which has the same drawbacks as before, plus the added complication of an existing composite key.
The database contains descriptions of move effects, item effects, and the like.
These, too, have changed over time. But I especially don't want to duplicate this prose, because:
-
Prose is something we write by hand, not something we can just extract from the games once and be done with. Even prose for very old games might be updated now, as our understanding improves. So if prose were duplicated per game, it would be very easy for it to get out of sync across games where it should be identical. (We have enough trouble already, keeping prose similar for similar effects.)
-
Ideally we'll have localized versions of prose too, and duplicates make that all the more difficult.
-
It's useful to see how effects changed over time, but not necessarily easy to make this happen with prose. Inline diffs are not particularly reliable things, so we'd likely end up explicitly writing prose that's diff-friendly, which is even more of a pain.
I only really have two ideas for addressing this.
-
Have a single prose description, plus a history table that expresses changes in relative terms, e.g. "Prior to Gold and Silver, the user skipped its recharge turn if this move fainted the target." Advantage is that it's easier to understand; disadvantage is that it gets kind of hairy for moves that have changed a lot or that have changed very significantly.
-
Have a single prose description, with some kind of invented inline markup for expressing changes — i.e., write and maintain our own diffs, manually. The disadvantage, of course, is tedium. But the advantage is a reduction in how much overall stuff there is to write and maintain.
-
Possibly outlandishly impractical, but a third option is to get the prose out of the database entirely, and instead represent move effects in as data-oriented a way as possible. Which is easy for "30% chance to paralyze", but more difficult for "applies a curse if the user is a ghost otherwise applies these stat changes".
Are there any great solutions I've overlooked? For the moment I'm heavily
leaning towards just adding a game
column to almost every goddamn table and
duplicating the hell out of everything. The db will balloon in size and
there'll be a lot of UI work to make any sense of it, but it's the most
flexible and the most correct. It's also nice to have more or less the same
versioning story throughout, whereas right now it's kind of ad-hoc and just
scoped to whatever we needed at the time.
The thing that most bothers me is knowing how much unnecessary duplication there'll be for games in a pair. But version groups just kinda suck.
Are separate schemas out of the question? For instance, schema GEN1 contains data pertaining to Red, Blue, and Yellow. GEN2 contains data pertaining to Gold, Silver, and Crystal. GEN3 contains Ruby, Sapphire, Emerald, AND Fire Red and Leaf Green. So on and so forth. You could do schemas per game, but this seems a bit excessive. Of course, you are completely correct that this could cause significant data duplication amongst games that didn't change much from their predecessors. In cases where there were no changes, that data could live in another schema called COMMON or something like that. This schema would contain anything prone to excessive duplication, like the original 150 Pokemon. Sure, the types and moves and whatnot have changed, but I've never seen a Pokemon removed or renamed. If something does radically change in the future, move that data out of the COMMON schema and into wherever it needs to go. But how do we know if data in a particular field is common or unique? Flag it. If it's unique, the field will contain meaningful data. If it's common between other games/gens, the field would contain some sort of flag or reference to where the common data is located. This is just me brainstorming, so feel free to build on it, correct me, or tear it apart with constructive criticism. 😃
Edit: And maybe a separate schema for side games.