catb0t · September 12, 2016 14:05
diff --git a/inconsist b/inconsist
 canonical-data.json needs standardisation

 Hello,

 I maintain the Factor track, 
 and I'd like to automate generation of unit tests 
 for exercises in my language, and looking at 
 `exercises/leap/canonical-data.json` it would seem 
 to be quite simple. However, many of the 
 `canonical-data.json`s don't have a standard set 
 of keys found in `leap`'s json, and this makes it 
 difficult to automate around.

 There are, as far as I can tell, two solutions to 
 the problems introduced by the inconsistencies.

 * Rather than hardcoding the `description`, `input` 
 and `expected` keys, use a regex / fuzzy find to 
 group keys into description, input and output.
 The main disadvantages of this are twofold: not 
 only must my code be flimsy, but so must everyone 
 else's, and subject to break on the whims of anyone.

 * Standardise on a fixed, predictable set of keys 
 and what their values represent. This makes the jobs
 of track maintainers easier, simplifies interacting 
 code, and future-proofs the api and the code.

 I think standardisation would be greatly beneficial,
 but before I open a pull request with structural 
 changes to hundreds of lines of data, I'd like some
 feedback.

 First, is anyone objected to changing the names of 
 the keys? They're rather haphazard (nearly as if
 it had been written for humans to read ): ) and some 
 exercises are missing `canonical-data.json` altogether, 
 and consequently I have difficulty believing there are
 programs reading this stuff. (If we make an API more 
 accessible, perhaps more tracks will automate 
 generation / regeneration of tests, which would be 
 positive.)

 Second, what keys should be used? I'm thinking 
 something like:

 * For exercises with one input translating to one
 output, `description`, `input` and `output`.

 * For exercises with multiple inputs / multiple 
 outputs, `description`, `input_N`, `output_N`. 

 Note that it would be disadvantageous to use an array 
 for multiple inputs / outputs where one is not part of the 
 exercise because it would be hard or impossible to tell the 
 difference.
	canonical-data.json needs standardisation

	Hello,

	I maintain the Factor track,
	and I'd like to automate generation of unit tests
	for exercises in my language, and looking at
	`exercises/leap/canonical-data.json` it would seem
	to be quite simple. However, many of the
	`canonical-data.json`s don't have a standard set
	of keys found in `leap`'s json, and this makes it
	difficult to automate around.

	There are, as far as I can tell, two solutions to
	the problems introduced by the inconsistencies.

	* Rather than hardcoding the `description`, `input`
	and `expected` keys, use a regex / fuzzy find to
	group keys into description, input and output.
	The main disadvantages of this are twofold: not
	only must my code be flimsy, but so must everyone
	else's, and subject to break on the whims of anyone.

	* Standardise on a fixed, predictable set of keys
	and what their values represent. This makes the jobs
	of track maintainers easier, simplifies interacting
	code, and future-proofs the api and the code.

	I think standardisation would be greatly beneficial,
	but before I open a pull request with structural
	changes to hundreds of lines of data, I'd like some
	feedback.

	First, is anyone objected to changing the names of
	the keys? They're rather haphazard (nearly as if
	it had been written for humans to read ): ) and some
	exercises are missing `canonical-data.json` altogether,
	and consequently I have difficulty believing there are
	programs reading this stuff. (If we make an API more
	accessible, perhaps more tracks will automate
	generation / regeneration of tests, which would be
	positive.)

	Second, what keys should be used? I'm thinking
	something like:

	* For exercises with one input translating to one
	output, `description`, `input` and `output`.

	* For exercises with multiple inputs / multiple
	outputs, `description`, `input_N`, `output_N`.

	Note that it would be disadvantageous to use an array
	for multiple inputs / outputs where one is not part of the
	exercise because it would be hard or impossible to tell the
	difference.