plentz/gist:1873224

Created February 21, 2012 02:51

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/plentz/1873224.js"></script>
Save plentz/1873224 to your computer and use it in GitHub Desktop.

Raw

https://github.com/plentz/jruby_report/blob/master/ok_json_test.rb

~/Projects/opensource/jruby_report (master) $ ruby -I. ok_json_test.rb 
Run options: --seed 31809

# Running tests:

"\xEF\xBF\xBD"
..F

Finished tests in 0.049639s, 60.4364 tests/s, 60.4364 assertions/s.

  1) Failure:
test_json_encode(OkJsonTest) [ok_json_test.rb:14]:
Expected: "{\"message\":\"á\"}"
  Actual: "{\"message\":\"\\ufffd\"}"

3 tests, 3 assertions, 1 failures, 0 errors, 0 skips

to this


~/Projects/opensource/jruby_report (master) $ ruby -I. ok_json_test.rb 
Run options: --seed 3567

# Running tests:

.FF

Finished tests in 0.028424s, 105.5446 tests/s, 105.5446 assertions/s.

  1) Failure:
test_decode_bad(OkJsonTest) [ok_json_test.rb:24]:
Expected: "\xEF\xBF\xBD"
  Actual: "�"

  2) Failure:
test_json_encode(OkJsonTest) [ok_json_test.rb:14]:
Expected: "{\"message\":\"á\"}"
  Actual: "{\"message\":\"\\u00e1\"}"

3 tests, 3 assertions, 2 failures, 0 errors, 0 skips

~/Projects/opensource/jruby_report (master) $ ruby -v
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]

Author

plentz commented Feb 22, 2012

great @kr! I've updated the tests as you said, but there's one think that still make me think it's the wrong behavior. The test_decode_bad should'nt pass this way?:

  def test_decode_bad
    json = "{\"message\":\"\\ufffd\"}"
    assert_equal("á", OkJson.decode(json)['message'])
  end

When we decode an json, I think that the output will be the "á", or am I wrong? Why this way the test fails?

kr commented Feb 22, 2012

Ah, sorry, I guess my comment was unclear. This string in test_decode_bad:

{"message":"\ufffd"}

is actually valid json representing U+FFFD (REPLACEMENT CHARACTER). This
same character is used by UTF-8 decoders (including okjson) to represent invalid
data that was found in the string during decoding. The UTF-8 representation of this
codepoint is 0xEF 0xBF 0xBD, so in ruby it's "\xEF\xBF\xBD". (By contrast, U+00E1
(LATIN SMALL LETTER A WITH ACUTE) in UTF-8 is 0xC3 0xA1.)

The test was almost correct before. The string data was right, but the metadata
(the encoding on the string) was wrong. So I meant to suggest changing

assert_equal([0xef, 0xbf, 0xbd].pack('C*'), OkJson.decode(json)['message'])

assert_equal("\xEF\xBF\xBD", OkJson.decode(json)['message'])

Another way to represent this idea would be:

s = [0xef, 0xbf, 0xbd].pack('C*')
s.force_encoding('UTF-8')
assert_equal(s, OkJson.decode(json)['message'])

(Also, I take back the suggestion to use "\uFFFD", because it doesn't work in ruby < 1.9.)

Author

plentz commented Feb 22, 2012

great! I misunderstood your comment and updated the code. Btw, I added a couple of tests to flori/json and found something weird. Compare these 2 tests:

To me, looks like that flori/json test is just "righter". (asserting against á instead of \u00e1). I googled for something, but did'nt found a spec that says wich one is the recommended,