Skip to content

Instantly share code, notes, and snippets.

@plentz
Created February 21, 2012 02:51
Show Gist options
  • Save plentz/1873224 to your computer and use it in GitHub Desktop.
Save plentz/1873224 to your computer and use it in GitHub Desktop.

https://github.com/plentz/jruby_report/blob/master/ok_json_test.rb

~/Projects/opensource/jruby_report (master) $ ruby -I. ok_json_test.rb 
Run options: --seed 31809

# Running tests:

"\xEF\xBF\xBD"
..F

Finished tests in 0.049639s, 60.4364 tests/s, 60.4364 assertions/s.

  1) Failure:
test_json_encode(OkJsonTest) [ok_json_test.rb:14]:
Expected: "{\"message\":\"á\"}"
  Actual: "{\"message\":\"\\ufffd\"}"

3 tests, 3 assertions, 1 failures, 0 errors, 0 skips

to this


~/Projects/opensource/jruby_report (master) $ ruby -I. ok_json_test.rb 
Run options: --seed 3567

# Running tests:

.FF

Finished tests in 0.028424s, 105.5446 tests/s, 105.5446 assertions/s.

  1) Failure:
test_decode_bad(OkJsonTest) [ok_json_test.rb:24]:
Expected: "\xEF\xBF\xBD"
  Actual: "�"

  2) Failure:
test_json_encode(OkJsonTest) [ok_json_test.rb:14]:
Expected: "{\"message\":\"á\"}"
  Actual: "{\"message\":\"\\u00e1\"}"

3 tests, 3 assertions, 2 failures, 0 errors, 0 skips
~/Projects/opensource/jruby_report (master) $ ruby -v
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]
@plentz
Copy link
Author

plentz commented Feb 22, 2012

great @kr! I've updated the tests as you said, but there's one think that still make me think it's the wrong behavior. The test_decode_bad should'nt pass this way?:

  def test_decode_bad
    json = "{\"message\":\"\\ufffd\"}"
    assert_equal("á", OkJson.decode(json)['message'])
  end

When we decode an json, I think that the output will be the "á", or am I wrong? Why this way the test fails?

@kr
Copy link

kr commented Feb 22, 2012

Ah, sorry, I guess my comment was unclear. This string in test_decode_bad:

{"message":"\ufffd"}

is actually valid json representing U+FFFD (REPLACEMENT CHARACTER). This
same character is used by UTF-8 decoders (including okjson) to represent invalid
data that was found in the string during decoding. The UTF-8 representation of this
codepoint is 0xEF 0xBF 0xBD, so in ruby it's "\xEF\xBF\xBD". (By contrast, U+00E1
(LATIN SMALL LETTER A WITH ACUTE) in UTF-8 is 0xC3 0xA1.)

The test was almost correct before. The string data was right, but the metadata
(the encoding on the string) was wrong. So I meant to suggest changing

assert_equal([0xef, 0xbf, 0xbd].pack('C*'), OkJson.decode(json)['message'])

to

assert_equal("\xEF\xBF\xBD", OkJson.decode(json)['message'])

Another way to represent this idea would be:

s = [0xef, 0xbf, 0xbd].pack('C*')
s.force_encoding('UTF-8')
assert_equal(s, OkJson.decode(json)['message'])

(Also, I take back the suggestion to use "\uFFFD", because it doesn't work in ruby < 1.9.)

@plentz
Copy link
Author

plentz commented Feb 22, 2012

great! I misunderstood your comment and updated the code. Btw, I added a couple of tests to flori/json and found something weird. Compare these 2 tests:

To me, looks like that flori/json test is just "righter". (asserting against á instead of \u00e1). I googled for something, but did'nt found a spec that says wich one is the recommended,

@plentz
Copy link
Author

plentz commented Feb 22, 2012

Forgot what I've said. Just found this: http://tools.ietf.org/html/rfc4627#section-2.5

Any character *may* be escaped.

So both are correct. Right?

@kr
Copy link

kr commented Feb 22, 2012

Yes, both are correct.

@plentz
Copy link
Author

plentz commented Feb 22, 2012

@kr, sorry to bother you, but I think you would like to read this intridea/multi_json#25 (comment) (btw, thanks for helping me till now :)

@kr
Copy link

kr commented Feb 23, 2012

Hey, no problem. That sounds about right. I've thought about this a few times before,
but it didn't seem like a big deal.

I just made kr/okjson#4 so I don't forget about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment