jacegu · September 11, 2012 08:06
diff --git a/issue_json.rb b/issue_json.rb
 #encoding: utf-8

 require 'json'

 curl_json    = `curl "https://api.github.com/repos/rails/rails/issues/7311"`

 literal_json = %q{
  {
    "title": "Authentication With Token Problem",
    "html_url": "https://github.com/rails/rails/issues/7311",
    "state": "open",
    "milestone": null,
    "assignee": null,
    "number": 7311,
    "closed_at": null,
    "updated_at": "2012-08-20T12:56:09Z",
    "body": "So I've been building an API and ran into a rather unique problem: When using a token value for http authentication w/token that has a `=` character I would lose the data after and any `=` characters. This caused a significant confusion and problem.\n\nI set out to fix the problem and discovered some code that might use a refactor:\n\n``` ruby\n# rails/actionpack/lib/action_controller/metal/http_authentication.rb:433\n\n# Parses the token and options out of the token authorization header.\n# If the header looks like this:\n#   Authorization: Token token=\"abc\", nonce=\"def\"\n# Then the returned token is \"abc\", and the options is {:nonce => \"def\"}\n#\n# request - ActionDispatch::Request instance with the current headers.\n#\n# Returns an Array of [String, Hash] if a token is present.\n# Returns nil if no token is found.\ndef token_and_options(request)\n  if request.authorization.to_s[/^Token (.*)/]\n    values = Hash[$1.split(',').map do |value|\n      value.strip!                      # remove any spaces between commas and values\n      key, value = value.split(/\\=\\\"?/) # split key=value pairs\n      if value\n        value.chomp!('\"')                 # chomp trailing \" in value\n        value.gsub!(/\\\\\\\"/, '\"')          # unescape remaining quotes\n        [key, value]\n      end\n    end.compact]\n    [values.delete(\"token\"), values.with_indifferent_access]\n  end\nend\n```\n\nThis code, when given the Authorization text: `Token token=\"abc\", nonce=\"def\"` would produce the value: `[\"abc\", { :nonce => \"def\" }]` which is entirely correct.\n\nHowever, when given the text: `Token token=\"rcHu+HzSFw89Ypyhn/896A==\", nonce=\"def\"` it would result in a truncated value: `[\"rcHu+HzSFw89Ypyhn/896A\", { :nonce => \"def\" }]`. Worse is if your random value contained a `=` earlier in the value, say with the text: `Token token=\"rcHu+=HzSFw89Ypyhn/896A=\", nonce=\"def\"` you would get the return value `[\"rcHu+\", {\"nonce\"=>\"def\"}]`.\n\nThis is obviously problematic, but I wanted to make sure it was correct, so I hunted down the header RFC:\n\n[The RFC2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2) documents the grammar for header field values:\n\n> A string of text is parsed as a single word if it is quoted using double-quote marks.\n>\n>     quoted-string  = ( <\"> *(qdtext | quoted-pair ) <\"> )\n>     qdtext         = <any TEXT except <\">>\n>\n> The backslash character (\"\\\") MAY be used as a single-character quoting mechanism only within quoted-string and comment constructs.\n>\n>     quoted-pair    = \"\\\" CHAR\n\nNow that we know that `=` characters can exist in these values I now had to write code to allow it. This actually lead me to another problem: The text in a field value containing: `\\\\\"` would return the value `\\\"`. So after identifying two problems with the current method I decided to rewrite it and break it up into this:\n\n``` ruby\ndef token_and_options(request)\n  authorization = request.authorization.to_s\n  if authorization[/Token .+/]\n    params = token_params_from authorization_request\n    [params.shift.last, Hash[params].with_indifferent_access]\n  end\nend\n\ndef token_params_from(authorization)\n  raw = authorization.split(/,*\\s+/).drop 1\n  rewrite_param_values params_array_from raw\nend\n\ndef params_array_from(raw_params)\n  raw_params.map { |p| p.split(%r/=(.+)?/) }\nend\n\ndef rewrite_param_values(params_array)\n  params_array.each { |p| p.last.gsub! /^\"|\"$/, '' }\nend\n```\n\nI then ran all the tests. This is where I decided to make an issue instead of a pull request:\n\n```\nFinished tests in 0.046664s, 428.5959 tests/s, 878.6216 assertions/s.\n\n  1) Failure:\ntest_0020_authentication request with valid credential(HttpTokenAuthenticationTest) [test/controller/http_token_authentication_test.rb:111]:\nExpected response to be a <success>, but was <401>\n```\n\nLooking into the test I see this source:\n\n``` ruby\n# rails/actionpack/test/controller/http_token_authentication_test.rb\n\ntest \"authentication request with valid credential\" do\n  @request.env['HTTP_AUTHORIZATION'] = encode_credentials('\"quote\" pretty', :algorithm => 'test')\n  get :display\n\n  assert_response :success\n  assert assigns(:logged_in)\n  assert_equal 'Definitely Maybe', @response.body\nend\n```\n\nFuther I find that the test was spitting out this for the Authorization text: `Token token=\"\\\"quote\\\" pretty\", algorithm=\"test\"`. It appears to me that it's attempting to test escaped strings as field values. The problem is that there are no specifications for this. The only details we have is that the `quoted-string` value **cant** be the `\"` character.\n\nThus the two reasons for this issue:\n\n  1. I don't believe that we should stop the use of the `\"`, because we don't need to.\n  2. This test should be removed as it's not required to test this sort of thing, it's not implied by the RFC.\n\nI don't think we need to follow that rule because the below modified code I wrote has no problem dealing with texts field values with `\"` characters:\n\n``` ruby\ndef token_and_options(request)\n  authorization_request = request.authorization.to_s\n  if authorization_request[/Token .+/]\n    params = token_params_from authorization_request\n    [params.shift.last, Hash[params].with_indifferent_access]\n  end\nend\n\ndef token_params_from(authorization)\n  raw = authorization.sub(/^Token /, '').split(/(?:,|;|\\s)\\s*/)\n  rewrite_param_values params_array_from raw\nend\n\ndef params_array_from(raw_params)\n  raw_params.map { |param| param.split(%r/=(.+)?/) }\nend\n\ndef rewrite_param_values(array_params)\n  array_params.each { |param| param.last.gsub! /^\"|\"$/, '' }\nend\n```\n\nHere are the different cases I've documented (first is old method, second is new method):\n\n``` ruby\n# The text value: Token token=\"lifo\"\n[\"lifo\", {}]\n[\"lifo\", {}]\n\n# The text value: Token token=\"lifo\", algorithm=\"test\"\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"rcHu+HzSFw89Ypyhn/896A==\", nonce=\"def\"\n[\"rcHu+HzSFw89Ypyhn/896A\", {\"nonce\"=>\"def\"}]\n[\"rcHu+HzSFw89Ypyhn/896A==\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"rcHu+=HzSFw89Ypyhn/896A==f34\", nonce=\"def\"\n[\"rcHu+\", {\"nonce\"=>\"def\"}]\n[\"rcHu+=HzSFw89Ypyhn/896A==f34\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"rcHu+\\\\\\\\\"/896A\", nonce=\"def\"\n[\"rcHu+\\\\\\\"/896A\", {\"nonce\"=>\"def\"}]\n[\"rcHu+\\\\\\\\\\\"/896A\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"lifo\"; algorithm=\"test\"\n[\"lifo\\\"; algorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"lifo\" algorithm=\"test\"\n[\"lifo\\\" algorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"lifo\"\talgorithm=\"test\"\n[\"lifo\\\"\\talgorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"\\\"quote\\\" pretty\", algorithm=\"test\"\n[\"\\\"quote\\\" pretty\", {\"algorithm\"=>\"test\"}]\n[\"\\\\\\\"quote\\\\\", {\"algorithm\"=>\"test\", \"pretty\"=>nil}]\n```\n\nYou may have noticed also that my code allows for the `;`, ' ', and '\\t' character to be a seperator. Clearly the RFC allows for many more, but I figure this is enough for now.\n",
    "labels": [

    ],
    "user": {
      "gravatar_id": "e1e3a7a63326260b58f82d12f9003e64",
      "avatar_url": "https://secure.gravatar.com/avatar/e1e3a7a63326260b58f82d12f9003e64?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png",
      "url": "https://api.github.com/users/krainboltgreene",
      "login": "krainboltgreene",
      "id": 334809
    },
    "closed_by": null,
    "url": "https://api.github.com/repos/rails/rails/issues/7311",
    "id": 6141700,
    "pull_request": {
      "html_url": null,
      "diff_url": null,
      "patch_url": null
    },
    "created_at": "2012-08-10T01:29:39Z",
    "comments": 8
  }
 }

 JSON.parse curl_json
 puts 'CURL JSON PARSED'

 JSON.parse literal_json
 puts 'LITERAL JSON PARSED'
	#encoding: utf-8

	require 'json'

	curl_json = `curl "https://api.github.com/repos/rails/rails/issues/7311"`

	literal_json = %q{
	{
	"title": "Authentication With Token Problem",
	"html_url": "https://github.com/rails/rails/issues/7311",
	"state": "open",
	"milestone": null,
	"assignee": null,
	"number": 7311,
	"closed_at": null,
	"updated_at": "2012-08-20T12:56:09Z",
	"body": "So I've been building an API and ran into a rather unique problem: When using a token value for http authentication w/token that has a `=` character I would lose the data after and any `=` characters. This caused a significant confusion and problem.\n\nI set out to fix the problem and discovered some code that might use a refactor:\n\n``` ruby\n# rails/actionpack/lib/action_controller/metal/http_authentication.rb:433\n\n# Parses the token and options out of the token authorization header.\n# If the header looks like this:\n# Authorization: Token token=\"abc\", nonce=\"def\"\n# Then the returned token is \"abc\", and the options is {:nonce => \"def\"}\n#\n# request - ActionDispatch::Request instance with the current headers.\n#\n# Returns an Array of [String, Hash] if a token is present.\n# Returns nil if no token is found.\ndef token_and_options(request)\n if request.authorization.to_s[/^Token (.)/]\n values = Hash[$1.split(',').map do \|value\|\n value.strip! # remove any spaces between commas and values\n key, value = value.split(/\\=\\\"?/) # split key=value pairs\n if value\n value.chomp!('\"') # chomp trailing \" in value\n value.gsub!(/\\\\\\\"/, '\"') # unescape remaining quotes\n [key, value]\n end\n end.compact]\n [values.delete(\"token\"), values.with_indifferent_access]\n end\nend\n```\n\nThis code, when given the Authorization text: `Token token=\"abc\", nonce=\"def\"` would produce the value: `[\"abc\", { :nonce => \"def\" }]` which is entirely correct.\n\nHowever, when given the text: `Token token=\"rcHu+HzSFw89Ypyhn/896A==\", nonce=\"def\"` it would result in a truncated value: `[\"rcHu+HzSFw89Ypyhn/896A\", { :nonce => \"def\" }]`. Worse is if your random value contained a `=` earlier in the value, say with the text: `Token token=\"rcHu+=HzSFw89Ypyhn/896A=\", nonce=\"def\"` you would get the return value `[\"rcHu+\", {\"nonce\"=>\"def\"}]`.\n\nThis is obviously problematic, but I wanted to make sure it was correct, so I hunted down the header RFC:\n\n[The RFC2616](http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2) documents the grammar for header field values:\n\n> A string of text is parsed as a single word if it is quoted using double-quote marks.\n>\n> quoted-string = ( <\"> (qdtext \| quoted-pair ) <\"> )\n> qdtext = <any TEXT except <\">>\n>\n> The backslash character (\"\\\") MAY be used as a single-character quoting mechanism only within quoted-string and comment constructs.\n>\n> quoted-pair = \"\\\" CHAR\n\nNow that we know that `=` characters can exist in these values I now had to write code to allow it. This actually lead me to another problem: The text in a field value containing: `\\\\\"` would return the value `\\\"`. So after identifying two problems with the current method I decided to rewrite it and break it up into this:\n\n``` ruby\ndef token_and_options(request)\n authorization = request.authorization.to_s\n if authorization[/Token .+/]\n params = token_params_from authorization_request\n [params.shift.last, Hash[params].with_indifferent_access]\n end\nend\n\ndef token_params_from(authorization)\n raw = authorization.split(/,\\s+/).drop 1\n rewrite_param_values params_array_from raw\nend\n\ndef params_array_from(raw_params)\n raw_params.map { \|p\| p.split(%r/=(.+)?/) }\nend\n\ndef rewrite_param_values(params_array)\n params_array.each { \|p\| p.last.gsub! /^\"\|\"$/, '' }\nend\n```\n\nI then ran all the tests. This is where I decided to make an issue instead of a pull request:\n\n```\nFinished tests in 0.046664s, 428.5959 tests/s, 878.6216 assertions/s.\n\n 1) Failure:\ntest_0020_authentication request with valid credential(HttpTokenAuthenticationTest) [test/controller/http_token_authentication_test.rb:111]:\nExpected response to be a <success>, but was <401>\n```\n\nLooking into the test I see this source:\n\n``` ruby\n# rails/actionpack/test/controller/http_token_authentication_test.rb\n\ntest \"authentication request with valid credential\" do\n @request.env['HTTP_AUTHORIZATION'] = encode_credentials('\"quote\" pretty', :algorithm => 'test')\n get :display\n\n assert_response :success\n assert assigns(:logged_in)\n assert_equal 'Definitely Maybe', @response.body\nend\n```\n\nFuther I find that the test was spitting out this for the Authorization text: `Token token=\"\\\"quote\\\" pretty\", algorithm=\"test\"`. It appears to me that it's attempting to test escaped strings as field values. The problem is that there are no specifications for this. The only details we have is that the `quoted-string` value cant* be the `\"` character.\n\nThus the two reasons for this issue:\n\n 1. I don't believe that we should stop the use of the `\"`, because we don't need to.\n 2. This test should be removed as it's not required to test this sort of thing, it's not implied by the RFC.\n\nI don't think we need to follow that rule because the below modified code I wrote has no problem dealing with texts field values with `\"` characters:\n\n``` ruby\ndef token_and_options(request)\n authorization_request = request.authorization.to_s\n if authorization_request[/Token .+/]\n params = token_params_from authorization_request\n [params.shift.last, Hash[params].with_indifferent_access]\n end\nend\n\ndef token_params_from(authorization)\n raw = authorization.sub(/^Token /, '').split(/(?:,\|;\|\\s)\\s*/)\n rewrite_param_values params_array_from raw\nend\n\ndef params_array_from(raw_params)\n raw_params.map { \|param\| param.split(%r/=(.+)?/) }\nend\n\ndef rewrite_param_values(array_params)\n array_params.each { \|param\| param.last.gsub! /^\"\|\"$/, '' }\nend\n```\n\nHere are the different cases I've documented (first is old method, second is new method):\n\n``` ruby\n# The text value: Token token=\"lifo\"\n[\"lifo\", {}]\n[\"lifo\", {}]\n\n# The text value: Token token=\"lifo\", algorithm=\"test\"\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"rcHu+HzSFw89Ypyhn/896A==\", nonce=\"def\"\n[\"rcHu+HzSFw89Ypyhn/896A\", {\"nonce\"=>\"def\"}]\n[\"rcHu+HzSFw89Ypyhn/896A==\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"rcHu+=HzSFw89Ypyhn/896A==f34\", nonce=\"def\"\n[\"rcHu+\", {\"nonce\"=>\"def\"}]\n[\"rcHu+=HzSFw89Ypyhn/896A==f34\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"rcHu+\\\\\\\\\"/896A\", nonce=\"def\"\n[\"rcHu+\\\\\\\"/896A\", {\"nonce\"=>\"def\"}]\n[\"rcHu+\\\\\\\\\\\"/896A\", {\"nonce\"=>\"def\"}]\n\n# The text value: Token token=\"lifo\"; algorithm=\"test\"\n[\"lifo\\\"; algorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"lifo\" algorithm=\"test\"\n[\"lifo\\\" algorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"lifo\"\talgorithm=\"test\"\n[\"lifo\\\"\\talgorithm\", {}]\n[\"lifo\", {\"algorithm\"=>\"test\"}]\n\n# The text value: Token token=\"\\\"quote\\\" pretty\", algorithm=\"test\"\n[\"\\\"quote\\\" pretty\", {\"algorithm\"=>\"test\"}]\n[\"\\\\\\\"quote\\\\\", {\"algorithm\"=>\"test\", \"pretty\"=>nil}]\n```\n\nYou may have noticed also that my code allows for the `;`, ' ', and '\\t' character to be a seperator. Clearly the RFC allows for many more, but I figure this is enough for now.\n",
	"labels": [

	],
	"user": {
	"gravatar_id": "e1e3a7a63326260b58f82d12f9003e64",
	"avatar_url": "https://secure.gravatar.com/avatar/e1e3a7a63326260b58f82d12f9003e64?d=https://a248.e.akamai.net/assets.github.com%2Fimages%2Fgravatars%2Fgravatar-user-420.png",
	"url": "https://api.github.com/users/krainboltgreene",
	"login": "krainboltgreene",
	"id": 334809
	},
	"closed_by": null,
	"url": "https://api.github.com/repos/rails/rails/issues/7311",
	"id": 6141700,
	"pull_request": {
	"html_url": null,
	"diff_url": null,
	"patch_url": null
	},
	"created_at": "2012-08-10T01:29:39Z",
	"comments": 8
	}
	}

	JSON.parse curl_json
	puts 'CURL JSON PARSED'

	JSON.parse literal_json
	puts 'LITERAL JSON PARSED'