Skip to content

Instantly share code, notes, and snippets.

@kennyluck
Created July 10, 2011 01:52
Show Gist options
  • Save kennyluck/1074151 to your computer and use it in GitHub Desktop.
Save kennyluck/1074151 to your computer and use it in GitHub Desktop.
Twitter doesn't sanitize certain control characters in JSON output of API

Problem

Twitter API doesn't sanitize characters such as U+0010 Data Link Escape in JSON output, while it does sanitize it into a "*" in XML. See the following files and pay attention to the text field. This becomes a problem when a 3rd party API site, such as Gtweet, generates XML-based format, in this case RSS, based on the JSON, it would result in malformed XML if the 3rd party site doesn't do sanitization.

In fact, this already broke my feed reader.

How to reproduce

The attached files are generated by

curl http://api.twitter.com/1/statuses/show/89719449015427072.json

and

curl http://api.twitter.com/1/statuses/show/89719449015427072.xml

Possible solutions

Three alternative solutions could solve my problem.

  1. As the XML output already does sanitization to these control characters, perhaps it wouldn't raise too much performance drawback to do similar sanitization for JSON output.

  2. Gtweet could do the sanitization when generating RSS. This might be the most reasonable. I am not sure.

  3. Twitter could do the sanitization during user input, as control characters are considered invalid in raw HTML and shouldn't appear in the DOM when you open, say, the example tweet.

{
"place":null,
"user":{
"protected":false,
"default_profile":false,
"contributors_enabled":false,
"profile_text_color":"333333",
"name":"xdite",
"default_profile_image":false,
"profile_sidebar_fill_color":"F6F6F6",
"id_str":"5981612",
"profile_background_tile":false,
"utc_offset":28800,
"friends_count":390,
"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/1360519344\/avatar2_normal.jpg",
"is_translator":false,
"following":true,
"description":"Rails Developer, Tech Blogger. Architect of Techbang",
"location":"Taiwan",
"follow_request_sent":false,
"verified":false,
"profile_link_color":"038543",
"followers_count":2206,
"screen_name":"xdite",
"profile_sidebar_border_color":"EEEEEE",
"url":"http:\/\/blog.xdite.net",
"show_all_inline_media":false,
"geo_enabled":true,"time_zone":"Taipei",
"id":5981612,
"notifications":false,
"profile_use_background_image":true,
"favourites_count":462,
"created_at":"Sat May 12 05:09:35 +0000 2007",
"listed_count":99,
"profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme18\/bg.gif",
"profile_background_color":"ACDED6",
"lang":"en",
"statuses_count":11767,
"profile_background_image_url":"http:\/\/a1.twimg.com\/images\/themes\/theme18\/bg.gif",
"profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/1360519344\/avatar2_normal.jpg"},
"in_reply_to_user_id":null,
"retweet_count":0,
"id_str":"89719449015427072",
"geo":null,"favorited":false,
"text":"\u0010@far \u597d\u50cf\u4e5f\u662f\u88ab\u9019\u6a23\u8ce3\u6389\u7684...",
"in_reply_to_status_id_str":null,
"in_reply_to_screen_name":null,
"in_reply_to_user_id_str":null,
"coordinates":null,
"truncated":false,
"contributors":null,
"retweeted":false,
"id":89719449015427072,
"source":"\u003Ca href=\"http:\/\/itunes.apple.com\/us\/app\/twitter\/id409789998?mt=12\" rel=\"nofollow\"\u003ETwitter for Mac\u003C\/a\u003E",
"created_at":"Sat Jul 09 15:35:58 +0000 2011",
"in_reply_to_status_id":null
}
<?xml version="1.0" encoding="UTF-8"?>
<status>
<created_at>Sat Jul 09 15:35:58 +0000 2011</created_at>
<id>89719449015427072</id>
<text>*@far &#22909;&#20687;&#20063;&#26159;&#34987;&#36889;&#27171;&#36067;&#25481;&#30340;...</text>
<source>&lt;a href="http://itunes.apple.com/us/app/twitter/id409789998?mt=12" rel="nofollow"&gt;Twitter for Mac&lt;/a&gt;</source>
<truncated>false</truncated>
<favorited>false</favorited>
<in_reply_to_status_id></in_reply_to_status_id>
<in_reply_to_user_id></in_reply_to_user_id>
<in_reply_to_screen_name></in_reply_to_screen_name>
<retweet_count>0</retweet_count>
<retweeted>false</retweeted>
<user>
<id>5981612</id>
<name>xdite</name>
<screen_name>xdite</screen_name>
<location>Taiwan</location>
<description>Rails Developer, Tech Blogger. Architect of Techbang</description>
<profile_image_url>http://a0.twimg.com/profile_images/1360519344/avatar2_normal.jpg</profile_image_url>
<profile_image_url_https>https://si0.twimg.com/profile_images/1360519344/avatar2_normal.jpg</profile_image_url_https>
<url>http://blog.xdite.net</url>
<protected>false</protected>
<followers_count>2206</followers_count>
<profile_background_color>ACDED6</profile_background_color>
<profile_text_color>333333</profile_text_color>
<profile_link_color>038543</profile_link_color>
<profile_sidebar_fill_color>F6F6F6</profile_sidebar_fill_color>
<profile_sidebar_border_color>EEEEEE</profile_sidebar_border_color>
<friends_count>390</friends_count>
<created_at>Sat May 12 05:09:35 +0000 2007</created_at>
<favourites_count>462</favourites_count>
<utc_offset>28800</utc_offset>
<time_zone>Taipei</time_zone>
<profile_background_image_url>http://a1.twimg.com/images/themes/theme18/bg.gif</profile_background_image_url>
<profile_background_image_url_https>https://si0.twimg.com/images/themes/theme18/bg.gif</profile_background_image_url_https>
<profile_background_tile>false</profile_background_tile>
<profile_use_background_image>true</profile_use_background_image>
<notifications>true</notifications>
<geo_enabled>true</geo_enabled>
<verified>false</verified>
<following>true</following>
<statuses_count>11767</statuses_count>
<lang>en</lang>
<contributors_enabled>false</contributors_enabled>
<follow_request_sent>false</follow_request_sent>
<listed_count>99</listed_count>
<show_all_inline_media>false</show_all_inline_media>
<default_profile>false</default_profile>
<default_profile_image>false</default_profile_image>
<is_translator>false</is_translator>
</user>
<geo/>
<coordinates/>
<place/>
<contributors/>
</status>
// A control character appears in the DOM when http://twitter.com/xdite/statuses/89719449015427072 is opened
// Run the following step to prove that in Web Inspector/Firebug:
var textDiv = document.getElementsByClassName("tweet-text")[0]
textDiv.firstChild.data.charCodeAt(0)
// >> 16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment