Skip to content

Instantly share code, notes, and snippets.

@jonhurlock
Created June 20, 2012 15:07
Show Gist options
  • Save jonhurlock/2960359 to your computer and use it in GitHub Desktop.
Save jonhurlock/2960359 to your computer and use it in GitHub Desktop.
Example Python Code showing cURLing data and POSTing data to Elastic Search, but fails with escaped speech marks
############# My Clusters Health
curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true'
{
"cluster_name" : "TweetHadoop",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 15,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 15
}
############# Works fine for posting data
import urllib
import urllib2
url = 'http://localhost:9200/twitter/tweet/22499999'
data = '{"user" : "helenax33","message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.","pubDate" : "20090705T21:46:34","isAQuestion" : "0" }'
#data = urllib.urlencode(values)
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
##### Also this works Fine for posting data.
import pycurl
apiURL = 'http://localhost:9200/twitter/tweet/22499999'
c = pycurl.Curl()
c.setopt(c.URL, apiURL)
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }')
c.setopt(c.VERBOSE, True)
c.perform()
# However, if I want to put a speech mark (") in the message part, then it fails e.g.
import pycurl
apiURL = 'http://localhost:9200/twitter/tweet/22499999'
c = pycurl.Curl()
c.setopt(c.URL, apiURL)
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i " look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }')
c.setopt(c.VERBOSE, True)
c.perform()
# So I tried to escape the speech mark e.g.
import pycurl
apiURL = 'http://localhost:9200/twitter/tweet/22499999'
c = pycurl.Curl()
c.setopt(c.URL, apiURL)
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i \" look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }')
c.setopt(c.VERBOSE, True)
c.perform()
# However, this still fails. Please help :(
@jonhurlock
Copy link
Author

Doesn't seem to help. Input see below:

import pycurl
import re

apiURL = 'http://localhost:9200/twitter/tweet/22499999'
data = '{ "user" : "someusername", "message" : "something they are tweeting about", "pubDate" : "20090705T21:46:34"}'
cleandata = re.escape(data)
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, apiURL)
c.setopt(c.POSTFIELDS, cleandata)
c.setopt(c.VERBOSE, True)
c.perform()

The below is is output from running the above input:

python curltest.py

  • About to connect() to localhost port 9200 (#0)
  • Trying ::1... * connected
  • Connected to localhost (::1) port 9200 (#0)

    POST /twitter/tweet/22499999 HTTP/1.1
    User-Agent: PycURL/7.19.5
    Host: localhost:9200
    Accept: /
    Content-Length: 142
    Content-Type: application/x-www-form-urlencoded

< HTTP/1.1 400 Bad Request
< Content-Type: application/json; charset=UTF-8
< Content-Length: 263
<

  • Connection #0 to host localhost left intact
  • Closing connection #0
    {"error":"MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected character ('' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B@296a13c9; line: 1, column: 2]]; ","status":400}

@seanhandley
Copy link

Ah. Looks from that that you're escaping the whole JSON string. And that's confusing it. The JSON string's quote marks should remain unescaped - it's the quote marks INSIDE the message you want to sort out. So re.escape(message) and then interpolate that into your JSON string. Hopefully you'll have a winner :-)

@seanhandley
Copy link

i.e.

{ "user" : "someusername", "message" : "something they are tweeting about that contains \"speech marks\".", "pubDate" : "20090705T21:46:34"}

@jonhurlock
Copy link
Author

jonhurlock commented Jun 21, 2012 via email

@seanhandley
Copy link

No, not ideal. Surely there's a python json lib that will parse/encode for you? In ruby you can just say .to_json and it works (escaping all dodgy chars also).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment