Created
June 20, 2012 15:07
-
-
Save jonhurlock/2960359 to your computer and use it in GitHub Desktop.
Example Python Code showing cURLing data and POSTing data to Elastic Search, but fails with escaped speech marks
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
############# My Clusters Health | |
curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true' | |
{ | |
"cluster_name" : "TweetHadoop", | |
"status" : "yellow", | |
"timed_out" : false, | |
"number_of_nodes" : 1, | |
"number_of_data_nodes" : 1, | |
"active_primary_shards" : 15, | |
"active_shards" : 15, | |
"relocating_shards" : 0, | |
"initializing_shards" : 0, | |
"unassigned_shards" : 15 | |
} | |
############# Works fine for posting data | |
import urllib | |
import urllib2 | |
url = 'http://localhost:9200/twitter/tweet/22499999' | |
data = '{"user" : "helenax33","message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.","pubDate" : "20090705T21:46:34","isAQuestion" : "0" }' | |
#data = urllib.urlencode(values) | |
req = urllib2.Request(url, data) | |
response = urllib2.urlopen(req) | |
the_page = response.read() | |
##### Also this works Fine for posting data. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# However, if I want to put a speech mark (") in the message part, then it fails e.g. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i " look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# So I tried to escape the speech mark e.g. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i \" look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# However, this still fails. Please help :( | |
i.e.
{ "user" : "someusername", "message" : "something they are tweeting about that contains \"speech marks\".", "pubDate" : "20090705T21:46:34"}
Hey Shaun,
Yup tried it, doesn't work. :( Going to replace " and 's with some 'unused'
unicode character then replace them back to speech marks and apostrophes
when displayed to users.
Not ideal, but should work. Cheers for your input though.
Jon
…On 21 June 2012 11:18, Sean Handley < ***@***.*** > wrote:
i.e.
{ "user" : "someusername", "message" : "something they are tweeting
about that contains \"speech marks\".", "pubDate" : "20090705T21:46:34"}
---
Reply to this email directly or view it on GitHub:
https://gist.github.com/2960359
No, not ideal. Surely there's a python json lib that will parse/encode for you? In ruby you can just say .to_json and it works (escaping all dodgy chars also).
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ah. Looks from that that you're escaping the whole JSON string. And that's confusing it. The JSON string's quote marks should remain unescaped - it's the quote marks INSIDE the message you want to sort out. So re.escape(message) and then interpolate that into your JSON string. Hopefully you'll have a winner :-)