-
-
Save jonhurlock/2960359 to your computer and use it in GitHub Desktop.
############# My Clusters Health | |
curl -XGET 'http://127.0.0.1:9200/_cluster/health?pretty=true' | |
{ | |
"cluster_name" : "TweetHadoop", | |
"status" : "yellow", | |
"timed_out" : false, | |
"number_of_nodes" : 1, | |
"number_of_data_nodes" : 1, | |
"active_primary_shards" : 15, | |
"active_shards" : 15, | |
"relocating_shards" : 0, | |
"initializing_shards" : 0, | |
"unassigned_shards" : 15 | |
} | |
############# Works fine for posting data | |
import urllib | |
import urllib2 | |
url = 'http://localhost:9200/twitter/tweet/22499999' | |
data = '{"user" : "helenax33","message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.","pubDate" : "20090705T21:46:34","isAQuestion" : "0" }' | |
#data = urllib.urlencode(values) | |
req = urllib2.Request(url, data) | |
response = urllib2.urlopen(req) | |
the_page = response.read() | |
##### Also this works Fine for posting data. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# However, if I want to put a speech mark (") in the message part, then it fails e.g. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i " look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# So I tried to escape the speech mark e.g. | |
import pycurl | |
apiURL = 'http://localhost:9200/twitter/tweet/22499999' | |
c = pycurl.Curl() | |
c.setopt(c.URL, apiURL) | |
c.setopt(c.POSTFIELDS, '{ "user" : "helenax33", "message" : "LIVE: http://www.justin.tv/xxhelenaxx i \" look like a can read+sound like a man.", "pubDate" : "20090705T21:46:34", "isAQuestion" : "0" }') | |
c.setopt(c.VERBOSE, True) | |
c.perform() | |
# However, this still fails. Please help :( | |
Doesn't seem to help. Input see below:
import pycurl
import re
apiURL = 'http://localhost:9200/twitter/tweet/22499999'
data = '{ "user" : "someusername", "message" : "something they are tweeting about", "pubDate" : "20090705T21:46:34"}'
cleandata = re.escape(data)
import pycurl
c = pycurl.Curl()
c.setopt(c.URL, apiURL)
c.setopt(c.POSTFIELDS, cleandata)
c.setopt(c.VERBOSE, True)
c.perform()
The below is is output from running the above input:
python curltest.py
- About to connect() to localhost port 9200 (#0)
- Trying ::1... * connected
- Connected to localhost (::1) port 9200 (#0)
POST /twitter/tweet/22499999 HTTP/1.1
User-Agent: PycURL/7.19.5
Host: localhost:9200
Accept: /
Content-Length: 142
Content-Type: application/x-www-form-urlencoded
< HTTP/1.1 400 Bad Request
< Content-Type: application/json; charset=UTF-8
< Content-Length: 263
<
- Connection #0 to host localhost left intact
- Closing connection #0
{"error":"MapperParsingException[Failed to parse]; nested: JsonParseException[Unexpected character ('' (code 92)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')\n at [Source: [B@296a13c9; line: 1, column: 2]]; ","status":400}
Ah. Looks from that that you're escaping the whole JSON string. And that's confusing it. The JSON string's quote marks should remain unescaped - it's the quote marks INSIDE the message you want to sort out. So re.escape(message) and then interpolate that into your JSON string. Hopefully you'll have a winner :-)
i.e.
{ "user" : "someusername", "message" : "something they are tweeting about that contains \"speech marks\".", "pubDate" : "20090705T21:46:34"}
No, not ideal. Surely there's a python json lib that will parse/encode for you? In ruby you can just say .to_json and it works (escaping all dodgy chars also).
http://stackoverflow.com/questions/4202538/python-escape-special-characters ?