Skip to content

Instantly share code, notes, and snippets.

@rgbkrk
Created January 2, 2014 23:16
Show Gist options
  • Save rgbkrk/8229067 to your computer and use it in GitHub Desktop.
Save rgbkrk/8229067 to your computer and use it in GitHub Desktop.

I'm working on something that queries the StackExchange API for tagged questions.

The search API comes out looking like

/2.1/search?order=desc&sort=activity&tagged=python;ruby&site=stackoverflow

Search requires tags to be separated by semi-colons. That's not a problem. Everything in my module works as I expect.

To test this, I brought in httpretty to mock calls to the API as it does the mocking for all my other calls.

In using httpretty though, the tagged field of the query string gets truncated down to just python. What gives?

import requests
import httpretty

httpretty.enable()

httpretty.register_uri(httpretty.GET, "https://api.stackexchange.com/2.1/search", body='{"items":[]}')
resp = requests.get("https://api.stackexchange.com/2.1/search", params={"tagged":"python;ruby"})
httpretty_request = httpretty.last_request()
print(httpretty_request.querystring)

httpretty.disable()
httpretty.reset()
(semicolon) ~/code/HTTPretty$ python query_string_breakage.py
{u'tagged': [u'python']}

After some digging into httpretty's core code and a mini-session with ipdb, I find the culprit.

In [1]: %run -d query_string_breakage.py
> /Users/rgbkrk/code/HTTPretty/query_string_breakage.py(5)<module>()
      4
----> 5 import requests
      6 import httpretty

ipdb> b httpretty/core.py:172
Breakpoint 1 at /Users/rgbkrk/code/HTTPretty/httpretty/core.py:172
ipdb> c
> /Users/rgbkrk/code/HTTPretty/httpretty/core.py(172)__init__()
    171         qstring = self.path.split("?", 1)[-1]
1-> 172         self.querystring = self.parse_querystring(qstring)
    173

ipdb> qstring
u'tagged=python%3Bruby'
ipdb> s
--Call--
> /Users/rgbkrk/code/HTTPretty/httpretty/core.py(185)parse_querystring()
    184
--> 185     def parse_querystring(self, qs):
    186         expanded = unquote_utf8(qs)

ipdb> n
> /Users/rgbkrk/code/HTTPretty/httpretty/core.py(186)parse_querystring()
    185     def parse_querystring(self, qs):
--> 186         expanded = unquote_utf8(qs)
    187         parsed = parse_qs(expanded)

ipdb> n
> /Users/rgbkrk/code/HTTPretty/httpretty/core.py(187)parse_querystring()
    186         expanded = unquote_utf8(qs)
--> 187         parsed = parse_qs(expanded)
    188         result = {}

ipdb> expanded
u'tagged=python;ruby'
ipdb> n
> /Users/rgbkrk/code/HTTPretty/httpretty/core.py(188)parse_querystring()
    187         parsed = parse_qs(expanded)
--> 188         result = {}
    189         for k in parsed:

ipdb> parsed
{u'tagged': [u'python']}

In my case, since I'm using Python 2.7 this is parse_qs from urlparse (urllib.parse.parse_qs in Python3).

Down to the smallest test case now:

In [8]: urlparse.parse_qs("tagged=python;ruby")
Out[8]: {'tagged': ['python']}

My gut reaction, on poor sleep (I'm a Dad x 2!) was to immediately flip the table in front of me. I was also hoping it was just a simple fix in httpretty, one pull request away.

For shame.

Googling parse_qs semicolon immediately takes me to a StackOverflow post asking "Why does Python's urlparse.parse_qs() split arguments on semicolon?". The answer, of course, is to switch the asker's server to using commas instead of semicolons as the semicolon is equivalent to &. This doesn't solve it for me of course, as I can't change the StackExchange servers and API. Simply changing the request to use commas results in empty results.

What do I do now? Guess I'll write one of those blog things and come back to this in a second. Do I propose a hack for httpretty? Chat with the StackExchange folks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment