Skip to content

Instantly share code, notes, and snippets.

@smerritt
Last active December 18, 2015 00:59
Show Gist options
  • Save smerritt/5700858 to your computer and use it in GitHub Desktop.
Save smerritt/5700858 to your computer and use it in GitHub Desktop.
Let's examine how the account and container caching works for a simple
middleware like container_quotas.
Abbreviations used:
s.c.m --> swift.common.middleware
s.c.p.a --> swift.controllers.proxy.account
s.c.p.b --> swift.controllers.proxy.base
s.c.p.c --> swift.controllers.proxy.container
s.c.p.o --> swift.controllers.proxy.object
Okay, so we've got an object PUT request that comes in.
So, s.c.m.container_quotas.ContainerQuotaMiddleware.__call__() calls
s.c.p.b.get_container_info(), then pulls quota stuff out of the return
value and does stuff with it. We don't care about the stuff; let's
look into get_container_info().
s.c.p.b.get_container_info() checks for container info in a few
places. If it's in the WSGI environment, then that's what is returned.
Otherwise, if it's in memcache, then it's placed into the WSGI
environment and returned. If it's not in memcache, then a container
HEAD request is made, the container info is generated from the HEAD
response, and the result is stuck into the WSGI environment and
returned.
Interestingly, get_container_info() does not ever put anything into
memcache. It relies on the container HEAD request to populate the
cache. On the one hand, it's an odd asymmetry, but on the other hand,
it does mean that there's no chance of overwriting
ever-so-slightly-fresher cached data with an ever-so-slightly-stale
response.
(To see what I mean: imagine that get_container_info() issues a
container HEAD request. Shortly thereafter, someone issues a POST to
that same container. Then the POST completes, then the HEAD completes.
If get_container_info() stored things into memcache, it could
overwrite the new stuff that got stored on the POST request, thereby
taking the cache from fresh to stale. Not good.)
Now let's drill down a little more into that container HEAD request.
After some routing and whatnot, we wind up in
s.c.p.c.ContainerController.GETorHEAD(). This method makes requests to
the container servers, then stashes the result in memcache. Note that
it never *reads* from memcache; it only writes to it. There's even a
comment there talking about ratelimiting. Yikes.
To recap: if get_container_info suffers from two cache misses (WSGI
environment and memcache), then it relies on the container HEAD to
populate memcache, and it only populates the WSGI environment.
Okay, so now we're done, right? Wrong! We've passed the
container_quotas middleware without error, so now our original object
PUT request is on its way down to the proxy.
Running total:
* 1 memcache get
* 1 container-server HEAD
* 1 memcache set
The container info is now in the WSGI environment and in memcache.
Got it? Okay, let's keep going. Remember, we're done with middleware
now, and we're on to the proxy. (I'll look at how multiple middlewares
interact with the cache and each other at a future point. Summary:
it's complicated.)
So, this object PUT request makes its way to
s.p.c.o.ObjectController.PUT(). First thing this method does is to
call s.c.p.b.Controller.container_info() [for ACLs, versioning, et
cetera. It's got good reasons]. That method checks memcache for
container info, and returns it if found. Note that it *does not* check
the WSGI environment for container info, so the data that
s.c.p.b.get_container_info() stuffed into the environment earlier was
for nothing.
Let's say that our cache was big enough and this request moved fast
enough that we got a memcache hit for the container info.
Running total:
* 2 memcache gets
* 1 container-server HEAD
* 1 memcache set
Now we're down to just the rest of the PUT method, which is 300
lines of code (ugh) that doesn't seem to do any more container or
account info fetching.
Scenario 2: Small / Missing Cache
=================================
Well, the basic scenario doesn't look too bad. Let's see how this
plays out with a small cache (so we have misses) or just no memcache
at all.
The container-quotas middleware goes as before, bringing us up to
Running total:
* 1 memcache get
* 1 container-server HEAD
* 1 memcache set
We get back to s.c.p.b.Controller.container_info(), and now instead of
a cache hit, we get a miss instead.
Now something interesting happens: instead of just doing a
container-server HEAD request, s.c.p.b.Controller.container_info()
calls s.c.p.b.Controller.account_info() for some reason. This checks
memcache for the account info, and let's say it misses. Now
account_info() goes and makes a HEAD request to the account servers,
then stashes the result in memcache.
Now, at the end of account_info(), we have:
Running total:
* 3 memcache gets
* 1 account-server HEAD
* 1 container-server HEAD
* 2 memcache sets
Right? Okay, back to s.c.p.b.Controller.container_info(). Now, having
verified that the account exists (I guess), it makes a
container-server HEAD request, then stashes the result in memcached
before returning.
Final total:
* 3 memcache gets
* 1 account-server HEAD
* 2 container-server HEADs
* 3 memcache sets
New code + requests:
====================
Run through Scenario 1 again here: container_quotas middleware
Assume empty cache to start.
First, container_quotas calls get_container_info() as before, which
calls get_info(). This calls _get_info_cache(), which looks in the
WSGI environment and then in memcache for stuff. Since we're in
cache-miss land, we try memcache, but find nothing.
Now, the get_info() call for the container goes and recursively calls
itself for the account, resulting in another memcache miss. Post-miss,
get_info() makes a HEAD request into the application for the account.
The account HEAD handler ends up calling
s.p.c.b.Controller.GETorHEAD_base(), which stores the account info in
memcache. get_info() then takes the account info, sticks it in the
WSGI environment, and returns it.
Okay, recursive call over; we're now back in get_info() for the
container. Since get_info() for the account returned a truthy value,
we then continue on with a container HEAD request. Like its sibling in
AccountController, this guy calls up to Controller.GETorHEAD_base(),
which stores the result in memcache. Back up to get_info(), which
stashes things in the WSGI environment and returns.
Running total:
* 2 memcache gets
* 1 account-server HEAD
* 1 container-server HEAD
* 2 memcache sets
Moving on from container_quotas, we hit the proxy server in
s.p.c.o.ObjectController.PUT(). This method calls
s.p.c.b.Controller.container_info().
Now we're back to get_info() for the container again. Fortunately,
this time the result is cached in the WSGI environment, so no more
memcache traffic is necessary.
Final total:
* 2 memcache gets
* 1 account-server HEAD
* 1 container-server HEAD
* 2 memcache sets (account info and container info)
Old final total:
* 2 memcache gets
* 1 container-server HEAD
* 1 memcache set (container info)
Run through Scenario 2 again here (where memcache is broken)
Due to the caching in the WSGI environment, we get the same final
total, minus the memcache.
Final total:
* 1 account-server HEAD
* 1 container-server HEAD
Old final total:
* 3 memcache gets
* 1 account-server HEAD
* 2 container-server HEADs
* 3 memcache sets
However, if get_info() for a container didn't call itself for the
account, then we'd get:
Better final total:
* 1 container-server HEAD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment