Skip to content

Instantly share code, notes, and snippets.

@terriyu
Last active December 20, 2015 03:39
Show Gist options
  • Select an option

  • Save terriyu/6065547 to your computer and use it in GitHub Desktop.

Select an option

Save terriyu/6065547 to your computer and use it in GitHub Desktop.
Journal for OpenStack Ceilometer work -- 15 Jul 2013

15 Jul 2013

Bug I'm working on, "unable to sort data with MongoDB"

Status

  • The bug report is here: https://bugs.launchpad.net/ceilometer/+bug/1193906

  • Successfully reproduced the error in the bug report:

    OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit
    
  • Attempted to fix the bug by creating an index for timestamp, but so far, it hasn't worked.

  • My patch so far: https://review.openstack.org/#/c/36159/

  • Debugging shows that the index is created when the MongoDB connection is initialized in __init__(), but when get_samples() is called, it doesn't see the indexes. Strangely enough, if I ran ./stack.sh and got a fresh database, I could see the indexes in get_samples().

  • Tried jd's patch https://review.openstack.org/#/c/33290/ which uses a real MongoDB instance to run the unit tests, but that didn't work either.

    • After applying jd's patch, I had weird connection errors when I ran the tests. My quick hack was to manually change the port number each time I ran tests.
    • jd then suggested I try adding a command for Python to sleep for 5 seconds for clear() in ceilometer/storage/impl_mongodb.py. He thought that if there is a race condition between tests, the sleep command would fix this. Unfortunately, even with the addition of sleep(), the sorting error still occurs, but the connection errors intermittently disappear. Increasing the sleep time to 50 seconds doesn't seem to change anything.
  • Next idea: jd thinks that in the setUp() function for the test database class defined in ceilometer/tests/db.py, the setup is destroying the MongoDB indexes in the last line self.conn.clear()

    To fix this possible (but unconfirmed) bug, jd made a new patch "storage: fix clear/upgrade order": https://review.openstack.org/#/c/36854/

    Unfortunately, this patch does not fix the sorting error.

Trying the patches again

Connection errors

Once again, I got lots of intermittent connection errors, similar to what I had before: https://gist.github.com/terriyu/6004283

  • jd thought it could be an out of memory error and asked me to check it by running

    $ dmesg | grep mongo
    

    which didn't return anything, so he asked me to try

    $ dmesg | grep -i memory
    

    which returned something, but nothing relevant or abnormal: https://gist.github.com/terriyu/6064325

  • I had connection problems even when running the full test suite on the master branch with no patches. jd thought it was possible my disk or /tmp directory was full which might prevent MongoDB from starting.

    I checked and I had only used 22 GB out of 32 GB in my Ubuntu partition. jd suggested checking the "disk space" in the Vagrant VM, so I ran

    $ df
    

    but I didn't seen anything relevant or abnormal: https://gist.github.com/terriyu/6064609

  • I noticed that when I ran the full test suite I had fewer problems than if I ran a single storage test. jd said this was to be expected.

    when you run a lot of tests, the first ones do not use MongoDB

    so by the time your test run, mongod started already

    in the case of 1 test, there's a race condition

    https://review.openstack.org/37105 might help, though it's not the best fix

    you can also add a "sleep 3" in run-tests.sh just after mongod is starting as a temporary fix

    I tried adding in "sleep 3" in run-tests.sh like jd suggested, but it didn't seem to help.

Running full test suite to check my timestamp index patch

  • Running the full test suite with a timestamp indexed made by create_index() doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004959

  • Running the full test suite with a timestamp indexed made by ensure_index() doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004970

  • If I put in an assert False, self.db.meter.index_information() statement in the get_samples() function inside ceilometer/storage/impl_mongodb.py, I see that the value of index_information() is correct:

    {u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'timestamp_idx': {u'key': [(u'timestamp', -1)], u'v': 1}, u'meter_idx': {u'key': [(u'resource_id', 1), (u'user_id', 1), (u'counter_name', 1), (u'timestamp', 1), (u'source', 1)], u'v': 1}}
    

    Full test output: https://gist.github.com/terriyu/6004977

  • All the above tests suggest that the index is working (with either create_index() or ensure_index()

  • However, this isn't a fully satisfying result because I haven't been able to generate the sorting error even when I make the test database very large. I tried using the same size test database that gave me a sorting error before, but with the new changes I pulled down and the updated version of jd's "storage: fix clear/upgrade order" patch, I no longer am able to generate the error.

  • For a fully rigorous fix, I should be able to generate the sorting error for a fixed size test database and then show that the error is resolved by creating a timestamp index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment