Journal for OpenStack Ceilometer work -- 15 Jul 2013

15 Jul 2013

Bug I'm working on, "unable to sort data with MongoDB"

Status

The bug report is here: https://bugs.launchpad.net/ceilometer/+bug/1193906

Successfully reproduced the error in the bug report:

OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit

Attempted to fix the bug by creating an index for timestamp, but so far, it hasn't worked.
My patch so far: https://review.openstack.org/#/c/36159/
Debugging shows that the index is created when the MongoDB connection is initialized in __init__(), but when get_samples() is called, it doesn't see the indexes. Strangely enough, if I ran ./stack.sh and got a fresh database, I could see the indexes in get_samples().
Tried jd's patch https://review.openstack.org/#/c/33290/ which uses a real MongoDB instance to run the unit tests, but that didn't work either.
- After applying jd's patch, I had weird connection errors when I ran the tests. My quick hack was to manually change the port number each time I ran tests.
- jd then suggested I try adding a command for Python to sleep for 5 seconds for clear() in ceilometer/storage/impl_mongodb.py. He thought that if there is a race condition between tests, the sleep command would fix this. Unfortunately, even with the addition of sleep(), the sorting error still occurs, but the connection errors intermittently disappear. Increasing the sleep time to 50 seconds doesn't seem to change anything.
Next idea: jd thinks that in the setUp() function for the test database class defined in ceilometer/tests/db.py, the setup is destroying the MongoDB indexes in the last line self.conn.clear()

To fix this possible (but unconfirmed) bug, jd made a new patch "storage: fix clear/upgrade order": https://review.openstack.org/#/c/36854/

Unfortunately, this patch does not fix the sorting error.

Trying the patches again

There were some changes over the weekend.
- jd's patch https://review.openstack.org/#/c/33290/ which uses a real MongoDB instance to run the unit tests was merged.
- jd's patch https://review.openstack.org/#/c/36854/ which recreates the MongoDB indexes in setUp() was rebased
To try out the new changes, I did the following:
- Updated my local master branch
- Cherrypicked jd's patch to fix the MongodB indexes in setUp() from ceilometer/tests/db.py: https://review.openstack.org/#/c/36854/
- Cherrypicked my own patch to create the timestamp MongoDB index: https://review.openstack.org/#/c/36159/

Connection errors

Once again, I got lots of intermittent connection errors, similar to what I had before: https://gist.github.com/terriyu/6004283

jd thought it could be an out of memory error and asked me to check it by running
```
$ dmesg | grep mongo
```
which didn't return anything, so he asked me to try
```
$ dmesg | grep -i memory
```
which returned something, but nothing relevant or abnormal: https://gist.github.com/terriyu/6064325
I had connection problems even when running the full test suite on the master branch with no patches. jd thought it was possible my disk or /tmp directory was full which might prevent MongoDB from starting.

I checked and I had only used 22 GB out of 32 GB in my Ubuntu partition. jd suggested checking the "disk space" in the Vagrant VM, so I ran
```
$ df
```
but I didn't seen anything relevant or abnormal: https://gist.github.com/terriyu/6064609
I noticed that when I ran the full test suite I had fewer problems than if I ran a single storage test. jd said this was to be expected.

when you run a lot of tests, the first ones do not use MongoDB

so by the time your test run, mongod started already

in the case of 1 test, there's a race condition

https://review.openstack.org/37105 might help, though it's not the best fix

you can also add a "sleep 3" in run-tests.sh just after mongod is starting as a temporary fix

I tried adding in "sleep 3" in run-tests.sh like jd suggested, but it didn't seem to help.

Running full test suite to check my timestamp index patch

Running the full test suite with a timestamp indexed made by create_index() doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004959
Running the full test suite with a timestamp indexed made by ensure_index() doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004970
If I put in an assert False, self.db.meter.index_information() statement in the get_samples() function inside ceilometer/storage/impl_mongodb.py, I see that the value of index_information() is correct:
```
{u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'timestamp_idx': {u'key': [(u'timestamp', -1)], u'v': 1}, u'meter_idx': {u'key': [(u'resource_id', 1), (u'user_id', 1), (u'counter_name', 1), (u'timestamp', 1), (u'source', 1)], u'v': 1}}
```
Full test output: https://gist.github.com/terriyu/6004977
All the above tests suggest that the index is working (with either create_index() or ensure_index()
However, this isn't a fully satisfying result because I haven't been able to generate the sorting error even when I make the test database very large. I tried using the same size test database that gave me a sorting error before, but with the new changes I pulled down and the updated version of jd's "storage: fix clear/upgrade order" patch, I no longer am able to generate the error.
For a fully rigorous fix, I should be able to generate the sorting error for a fixed size test database and then show that the error is resolved by creating a timestamp index.

terriyu/diary-15jul2013.md

Select an option

No results found

Select an option

No results found

15 Jul 2013

Bug I'm working on, "unable to sort data with MongoDB"

Status

Trying the patches again

Connection errors

Running full test suite to check my timestamp index patch