-
The bug report is here: https://bugs.launchpad.net/ceilometer/+bug/1193906
-
Successfully reproduced the error in the bug report:
OperationFailure: database error: too much data for sort() with no index. add an index or specify a smaller limit -
Attempted to fix the bug by creating an index for timestamp, but so far, it hasn't worked.
-
My patch so far: https://review.openstack.org/#/c/36159/
-
Debugging shows that the index is created when the MongoDB connection is initialized in
__init__(), but whenget_samples()is called, it doesn't see the indexes. Strangely enough, if I ran./stack.shand got a fresh database, I could see the indexes inget_samples(). -
Tried jd's patch https://review.openstack.org/#/c/33290/ which uses a real MongoDB instance to run the unit tests, but that didn't work either.
- After applying jd's patch, I had weird connection errors when I ran the tests. My quick hack was to manually change the port number each time I ran tests.
- jd then suggested I try adding a command for Python to sleep for 5
seconds for clear() in
ceilometer/storage/impl_mongodb.py. He thought that if there is a race condition between tests, the sleep command would fix this. Unfortunately, even with the addition ofsleep(), the sorting error still occurs, but the connection errors intermittently disappear. Increasing the sleep time to 50 seconds doesn't seem to change anything.
-
Next idea: jd thinks that in the
setUp()function for the test database class defined inceilometer/tests/db.py, the setup is destroying the MongoDB indexes in the last lineself.conn.clear()To fix this possible (but unconfirmed) bug, jd made a new patch "storage: fix clear/upgrade order": https://review.openstack.org/#/c/36854/
Unfortunately, this patch does not fix the sorting error.
-
There were some changes over the weekend.
- jd's patch https://review.openstack.org/#/c/33290/ which uses a real MongoDB instance to run the unit tests was merged.
- jd's patch https://review.openstack.org/#/c/36854/ which recreates the
MongoDB indexes in
setUp()was rebased
-
To try out the new changes, I did the following:
- Updated my local
masterbranch - Cherrypicked jd's patch to fix the MongodB indexes in
setUp()fromceilometer/tests/db.py: https://review.openstack.org/#/c/36854/ - Cherrypicked my own patch to create the timestamp MongoDB index: https://review.openstack.org/#/c/36159/
- Updated my local
Once again, I got lots of intermittent connection errors, similar to what I had before: https://gist.github.com/terriyu/6004283
-
jd thought it could be an out of memory error and asked me to check it by running
$ dmesg | grep mongowhich didn't return anything, so he asked me to try
$ dmesg | grep -i memorywhich returned something, but nothing relevant or abnormal: https://gist.github.com/terriyu/6064325
-
I had connection problems even when running the full test suite on the
masterbranch with no patches. jd thought it was possible my disk or/tmpdirectory was full which might prevent MongoDB from starting.I checked and I had only used 22 GB out of 32 GB in my Ubuntu partition. jd suggested checking the "disk space" in the Vagrant VM, so I ran
$ dfbut I didn't seen anything relevant or abnormal: https://gist.github.com/terriyu/6064609
-
I noticed that when I ran the full test suite I had fewer problems than if I ran a single storage test. jd said this was to be expected.
when you run a lot of tests, the first ones do not use MongoDB
so by the time your test run, mongod started already
in the case of 1 test, there's a race condition
https://review.openstack.org/37105 might help, though it's not the best fix
you can also add a "sleep 3" in run-tests.sh just after mongod is starting as a temporary fix
I tried adding in "sleep 3" in
run-tests.shlike jd suggested, but it didn't seem to help.
-
Running the full test suite with a timestamp indexed made by
create_index()doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004959 -
Running the full test suite with a timestamp indexed made by
ensure_index()doesn't throw any MongoDB sorting errors: https://gist.github.com/terriyu/6004970 -
If I put in an
assert False, self.db.meter.index_information()statement in theget_samples()function insideceilometer/storage/impl_mongodb.py, I see that the value ofindex_information()is correct:{u'_id_': {u'key': [(u'_id', 1)], u'v': 1}, u'timestamp_idx': {u'key': [(u'timestamp', -1)], u'v': 1}, u'meter_idx': {u'key': [(u'resource_id', 1), (u'user_id', 1), (u'counter_name', 1), (u'timestamp', 1), (u'source', 1)], u'v': 1}}Full test output: https://gist.github.com/terriyu/6004977
-
All the above tests suggest that the index is working (with either
create_index()orensure_index() -
However, this isn't a fully satisfying result because I haven't been able to generate the sorting error even when I make the test database very large. I tried using the same size test database that gave me a sorting error before, but with the new changes I pulled down and the updated version of jd's "storage: fix clear/upgrade order" patch, I no longer am able to generate the error.
-
For a fully rigorous fix, I should be able to generate the sorting error for a fixed size test database and then show that the error is resolved by creating a timestamp index.