Created
December 16, 2013 23:34
-
-
Save Supermathie/7996969 to your computer and use it in GitHub Desktop.
bug 1043693
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Description of problem: | |
A specific customer workload is causing abnormally high system load on RHEL6, whereas the code did not cause the same under RHEL4. The abnormal load relates to the dnotify / inotify subsystems. | |
Version-Release number of selected component (if applicable): | |
Tests run on 2.6.32-279.2.1.el6.x86_64 | |
How reproducible: | |
100% | |
Steps to Reproduce: | |
1. Run test case: ./msys_sim -c 50 -m 1024 -d /mnt/tmp | |
1. wait; load goes up over time | |
1. See http://i.imgur.com/LJFbt99.png | |
Actual results: | |
1. latency across the entire system is seriously affected | |
1. cpu system % goes up constantly | |
1. kernel is spending nearly all of its time contending for a spin lock | |
1. the rest of the time kernel is in __fsnotify_update_child_dentry_flags | |
Expected results: | |
1. OS should cruise along smoothly (http://i.imgur.com/TLElwsw.png) | |
Additional info: | |
* mount line for filesystem used for test is: | |
/dev/mapper/sysvg-msys /mnt/tmp ext4 rw,seclabel,relatime,barrier=1,data=ordered 0 0 | |
* Test case has been generated as a mockup of a real production system | |
* The high load is generated inside the close() call on the inotify fd or presumably DN_CREATE in the dnotify case: | |
system which has been running this test case for a while: | |
inotify_init() = 4 <0.000020> | |
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385> | |
write(1, "Child [032] sleeping\n", 21) = 21 <0.000903> | |
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423> | |
inotify_rm_watch(4, 1) = 0 <0.000012> | |
close(4) = 0 <0.528736> | |
* it is possible to avoid the problem by using inotify without re-initializing it every time - this avoids the teardown | |
* it is possible to avoid the problem by using dnotify and DN_MULTISHOT - again avoiding the teardown | |
* unfortunately, avoiding the problem using either of the previous two workarounds will not work for the production application | |
* the test case generates 256K files in a single directory. Modifying the test case to use two-level buckets instead of a single directory reduces the amount of user% consumed but DOES NOT AFFECT the system% cpu | |
* strangely, calling mount with MS_REMOUNT seems to clean up the problem - the system% drops down to zero |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment