laoar · November 28, 2019 03:27
diff --git a/0001-mm-memcg-fix-the-stupid-OOM-killer-when-shrinking-me.patch b/0001-mm-memcg-fix-the-stupid-OOM-killer-when-shrinking-me.patch
 Subject: [PATCH] mm, memcg: fix the stupid OOM killer when shrinking memcg
 hard limit

 When there are no more processes in a memcg (e.g., due to OOM 
 group), we can still have file pages in the page cache.

 If these pages are protected by memory.min, they can't be reclaimed.
 Especially if there won't be another process in this memcg and the memcg
 is kept online, we do want to drop these pages from the page cache.

 By dropping these page caches we can avoid reclaimers (e.g., kswapd or
 direct) to scan and reclaim pages from all memcgs in the system -
 because the reclaimers will try to fairly reclaim pages from all memcgs
 in the system when under memory pressure.

 By setting the hard limit of such a memcg to 0, we allow to drop the 
 page cache of such memcgs. Unfortunately, this may invoke the OOM killer
 and generate a lot of output. The OOM output is not expected by an admin
 who wants to drop these caches and knows that there are no processes in
 this memcg anymore.

 Therefore, if a memcg is not populated, we should not invoke the OOM 
 killer - there is nothing to kill. The next time a new process is
 started in the memcg and the "max" is still below usage, the OOM killer
 will be invoked and the new process will be killed.

 [ Above commit log is contributed by David ]

 What's worse about this issue is that when there're killable tasks and the 
 OOM killer killed the last task, and what will happen then ? As nr_reclaims
 is already 0 and drained is alreay true, the OOM killer will try to kill
 nothing (because he knows he has killed the last task), what's a stupid
 behavior.

 Someone may worry that the admins may not going to see that the memcg was 
 OOM due to the limit change. But this is not a issue, because the admins
 changes the limit and then the admins must check the result of his change
 - by checking memory.{max, current, stat} he can get all he wants.

 Cc: David Hildenbrand <david@redhat.com>
 Nacked-by: Michal Hocko <mhocko@suse.com>
 Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

 ---
 mm/memcontrol.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

 diff --git a/mm/memcontrol.c b/mm/memcontrol.c
 index 1c4c08b..e936f1b 100644
 --- a/mm/memcontrol.c
 +++ b/mm/memcontrol.c
 @@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
                        continue;
                }

 -               memcg_memory_event(memcg, MEMCG_OOM);
 -               if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
 +               /* If there's no procesess, we don't need to invoke the OOM
 +                * killer. Then next time when you try to start a process
 +                * in this memcg, the max may still bellow usage, and then
 +                * this OOM killer will be invoked. This can be considered
 +                * as lazy OOM, that is we have been always doing in the
 +                * kernel. Pls. Michal, that is really consistency.
 +                */
 +               if (cgroup_is_populated(memcg->css.cgroup)) {
 +                       memcg_memory_event(memcg, MEMCG_OOM);
 +                       if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
 +                               break;
 +               } else  {
                        break;
 +               }
        }

        memcg_wb_domain_size_changed(memcg);
 -- 
 1.8.3.1
	Subject: [PATCH] mm, memcg: fix the stupid OOM killer when shrinking memcg
	hard limit

	When there are no more processes in a memcg (e.g., due to OOM
	group), we can still have file pages in the page cache.

	If these pages are protected by memory.min, they can't be reclaimed.
	Especially if there won't be another process in this memcg and the memcg
	is kept online, we do want to drop these pages from the page cache.

	By dropping these page caches we can avoid reclaimers (e.g., kswapd or
	direct) to scan and reclaim pages from all memcgs in the system -
	because the reclaimers will try to fairly reclaim pages from all memcgs
	in the system when under memory pressure.

	By setting the hard limit of such a memcg to 0, we allow to drop the
	page cache of such memcgs. Unfortunately, this may invoke the OOM killer
	and generate a lot of output. The OOM output is not expected by an admin
	who wants to drop these caches and knows that there are no processes in
	this memcg anymore.

	Therefore, if a memcg is not populated, we should not invoke the OOM
	killer - there is nothing to kill. The next time a new process is
	started in the memcg and the "max" is still below usage, the OOM killer
	will be invoked and the new process will be killed.

	[ Above commit log is contributed by David ]

	What's worse about this issue is that when there're killable tasks and the
	OOM killer killed the last task, and what will happen then ? As nr_reclaims
	is already 0 and drained is alreay true, the OOM killer will try to kill
	nothing (because he knows he has killed the last task), what's a stupid
	behavior.

	Someone may worry that the admins may not going to see that the memcg was
	OOM due to the limit change. But this is not a issue, because the admins
	changes the limit and then the admins must check the result of his change
	- by checking memory.{max, current, stat} he can get all he wants.

	Cc: David Hildenbrand <david@redhat.com>
	Nacked-by: Michal Hocko <mhocko@suse.com>
	Signed-off-by: Yafang Shao <laoar.shao@gmail.com>

	---
	mm/memcontrol.c \| 15 +++++++++++++--
	1 file changed, 13 insertions(+), 2 deletions(-)

	diff --git a/mm/memcontrol.c b/mm/memcontrol.c
	index 1c4c08b..e936f1b 100644
	--- a/mm/memcontrol.c
	+++ b/mm/memcontrol.c
	@@ -6139,9 +6139,20 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
	continue;
	}

	- memcg_memory_event(memcg, MEMCG_OOM);
	- if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
	+ /* If there's no procesess, we don't need to invoke the OOM
	+ * killer. Then next time when you try to start a process
	+ * in this memcg, the max may still bellow usage, and then
	+ * this OOM killer will be invoked. This can be considered
	+ * as lazy OOM, that is we have been always doing in the
	+ * kernel. Pls. Michal, that is really consistency.
	+ */
	+ if (cgroup_is_populated(memcg->css.cgroup)) {
	+ memcg_memory_event(memcg, MEMCG_OOM);
	+ if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
	+ break;
	+ } else {
	break;
	+ }
	}

	memcg_wb_domain_size_changed(memcg);
	--
	1.8.3.1
No results found