Created
June 10, 2014 03:59
-
-
Save ravageralpha/5e9cebb85bd26771754e to your computer and use it in GitHub Desktop.
fixing shitty systemd update
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
From fa647ab1c8fb6580813973976a4fa646623b769d Mon Sep 17 00:00:00 2001 | |
From: Markus Rathgeb <[email protected]> | |
Date: Thu, 5 Jun 2014 11:10:44 +0200 | |
Subject: [PATCH] cgroup: add xattr support | |
Squashed commit of the following: | |
commit 83ba09a9229baee9af79724ab3ef0840c97b3b74 | |
Author: Aristeu Rozanski <[email protected]> | |
Date: Thu Aug 23 16:53:30 2012 -0400 | |
cgroup: add xattr support | |
This is one of the items in the plumber's wish list. | |
For use cases: | |
>> What would the use case be for this? | |
> | |
> Attaching meta information to services, in an easily discoverable | |
> way. For example, in systemd we create one cgroup for each service, and | |
> could then store data like the main pid of the specific service as an | |
> xattr on the cgroup itself. That way we'd have almost all service state | |
> in the cgroupfs, which would make it possible to terminate systemd and | |
> later restart it without losing any state information. But there's more: | |
> for example, some very peculiar services cannot be terminated on | |
> shutdown (i.e. fakeraid DM stuff) and it would be really nice if the | |
> services in question could just mark that on their cgroup, by setting an | |
> xattr. On the more desktopy side of things there are other | |
> possibilities: for example there are plans defining what an application | |
> is along the lines of a cgroup (i.e. an app being a collection of | |
> processes). With xattrs one could then attach an icon or human readable | |
> program name on the cgroup. | |
> | |
> The key idea is that this would allow attaching runtime meta information | |
> to cgroups and everything they model (services, apps, vms), that doesn't | |
> need any complex userspace infrastructure, has good access control | |
> (i.e. because the file system enforces that anyway, and there's the | |
> "trusted." xattr namespace), notifications (inotify), and can easily be | |
> shared among applications. | |
> | |
> Lennart | |
v7: | |
- no changes | |
v6: | |
- remove user xattr namespace, only allow trusted and security | |
v5: | |
- check for capabilities before setting/removing xattrs | |
v4: | |
- no changes | |
v3: | |
- instead of config option, use mount option to enable xattr support | |
Original-patch-by: Li Zefan <[email protected]> | |
Cc: Li Zefan <[email protected]> | |
Cc: Tejun Heo <[email protected]> | |
Cc: Hugh Dickins <[email protected]> | |
Cc: Hillf Danton <[email protected]> | |
Cc: Lennart Poettering <[email protected]> | |
Signed-off-by: Li Zefan <[email protected]> | |
Signed-off-by: Aristeu Rozanski <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
commit cac4522c594319e40a39b5427623996761b24b81 | |
Author: Aristeu Rozanski <[email protected]> | |
Date: Thu Aug 23 16:53:28 2012 -0400 | |
xattr: extract simple_xattr code from tmpfs | |
Extract in-memory xattr APIs from tmpfs. Will be used by cgroup. | |
$ size vmlinux.o | |
text data bss dec hex filename | |
4658782 880729 5195032 10734543 a3cbcf vmlinux.o | |
$ size vmlinux.o | |
text data bss dec hex filename | |
4658957 880729 5195032 10734718 a3cc7e vmlinux.o | |
v7: | |
- checkpatch warnings fixed | |
- Implement the changes requested by Hugh Dickins: | |
- make simple_xattrs_init and simple_xattrs_free inline | |
- get rid of locking and list reinitialization in simple_xattrs_free, | |
they're not needed | |
v6: | |
- no changes | |
v5: | |
- no changes | |
v4: | |
- move simple_xattrs_free() to fs/xattr.c | |
v3: | |
- in kmem_xattrs_free(), reinitialize the list | |
- use simple_xattr_* prefix | |
- introduce simple_xattr_add() to prevent direct list usage | |
Original-patch-by: Li Zefan <[email protected]> | |
Cc: Li Zefan <[email protected]> | |
Cc: Hillf Danton <[email protected]> | |
Cc: Lennart Poettering <[email protected]> | |
Acked-by: Hugh Dickins <[email protected]> | |
Signed-off-by: Li Zefan <[email protected]> | |
Signed-off-by: Aristeu Rozanski <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
Conflicts: | |
mm/shmem.c | |
commit cd8f6a121c4e0e49e9c15f4f1ab1e2c534a65d3b | |
Author: Aristeu Rozanski <[email protected]> | |
Date: Thu Aug 23 16:53:29 2012 -0400 | |
cgroup: revise how we re-populate root directory | |
When remounting cgroupfs with some subsystems added to it and some | |
removed, cgroup will remove all the files in root directory and then | |
re-popluate it. | |
What I'm doing here is, only remove files which belong to subsystems that | |
are to be unbinded, and only create files for newly-added subsystems. | |
The purpose is to have all other files untouched. | |
This is a preparation for cgroup xattr support. | |
v7: | |
- checkpatch warnings fixed | |
v6: | |
- no changes | |
v5: | |
- no changes | |
v4: | |
- refactored cgroup_clear_directory() to not use cgroup_rm_file() | |
- instead of going thru the list of files, get the file list using the | |
subsystems | |
- use 'subsys_mask' instead of {added,removed}_bits and made | |
cgroup_populate_dir() to match the parameters with cgroup_clear_directory() | |
v3: | |
- refresh patches after recent refactoring | |
Original-patch-by: Li Zefan <[email protected]> | |
Cc: Li Zefan <[email protected]> | |
Cc: Hugh Dickins <[email protected]> | |
Cc: Hillf Danton <[email protected]> | |
Cc: Lennart Poettering <[email protected]> | |
Signed-off-by: Li Zefan <[email protected]> | |
Signed-off-by: Aristeu Rozanski <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
commit fa68f525a488e74d35f44e4cf5b1dbbc83ad8da9 | |
Author: Tejun Heo <[email protected]> | |
Date: Tue Jul 3 10:38:06 2012 -0700 | |
cgroup: cgroup_rm_files() was calling simple_unlink() with the wrong inode | |
While refactoring cgroup file removal path, 05ef1d7c4a "cgroup: | |
introduce struct cfent" incorrectly changed the @dir argument of | |
simple_unlink() to the inode of the file being deleted instead of that | |
of the containing directory. | |
The effect of this bug is minor - ctime and mtime of the parent | |
weren't properly updated on file deletion. | |
Fix it by using @cgrp->dentry->d_inode instead. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Reported-by: Al Viro <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: [email protected] | |
commit 8ca67b989e17a377f6aee5e251b0fa1b10ccd70f | |
Author: Tejun Heo <[email protected]> | |
Date: Sat Jul 7 16:08:18 2012 -0700 | |
cgroup: fix cgroup hierarchy umount race | |
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal | |
optional" allowed a css to linger after the associated cgroup is | |
removed. As a css holds a reference on the cgroup's dentry, it means | |
that cgroup dentries may linger for a while. | |
Destroying a superblock which has dentries with positive refcnts is a | |
critical bug and triggers BUG() in vfs code. As each cgroup dentry | |
holds an s_active reference, any lingering cgroup has both its dentry | |
and the superblock pinned and thus preventing premature release of | |
superblock. | |
Unfortunately, after 48ddbe1946, there's a small window while | |
releasing a cgroup which is directly under the root of the hierarchy. | |
When a cgroup directory is released, vfs layer first deletes the | |
corresponding dentry and then invokes dput() on the parent, which may | |
recurse further, so when a cgroup directly below root cgroup is | |
released, the cgroup is first destroyed - which releases the s_active | |
it was holding - and then the dentry for the root cgroup is dput(). | |
This creates a window where the root dentry's refcnt isn't zero but | |
superblock's s_active is. If umount happens before or during this | |
window, vfs will see the root dentry with non-zero refcnt and trigger | |
BUG(). | |
Before 48ddbe1946, this problem didn't exist because the last dentry | |
reference was guaranteed to be put synchronously from rmdir(2) | |
invocation which holds s_active around the whole process. | |
Fix it by holding an extra superblock->s_active reference across | |
dput() from css release, which is the dput() path added by 48ddbe1946 | |
and the only one which doesn't hold an extra s_active ref across the | |
final cgroup dput(). | |
Signed-off-by: Tejun Heo <[email protected]> | |
LKML-Reference: <[email protected]> | |
Reported-by: shyju pv <[email protected]> | |
Tested-by: shyju pv <[email protected]> | |
Cc: Sasha Levin <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 7db74fed5e5137076ce2e2ad8de8bcff08fbaca2 | |
Author: Tejun Heo <[email protected]> | |
Date: Sat Jul 7 15:55:47 2012 -0700 | |
Revert "cgroup: superblock can't be released with active dentries" | |
This reverts commit fa980ca87d15bb8a1317853f257a505990f3ffde. The | |
commit was an attempt to fix a race condition where a cgroup hierarchy | |
may be unmounted with positive dentry reference on root cgroup. While | |
the commit made the race condition slightly more difficult to trigger, | |
the race was still there and could be reliably triggered using a | |
different test case. | |
Revert the incorrect fix. The next commit will describe the race and | |
fix it correctly. | |
Signed-off-by: Tejun Heo <[email protected]> | |
LKML-Reference: <[email protected]> | |
Reported-by: shyju pv <[email protected]> | |
Cc: Sasha Levin <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 8a817aa20830078f4027fe9c23ba30d79b1d9d06 | |
Author: Salman Qazi <[email protected]> | |
Date: Thu Jun 14 14:55:30 2012 -0700 | |
cgroups: Account for CSS_DEACT_BIAS in __css_put | |
When we fixed the race between atomic_dec and css_refcnt, we missed | |
the fact that css_refcnt internally subtracts CSS_DEACT_BIAS to get | |
the actual reference count. This can potentially cause a refcount leak | |
if __css_put races with cgroup_clear_css_refs. | |
Signed-off-by: Salman Qazi <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
commit 8a0b43b1785f1eaf533693c65a5880a2951dbd1d | |
Author: Li Zefan <[email protected]> | |
Date: Wed Jun 6 19:12:30 2012 -0700 | |
cgroup: remove hierarchy_mutex | |
It was introduced for memcg to iterate cgroup hierarchy without | |
holding cgroup_mutex, but soon after that it was replaced with | |
a lockless way in memcg. | |
No one used hierarchy_mutex since that, so remove it. | |
Signed-off-by: Li Zefan <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
commit 2d111e16082c524a12fcb86b7a7d09e46767d7e2 | |
Author: Salman Qazi <[email protected]> | |
Date: Wed Jun 6 18:51:35 2012 -0700 | |
cgroup: make sure that decisions in __css_put are atomic | |
__css_put is using atomic_dec on the ref count, and then | |
looking at the ref count to make decisions. This is prone | |
to races, as someone else may decrement ref count between | |
our decrement and our decision. Instead, we should base our | |
decisions on the value that we decremented the ref count to. | |
(This results in an actual race on Google's kernel which I | |
haven't been able to reproduce on the upstream kernel. Having | |
said that, it's still incorrect by inspection). | |
Signed-off-by: Salman Qazi <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
Cc: [email protected] | |
commit e742b3c8b306ebc06a1321b1437de20b935ac892 | |
Author: Johannes Weiner <[email protected]> | |
Date: Tue May 29 15:06:24 2012 -0700 | |
kernel: cgroup: push rcu read locking from css_is_ancestor() to callsite | |
Library functions should not grab locks when the callsites can do it, | |
even if the lock nests like the rcu read-side lock does. | |
Push the rcu_read_lock() from css_is_ancestor() to its single user, | |
mem_cgroup_same_or_subtree() in preparation for another user that may | |
already hold the rcu read-side lock. | |
Signed-off-by: Johannes Weiner <[email protected]> | |
Cc: Konstantin Khlebnikov <[email protected]> | |
Acked-by: KAMEZAWA Hiroyuki <[email protected]> | |
Acked-by: Michal Hocko <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Li Zefan <[email protected]> | |
Cc: Tejun Heo <[email protected]> | |
Signed-off-by: Andrew Morton <[email protected]> | |
Signed-off-by: Linus Torvalds <[email protected]> | |
commit 4851db2641ccb6cdb2f32d9507505c52813562b3 | |
Author: Tejun Heo <[email protected]> | |
Date: Thu May 24 08:24:39 2012 -0700 | |
cgroup: superblock can't be released with active dentries | |
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal | |
optional" allowed a css to linger after the associated cgroup is | |
removed. As a css holds a reference on the cgroup's dentry, it means | |
that cgroup dentries may linger for a while. | |
cgroup_create() does grab an active reference on the superblock to | |
prevent it from going away while there are !root cgroups; however, the | |
reference is put from cgroup_diput() which is invoked on cgroup | |
removal, so cgroup dentries which are removed but persisting due to | |
lingering csses already have released their superblock active refs | |
allowing superblock to be killed while those dentries are around. | |
Given the right condition, this makes cgroup_kill_sb() call | |
kill_litter_super() with dentries with non-zero d_count leading to | |
BUG() in shrink_dcache_for_umount_subtree(). | |
Fix it by adding cgroup_dops->d_release() operation and moving | |
deactivate_super() to it. cgroup_diput() now marks dentry->d_fsdata | |
with itself if superblock should be deactivated and cgroup_d_release() | |
deactivates the superblock on dentry release. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Reported-by: Sasha Levin <[email protected]> | |
Tested-by: Sasha Levin <[email protected]> | |
LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com> | |
Acked-by: Li Zefan <[email protected]> | |
commit ee7067598065cd269825ac9812040c9e2760d4b4 | |
Author: Eric W. Biederman <[email protected]> | |
Date: Thu Nov 17 10:23:55 2011 -0800 | |
userns: Add a Kconfig option to enforce strict kuid and kgid type checks | |
Make it possible to easily switch between strong mandatory | |
type checks and relaxed type checks so that the code can | |
easily be tested with the type checks and then built | |
with the strong type checks disabled so the resulting | |
code can be used. | |
Require strong mandatory type checks when enabling the user namespace. | |
It is very simple to make a typo and use the wrong type allowing | |
conversions to/from userspace values to be bypassed by accident, | |
the strong type checks prevent this. | |
Acked-by: Serge Hallyn <[email protected]> | |
Signed-off-by: Eric W. Biederman <[email protected]> | |
commit c5601b73aab8bac90be8b69968bd99e76c0e516d | |
Author: Eric W. Biederman <[email protected]> | |
Date: Mon Nov 14 14:29:51 2011 -0800 | |
userns: Add kuid_t and kgid_t and associated infrastructure in uidgid.h | |
Start distinguishing between internal kernel uids and gids and | |
values that userspace can use. This is done by introducing two | |
new types: kuid_t and kgid_t. These types and their associated | |
functions are infrastructure are declared in the new header | |
uidgid.h. | |
Ultimately there will be a different implementation of the mapping | |
functions for use with user namespaces. But to keep it simple | |
we introduce the mapping functions first to separate the meat | |
from the mechanical code conversions. | |
Export overflowuid and overflowgid so we can use from_kuid_munged | |
and from_kgid_munged in modular code. | |
Acked-by: Serge Hallyn <[email protected]> | |
Signed-off-by: Eric W. Biederman <[email protected]> | |
commit 531ca1b1f0f5a873d89950d9683f44d9f7001cd3 | |
Author: Mike Galbraith <[email protected]> | |
Date: Sat Apr 21 09:13:46 2012 +0200 | |
cgroups: disallow attaching kthreadd or PF_THREAD_BOUND threads | |
Allowing kthreadd to be moved to a non-root group makes no sense, it being | |
a global resource, and needlessly leads unsuspecting users toward trouble. | |
1. An RT workqueue worker thread spawned in a task group with no rt_runtime | |
allocated is not schedulable. Simple user error, but harmful to the box. | |
2. A worker thread which acquires PF_THREAD_BOUND can never leave a cpuset, | |
rendering the cpuset immortal. | |
Save the user some unexpected trouble, just say no. | |
Signed-off-by: Mike Galbraith <[email protected]> | |
Acked-by: Peter Zijlstra <[email protected]> | |
Acked-by: Thomas Gleixner <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
commit 5e4cba06cbe7fcd2b3ffee5c62f79acf4d126e92 | |
Author: Tejun Heo <[email protected]> | |
Date: Tue Apr 10 10:16:36 2012 -0700 | |
cgroup: remove cgroup_subsys->populate() | |
With memcg converted, cgroup_subsys->populate() doesn't have any user | |
left. Remove it. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit a7a79b552a3bf97788d8d22f2888a4f3310c128e | |
Author: Glauber Costa <[email protected]> | |
Date: Mon Apr 9 19:36:34 2012 -0300 | |
cgroup: get rid of populate for memcg | |
The last man standing justifying the need for populate() is the | |
sock memcg initialization functions. Now that we are able to pass | |
a struct mem_cgroup instead of a struct cgroup to the socket | |
initialization, there is nothing that stops us from initializing | |
everything in create(). | |
Signed-off-by: Glauber Costa <[email protected]> | |
Acked-by: Kamezawa Hiroyuki <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
CC: Li Zefan <[email protected]> | |
CC: Johannes Weiner <[email protected]> | |
CC: Michal Hocko <[email protected]> | |
Conflicts: | |
mm/memcontrol.c | |
commit bed6daf96599cbfdec259507eaccd1d1967f2a9a | |
Author: Glauber Costa <[email protected]> | |
Date: Mon Apr 9 19:36:33 2012 -0300 | |
cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg | |
The only reason cgroup was used, was to be consistent with the populate() | |
interface. Now that we're getting rid of it, not only we no longer need | |
it, but we also *can't* call it this way. | |
Since we will no longer rely on populate(), this will be called from | |
create(). During create, the association between struct mem_cgroup | |
and struct cgroup does not yet exist, since cgroup internals hasn't | |
yet initialized its bookkeeping. This means we would not be able | |
to draw the memcg pointer from the cgroup pointer in these | |
functions, which is highly undesirable. | |
Signed-off-by: Glauber Costa <[email protected]> | |
Acked-by: Kamezawa Hiroyuki <[email protected]> | |
Signed-off-by: Tejun Heo <[email protected]> | |
CC: Li Zefan <[email protected]> | |
CC: Johannes Weiner <[email protected]> | |
CC: Michal Hocko <[email protected]> | |
commit 9a98fea2f9879cc84238cfb57fccf0956627cea1 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:56 2012 -0700 | |
cgroup: make css->refcnt clearing on cgroup removal optional | |
Currently, cgroup removal tries to drain all css references. If there | |
are active css references, the removal logic waits and retries | |
->pre_detroy() until either all refs drop to zero or removal is | |
cancelled. | |
This semantics is unusual and adds non-trivial complexity to cgroup | |
core and IMHO is fundamentally misguided in that it couples internal | |
implementation details (references to internal data structure) with | |
externally visible operation (rmdir). To userland, this is a behavior | |
peculiarity which is unnecessary and difficult to expect (css refs is | |
otherwise invisible from userland), and, to policy implementations, | |
this is an unnecessary restriction (e.g. blkcg wants to hold css refs | |
for caching purposes but can't as that becomes visible as rmdir hang). | |
Unfortunately, memcg currently depends on ->pre_destroy() retrials and | |
cgroup removal vetoing and can't be immmediately switched to the new | |
behavior. This patch introduces the new behavior of not waiting for | |
css refs to drain and maintains the old behavior for subsystems which | |
have __DEPRECATED_clear_css_refs set. | |
Once, memcg is updated, we can drop the code paths for the old | |
behavior as proposed in the following patch. Note that the following | |
patch is incorrect in that dput work item is in cgroup and may lose | |
some of dputs when multiples css's are released back-to-back, and | |
__css_put() triggers check_for_release() when refcnt reaches 0 instead | |
of 1; however, it shows what part can be removed. | |
http://thread.gmane.org/gmane.linux.kernel.containers/22559/focus=75251 | |
Note that, in not-too-distant future, cgroup core will start emitting | |
warning messages for subsys which require the old behavior, so please | |
get moving. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Vivek Goyal <[email protected]> | |
Cc: Johannes Weiner <[email protected]> | |
Cc: Michal Hocko <[email protected]> | |
Cc: Balbir Singh <[email protected]> | |
Cc: KAMEZAWA Hiroyuki <[email protected]> | |
Conflicts: | |
kernel/cgroup.c | |
commit 37e6593be2a784f62e3b7c350a0b778dbc62f922 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:56 2012 -0700 | |
cgroup: use negative bias on css->refcnt to block css_tryget() | |
When a cgroup is about to be removed, cgroup_clear_css_refs() is | |
called to check and ensure that there are no active css references. | |
This is currently achieved by dropping the refcnt to zero iff it has | |
only the base ref. If all css refs could be dropped to zero, ref | |
clearing is successful and CSS_REMOVED is set on all css. If not, the | |
base ref is restored. While css ref is zero w/o CSS_REMOVED set, any | |
css_tryget() attempt on it busy loops so that they are atomic | |
w.r.t. the whole css ref clearing. | |
This does work but dropping and re-instating the base ref is somewhat | |
hairy and makes it difficult to add more logic to the put path as | |
there are two of them - the regular css_put() and the reversible base | |
ref clearing. | |
This patch updates css ref clearing such that blocking new | |
css_tryget() and putting the base ref are separate operations. | |
CSS_DEACT_BIAS, defined as INT_MIN, is added to css->refcnt and | |
css_tryget() busy loops while refcnt is negative. After all css refs | |
are deactivated, if they were all one, ref clearing succeeded and | |
CSS_REMOVED is set and the base ref is put using the regular | |
css_put(); otherwise, CSS_DEACT_BIAS is subtracted from the refcnts | |
and the original postive values are restored. | |
css_refcnt() accessor which always returns the unbiased positive | |
reference counts is added and used to simplify refcnt usages. While | |
at it, relocate and reformat comments in cgroup_has_css_refs(). | |
This separates css->refcnt deactivation and putting the base ref, | |
which enables the next patch to make ref clearing optional. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Conflicts: | |
kernel/cgroup.c | |
commit 6ae80cc8464ab30d769e70241d61ddd45865f3bd | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:56 2012 -0700 | |
cgroup: implement cgroup_rm_cftypes() | |
Implement cgroup_rm_cftypes() which removes an array of cftypes from a | |
subsystem. It can be called whether the target subsys is attached or | |
not. cgroup core will remove the specified file from all existing | |
cgroups. | |
This will be used to improve sub-subsys modularity and will be helpful | |
for unified hierarchy. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit d73a46c95efe8514724ad48fcac7dfb29bcac923 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:56 2012 -0700 | |
cgroup: introduce struct cfent | |
This patch adds cfent (cgroup file entry) which is the association | |
between a cgroup and a file. This is in-cgroup representation of | |
files under a cgroup directory. This simplifies walking walking | |
cgroup files and thus cgroup_clear_directory(), which is now | |
implemented in two parts - cgroup_rm_file() and a loop around it. | |
cgroup_rm_file() will be used to implement cftype removal and cfent is | |
scheduled to serve cgroup specific per-file data (e.g. for sysfs-like | |
"sever" semantics). | |
v2: - cfe was freed from cgroup_rm_file() which led to use-after-free | |
if the file had openers at the time of removal. Moved to | |
cgroup_diput(). | |
- cgroup_clear_directory() triggered WARN_ON_ONCE() if d_subdirs | |
wasn't empty after removing all files. This triggered | |
spuriously if some files were open during directory clearing. | |
Removed. | |
v3: - In cgroup_diput(), WARN_ONCE(!list_empty(&cfe->node)) could be | |
spuriously triggered for root cgroups because they don't go | |
through cgroup_clear_directory() on unmount. Don't trigger WARN | |
for root cgroups. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Glauber Costa <[email protected]> | |
commit 82408a55564678646baf265768ce3d33b064e25c | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: relocate __d_cgrp() and __d_cft() | |
Move the two macros upwards as they'll be used earlier in the file. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 9fbb0bb71a3bbf8b30278100e7804a52e9b25e62 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: remove cgroup_add_file[s]() | |
No controller is using cgroup_add_files[s](). Unexport them, and | |
convert cgroup_add_files() to handle NULL entry terminated array | |
instead of taking count explicitly and continue creation on failure | |
for internal use. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 9e8ad7804f72e8b6eff7d959390f0a2ce369fdd9 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: convert memcg controller to the new cftype interface | |
Convert memcg to use the new cftype based interface. kmem support | |
abuses ->populate() for mem_cgroup_sockets_init() so it can't be | |
removed at the moment. | |
tcp_memcontrol is updated so that tcp_files[] is registered via a | |
__initcall. This change also allows removing the forward declaration | |
of tcp_files[]. Removed. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: KAMEZAWA Hiroyuki <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Johannes Weiner <[email protected]> | |
Cc: Michal Hocko <[email protected]> | |
Cc: Balbir Singh <[email protected]> | |
Cc: Glauber Costa <[email protected]> | |
Cc: Hugh Dickins <[email protected]> | |
Cc: Greg Thelen <[email protected]> | |
commit b32023e8c10f97dbc094c7030f5b9d9f760e4935 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
memcg: always create memsw files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP | |
Instead of conditioning creation of memsw files on do_swap_account, | |
always create the files if compiled-in and fail read/write attempts | |
with -EOPNOTSUPP if !do_swap_account. | |
This is suggested by KAMEZAWA to simplify memcg file creation so that | |
it can use cgroup->subsys_cftypes. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: KAMEZAWA Hiroyuki <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 304613589145ed4689da8a882d9b6652b93f0b1e | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: convert all non-memcg controllers to the new cftype interface | |
Convert debug, freezer, cpuset, cpu_cgroup, cpuacct, net_prio, blkio, | |
net_cls and device controllers to use the new cftype based interface. | |
Termination entry is added to cftype arrays and populate callbacks are | |
replaced with cgroup_subsys->base_cftypes initializations. | |
This is functionally identical transformation. There shouldn't be any | |
visible behavior change. | |
memcg is rather special and will be converted separately. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Paul Menage <[email protected]> | |
Cc: Ingo Molnar <[email protected]> | |
Cc: Peter Zijlstra <[email protected]> | |
Cc: "David S. Miller" <[email protected]> | |
Cc: Vivek Goyal <[email protected]> | |
commit 45da257f6dcf9fdfab5a7594f271397cbc731f2b | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: relocate cftype and cgroup_subsys definitions in controllers | |
blk-cgroup, netprio_cgroup, cls_cgroup and tcp_memcontrol | |
unnecessarily define cftype array and cgroup_subsys structures at the | |
top of the file, which is unconventional and necessiates forward | |
declaration of methods. | |
This patch relocates those below the definitions of the methods and | |
removes the forward declarations. Note that forward declaration of | |
tcp_files[] is added in tcp_memcontrol.c for tcp_init_cgroup(). This | |
will be removed soon by another patch. | |
This patch doesn't introduce any functional change. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 9799ae980cdd1e311eba6b7e7f9f13aa3528a42d | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: merge cft_release_agent cftype array into the base files array | |
Now that cftype can express whether a file should only be on root, | |
cft_release_agent can be merged into the base files cftypes array. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit bd6bb9b83926409b7ee13f03def2161266ca6d30 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:55 2012 -0700 | |
cgroup: implement cgroup_add_cftypes() and friends | |
Currently, cgroup directories are populated by subsys->populate() | |
callback explicitly creating files on each cgroup creation. This | |
level of flexibility isn't needed or desirable. It provides largely | |
unused flexibility which call for abuses while severely limiting what | |
the core layer can do through the lack of structure and conventions. | |
Per each cgroup file type, the only distinction that cgroup users is | |
making is whether a cgroup is root or not, which can easily be | |
expressed with flags. | |
This patch introduces cgroup_add_cftypes(). These deal with cftypes | |
instead of individual files - controllers indicate that certain types | |
of files exist for certain subsystem. Newly added CFTYPE_*_ON_ROOT | |
flags indicate whether a cftype should be excluded or created only on | |
the root cgroup. | |
cgroup_add_cftypes() can be called any time whether the target | |
subsystem is currently attached or not. cgroup core will create files | |
on the existing cgroups as necessary. | |
Also, cgroup_subsys->base_cftypes is added to ease registration of the | |
base files for the subsystem. If non-NULL on subsys init, the cftypes | |
pointed to by ->base_cftypes are automatically registered on subsys | |
init / load. | |
Further patches will convert the existing users and remove the file | |
based interface. Note that this interface allows dynamic addition of | |
files to an active controller. This will be used for sub-controller | |
modularity and unified hierarchy in the longer term. | |
This patch implements the new mechanism but doesn't apply it to any | |
user. | |
v2: replaced DECLARE_CGROUP_CFTYPES[_COND]() with | |
cgroup_subsys->base_cftypes, which works better for cgroup_subsys | |
which is loaded as module. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit f7c3e8e02ead0cffbbb1e80a20dab96317950429 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:54 2012 -0700 | |
cgroup: build list of all cgroups under a given cgroupfs_root | |
Build a list of all cgroups anchored at cgroupfs_root->allcg_list and | |
going through cgroup->allcg_node. The list is protected by | |
cgroup_mutex and will be used to improve cgroup file handling. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit 5b4f71cda30e97cdc874ca8a98992135de189d0b | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:54 2012 -0700 | |
cgroup: move cgroup_clear_directory() call out of cgroup_populate_dir() | |
cgroup_populate_dir() currently clears all files and then repopulate | |
the directory; however, the clearing part is only useful when it's | |
called from cgroup_remount(). Relocate the invocation to | |
cgroup_remount(). | |
This is to prepare for further cgroup file handling updates. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
commit d8d6a3fa984898b1256b9c126c31acf85eb02d72 | |
Author: Tejun Heo <[email protected]> | |
Date: Sun Apr 1 12:09:54 2012 -0700 | |
cgroup: deprecate remount option changes | |
This patch marks the following features for deprecation. | |
* Rebinding subsys by remount: Never reached useful state - only works | |
on empty hierarchies. | |
* release_agent update by remount: release_agent itself will be | |
replaced with conventional fsnotify notification. | |
v2: Lennart pointed out that "name=" is necessary for mounts w/o any | |
controller attached. Drop "name=" deprecation. | |
Signed-off-by: Tejun Heo <[email protected]> | |
Acked-by: Li Zefan <[email protected]> | |
Cc: Lennart Poettering <[email protected]> | |
Conflicts: | |
Documentation/feature-removal-schedule.txt | |
commit 4e5190f3c4ca056efdead0f98d081fc5f8c7aa11 | |
Author: Daisuke Nishimura <[email protected]> | |
Date: Wed Mar 10 15:22:05 2010 -0800 | |
cgroup: introduce coalesce css_get() and css_put() | |
Current css_get() and css_put() increment/decrement css->refcnt one by | |
one. | |
This patch add a new function __css_get(), which takes "count" as a arg | |
and increment the css->refcnt by "count". And this patch also add a new | |
arg("count") to __css_put() and change the function to decrement the | |
css->refcnt by "count". | |
These coalesce version of __css_get()/__css_put() will be used to improve | |
performance of memcg's moving charge feature later, where instead of | |
calling css_get()/css_put() repeatedly, these new functions will be used. | |
No change is needed for current users of css_get()/css_put(). | |
Signed-off-by: Daisuke Nishimura <[email protected]> | |
Acked-by: Paul Menage <[email protected]> | |
Cc: Balbir Singh <[email protected]> | |
Acked-by: KAMEZAWA Hiroyuki <[email protected]> | |
Cc: Li Zefan <[email protected]> | |
Cc: Daisuke Nishimura <[email protected]> | |
Signed-off-by: Andrew Morton <[email protected]> | |
Signed-off-by: Linus Torvalds <[email protected]> | |
Conflicts: | |
kernel/cgroup.c | |
--- | |
Documentation/cgroups/cgroups.txt | 2 +- | |
Documentation/feature-removal-schedule.txt | 21 +- | |
block/blk-cgroup.c | 45 +- | |
fs/xattr.c | 166 +++++++ | |
include/linux/cgroup.h | 107 ++-- | |
include/linux/shmem_fs.h | 3 +- | |
include/linux/uidgid.h | 176 +++++++ | |
include/linux/xattr.h | 48 ++ | |
include/net/sock.h | 12 +- | |
include/net/tcp_memcontrol.h | 4 +- | |
init/Kconfig | 12 +- | |
kernel/cgroup.c | 774 +++++++++++++++++++++-------- | |
kernel/cgroup_freezer.c | 11 +- | |
kernel/cpuset.c | 31 +- | |
kernel/sched/core.c | 16 +- | |
kernel/sys.c | 2 - | |
mm/memcontrol.c | 130 +++-- | |
mm/shmem.c | 181 +------ | |
net/core/netprio_cgroup.c | 30 +- | |
net/core/sock.c | 10 +- | |
net/ipv4/tcp_memcontrol.c | 77 ++- | |
net/sched/cls_cgroup.c | 31 +- | |
security/device_cgroup.c | 10 +- | |
23 files changed, 1241 insertions(+), 658 deletions(-) | |
create mode 100644 include/linux/uidgid.h | |
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt | |
index 594ff17..66ad6bd 100644 | |
--- a/Documentation/cgroups/cgroups.txt | |
+++ b/Documentation/cgroups/cgroups.txt | |
@@ -665,7 +665,7 @@ example in cpusets, no task may attach before 'cpus' and 'mems' are set | |
up. | |
void bind(struct cgroup *root) | |
-(cgroup_mutex and ss->hierarchy_mutex held by caller) | |
+(cgroup_mutex held by caller) | |
Called when a cgroup subsystem is rebound to a different hierarchy | |
and root cgroup. Currently this will only involve movement between | |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt | |
index e4b5775..0eb7118 100644 | |
--- a/Documentation/feature-removal-schedule.txt | |
+++ b/Documentation/feature-removal-schedule.txt | |
@@ -534,18 +534,9 @@ Who: Kees Cook <[email protected]> | |
---------------------------- | |
-What: setitimer accepts user NULL pointer (value) | |
-When: 3.6 | |
-Why: setitimer is not returning -EFAULT if user pointer is NULL. This | |
- violates the spec. | |
-Who: Sasikantha Babu <[email protected]> | |
- | |
----------------------------- | |
- | |
-What: V4L2_CID_HCENTER, V4L2_CID_VCENTER V4L2 controls | |
-When: 3.7 | |
-Why: The V4L2_CID_VCENTER, V4L2_CID_HCENTER controls have been deprecated | |
- for about 4 years and they are not used by any mainline driver. | |
- There are newer controls (V4L2_CID_PAN*, V4L2_CID_TILT*) that provide | |
- similar functionality. | |
-Who: Sylwester Nawrocki <[email protected]> | |
+What: cgroup option updates via remount | |
+When: March 2013 | |
+Why: Remount currently allows changing bound subsystems and | |
+ release_agent. Rebinding is hardly useful as it only works | |
+ when the hierarchy is empty and release_agent itself should be | |
+ replaced with conventional fsnotify. | |
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c | |
index ea84a23..126c341 100644 | |
--- a/block/blk-cgroup.c | |
+++ b/block/blk-cgroup.c | |
@@ -28,34 +28,12 @@ static LIST_HEAD(blkio_list); | |
struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT }; | |
EXPORT_SYMBOL_GPL(blkio_root_cgroup); | |
-static struct cgroup_subsys_state *blkiocg_create(struct cgroup *); | |
-static int blkiocg_can_attach(struct cgroup *, struct cgroup_taskset *); | |
-static void blkiocg_attach(struct cgroup *, struct cgroup_taskset *); | |
-static void blkiocg_destroy(struct cgroup *); | |
-static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *); | |
- | |
/* for encoding cft->private value on file */ | |
#define BLKIOFILE_PRIVATE(x, val) (((x) << 16) | (val)) | |
/* What policy owns the file, proportional or throttle */ | |
#define BLKIOFILE_POLICY(val) (((val) >> 16) & 0xffff) | |
#define BLKIOFILE_ATTR(val) ((val) & 0xffff) | |
-struct cgroup_subsys blkio_subsys = { | |
- .name = "blkio", | |
- .create = blkiocg_create, | |
- .can_attach = blkiocg_can_attach, | |
- .attach = blkiocg_attach, | |
- .destroy = blkiocg_destroy, | |
- .populate = blkiocg_populate, | |
-#ifdef CONFIG_BLK_CGROUP | |
- /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */ | |
- .subsys_id = blkio_subsys_id, | |
-#endif | |
- .use_id = 1, | |
- .module = THIS_MODULE, | |
-}; | |
-EXPORT_SYMBOL_GPL(blkio_subsys); | |
- | |
static inline void blkio_policy_insert_node(struct blkio_cgroup *blkcg, | |
struct blkio_policy_node *pn) | |
{ | |
@@ -1537,14 +1515,9 @@ struct cftype blkio_files[] = { | |
.read_map = blkiocg_file_read_map, | |
}, | |
#endif | |
+ { } /* terminate */ | |
}; | |
-static int blkiocg_populate(struct cgroup_subsys *subsys, struct cgroup *cgroup) | |
-{ | |
- return cgroup_add_files(cgroup, subsys, blkio_files, | |
- ARRAY_SIZE(blkio_files)); | |
-} | |
- | |
static void blkiocg_destroy(struct cgroup *cgroup) | |
{ | |
struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup); | |
@@ -1658,6 +1631,22 @@ static void blkiocg_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) | |
} | |
} | |
+struct cgroup_subsys blkio_subsys = { | |
+ .name = "blkio", | |
+ .create = blkiocg_create, | |
+ .can_attach = blkiocg_can_attach, | |
+ .attach = blkiocg_attach, | |
+ .destroy = blkiocg_destroy, | |
+#ifdef CONFIG_BLK_CGROUP | |
+ /* note: blkio_subsys_id is otherwise defined in blk-cgroup.h */ | |
+ .subsys_id = blkio_subsys_id, | |
+#endif | |
+ .base_cftypes = blkio_files, | |
+ .use_id = 1, | |
+ .module = THIS_MODULE, | |
+}; | |
+EXPORT_SYMBOL_GPL(blkio_subsys); | |
+ | |
void blkio_policy_register(struct blkio_policy_type *blkiop) | |
{ | |
spin_lock(&blkio_list_lock); | |
diff --git a/fs/xattr.c b/fs/xattr.c | |
index 3c8c1cc..050f63a 100644 | |
--- a/fs/xattr.c | |
+++ b/fs/xattr.c | |
@@ -779,3 +779,169 @@ EXPORT_SYMBOL(generic_getxattr); | |
EXPORT_SYMBOL(generic_listxattr); | |
EXPORT_SYMBOL(generic_setxattr); | |
EXPORT_SYMBOL(generic_removexattr); | |
+ | |
+/* | |
+ * Allocate new xattr and copy in the value; but leave the name to callers. | |
+ */ | |
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size) | |
+{ | |
+ struct simple_xattr *new_xattr; | |
+ size_t len; | |
+ | |
+ /* wrap around? */ | |
+ len = sizeof(*new_xattr) + size; | |
+ if (len <= sizeof(*new_xattr)) | |
+ return NULL; | |
+ | |
+ new_xattr = kmalloc(len, GFP_KERNEL); | |
+ if (!new_xattr) | |
+ return NULL; | |
+ | |
+ new_xattr->size = size; | |
+ memcpy(new_xattr->value, value, size); | |
+ return new_xattr; | |
+} | |
+ | |
+/* | |
+ * xattr GET operation for in-memory/pseudo filesystems | |
+ */ | |
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, | |
+ void *buffer, size_t size) | |
+{ | |
+ struct simple_xattr *xattr; | |
+ int ret = -ENODATA; | |
+ | |
+ spin_lock(&xattrs->lock); | |
+ list_for_each_entry(xattr, &xattrs->head, list) { | |
+ if (strcmp(name, xattr->name)) | |
+ continue; | |
+ | |
+ ret = xattr->size; | |
+ if (buffer) { | |
+ if (size < xattr->size) | |
+ ret = -ERANGE; | |
+ else | |
+ memcpy(buffer, xattr->value, xattr->size); | |
+ } | |
+ break; | |
+ } | |
+ spin_unlock(&xattrs->lock); | |
+ return ret; | |
+} | |
+ | |
+static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name, | |
+ const void *value, size_t size, int flags) | |
+{ | |
+ struct simple_xattr *xattr; | |
+ struct simple_xattr *new_xattr = NULL; | |
+ int err = 0; | |
+ | |
+ /* value == NULL means remove */ | |
+ if (value) { | |
+ new_xattr = simple_xattr_alloc(value, size); | |
+ if (!new_xattr) | |
+ return -ENOMEM; | |
+ | |
+ new_xattr->name = kstrdup(name, GFP_KERNEL); | |
+ if (!new_xattr->name) { | |
+ kfree(new_xattr); | |
+ return -ENOMEM; | |
+ } | |
+ } | |
+ | |
+ spin_lock(&xattrs->lock); | |
+ list_for_each_entry(xattr, &xattrs->head, list) { | |
+ if (!strcmp(name, xattr->name)) { | |
+ if (flags & XATTR_CREATE) { | |
+ xattr = new_xattr; | |
+ err = -EEXIST; | |
+ } else if (new_xattr) { | |
+ list_replace(&xattr->list, &new_xattr->list); | |
+ } else { | |
+ list_del(&xattr->list); | |
+ } | |
+ goto out; | |
+ } | |
+ } | |
+ if (flags & XATTR_REPLACE) { | |
+ xattr = new_xattr; | |
+ err = -ENODATA; | |
+ } else { | |
+ list_add(&new_xattr->list, &xattrs->head); | |
+ xattr = NULL; | |
+ } | |
+out: | |
+ spin_unlock(&xattrs->lock); | |
+ if (xattr) { | |
+ kfree(xattr->name); | |
+ kfree(xattr); | |
+ } | |
+ return err; | |
+ | |
+} | |
+ | |
+/* | |
+ * xattr SET operation for in-memory/pseudo filesystems | |
+ */ | |
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, | |
+ const void *value, size_t size, int flags) | |
+{ | |
+ if (size == 0) | |
+ value = ""; /* empty EA, do not remove */ | |
+ return __simple_xattr_set(xattrs, name, value, size, flags); | |
+} | |
+ | |
+/* | |
+ * xattr REMOVE operation for in-memory/pseudo filesystems | |
+ */ | |
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name) | |
+{ | |
+ return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE); | |
+} | |
+ | |
+static bool xattr_is_trusted(const char *name) | |
+{ | |
+ return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN); | |
+} | |
+ | |
+/* | |
+ * xattr LIST operation for in-memory/pseudo filesystems | |
+ */ | |
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, | |
+ size_t size) | |
+{ | |
+ bool trusted = capable(CAP_SYS_ADMIN); | |
+ struct simple_xattr *xattr; | |
+ size_t used = 0; | |
+ | |
+ spin_lock(&xattrs->lock); | |
+ list_for_each_entry(xattr, &xattrs->head, list) { | |
+ size_t len; | |
+ | |
+ /* skip "trusted." attributes for unprivileged callers */ | |
+ if (!trusted && xattr_is_trusted(xattr->name)) | |
+ continue; | |
+ | |
+ len = strlen(xattr->name) + 1; | |
+ used += len; | |
+ if (buffer) { | |
+ if (size < used) { | |
+ used = -ERANGE; | |
+ break; | |
+ } | |
+ memcpy(buffer, xattr->name, len); | |
+ buffer += len; | |
+ } | |
+ } | |
+ spin_unlock(&xattrs->lock); | |
+ | |
+ return used; | |
+} | |
+ | |
+void simple_xattr_list_add(struct simple_xattrs *xattrs, | |
+ struct simple_xattr *new_xattr) | |
+{ | |
+ spin_lock(&xattrs->lock); | |
+ list_add(&new_xattr->list, &xattrs->head); | |
+ spin_unlock(&xattrs->lock); | |
+} | |
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h | |
index 6719089..d752505 100644 | |
--- a/include/linux/cgroup.h | |
+++ b/include/linux/cgroup.h | |
@@ -16,6 +16,8 @@ | |
#include <linux/prio_heap.h> | |
#include <linux/rwsem.h> | |
#include <linux/idr.h> | |
+#include <linux/workqueue.h> | |
+#include <linux/xattr.h> | |
#ifdef CONFIG_CGROUPS | |
@@ -75,14 +77,24 @@ struct cgroup_subsys_state { | |
unsigned long flags; | |
/* ID for this css, if possible */ | |
struct css_id __rcu *id; | |
+ | |
+ /* Used to put @cgroup->dentry on the last css_put() */ | |
+ struct work_struct dput_work; | |
}; | |
/* bits in struct cgroup_subsys_state flags field */ | |
enum { | |
CSS_ROOT, /* This CSS is the root of the subsystem */ | |
CSS_REMOVED, /* This CSS is dead */ | |
+ CSS_CLEAR_CSS_REFS, /* @ss->__DEPRECATED_clear_css_refs */ | |
}; | |
+/* Caller must verify that the css is not for root cgroup */ | |
+static inline void __css_get(struct cgroup_subsys_state *css, int count) | |
+{ | |
+ atomic_add(count, &css->refcnt); | |
+} | |
+ | |
/* | |
* Call css_get() to hold a reference on the css; it can be used | |
* for a reference obtained via: | |
@@ -109,16 +121,12 @@ static inline bool css_is_removed(struct cgroup_subsys_state *css) | |
* the css has been destroyed. | |
*/ | |
+extern bool __css_tryget(struct cgroup_subsys_state *css); | |
static inline bool css_tryget(struct cgroup_subsys_state *css) | |
{ | |
if (test_bit(CSS_ROOT, &css->flags)) | |
return true; | |
- while (!atomic_inc_not_zero(&css->refcnt)) { | |
- if (test_bit(CSS_REMOVED, &css->flags)) | |
- return false; | |
- cpu_relax(); | |
- } | |
- return true; | |
+ return __css_tryget(css); | |
} | |
/* | |
@@ -126,11 +134,11 @@ static inline bool css_tryget(struct cgroup_subsys_state *css) | |
* css_get() or css_tryget() | |
*/ | |
-extern void __css_put(struct cgroup_subsys_state *css, int count); | |
+extern void __css_put(struct cgroup_subsys_state *css); | |
static inline void css_put(struct cgroup_subsys_state *css) | |
{ | |
if (!test_bit(CSS_ROOT, &css->flags)) | |
- __css_put(css, 1); | |
+ __css_put(css); | |
} | |
/* bits in struct cgroup flags field */ | |
@@ -166,6 +174,7 @@ struct cgroup { | |
*/ | |
struct list_head sibling; /* my parent's children */ | |
struct list_head children; /* my children */ | |
+ struct list_head files; /* my files */ | |
struct cgroup *parent; /* my parent */ | |
struct dentry __rcu *dentry; /* cgroup fs entry, RCU protected */ | |
@@ -182,6 +191,9 @@ struct cgroup { | |
*/ | |
struct list_head css_sets; | |
+ struct list_head allcg_node; /* cgroupfs_root->allcg_list */ | |
+ struct list_head cft_q_node; /* used during cftype add/rm */ | |
+ | |
/* | |
* Linked list running through all cgroups that can | |
* potentially be reaped by the release agent. Protected by | |
@@ -202,6 +214,9 @@ struct cgroup { | |
/* List of events which userspace want to receive */ | |
struct list_head event_list; | |
spinlock_t event_list_lock; | |
+ | |
+ /* directory xattrs */ | |
+ struct simple_xattrs xattrs; | |
}; | |
/* | |
@@ -267,11 +282,17 @@ struct cgroup_map_cb { | |
* - the 'cftype' of the file is file->f_dentry->d_fsdata | |
*/ | |
-#define MAX_CFTYPE_NAME 64 | |
+/* cftype->flags */ | |
+#define CFTYPE_ONLY_ON_ROOT (1U << 0) /* only create on root cg */ | |
+#define CFTYPE_NOT_ON_ROOT (1U << 1) /* don't create onp root cg */ | |
+ | |
+#define MAX_CFTYPE_NAME 64 | |
+ | |
struct cftype { | |
/* | |
* By convention, the name should begin with the name of the | |
- * subsystem, followed by a period | |
+ * subsystem, followed by a period. Zero length string indicates | |
+ * end of cftype array. | |
*/ | |
char name[MAX_CFTYPE_NAME]; | |
int private; | |
@@ -287,6 +308,12 @@ struct cftype { | |
*/ | |
size_t max_write_len; | |
+ /* CFTYPE_* flags */ | |
+ unsigned int flags; | |
+ | |
+ /* file xattrs */ | |
+ struct simple_xattrs xattrs; | |
+ | |
int (*open)(struct inode *inode, struct file *file); | |
ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft, | |
struct file *file, | |
@@ -365,6 +392,16 @@ struct cftype { | |
struct eventfd_ctx *eventfd); | |
}; | |
+/* | |
+ * cftype_sets describe cftypes belonging to a subsystem and are chained at | |
+ * cgroup_subsys->cftsets. Each cftset points to an array of cftypes | |
+ * terminated by zero length name. | |
+ */ | |
+struct cftype_set { | |
+ struct list_head node; /* chained at subsys->cftsets */ | |
+ struct cftype *cfts; | |
+}; | |
+ | |
struct cgroup_scanner { | |
struct cgroup *cg; | |
int (*test_task)(struct task_struct *p, struct cgroup_scanner *scan); | |
@@ -374,21 +411,8 @@ struct cgroup_scanner { | |
void *data; | |
}; | |
-/* | |
- * Add a new file to the given cgroup directory. Should only be | |
- * called by subsystems from within a populate() method | |
- */ | |
-int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys, | |
- const struct cftype *cft); | |
- | |
-/* | |
- * Add a set of new files to the given cgroup directory. Should | |
- * only be called by subsystems from within a populate() method | |
- */ | |
-int cgroup_add_files(struct cgroup *cgrp, | |
- struct cgroup_subsys *subsys, | |
- const struct cftype cft[], | |
- int count); | |
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); | |
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); | |
int cgroup_is_removed(const struct cgroup *cgrp); | |
@@ -454,7 +478,6 @@ struct cgroup_subsys { | |
void (*fork)(struct task_struct *task); | |
void (*exit)(struct cgroup *cgrp, struct cgroup *old_cgrp, | |
struct task_struct *task); | |
- int (*populate)(struct cgroup_subsys *ss, struct cgroup *cgrp); | |
void (*post_clone)(struct cgroup *cgrp); | |
void (*bind)(struct cgroup *root); | |
@@ -467,25 +490,24 @@ struct cgroup_subsys { | |
* (not available in early_init time.) | |
*/ | |
bool use_id; | |
-#define MAX_CGROUP_TYPE_NAMELEN 32 | |
- const char *name; | |
/* | |
- * Protects sibling/children links of cgroups in this | |
- * hierarchy, plus protects which hierarchy (or none) the | |
- * subsystem is a part of (i.e. root/sibling). To avoid | |
- * potential deadlocks, the following operations should not be | |
- * undertaken while holding any hierarchy_mutex: | |
+ * If %true, cgroup removal will try to clear css refs by retrying | |
+ * ss->pre_destroy() until there's no css ref left. This behavior | |
+ * is strictly for backward compatibility and will be removed as | |
+ * soon as the current user (memcg) is updated. | |
* | |
- * - allocating memory | |
- * - initiating hotplug events | |
+ * If %false, ss->pre_destroy() can't fail and cgroup removal won't | |
+ * wait for css refs to drop to zero before proceeding. | |
*/ | |
- struct mutex hierarchy_mutex; | |
- struct lock_class_key subsys_key; | |
+ bool __DEPRECATED_clear_css_refs; | |
+ | |
+#define MAX_CGROUP_TYPE_NAMELEN 32 | |
+ const char *name; | |
/* | |
* Link to parent, and list entry in parent's children. | |
- * Protected by this->hierarchy_mutex and cgroup_lock() | |
+ * Protected by cgroup_lock() | |
*/ | |
struct cgroupfs_root *root; | |
struct list_head sibling; | |
@@ -493,6 +515,13 @@ struct cgroup_subsys { | |
struct idr idr; | |
spinlock_t id_lock; | |
+ /* list of cftype_sets */ | |
+ struct list_head cftsets; | |
+ | |
+ /* base cftypes, automatically [de]registered with subsys itself */ | |
+ struct cftype *base_cftypes; | |
+ struct cftype_set base_cftset; | |
+ | |
/* should be defined only by modular subsystems */ | |
struct module *module; | |
}; | |
@@ -604,7 +633,7 @@ int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); | |
* the lifetime of cgroup_subsys_state is subsys's matter. | |
* | |
* Looking up and scanning function should be called under rcu_read_lock(). | |
- * Taking cgroup_mutex()/hierarchy_mutex() is not necessary for following calls. | |
+ * Taking cgroup_mutex is not necessary for following calls. | |
* But the css returned by this routine can be "not populated yet" or "being | |
* destroyed". The caller should check css and cgroup's status. | |
*/ | |
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h | |
index 79ab255..72263bd 100644 | |
--- a/include/linux/shmem_fs.h | |
+++ b/include/linux/shmem_fs.h | |
@@ -5,6 +5,7 @@ | |
#include <linux/mempolicy.h> | |
#include <linux/pagemap.h> | |
#include <linux/percpu_counter.h> | |
+#include <linux/xattr.h> | |
/* inode in-kernel data */ | |
@@ -18,7 +19,7 @@ struct shmem_inode_info { | |
}; | |
struct shared_policy policy; /* NUMA memory alloc policy */ | |
struct list_head swaplist; /* chain of maybes on swap */ | |
- struct list_head xattr_list; /* list of shmem_xattr */ | |
+ struct simple_xattrs xattrs; /* list of xattrs */ | |
struct inode vfs_inode; | |
}; | |
diff --git a/include/linux/uidgid.h b/include/linux/uidgid.h | |
new file mode 100644 | |
index 0000000..5398568 | |
--- /dev/null | |
+++ b/include/linux/uidgid.h | |
@@ -0,0 +1,176 @@ | |
+#ifndef _LINUX_UIDGID_H | |
+#define _LINUX_UIDGID_H | |
+ | |
+/* | |
+ * A set of types for the internal kernel types representing uids and gids. | |
+ * | |
+ * The types defined in this header allow distinguishing which uids and gids in | |
+ * the kernel are values used by userspace and which uid and gid values are | |
+ * the internal kernel values. With the addition of user namespaces the values | |
+ * can be different. Using the type system makes it possible for the compiler | |
+ * to detect when we overlook these differences. | |
+ * | |
+ */ | |
+#include <linux/types.h> | |
+#include <linux/highuid.h> | |
+ | |
+struct user_namespace; | |
+extern struct user_namespace init_user_ns; | |
+ | |
+#ifdef CONFIG_UIDGID_STRICT_TYPE_CHECKS | |
+ | |
+typedef struct { | |
+ uid_t val; | |
+} kuid_t; | |
+ | |
+ | |
+typedef struct { | |
+ gid_t val; | |
+} kgid_t; | |
+ | |
+#define KUIDT_INIT(value) (kuid_t){ value } | |
+#define KGIDT_INIT(value) (kgid_t){ value } | |
+ | |
+static inline uid_t __kuid_val(kuid_t uid) | |
+{ | |
+ return uid.val; | |
+} | |
+ | |
+static inline gid_t __kgid_val(kgid_t gid) | |
+{ | |
+ return gid.val; | |
+} | |
+ | |
+#else | |
+ | |
+typedef uid_t kuid_t; | |
+typedef gid_t kgid_t; | |
+ | |
+static inline uid_t __kuid_val(kuid_t uid) | |
+{ | |
+ return uid; | |
+} | |
+ | |
+static inline gid_t __kgid_val(kgid_t gid) | |
+{ | |
+ return gid; | |
+} | |
+ | |
+#define KUIDT_INIT(value) ((kuid_t) value ) | |
+#define KGIDT_INIT(value) ((kgid_t) value ) | |
+ | |
+#endif | |
+ | |
+#define GLOBAL_ROOT_UID KUIDT_INIT(0) | |
+#define GLOBAL_ROOT_GID KGIDT_INIT(0) | |
+ | |
+#define INVALID_UID KUIDT_INIT(-1) | |
+#define INVALID_GID KGIDT_INIT(-1) | |
+ | |
+static inline bool uid_eq(kuid_t left, kuid_t right) | |
+{ | |
+ return __kuid_val(left) == __kuid_val(right); | |
+} | |
+ | |
+static inline bool gid_eq(kgid_t left, kgid_t right) | |
+{ | |
+ return __kgid_val(left) == __kgid_val(right); | |
+} | |
+ | |
+static inline bool uid_gt(kuid_t left, kuid_t right) | |
+{ | |
+ return __kuid_val(left) > __kuid_val(right); | |
+} | |
+ | |
+static inline bool gid_gt(kgid_t left, kgid_t right) | |
+{ | |
+ return __kgid_val(left) > __kgid_val(right); | |
+} | |
+ | |
+static inline bool uid_gte(kuid_t left, kuid_t right) | |
+{ | |
+ return __kuid_val(left) >= __kuid_val(right); | |
+} | |
+ | |
+static inline bool gid_gte(kgid_t left, kgid_t right) | |
+{ | |
+ return __kgid_val(left) >= __kgid_val(right); | |
+} | |
+ | |
+static inline bool uid_lt(kuid_t left, kuid_t right) | |
+{ | |
+ return __kuid_val(left) < __kuid_val(right); | |
+} | |
+ | |
+static inline bool gid_lt(kgid_t left, kgid_t right) | |
+{ | |
+ return __kgid_val(left) < __kgid_val(right); | |
+} | |
+ | |
+static inline bool uid_lte(kuid_t left, kuid_t right) | |
+{ | |
+ return __kuid_val(left) <= __kuid_val(right); | |
+} | |
+ | |
+static inline bool gid_lte(kgid_t left, kgid_t right) | |
+{ | |
+ return __kgid_val(left) <= __kgid_val(right); | |
+} | |
+ | |
+static inline bool uid_valid(kuid_t uid) | |
+{ | |
+ return !uid_eq(uid, INVALID_UID); | |
+} | |
+ | |
+static inline bool gid_valid(kgid_t gid) | |
+{ | |
+ return !gid_eq(gid, INVALID_GID); | |
+} | |
+ | |
+static inline kuid_t make_kuid(struct user_namespace *from, uid_t uid) | |
+{ | |
+ return KUIDT_INIT(uid); | |
+} | |
+ | |
+static inline kgid_t make_kgid(struct user_namespace *from, gid_t gid) | |
+{ | |
+ return KGIDT_INIT(gid); | |
+} | |
+ | |
+static inline uid_t from_kuid(struct user_namespace *to, kuid_t kuid) | |
+{ | |
+ return __kuid_val(kuid); | |
+} | |
+ | |
+static inline gid_t from_kgid(struct user_namespace *to, kgid_t kgid) | |
+{ | |
+ return __kgid_val(kgid); | |
+} | |
+ | |
+static inline uid_t from_kuid_munged(struct user_namespace *to, kuid_t kuid) | |
+{ | |
+ uid_t uid = from_kuid(to, kuid); | |
+ if (uid == (uid_t)-1) | |
+ uid = overflowuid; | |
+ return uid; | |
+} | |
+ | |
+static inline gid_t from_kgid_munged(struct user_namespace *to, kgid_t kgid) | |
+{ | |
+ gid_t gid = from_kgid(to, kgid); | |
+ if (gid == (gid_t)-1) | |
+ gid = overflowgid; | |
+ return gid; | |
+} | |
+ | |
+static inline bool kuid_has_mapping(struct user_namespace *ns, kuid_t uid) | |
+{ | |
+ return true; | |
+} | |
+ | |
+static inline bool kgid_has_mapping(struct user_namespace *ns, kgid_t gid) | |
+{ | |
+ return true; | |
+} | |
+ | |
+#endif /* _LINUX_UIDGID_H */ | |
diff --git a/include/linux/xattr.h b/include/linux/xattr.h | |
index e5d1220..2ace7a6 100644 | |
--- a/include/linux/xattr.h | |
+++ b/include/linux/xattr.h | |
@@ -59,7 +59,9 @@ | |
#ifdef __KERNEL__ | |
+#include <linux/slab.h> | |
#include <linux/types.h> | |
+#include <linux/spinlock.h> | |
struct inode; | |
struct dentry; | |
@@ -96,6 +98,52 @@ ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name, | |
char **xattr_value, size_t size, gfp_t flags); | |
int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name, | |
const char *value, size_t size, gfp_t flags); | |
+ | |
+struct simple_xattrs { | |
+ struct list_head head; | |
+ spinlock_t lock; | |
+}; | |
+ | |
+struct simple_xattr { | |
+ struct list_head list; | |
+ char *name; | |
+ size_t size; | |
+ char value[0]; | |
+}; | |
+ | |
+/* | |
+ * initialize the simple_xattrs structure | |
+ */ | |
+static inline void simple_xattrs_init(struct simple_xattrs *xattrs) | |
+{ | |
+ INIT_LIST_HEAD(&xattrs->head); | |
+ spin_lock_init(&xattrs->lock); | |
+} | |
+ | |
+/* | |
+ * free all the xattrs | |
+ */ | |
+static inline void simple_xattrs_free(struct simple_xattrs *xattrs) | |
+{ | |
+ struct simple_xattr *xattr, *node; | |
+ | |
+ list_for_each_entry_safe(xattr, node, &xattrs->head, list) { | |
+ kfree(xattr->name); | |
+ kfree(xattr); | |
+ } | |
+} | |
+ | |
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size); | |
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name, | |
+ void *buffer, size_t size); | |
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name, | |
+ const void *value, size_t size, int flags); | |
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name); | |
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer, | |
+ size_t size); | |
+void simple_xattr_list_add(struct simple_xattrs *xattrs, | |
+ struct simple_xattr *new_xattr); | |
+ | |
#endif /* __KERNEL__ */ | |
#endif /* _LINUX_XATTR_H */ | |
diff --git a/include/net/sock.h b/include/net/sock.h | |
index f673ba5..e3ab749 100644 | |
--- a/include/net/sock.h | |
+++ b/include/net/sock.h | |
@@ -70,16 +70,16 @@ | |
struct cgroup; | |
struct cgroup_subsys; | |
#ifdef CONFIG_NET | |
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss); | |
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp); | |
+int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss); | |
+void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg); | |
#else | |
static inline | |
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
+int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss) | |
{ | |
return 0; | |
} | |
static inline | |
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp) | |
+void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg) | |
{ | |
} | |
#endif | |
@@ -915,9 +915,9 @@ struct proto { | |
* This function has to setup any files the protocol want to | |
* appear in the kmem cgroup filesystem. | |
*/ | |
- int (*init_cgroup)(struct cgroup *cgrp, | |
+ int (*init_cgroup)(struct mem_cgroup *memcg, | |
struct cgroup_subsys *ss); | |
- void (*destroy_cgroup)(struct cgroup *cgrp); | |
+ void (*destroy_cgroup)(struct mem_cgroup *memcg); | |
struct cg_proto *(*proto_cgroup)(struct mem_cgroup *memcg); | |
#endif | |
}; | |
diff --git a/include/net/tcp_memcontrol.h b/include/net/tcp_memcontrol.h | |
index 48410ff..7df18bc 100644 | |
--- a/include/net/tcp_memcontrol.h | |
+++ b/include/net/tcp_memcontrol.h | |
@@ -12,8 +12,8 @@ struct tcp_memcontrol { | |
}; | |
struct cg_proto *tcp_proto_cgroup(struct mem_cgroup *memcg); | |
-int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss); | |
-void tcp_destroy_cgroup(struct cgroup *cgrp); | |
+int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss); | |
+void tcp_destroy_cgroup(struct mem_cgroup *memcg); | |
unsigned long long tcp_max_memory(const struct mem_cgroup *memcg); | |
void tcp_prot_mem(struct mem_cgroup *memcg, long val, int idx); | |
#endif /* _TCP_MEMCG_H */ | |
diff --git a/init/Kconfig b/init/Kconfig | |
index a7cffc8..931d5b5 100644 | |
--- a/init/Kconfig | |
+++ b/init/Kconfig | |
@@ -828,7 +828,8 @@ config IPC_NS | |
config USER_NS | |
bool "User namespace (EXPERIMENTAL)" | |
depends on EXPERIMENTAL | |
- default y | |
+ select UIDGID_STRICT_TYPE_CHECKS | |
+ default n | |
help | |
This allows containers, i.e. vservers, to use user namespaces | |
to provide different user info for different servers. | |
@@ -852,6 +853,15 @@ config NET_NS | |
endif # NAMESPACES | |
+config UIDGID_STRICT_TYPE_CHECKS | |
+ bool "Require conversions between uid/gids and their internal representation" | |
+ default n | |
+ help | |
+ While the nececessary conversions are being added to all subsystems this option allows | |
+ the code to continue to build for unconverted subsystems. | |
+ | |
+ Say Y here if you want the strict type checking enabled | |
+ | |
config SCHED_AUTOGROUP | |
bool "Automatic process group scheduling" | |
select EVENTFD | |
diff --git a/kernel/cgroup.c b/kernel/cgroup.c | |
index 714ac5d..5bdbcdf 100644 | |
--- a/kernel/cgroup.c | |
+++ b/kernel/cgroup.c | |
@@ -60,9 +60,13 @@ | |
#include <linux/eventfd.h> | |
#include <linux/poll.h> | |
#include <linux/flex_array.h> /* used in cgroup_attach_proc */ | |
+#include <linux/kthread.h> | |
#include <linux/atomic.h> | |
+/* css deactivation bias, makes css->refcnt negative to deny new trygets */ | |
+#define CSS_DEACT_BIAS INT_MIN | |
+ | |
/* | |
* cgroup_mutex is the master lock. Any modification to cgroup or its | |
* hierarchy must be performed while holding it. | |
@@ -127,6 +131,9 @@ struct cgroupfs_root { | |
/* A list running through the active hierarchies */ | |
struct list_head root_list; | |
+ /* All cgroups on this root, cgroup_mutex protected */ | |
+ struct list_head allcg_list; | |
+ | |
/* Hierarchy-specific flags */ | |
unsigned long flags; | |
@@ -145,6 +152,15 @@ struct cgroupfs_root { | |
static struct cgroupfs_root rootnode; | |
/* | |
+ * cgroupfs file entry, pointed to from leaf dentry->d_fsdata. | |
+ */ | |
+struct cfent { | |
+ struct list_head node; | |
+ struct dentry *dentry; | |
+ struct cftype *type; | |
+}; | |
+ | |
+/* | |
* CSS ID -- ID per subsys's Cgroup Subsys State(CSS). used only when | |
* cgroup_subsys->use_id != 0. | |
*/ | |
@@ -239,6 +255,19 @@ int cgroup_lock_is_held(void) | |
EXPORT_SYMBOL_GPL(cgroup_lock_is_held); | |
+static int css_unbias_refcnt(int refcnt) | |
+{ | |
+ return refcnt >= 0 ? refcnt : refcnt - CSS_DEACT_BIAS; | |
+} | |
+ | |
+/* the current nr of refs, always >= 0 whether @css is deactivated or not */ | |
+static int css_refcnt(struct cgroup_subsys_state *css) | |
+{ | |
+ int v = atomic_read(&css->refcnt); | |
+ | |
+ return css_unbias_refcnt(v); | |
+} | |
+ | |
/* convenient tests for these bits */ | |
inline int cgroup_is_removed(const struct cgroup *cgrp) | |
{ | |
@@ -247,7 +276,8 @@ inline int cgroup_is_removed(const struct cgroup *cgrp) | |
/* bits in struct cgroupfs_root flags field */ | |
enum { | |
- ROOT_NOPREFIX, /* mounted subsystems have no named prefix */ | |
+ ROOT_NOPREFIX, /* mounted subsystems have no named prefix */ | |
+ ROOT_XATTR, /* supports extended attributes */ | |
}; | |
static int cgroup_is_releasable(const struct cgroup *cgrp) | |
@@ -279,6 +309,21 @@ list_for_each_entry(_ss, &_root->subsys_list, sibling) | |
#define for_each_active_root(_root) \ | |
list_for_each_entry(_root, &roots, root_list) | |
+static inline struct cgroup *__d_cgrp(struct dentry *dentry) | |
+{ | |
+ return dentry->d_fsdata; | |
+} | |
+ | |
+static inline struct cfent *__d_cfe(struct dentry *dentry) | |
+{ | |
+ return dentry->d_fsdata; | |
+} | |
+ | |
+static inline struct cftype *__d_cft(struct dentry *dentry) | |
+{ | |
+ return __d_cfe(dentry)->type; | |
+} | |
+ | |
/* the list of cgroups eligible for automatic release. Protected by | |
* release_list_lock */ | |
static LIST_HEAD(release_list); | |
@@ -818,7 +863,8 @@ EXPORT_SYMBOL_GPL(cgroup_unlock); | |
static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode); | |
static struct dentry *cgroup_lookup(struct inode *, struct dentry *, struct nameidata *); | |
static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry); | |
-static int cgroup_populate_dir(struct cgroup *cgrp); | |
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files, | |
+ unsigned long subsys_mask); | |
static const struct inode_operations cgroup_dir_inode_operations; | |
static const struct file_operations proc_cgroupstats_operations; | |
@@ -854,12 +900,17 @@ static int cgroup_call_pre_destroy(struct cgroup *cgrp) | |
struct cgroup_subsys *ss; | |
int ret = 0; | |
- for_each_subsys(cgrp->root, ss) | |
- if (ss->pre_destroy) { | |
- ret = ss->pre_destroy(cgrp); | |
- if (ret) | |
- break; | |
+ for_each_subsys(cgrp->root, ss) { | |
+ if (!ss->pre_destroy) | |
+ continue; | |
+ | |
+ ret = ss->pre_destroy(cgrp); | |
+ if (ret) { | |
+ /* ->pre_destroy() failure is being deprecated */ | |
+ WARN_ON_ONCE(!ss->__DEPRECATED_clear_css_refs); | |
+ break; | |
} | |
+ } | |
return ret; | |
} | |
@@ -901,7 +952,19 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode) | |
*/ | |
BUG_ON(!list_empty(&cgrp->pidlists)); | |
+ simple_xattrs_free(&cgrp->xattrs); | |
+ | |
kfree_rcu(cgrp, rcu_head); | |
+ } else { | |
+ struct cfent *cfe = __d_cfe(dentry); | |
+ struct cgroup *cgrp = dentry->d_parent->d_fsdata; | |
+ struct cftype *cft = cfe->type; | |
+ | |
+ WARN_ONCE(!list_empty(&cfe->node) && | |
+ cgrp != &cgrp->root->top_cgroup, | |
+ "cfe still linked for %s\n", cfe->type->name); | |
+ kfree(cfe); | |
+ simple_xattrs_free(&cft->xattrs); | |
} | |
iput(inode); | |
} | |
@@ -920,34 +983,53 @@ static void remove_dir(struct dentry *d) | |
dput(parent); | |
} | |
-static void cgroup_clear_directory(struct dentry *dentry) | |
-{ | |
- struct list_head *node; | |
- | |
- BUG_ON(!mutex_is_locked(&dentry->d_inode->i_mutex)); | |
- spin_lock(&dentry->d_lock); | |
- node = dentry->d_subdirs.next; | |
- while (node != &dentry->d_subdirs) { | |
- struct dentry *d = list_entry(node, struct dentry, d_u.d_child); | |
- | |
- spin_lock_nested(&d->d_lock, DENTRY_D_LOCK_NESTED); | |
- list_del_init(node); | |
- if (d->d_inode) { | |
- /* This should never be called on a cgroup | |
- * directory with child cgroups */ | |
- BUG_ON(d->d_inode->i_mode & S_IFDIR); | |
- dget_dlock(d); | |
- spin_unlock(&d->d_lock); | |
- spin_unlock(&dentry->d_lock); | |
- d_delete(d); | |
- simple_unlink(dentry->d_inode, d); | |
- dput(d); | |
- spin_lock(&dentry->d_lock); | |
- } else | |
- spin_unlock(&d->d_lock); | |
- node = dentry->d_subdirs.next; | |
+static int cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft) | |
+{ | |
+ struct cfent *cfe; | |
+ | |
+ lockdep_assert_held(&cgrp->dentry->d_inode->i_mutex); | |
+ lockdep_assert_held(&cgroup_mutex); | |
+ | |
+ list_for_each_entry(cfe, &cgrp->files, node) { | |
+ struct dentry *d = cfe->dentry; | |
+ | |
+ if (cft && cfe->type != cft) | |
+ continue; | |
+ | |
+ dget(d); | |
+ d_delete(d); | |
+ simple_unlink(cgrp->dentry->d_inode, d); | |
+ list_del_init(&cfe->node); | |
+ dput(d); | |
+ | |
+ return 0; | |
+ } | |
+ return -ENOENT; | |
+} | |
+ | |
+/** | |
+ * cgroup_clear_directory - selective removal of base and subsystem files | |
+ * @dir: directory containing the files | |
+ * @base_files: true if the base files should be removed | |
+ * @subsys_mask: mask of the subsystem ids whose files should be removed | |
+ */ | |
+static void cgroup_clear_directory(struct dentry *dir, bool base_files, | |
+ unsigned long subsys_mask) | |
+{ | |
+ struct cgroup *cgrp = __d_cgrp(dir); | |
+ struct cgroup_subsys *ss; | |
+ | |
+ for_each_subsys(cgrp->root, ss) { | |
+ struct cftype_set *set; | |
+ if (!test_bit(ss->subsys_id, &subsys_mask)) | |
+ continue; | |
+ list_for_each_entry(set, &ss->cftsets, node) | |
+ cgroup_rm_file(cgrp, set->cfts); | |
+ } | |
+ if (base_files) { | |
+ while (!list_empty(&cgrp->files)) | |
+ cgroup_rm_file(cgrp, NULL); | |
} | |
- spin_unlock(&dentry->d_lock); | |
} | |
/* | |
@@ -956,8 +1038,9 @@ static void cgroup_clear_directory(struct dentry *dentry) | |
static void cgroup_d_remove_dir(struct dentry *dentry) | |
{ | |
struct dentry *parent; | |
+ struct cgroupfs_root *root = dentry->d_sb->s_fs_info; | |
- cgroup_clear_directory(dentry); | |
+ cgroup_clear_directory(dentry, true, root->subsys_bits); | |
parent = dentry->d_parent; | |
spin_lock(&parent->d_lock); | |
@@ -1020,28 +1103,24 @@ static int rebind_subsystems(struct cgroupfs_root *root, | |
BUG_ON(cgrp->subsys[i]); | |
BUG_ON(!dummytop->subsys[i]); | |
BUG_ON(dummytop->subsys[i]->cgroup != dummytop); | |
- mutex_lock(&ss->hierarchy_mutex); | |
cgrp->subsys[i] = dummytop->subsys[i]; | |
cgrp->subsys[i]->cgroup = cgrp; | |
list_move(&ss->sibling, &root->subsys_list); | |
ss->root = root; | |
if (ss->bind) | |
ss->bind(cgrp); | |
- mutex_unlock(&ss->hierarchy_mutex); | |
/* refcount was already taken, and we're keeping it */ | |
} else if (bit & removed_bits) { | |
/* We're removing this subsystem */ | |
BUG_ON(ss == NULL); | |
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]); | |
BUG_ON(cgrp->subsys[i]->cgroup != cgrp); | |
- mutex_lock(&ss->hierarchy_mutex); | |
if (ss->bind) | |
ss->bind(dummytop); | |
dummytop->subsys[i]->cgroup = dummytop; | |
cgrp->subsys[i] = NULL; | |
subsys[i]->root = &rootnode; | |
list_move(&ss->sibling, &rootnode.subsys_list); | |
- mutex_unlock(&ss->hierarchy_mutex); | |
/* subsystem is now free - drop reference on module */ | |
module_put(ss->module); | |
} else if (bit & final_bits) { | |
@@ -1077,6 +1156,8 @@ static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry) | |
seq_printf(seq, ",%s", ss->name); | |
if (test_bit(ROOT_NOPREFIX, &root->flags)) | |
seq_puts(seq, ",noprefix"); | |
+ if (test_bit(ROOT_XATTR, &root->flags)) | |
+ seq_puts(seq, ",xattr"); | |
if (strlen(root->release_agent_path)) | |
seq_printf(seq, ",release_agent=%s", root->release_agent_path); | |
if (clone_children(&root->top_cgroup)) | |
@@ -1145,6 +1226,10 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts) | |
opts->clone_children = true; | |
continue; | |
} | |
+ if (!strcmp(token, "xattr")) { | |
+ set_bit(ROOT_XATTR, &opts->flags); | |
+ continue; | |
+ } | |
if (!strncmp(token, "release_agent=", 14)) { | |
/* Specifying two release agents is forbidden */ | |
if (opts->release_agent) | |
@@ -1295,6 +1380,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data) | |
struct cgroupfs_root *root = sb->s_fs_info; | |
struct cgroup *cgrp = &root->top_cgroup; | |
struct cgroup_sb_opts opts; | |
+ unsigned long added_bits, removed_bits; | |
mutex_lock(&cgrp->dentry->d_inode->i_mutex); | |
mutex_lock(&cgroup_mutex); | |
@@ -1305,6 +1391,14 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data) | |
if (ret) | |
goto out_unlock; | |
+ /* See feature-removal-schedule.txt */ | |
+ if (opts.subsys_bits != root->actual_subsys_bits || opts.release_agent) | |
+ pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n", | |
+ task_tgid_nr(current), current->comm); | |
+ | |
+ added_bits = opts.subsys_bits & ~root->subsys_bits; | |
+ removed_bits = root->subsys_bits & ~opts.subsys_bits; | |
+ | |
/* Don't allow flags or name to change at remount */ | |
if (opts.flags != root->flags || | |
(opts.name && strcmp(opts.name, root->name))) { | |
@@ -1319,8 +1413,10 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data) | |
goto out_unlock; | |
} | |
- /* (re)populate subsystem files */ | |
- cgroup_populate_dir(cgrp); | |
+ /* clear out any existing files and repopulate subsystem files */ | |
+ cgroup_clear_directory(cgrp->dentry, false, removed_bits); | |
+ /* re-populate subsystem files */ | |
+ cgroup_populate_dir(cgrp, false, added_bits); | |
if (opts.release_agent) | |
strcpy(root->release_agent_path, opts.release_agent); | |
@@ -1344,22 +1440,27 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp) | |
{ | |
INIT_LIST_HEAD(&cgrp->sibling); | |
INIT_LIST_HEAD(&cgrp->children); | |
+ INIT_LIST_HEAD(&cgrp->files); | |
INIT_LIST_HEAD(&cgrp->css_sets); | |
INIT_LIST_HEAD(&cgrp->release_list); | |
INIT_LIST_HEAD(&cgrp->pidlists); | |
mutex_init(&cgrp->pidlist_mutex); | |
INIT_LIST_HEAD(&cgrp->event_list); | |
spin_lock_init(&cgrp->event_list_lock); | |
+ simple_xattrs_init(&cgrp->xattrs); | |
} | |
static void init_cgroup_root(struct cgroupfs_root *root) | |
{ | |
struct cgroup *cgrp = &root->top_cgroup; | |
+ | |
INIT_LIST_HEAD(&root->subsys_list); | |
INIT_LIST_HEAD(&root->root_list); | |
+ INIT_LIST_HEAD(&root->allcg_list); | |
root->number_of_cgroups = 1; | |
cgrp->root = root; | |
cgrp->top_cgroup = cgrp; | |
+ list_add_tail(&cgrp->allcg_node, &root->allcg_list); | |
init_cgroup_housekeeping(cgrp); | |
} | |
@@ -1615,7 +1716,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, | |
BUG_ON(root->number_of_cgroups != 1); | |
cred = override_creds(&init_cred); | |
- cgroup_populate_dir(root_cgrp); | |
+ cgroup_populate_dir(root_cgrp, true, root->subsys_bits); | |
revert_creds(cred); | |
mutex_unlock(&cgroup_root_mutex); | |
mutex_unlock(&cgroup_mutex); | |
@@ -1691,6 +1792,8 @@ static void cgroup_kill_sb(struct super_block *sb) { | |
mutex_unlock(&cgroup_root_mutex); | |
mutex_unlock(&cgroup_mutex); | |
+ simple_xattrs_free(&cgrp->xattrs); | |
+ | |
kill_litter_super(sb); | |
cgroup_drop_root(root); | |
} | |
@@ -1703,16 +1806,6 @@ static struct file_system_type cgroup_fs_type = { | |
static struct kobject *cgroup_kobj; | |
-static inline struct cgroup *__d_cgrp(struct dentry *dentry) | |
-{ | |
- return dentry->d_fsdata; | |
-} | |
- | |
-static inline struct cftype *__d_cft(struct dentry *dentry) | |
-{ | |
- return dentry->d_fsdata; | |
-} | |
- | |
/** | |
* cgroup_path - generate the path of a cgroup | |
* @cgrp: the cgroup in question | |
@@ -2216,6 +2309,18 @@ retry_find_task: | |
if (threadgroup) | |
tsk = tsk->group_leader; | |
+ | |
+ /* | |
+ * Workqueue threads may acquire PF_THREAD_BOUND and become | |
+ * trapped in a cpuset, or RT worker may be born in a cgroup | |
+ * with no rt_runtime allocated. Just say no. | |
+ */ | |
+ if (tsk == kthreadd_task || (tsk->flags & PF_THREAD_BOUND)) { | |
+ ret = -EINVAL; | |
+ rcu_read_unlock(); | |
+ goto out_unlock_cgroup; | |
+ } | |
+ | |
get_task_struct(tsk); | |
rcu_read_unlock(); | |
@@ -2528,6 +2633,64 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry, | |
return simple_rename(old_dir, old_dentry, new_dir, new_dentry); | |
} | |
+static struct simple_xattrs *__d_xattrs(struct dentry *dentry) | |
+{ | |
+ if (S_ISDIR(dentry->d_inode->i_mode)) | |
+ return &__d_cgrp(dentry)->xattrs; | |
+ else | |
+ return &__d_cft(dentry)->xattrs; | |
+} | |
+ | |
+static inline int xattr_enabled(struct dentry *dentry) | |
+{ | |
+ struct cgroupfs_root *root = dentry->d_sb->s_fs_info; | |
+ return test_bit(ROOT_XATTR, &root->flags); | |
+} | |
+ | |
+static bool is_valid_xattr(const char *name) | |
+{ | |
+ if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) || | |
+ !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN)) | |
+ return true; | |
+ return false; | |
+} | |
+ | |
+static int cgroup_setxattr(struct dentry *dentry, const char *name, | |
+ const void *val, size_t size, int flags) | |
+{ | |
+ if (!xattr_enabled(dentry)) | |
+ return -EOPNOTSUPP; | |
+ if (!is_valid_xattr(name)) | |
+ return -EINVAL; | |
+ return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags); | |
+} | |
+ | |
+static int cgroup_removexattr(struct dentry *dentry, const char *name) | |
+{ | |
+ if (!xattr_enabled(dentry)) | |
+ return -EOPNOTSUPP; | |
+ if (!is_valid_xattr(name)) | |
+ return -EINVAL; | |
+ return simple_xattr_remove(__d_xattrs(dentry), name); | |
+} | |
+ | |
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name, | |
+ void *buf, size_t size) | |
+{ | |
+ if (!xattr_enabled(dentry)) | |
+ return -EOPNOTSUPP; | |
+ if (!is_valid_xattr(name)) | |
+ return -EINVAL; | |
+ return simple_xattr_get(__d_xattrs(dentry), name, buf, size); | |
+} | |
+ | |
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size) | |
+{ | |
+ if (!xattr_enabled(dentry)) | |
+ return -EOPNOTSUPP; | |
+ return simple_xattr_list(__d_xattrs(dentry), buf, size); | |
+} | |
+ | |
static const struct file_operations cgroup_file_operations = { | |
.read = cgroup_file_read, | |
.write = cgroup_file_write, | |
@@ -2536,11 +2699,22 @@ static const struct file_operations cgroup_file_operations = { | |
.release = cgroup_file_release, | |
}; | |
+static const struct inode_operations cgroup_file_inode_operations = { | |
+ .setxattr = cgroup_setxattr, | |
+ .getxattr = cgroup_getxattr, | |
+ .listxattr = cgroup_listxattr, | |
+ .removexattr = cgroup_removexattr, | |
+}; | |
+ | |
static const struct inode_operations cgroup_dir_inode_operations = { | |
.lookup = cgroup_lookup, | |
.mkdir = cgroup_mkdir, | |
.rmdir = cgroup_rmdir, | |
.rename = cgroup_rename, | |
+ .setxattr = cgroup_setxattr, | |
+ .getxattr = cgroup_getxattr, | |
+ .listxattr = cgroup_listxattr, | |
+ .removexattr = cgroup_removexattr, | |
}; | |
static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, struct nameidata *nd) | |
@@ -2588,6 +2762,7 @@ static int cgroup_create_file(struct dentry *dentry, umode_t mode, | |
} else if (S_ISREG(mode)) { | |
inode->i_size = 0; | |
inode->i_fop = &cgroup_file_operations; | |
+ inode->i_op = &cgroup_file_inode_operations; | |
} | |
d_instantiate(dentry, inode); | |
dget(dentry); /* Extra count - pin the dentry in core */ | |
@@ -2645,50 +2820,193 @@ static umode_t cgroup_file_mode(const struct cftype *cft) | |
return mode; | |
} | |
-int cgroup_add_file(struct cgroup *cgrp, | |
- struct cgroup_subsys *subsys, | |
- const struct cftype *cft) | |
+static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys, | |
+ struct cftype *cft) | |
{ | |
struct dentry *dir = cgrp->dentry; | |
+ struct cgroup *parent = __d_cgrp(dir); | |
struct dentry *dentry; | |
+ struct cfent *cfe; | |
int error; | |
umode_t mode; | |
- | |
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 }; | |
+ | |
+ simple_xattrs_init(&cft->xattrs); | |
+ | |
+ /* does @cft->flags tell us to skip creation on @cgrp? */ | |
+ if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent) | |
+ return 0; | |
+ if ((cft->flags & CFTYPE_ONLY_ON_ROOT) && cgrp->parent) | |
+ return 0; | |
+ | |
if (subsys && !test_bit(ROOT_NOPREFIX, &cgrp->root->flags)) { | |
strcpy(name, subsys->name); | |
strcat(name, "."); | |
} | |
strcat(name, cft->name); | |
+ | |
BUG_ON(!mutex_is_locked(&dir->d_inode->i_mutex)); | |
+ | |
+ cfe = kzalloc(sizeof(*cfe), GFP_KERNEL); | |
+ if (!cfe) | |
+ return -ENOMEM; | |
+ | |
dentry = lookup_one_len(name, dir, strlen(name)); | |
- if (!IS_ERR(dentry)) { | |
- mode = cgroup_file_mode(cft); | |
- error = cgroup_create_file(dentry, mode | S_IFREG, | |
- cgrp->root->sb); | |
- if (!error) | |
- dentry->d_fsdata = (void *)cft; | |
- dput(dentry); | |
- } else | |
+ if (IS_ERR(dentry)) { | |
error = PTR_ERR(dentry); | |
+ goto out; | |
+ } | |
+ | |
+ mode = cgroup_file_mode(cft); | |
+ error = cgroup_create_file(dentry, mode | S_IFREG, cgrp->root->sb); | |
+ if (!error) { | |
+ cfe->type = (void *)cft; | |
+ cfe->dentry = dentry; | |
+ dentry->d_fsdata = cfe; | |
+ list_add_tail(&cfe->node, &parent->files); | |
+ cfe = NULL; | |
+ } | |
+ dput(dentry); | |
+out: | |
+ kfree(cfe); | |
return error; | |
} | |
-EXPORT_SYMBOL_GPL(cgroup_add_file); | |
-int cgroup_add_files(struct cgroup *cgrp, | |
- struct cgroup_subsys *subsys, | |
- const struct cftype cft[], | |
- int count) | |
+static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys, | |
+ struct cftype cfts[], bool is_add) | |
{ | |
- int i, err; | |
- for (i = 0; i < count; i++) { | |
- err = cgroup_add_file(cgrp, subsys, &cft[i]); | |
- if (err) | |
- return err; | |
+ struct cftype *cft; | |
+ int err, ret = 0; | |
+ | |
+ for (cft = cfts; cft->name[0] != '\0'; cft++) { | |
+ if (is_add) | |
+ err = cgroup_add_file(cgrp, subsys, cft); | |
+ else | |
+ err = cgroup_rm_file(cgrp, cft); | |
+ if (err) { | |
+ pr_warning("cgroup_addrm_files: failed to %s %s, err=%d\n", | |
+ is_add ? "add" : "remove", cft->name, err); | |
+ ret = err; | |
+ } | |
+ } | |
+ return ret; | |
+} | |
+ | |
+static DEFINE_MUTEX(cgroup_cft_mutex); | |
+ | |
+static void cgroup_cfts_prepare(void) | |
+ __acquires(&cgroup_cft_mutex) __acquires(&cgroup_mutex) | |
+{ | |
+ /* | |
+ * Thanks to the entanglement with vfs inode locking, we can't walk | |
+ * the existing cgroups under cgroup_mutex and create files. | |
+ * Instead, we increment reference on all cgroups and build list of | |
+ * them using @cgrp->cft_q_node. Grab cgroup_cft_mutex to ensure | |
+ * exclusive access to the field. | |
+ */ | |
+ mutex_lock(&cgroup_cft_mutex); | |
+ mutex_lock(&cgroup_mutex); | |
+} | |
+ | |
+static void cgroup_cfts_commit(struct cgroup_subsys *ss, | |
+ struct cftype *cfts, bool is_add) | |
+ __releases(&cgroup_mutex) __releases(&cgroup_cft_mutex) | |
+{ | |
+ LIST_HEAD(pending); | |
+ struct cgroup *cgrp, *n; | |
+ | |
+ /* %NULL @cfts indicates abort and don't bother if @ss isn't attached */ | |
+ if (cfts && ss->root != &rootnode) { | |
+ list_for_each_entry(cgrp, &ss->root->allcg_list, allcg_node) { | |
+ dget(cgrp->dentry); | |
+ list_add_tail(&cgrp->cft_q_node, &pending); | |
+ } | |
+ } | |
+ | |
+ mutex_unlock(&cgroup_mutex); | |
+ | |
+ /* | |
+ * All new cgroups will see @cfts update on @ss->cftsets. Add/rm | |
+ * files for all cgroups which were created before. | |
+ */ | |
+ list_for_each_entry_safe(cgrp, n, &pending, cft_q_node) { | |
+ struct inode *inode = cgrp->dentry->d_inode; | |
+ | |
+ mutex_lock(&inode->i_mutex); | |
+ mutex_lock(&cgroup_mutex); | |
+ if (!cgroup_is_removed(cgrp)) | |
+ cgroup_addrm_files(cgrp, ss, cfts, is_add); | |
+ mutex_unlock(&cgroup_mutex); | |
+ mutex_unlock(&inode->i_mutex); | |
+ | |
+ list_del_init(&cgrp->cft_q_node); | |
+ dput(cgrp->dentry); | |
} | |
+ | |
+ mutex_unlock(&cgroup_cft_mutex); | |
+} | |
+ | |
+/** | |
+ * cgroup_add_cftypes - add an array of cftypes to a subsystem | |
+ * @ss: target cgroup subsystem | |
+ * @cfts: zero-length name terminated array of cftypes | |
+ * | |
+ * Register @cfts to @ss. Files described by @cfts are created for all | |
+ * existing cgroups to which @ss is attached and all future cgroups will | |
+ * have them too. This function can be called anytime whether @ss is | |
+ * attached or not. | |
+ * | |
+ * Returns 0 on successful registration, -errno on failure. Note that this | |
+ * function currently returns 0 as long as @cfts registration is successful | |
+ * even if some file creation attempts on existing cgroups fail. | |
+ */ | |
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts) | |
+{ | |
+ struct cftype_set *set; | |
+ | |
+ set = kzalloc(sizeof(*set), GFP_KERNEL); | |
+ if (!set) | |
+ return -ENOMEM; | |
+ | |
+ cgroup_cfts_prepare(); | |
+ set->cfts = cfts; | |
+ list_add_tail(&set->node, &ss->cftsets); | |
+ cgroup_cfts_commit(ss, cfts, true); | |
+ | |
return 0; | |
} | |
-EXPORT_SYMBOL_GPL(cgroup_add_files); | |
+EXPORT_SYMBOL_GPL(cgroup_add_cftypes); | |
+ | |
+/** | |
+ * cgroup_rm_cftypes - remove an array of cftypes from a subsystem | |
+ * @ss: target cgroup subsystem | |
+ * @cfts: zero-length name terminated array of cftypes | |
+ * | |
+ * Unregister @cfts from @ss. Files described by @cfts are removed from | |
+ * all existing cgroups to which @ss is attached and all future cgroups | |
+ * won't have them either. This function can be called anytime whether @ss | |
+ * is attached or not. | |
+ * | |
+ * Returns 0 on successful unregistration, -ENOENT if @cfts is not | |
+ * registered with @ss. | |
+ */ | |
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts) | |
+{ | |
+ struct cftype_set *set; | |
+ | |
+ cgroup_cfts_prepare(); | |
+ | |
+ list_for_each_entry(set, &ss->cftsets, node) { | |
+ if (set->cfts == cfts) { | |
+ list_del_init(&set->node); | |
+ cgroup_cfts_commit(ss, cfts, false); | |
+ return 0; | |
+ } | |
+ } | |
+ | |
+ cgroup_cfts_commit(ss, NULL, false); | |
+ return -ENOENT; | |
+} | |
/** | |
* cgroup_task_count - count the number of tasks in a cgroup. | |
@@ -3678,36 +3996,44 @@ static struct cftype files[] = { | |
.read_u64 = cgroup_clone_children_read, | |
.write_u64 = cgroup_clone_children_write, | |
}, | |
+ { | |
+ .name = "release_agent", | |
+ .flags = CFTYPE_ONLY_ON_ROOT, | |
+ .read_seq_string = cgroup_release_agent_show, | |
+ .write_string = cgroup_release_agent_write, | |
+ .max_write_len = PATH_MAX, | |
+ }, | |
+ { } /* terminate */ | |
}; | |
-static struct cftype cft_release_agent = { | |
- .name = "release_agent", | |
- .read_seq_string = cgroup_release_agent_show, | |
- .write_string = cgroup_release_agent_write, | |
- .max_write_len = PATH_MAX, | |
-}; | |
- | |
-static int cgroup_populate_dir(struct cgroup *cgrp) | |
+/** | |
+ * cgroup_populate_dir - selectively creation of files in a directory | |
+ * @cgrp: target cgroup | |
+ * @base_files: true if the base files should be added | |
+ * @subsys_mask: mask of the subsystem ids whose files should be added | |
+ */ | |
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files, | |
+ unsigned long subsys_mask) | |
{ | |
int err; | |
struct cgroup_subsys *ss; | |
- /* First clear out any existing files */ | |
- cgroup_clear_directory(cgrp->dentry); | |
- | |
- err = cgroup_add_files(cgrp, NULL, files, ARRAY_SIZE(files)); | |
- if (err < 0) | |
- return err; | |
- | |
- if (cgrp == cgrp->top_cgroup) { | |
- if ((err = cgroup_add_file(cgrp, NULL, &cft_release_agent)) < 0) | |
+ if (base_files) { | |
+ err = cgroup_addrm_files(cgrp, NULL, files, true); | |
+ if (err < 0) | |
return err; | |
} | |
+ /* process cftsets of each subsystem */ | |
for_each_subsys(cgrp->root, ss) { | |
- if (ss->populate && (err = ss->populate(ss, cgrp)) < 0) | |
- return err; | |
+ struct cftype_set *set; | |
+ if (!test_bit(ss->subsys_id, &subsys_mask)) | |
+ continue; | |
+ | |
+ list_for_each_entry(set, &ss->cftsets, node) | |
+ cgroup_addrm_files(cgrp, ss, set->cfts, true); | |
} | |
+ | |
/* This cgroup is ready now */ | |
for_each_subsys(cgrp->root, ss) { | |
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id]; | |
@@ -3723,6 +4049,18 @@ static int cgroup_populate_dir(struct cgroup *cgrp) | |
return 0; | |
} | |
+static void css_dput_fn(struct work_struct *work) | |
+{ | |
+ struct cgroup_subsys_state *css = | |
+ container_of(work, struct cgroup_subsys_state, dput_work); | |
+ struct dentry *dentry = css->cgroup->dentry; | |
+ struct super_block *sb = dentry->d_sb; | |
+ | |
+ atomic_inc(&sb->s_active); | |
+ dput(dentry); | |
+ deactivate_super(sb); | |
+} | |
+ | |
static void init_cgroup_css(struct cgroup_subsys_state *css, | |
struct cgroup_subsys *ss, | |
struct cgroup *cgrp) | |
@@ -3735,37 +4073,16 @@ static void init_cgroup_css(struct cgroup_subsys_state *css, | |
set_bit(CSS_ROOT, &css->flags); | |
BUG_ON(cgrp->subsys[ss->subsys_id]); | |
cgrp->subsys[ss->subsys_id] = css; | |
-} | |
- | |
-static void cgroup_lock_hierarchy(struct cgroupfs_root *root) | |
-{ | |
- /* We need to take each hierarchy_mutex in a consistent order */ | |
- int i; | |
/* | |
- * No worry about a race with rebind_subsystems that might mess up the | |
- * locking order, since both parties are under cgroup_mutex. | |
+ * If !clear_css_refs, css holds an extra ref to @cgrp->dentry | |
+ * which is put on the last css_put(). dput() requires process | |
+ * context, which css_put() may be called without. @css->dput_work | |
+ * will be used to invoke dput() asynchronously from css_put(). | |
*/ | |
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { | |
- struct cgroup_subsys *ss = subsys[i]; | |
- if (ss == NULL) | |
- continue; | |
- if (ss->root == root) | |
- mutex_lock(&ss->hierarchy_mutex); | |
- } | |
-} | |
- | |
-static void cgroup_unlock_hierarchy(struct cgroupfs_root *root) | |
-{ | |
- int i; | |
- | |
- for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { | |
- struct cgroup_subsys *ss = subsys[i]; | |
- if (ss == NULL) | |
- continue; | |
- if (ss->root == root) | |
- mutex_unlock(&ss->hierarchy_mutex); | |
- } | |
+ INIT_WORK(&css->dput_work, css_dput_fn); | |
+ if (ss->__DEPRECATED_clear_css_refs) | |
+ set_bit(CSS_CLEAR_CSS_REFS, &css->flags); | |
} | |
/* | |
@@ -3828,21 +4145,24 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, | |
ss->post_clone(cgrp); | |
} | |
- cgroup_lock_hierarchy(root); | |
list_add(&cgrp->sibling, &cgrp->parent->children); | |
- cgroup_unlock_hierarchy(root); | |
root->number_of_cgroups++; | |
err = cgroup_create_dir(cgrp, dentry, mode); | |
if (err < 0) | |
goto err_remove; | |
- set_bit(CGRP_RELEASABLE, &parent->flags); | |
+ /* If !clear_css_refs, each css holds a ref to the cgroup's dentry */ | |
+ for_each_subsys(root, ss) | |
+ if (!ss->__DEPRECATED_clear_css_refs) | |
+ dget(dentry); | |
/* The cgroup directory was pre-locked for us */ | |
BUG_ON(!mutex_is_locked(&cgrp->dentry->d_inode->i_mutex)); | |
- err = cgroup_populate_dir(cgrp); | |
+ list_add_tail(&cgrp->allcg_node, &root->allcg_list); | |
+ | |
+ err = cgroup_populate_dir(cgrp, true, root->subsys_bits); | |
/* If err < 0, we have a half-filled directory - oh well ;) */ | |
mutex_unlock(&cgroup_mutex); | |
@@ -3852,9 +4172,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry, | |
err_remove: | |
- cgroup_lock_hierarchy(root); | |
list_del(&cgrp->sibling); | |
- cgroup_unlock_hierarchy(root); | |
root->number_of_cgroups--; | |
err_destroy: | |
@@ -3881,18 +4199,19 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) | |
return cgroup_create(c_parent, dentry, mode | S_IFDIR); | |
} | |
+/* | |
+ * Check the reference count on each subsystem. Since we already | |
+ * established that there are no tasks in the cgroup, if the css refcount | |
+ * is also 1, then there should be no outstanding references, so the | |
+ * subsystem is safe to destroy. We scan across all subsystems rather than | |
+ * using the per-hierarchy linked list of mounted subsystems since we can | |
+ * be called via check_for_release() with no synchronization other than | |
+ * RCU, and the subsystem linked list isn't RCU-safe. | |
+ */ | |
static int cgroup_has_css_refs(struct cgroup *cgrp) | |
{ | |
- /* Check the reference count on each subsystem. Since we | |
- * already established that there are no tasks in the | |
- * cgroup, if the css refcount is also 1, then there should | |
- * be no outstanding references, so the subsystem is safe to | |
- * destroy. We scan across all subsystems rather than using | |
- * the per-hierarchy linked list of mounted subsystems since | |
- * we can be called via check_for_release() with no | |
- * synchronization other than RCU, and the subsystem linked | |
- * list isn't RCU-safe */ | |
int i; | |
+ | |
/* | |
* We won't need to lock the subsys array, because the subsystems | |
* we're concerned about aren't going anywhere since our cgroup root | |
@@ -3901,17 +4220,21 @@ static int cgroup_has_css_refs(struct cgroup *cgrp) | |
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { | |
struct cgroup_subsys *ss = subsys[i]; | |
struct cgroup_subsys_state *css; | |
+ | |
/* Skip subsystems not present or not in this hierarchy */ | |
if (ss == NULL || ss->root != cgrp->root) | |
continue; | |
+ | |
css = cgrp->subsys[ss->subsys_id]; | |
- /* When called from check_for_release() it's possible | |
+ /* | |
+ * When called from check_for_release() it's possible | |
* that by this point the cgroup has been removed | |
* and the css deleted. But a false-positive doesn't | |
* matter, since it can only happen if the cgroup | |
* has been deleted and hence no longer needs the | |
- * release agent to be called anyway. */ | |
- if (css && (atomic_read(&css->refcnt) > 1)) | |
+ * release agent to be called anyway. | |
+ */ | |
+ if (css && css_refcnt(css) > 1) | |
return 1; | |
} | |
return 0; | |
@@ -3921,51 +4244,63 @@ static int cgroup_has_css_refs(struct cgroup *cgrp) | |
* Atomically mark all (or else none) of the cgroup's CSS objects as | |
* CSS_REMOVED. Return true on success, or false if the cgroup has | |
* busy subsystems. Call with cgroup_mutex held | |
+ * | |
+ * Depending on whether a subsys has __DEPRECATED_clear_css_refs set or | |
+ * not, cgroup removal behaves differently. | |
+ * | |
+ * If clear is set, css refcnt for the subsystem should be zero before | |
+ * cgroup removal can be committed. This is implemented by | |
+ * CGRP_WAIT_ON_RMDIR and retry logic around ->pre_destroy(), which may be | |
+ * called multiple times until all css refcnts reach zero and is allowed to | |
+ * veto removal on any invocation. This behavior is deprecated and will be | |
+ * removed as soon as the existing user (memcg) is updated. | |
+ * | |
+ * If clear is not set, each css holds an extra reference to the cgroup's | |
+ * dentry and cgroup removal proceeds regardless of css refs. | |
+ * ->pre_destroy() will be called at least once and is not allowed to fail. | |
+ * On the last put of each css, whenever that may be, the extra dentry ref | |
+ * is put so that dentry destruction happens only after all css's are | |
+ * released. | |
*/ | |
- | |
static int cgroup_clear_css_refs(struct cgroup *cgrp) | |
{ | |
struct cgroup_subsys *ss; | |
unsigned long flags; | |
bool failed = false; | |
+ | |
local_irq_save(flags); | |
+ | |
+ /* | |
+ * Block new css_tryget() by deactivating refcnt. If all refcnts | |
+ * for subsystems w/ clear_css_refs set were 1 at the moment of | |
+ * deactivation, we succeeded. | |
+ */ | |
for_each_subsys(cgrp->root, ss) { | |
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id]; | |
- int refcnt; | |
- while (1) { | |
- /* We can only remove a CSS with a refcnt==1 */ | |
- refcnt = atomic_read(&css->refcnt); | |
- if (refcnt > 1) { | |
- failed = true; | |
- goto done; | |
- } | |
- BUG_ON(!refcnt); | |
- /* | |
- * Drop the refcnt to 0 while we check other | |
- * subsystems. This will cause any racing | |
- * css_tryget() to spin until we set the | |
- * CSS_REMOVED bits or abort | |
- */ | |
- if (atomic_cmpxchg(&css->refcnt, refcnt, 0) == refcnt) | |
- break; | |
- cpu_relax(); | |
- } | |
+ | |
+ WARN_ON(atomic_read(&css->refcnt) < 0); | |
+ atomic_add(CSS_DEACT_BIAS, &css->refcnt); | |
+ | |
+ if (ss->__DEPRECATED_clear_css_refs) | |
+ failed |= css_refcnt(css) != 1; | |
} | |
- done: | |
+ | |
+ /* | |
+ * If succeeded, set REMOVED and put all the base refs; otherwise, | |
+ * restore refcnts to positive values. Either way, all in-progress | |
+ * css_tryget() will be released. | |
+ */ | |
for_each_subsys(cgrp->root, ss) { | |
struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id]; | |
- if (failed) { | |
- /* | |
- * Restore old refcnt if we previously managed | |
- * to clear it from 1 to 0 | |
- */ | |
- if (!atomic_read(&css->refcnt)) | |
- atomic_set(&css->refcnt, 1); | |
- } else { | |
- /* Commit the fact that the CSS is removed */ | |
+ | |
+ if (!failed) { | |
set_bit(CSS_REMOVED, &css->flags); | |
+ css_put(css); | |
+ } else { | |
+ atomic_sub(CSS_DEACT_BIAS, &css->refcnt); | |
} | |
} | |
+ | |
local_irq_restore(flags); | |
return !failed; | |
} | |
@@ -4069,10 +4404,10 @@ again: | |
list_del_init(&cgrp->release_list); | |
raw_spin_unlock(&release_list_lock); | |
- cgroup_lock_hierarchy(cgrp->root); | |
/* delete this cgroup from parent->children */ | |
list_del_init(&cgrp->sibling); | |
- cgroup_unlock_hierarchy(cgrp->root); | |
+ | |
+ list_del_init(&cgrp->allcg_node); | |
d = dget(cgrp->dentry); | |
@@ -4099,12 +4434,29 @@ again: | |
return 0; | |
} | |
+static void __init_or_module cgroup_init_cftsets(struct cgroup_subsys *ss) | |
+{ | |
+ INIT_LIST_HEAD(&ss->cftsets); | |
+ | |
+ /* | |
+ * base_cftset is embedded in subsys itself, no need to worry about | |
+ * deregistration. | |
+ */ | |
+ if (ss->base_cftypes) { | |
+ ss->base_cftset.cfts = ss->base_cftypes; | |
+ list_add_tail(&ss->base_cftset.node, &ss->cftsets); | |
+ } | |
+} | |
+ | |
static void __init cgroup_init_subsys(struct cgroup_subsys *ss) | |
{ | |
struct cgroup_subsys_state *css; | |
printk(KERN_INFO "Initializing cgroup subsys %s\n", ss->name); | |
+ /* init base cftset */ | |
+ cgroup_init_cftsets(ss); | |
+ | |
/* Create the top cgroup state for this subsystem */ | |
list_add(&ss->sibling, &rootnode.subsys_list); | |
ss->root = &rootnode; | |
@@ -4126,8 +4478,6 @@ static void __init cgroup_init_subsys(struct cgroup_subsys *ss) | |
* need to invoke fork callbacks here. */ | |
BUG_ON(!list_empty(&init_task.tasks)); | |
- mutex_init(&ss->hierarchy_mutex); | |
- lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key); | |
ss->active = 1; | |
/* this function shouldn't be used with modular subsystems, since they | |
@@ -4174,6 +4524,9 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) | |
return 0; | |
} | |
+ /* init base cftset */ | |
+ cgroup_init_cftsets(ss); | |
+ | |
/* | |
* need to register a subsys id before anything else - for example, | |
* init_cgroup_css needs it. | |
@@ -4251,8 +4604,6 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss) | |
} | |
write_unlock(&css_set_lock); | |
- mutex_init(&ss->hierarchy_mutex); | |
- lockdep_set_class(&ss->hierarchy_mutex, &ss->subsys_key); | |
ss->active = 1; | |
/* success! */ | |
@@ -4735,26 +5086,43 @@ static void check_for_release(struct cgroup *cgrp) | |
} | |
/* Caller must verify that the css is not for root cgroup */ | |
-void __css_get(struct cgroup_subsys_state *css, int count) | |
+bool __css_tryget(struct cgroup_subsys_state *css) | |
{ | |
- atomic_add(count, &css->refcnt); | |
- set_bit(CGRP_RELEASABLE, &css->cgroup->flags); | |
+ do { | |
+ int v = css_refcnt(css); | |
+ | |
+ if (atomic_cmpxchg(&css->refcnt, v, v + 1) == v) | |
+ return true; | |
+ cpu_relax(); | |
+ } while (!test_bit(CSS_REMOVED, &css->flags)); | |
+ | |
+ return false; | |
} | |
-EXPORT_SYMBOL_GPL(__css_get); | |
+EXPORT_SYMBOL_GPL(__css_tryget); | |
/* Caller must verify that the css is not for root cgroup */ | |
-void __css_put(struct cgroup_subsys_state *css, int count) | |
+void __css_put(struct cgroup_subsys_state *css) | |
{ | |
struct cgroup *cgrp = css->cgroup; | |
- int val; | |
+ int v; | |
+ | |
rcu_read_lock(); | |
- val = atomic_sub_return(count, &css->refcnt); | |
- if (val == 1) { | |
- check_for_release(cgrp); | |
+ v = css_unbias_refcnt(atomic_dec_return(&css->refcnt)); | |
+ | |
+ switch (v) { | |
+ case 1: | |
+ if (notify_on_release(cgrp)) { | |
+ set_bit(CGRP_RELEASABLE, &cgrp->flags); | |
+ check_for_release(cgrp); | |
+ } | |
cgroup_wakeup_rmdir_waiter(cgrp); | |
+ break; | |
+ case 0: | |
+ if (!test_bit(CSS_CLEAR_CSS_REFS, &css->flags)) | |
+ schedule_work(&css->dput_work); | |
+ break; | |
} | |
rcu_read_unlock(); | |
- WARN_ON_ONCE(val < 1); | |
} | |
EXPORT_SYMBOL_GPL(__css_put); | |
@@ -4873,7 +5241,7 @@ unsigned short css_id(struct cgroup_subsys_state *css) | |
* on this or this is under rcu_read_lock(). Once css->id is allocated, | |
* it's unchanged until freed. | |
*/ | |
- cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt)); | |
+ cssid = rcu_dereference_check(css->id, css_refcnt(css)); | |
if (cssid) | |
return cssid->id; | |
@@ -4885,7 +5253,7 @@ unsigned short css_depth(struct cgroup_subsys_state *css) | |
{ | |
struct css_id *cssid; | |
- cssid = rcu_dereference_check(css->id, atomic_read(&css->refcnt)); | |
+ cssid = rcu_dereference_check(css->id, css_refcnt(css)); | |
if (cssid) | |
return cssid->depth; | |
@@ -4899,7 +5267,7 @@ EXPORT_SYMBOL_GPL(css_depth); | |
* @root: the css supporsed to be an ancestor of the child. | |
* | |
* Returns true if "root" is an ancestor of "child" in its hierarchy. Because | |
- * this function reads css->id, this use rcu_dereference() and rcu_read_lock(). | |
+ * this function reads css->id, the caller must hold rcu_read_lock(). | |
* But, considering usual usage, the csses should be valid objects after test. | |
* Assuming that the caller will do some action to the child if this returns | |
* returns true, the caller must take "child";s reference count. | |
@@ -4911,18 +5279,18 @@ bool css_is_ancestor(struct cgroup_subsys_state *child, | |
{ | |
struct css_id *child_id; | |
struct css_id *root_id; | |
- bool ret = true; | |
- rcu_read_lock(); | |
child_id = rcu_dereference(child->id); | |
+ if (!child_id) | |
+ return false; | |
root_id = rcu_dereference(root->id); | |
- if (!child_id | |
- || !root_id | |
- || (child_id->depth < root_id->depth) | |
- || (child_id->stack[root_id->depth] != root_id->id)) | |
- ret = false; | |
- rcu_read_unlock(); | |
- return ret; | |
+ if (!root_id) | |
+ return false; | |
+ if (child_id->depth < root_id->depth) | |
+ return false; | |
+ if (child_id->stack[root_id->depth] != root_id->id) | |
+ return false; | |
+ return true; | |
} | |
void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css) | |
@@ -5266,19 +5634,15 @@ static struct cftype debug_files[] = { | |
.name = "releasable", | |
.read_u64 = releasable_read, | |
}, | |
-}; | |
-static int debug_populate(struct cgroup_subsys *ss, struct cgroup *cont) | |
-{ | |
- return cgroup_add_files(cont, ss, debug_files, | |
- ARRAY_SIZE(debug_files)); | |
-} | |
+ { } /* terminate */ | |
+}; | |
struct cgroup_subsys debug_subsys = { | |
.name = "debug", | |
.create = debug_create, | |
.destroy = debug_destroy, | |
- .populate = debug_populate, | |
.subsys_id = debug_subsys_id, | |
+ .base_cftypes = debug_files, | |
}; | |
#endif /* CONFIG_CGROUP_DEBUG */ | |
diff --git a/kernel/cgroup_freezer.c b/kernel/cgroup_freezer.c | |
index a902e2a..f990569 100644 | |
--- a/kernel/cgroup_freezer.c | |
+++ b/kernel/cgroup_freezer.c | |
@@ -353,24 +353,19 @@ static int freezer_write(struct cgroup *cgroup, | |
static struct cftype files[] = { | |
{ | |
.name = "state", | |
+ .flags = CFTYPE_NOT_ON_ROOT, | |
.read_seq_string = freezer_read, | |
.write_string = freezer_write, | |
}, | |
+ { } /* terminate */ | |
}; | |
-static int freezer_populate(struct cgroup_subsys *ss, struct cgroup *cgroup) | |
-{ | |
- if (!cgroup->parent) | |
- return 0; | |
- return cgroup_add_files(cgroup, ss, files, ARRAY_SIZE(files)); | |
-} | |
- | |
struct cgroup_subsys freezer_subsys = { | |
.name = "freezer", | |
.create = freezer_create, | |
.destroy = freezer_destroy, | |
- .populate = freezer_populate, | |
.subsys_id = freezer_subsys_id, | |
.can_attach = freezer_can_attach, | |
.fork = freezer_fork, | |
+ .base_cftypes = files, | |
}; | |
diff --git a/kernel/cpuset.c b/kernel/cpuset.c | |
index 4b843ac..f1dec4b 100644 | |
--- a/kernel/cpuset.c | |
+++ b/kernel/cpuset.c | |
@@ -1769,28 +1769,17 @@ static struct cftype files[] = { | |
.write_u64 = cpuset_write_u64, | |
.private = FILE_SPREAD_SLAB, | |
}, | |
-}; | |
- | |
-static struct cftype cft_memory_pressure_enabled = { | |
- .name = "memory_pressure_enabled", | |
- .read_u64 = cpuset_read_u64, | |
- .write_u64 = cpuset_write_u64, | |
- .private = FILE_MEMORY_PRESSURE_ENABLED, | |
-}; | |
-static int cpuset_populate(struct cgroup_subsys *ss, struct cgroup *cont) | |
-{ | |
- int err; | |
+ { | |
+ .name = "memory_pressure_enabled", | |
+ .flags = CFTYPE_ONLY_ON_ROOT, | |
+ .read_u64 = cpuset_read_u64, | |
+ .write_u64 = cpuset_write_u64, | |
+ .private = FILE_MEMORY_PRESSURE_ENABLED, | |
+ }, | |
- err = cgroup_add_files(cont, ss, files, ARRAY_SIZE(files)); | |
- if (err) | |
- return err; | |
- /* memory_pressure_enabled is in root cpuset only */ | |
- if (!cont->parent) | |
- err = cgroup_add_file(cont, ss, | |
- &cft_memory_pressure_enabled); | |
- return err; | |
-} | |
+ { } /* terminate */ | |
+}; | |
/* | |
* post_clone() is called during cgroup_create() when the | |
@@ -1891,9 +1880,9 @@ struct cgroup_subsys cpuset_subsys = { | |
.destroy = cpuset_destroy, | |
.can_attach = cpuset_can_attach, | |
.attach = cpuset_attach, | |
- .populate = cpuset_populate, | |
.post_clone = cpuset_post_clone, | |
.subsys_id = cpuset_subsys_id, | |
+ .base_cftypes = files, | |
.early_init = 1, | |
}; | |
diff --git a/kernel/sched/core.c b/kernel/sched/core.c | |
index a2ba28e..dac1d43 100644 | |
--- a/kernel/sched/core.c | |
+++ b/kernel/sched/core.c | |
@@ -8249,13 +8249,9 @@ static struct cftype cpu_files[] = { | |
.write_u64 = cpu_rt_period_write_uint, | |
}, | |
#endif | |
+ { } /* terminate */ | |
}; | |
-static int cpu_cgroup_populate(struct cgroup_subsys *ss, struct cgroup *cont) | |
-{ | |
- return cgroup_add_files(cont, ss, cpu_files, ARRAY_SIZE(cpu_files)); | |
-} | |
- | |
struct cgroup_subsys cpu_cgroup_subsys = { | |
.name = "cpu", | |
.create = cpu_cgroup_create, | |
@@ -8264,8 +8260,8 @@ struct cgroup_subsys cpu_cgroup_subsys = { | |
.attach = cpu_cgroup_attach, | |
.allow_attach = cpu_cgroup_allow_attach, | |
.exit = cpu_cgroup_exit, | |
- .populate = cpu_cgroup_populate, | |
.subsys_id = cpu_cgroup_subsys_id, | |
+ .base_cftypes = cpu_files, | |
.early_init = 1, | |
}; | |
@@ -8450,13 +8446,9 @@ static struct cftype files[] = { | |
.name = "stat", | |
.read_map = cpuacct_stats_show, | |
}, | |
+ { } /* terminate */ | |
}; | |
-static int cpuacct_populate(struct cgroup_subsys *ss, struct cgroup *cgrp) | |
-{ | |
- return cgroup_add_files(cgrp, ss, files, ARRAY_SIZE(files)); | |
-} | |
- | |
/* | |
* charge this task's execution time to its accounting group. | |
* | |
@@ -8488,7 +8480,7 @@ struct cgroup_subsys cpuacct_subsys = { | |
.name = "cpuacct", | |
.create = cpuacct_create, | |
.destroy = cpuacct_destroy, | |
- .populate = cpuacct_populate, | |
.subsys_id = cpuacct_subsys_id, | |
+ .base_cftypes = files, | |
}; | |
#endif /* CONFIG_CGROUP_CPUACCT */ | |
diff --git a/kernel/sys.c b/kernel/sys.c | |
index b3d4f92..7163847 100644 | |
--- a/kernel/sys.c | |
+++ b/kernel/sys.c | |
@@ -96,10 +96,8 @@ | |
int overflowuid = DEFAULT_OVERFLOWUID; | |
int overflowgid = DEFAULT_OVERFLOWGID; | |
-#ifdef CONFIG_UID16 | |
EXPORT_SYMBOL(overflowuid); | |
EXPORT_SYMBOL(overflowgid); | |
-#endif | |
/* | |
* the same as above, but for filesystems which can only store a 16-bit | |
diff --git a/mm/memcontrol.c b/mm/memcontrol.c | |
index 9db1557..08d6852 100644 | |
--- a/mm/memcontrol.c | |
+++ b/mm/memcontrol.c | |
@@ -1175,12 +1175,16 @@ struct lruvec *mem_cgroup_lru_move_lists(struct zone *zone, | |
static bool mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg, | |
struct mem_cgroup *memcg) | |
{ | |
- if (root_memcg != memcg) { | |
- return (root_memcg->use_hierarchy && | |
- css_is_ancestor(&memcg->css, &root_memcg->css)); | |
- } | |
+ bool ret; | |
- return true; | |
+ if (root_memcg == memcg) | |
+ return true; | |
+ if (!root_memcg->use_hierarchy) | |
+ return false; | |
+ rcu_read_lock(); | |
+ ret = css_is_ancestor(&memcg->css, &root_memcg->css); | |
+ rcu_read_unlock(); | |
+ return ret; | |
} | |
int task_in_mem_cgroup(struct task_struct *task, const struct mem_cgroup *memcg) | |
@@ -3905,14 +3909,21 @@ static inline u64 mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) | |
return val << PAGE_SHIFT; | |
} | |
-static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) | |
+static ssize_t mem_cgroup_read(struct cgroup *cont, struct cftype *cft, | |
+ struct file *file, char __user *buf, | |
+ size_t nbytes, loff_t *ppos) | |
{ | |
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); | |
+ char str[64]; | |
u64 val; | |
- int type, name; | |
+ int type, name, len; | |
type = MEMFILE_TYPE(cft->private); | |
name = MEMFILE_ATTR(cft->private); | |
+ | |
+ if (!do_swap_account && type == _MEMSWAP) | |
+ return -EOPNOTSUPP; | |
+ | |
switch (type) { | |
case _MEM: | |
if (name == RES_USAGE) | |
@@ -3929,7 +3940,9 @@ static u64 mem_cgroup_read(struct cgroup *cont, struct cftype *cft) | |
default: | |
BUG(); | |
} | |
- return val; | |
+ | |
+ len = scnprintf(str, sizeof(str), "%llu\n", (unsigned long long)val); | |
+ return simple_read_from_buffer(buf, nbytes, ppos, str, len); | |
} | |
/* | |
* The user of this function is... | |
@@ -3945,6 +3958,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft, | |
type = MEMFILE_TYPE(cft->private); | |
name = MEMFILE_ATTR(cft->private); | |
+ | |
+ if (!do_swap_account && type == _MEMSWAP) | |
+ return -EOPNOTSUPP; | |
+ | |
switch (name) { | |
case RES_LIMIT: | |
if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */ | |
@@ -4010,12 +4027,15 @@ out: | |
static int mem_cgroup_reset(struct cgroup *cont, unsigned int event) | |
{ | |
- struct mem_cgroup *memcg; | |
+ struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); | |
int type, name; | |
- memcg = mem_cgroup_from_cont(cont); | |
type = MEMFILE_TYPE(event); | |
name = MEMFILE_ATTR(event); | |
+ | |
+ if (!do_swap_account && type == _MEMSWAP) | |
+ return -EOPNOTSUPP; | |
+ | |
switch (name) { | |
case RES_MAX_USAGE: | |
if (type == _MEM) | |
@@ -4662,29 +4682,22 @@ static int mem_control_numa_stat_open(struct inode *unused, struct file *file) | |
#endif /* CONFIG_NUMA */ | |
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM | |
-static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss) | |
+static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss) | |
{ | |
- /* | |
- * Part of this would be better living in a separate allocation | |
- * function, leaving us with just the cgroup tree population work. | |
- * We, however, depend on state such as network's proto_list that | |
- * is only initialized after cgroup creation. I found the less | |
- * cumbersome way to deal with it to defer it all to populate time | |
- */ | |
- return mem_cgroup_sockets_init(cont, ss); | |
+ return mem_cgroup_sockets_init(memcg, ss); | |
}; | |
-static void kmem_cgroup_destroy(struct cgroup *cont) | |
+static void kmem_cgroup_destroy(struct mem_cgroup *memcg) | |
{ | |
- mem_cgroup_sockets_destroy(cont); | |
+ mem_cgroup_sockets_destroy(memcg); | |
} | |
#else | |
-static int register_kmem_files(struct cgroup *cont, struct cgroup_subsys *ss) | |
+static int memcg_init_kmem(struct mem_cgroup *memcg, struct cgroup_subsys *ss) | |
{ | |
return 0; | |
} | |
-static void kmem_cgroup_destroy(struct cgroup *cont) | |
+static void kmem_cgroup_destroy(struct mem_cgroup *memcg) | |
{ | |
} | |
#endif | |
@@ -4693,7 +4706,7 @@ static struct cftype mem_cgroup_files[] = { | |
{ | |
.name = "usage_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEM, RES_USAGE), | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
.register_event = mem_cgroup_usage_register_event, | |
.unregister_event = mem_cgroup_usage_unregister_event, | |
}, | |
@@ -4701,25 +4714,25 @@ static struct cftype mem_cgroup_files[] = { | |
.name = "max_usage_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEM, RES_MAX_USAGE), | |
.trigger = mem_cgroup_reset, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "limit_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEM, RES_LIMIT), | |
.write_string = mem_cgroup_write, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "soft_limit_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEM, RES_SOFT_LIMIT), | |
.write_string = mem_cgroup_write, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "failcnt", | |
.private = MEMFILE_PRIVATE(_MEM, RES_FAILCNT), | |
.trigger = mem_cgroup_reset, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "stat", | |
@@ -4764,14 +4777,11 @@ static struct cftype mem_cgroup_files[] = { | |
.mode = S_IRUGO, | |
}, | |
#endif | |
-}; | |
- | |
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP | |
-static struct cftype memsw_cgroup_files[] = { | |
{ | |
.name = "memsw.usage_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEMSWAP, RES_USAGE), | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
.register_event = mem_cgroup_usage_register_event, | |
.unregister_event = mem_cgroup_usage_unregister_event, | |
}, | |
@@ -4779,35 +4789,23 @@ static struct cftype memsw_cgroup_files[] = { | |
.name = "memsw.max_usage_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEMSWAP, RES_MAX_USAGE), | |
.trigger = mem_cgroup_reset, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "memsw.limit_in_bytes", | |
.private = MEMFILE_PRIVATE(_MEMSWAP, RES_LIMIT), | |
.write_string = mem_cgroup_write, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
{ | |
.name = "memsw.failcnt", | |
.private = MEMFILE_PRIVATE(_MEMSWAP, RES_FAILCNT), | |
.trigger = mem_cgroup_reset, | |
- .read_u64 = mem_cgroup_read, | |
+ .read = mem_cgroup_read, | |
}, | |
-}; | |
- | |
-static int register_memsw_files(struct cgroup *cont, struct cgroup_subsys *ss) | |
-{ | |
- if (!do_swap_account) | |
- return 0; | |
- return cgroup_add_files(cont, ss, memsw_cgroup_files, | |
- ARRAY_SIZE(memsw_cgroup_files)); | |
-}; | |
-#else | |
-static int register_memsw_files(struct cgroup *cont, struct cgroup_subsys *ss) | |
-{ | |
- return 0; | |
-} | |
#endif | |
+ { }, /* terminate */ | |
+}; | |
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *memcg, int node) | |
{ | |
@@ -5059,7 +5057,17 @@ mem_cgroup_create(struct cgroup *cont) | |
memcg->move_charge_at_immigrate = 0; | |
mutex_init(&memcg->thresholds_lock); | |
spin_lock_init(&memcg->move_lock); | |
- vmpressure_init(&memcg->vmpressure); | |
+ | |
+ error = memcg_init_kmem(memcg, &mem_cgroup_subsys); | |
+ if (error) { | |
+ /* | |
+ * We call put now because our (and parent's) refcnts | |
+ * are already in place. mem_cgroup_put() will internally | |
+ * call __mem_cgroup_free, so return directly | |
+ */ | |
+ mem_cgroup_put(memcg); | |
+ return ERR_PTR(error); | |
+ } | |
return &memcg->css; | |
free_out: | |
__mem_cgroup_free(memcg); | |
@@ -5077,28 +5085,11 @@ static void mem_cgroup_destroy(struct cgroup *cont) | |
{ | |
struct mem_cgroup *memcg = mem_cgroup_from_cont(cont); | |
- kmem_cgroup_destroy(cont); | |
+ kmem_cgroup_destroy(memcg); | |
mem_cgroup_put(memcg); | |
} | |
-static int mem_cgroup_populate(struct cgroup_subsys *ss, | |
- struct cgroup *cont) | |
-{ | |
- int ret; | |
- | |
- ret = cgroup_add_files(cont, ss, mem_cgroup_files, | |
- ARRAY_SIZE(mem_cgroup_files)); | |
- | |
- if (!ret) | |
- ret = register_memsw_files(cont, ss); | |
- | |
- if (!ret) | |
- ret = register_kmem_files(cont, ss); | |
- | |
- return ret; | |
-} | |
- | |
#ifdef CONFIG_MMU | |
/* Handlers for move charge at task migration. */ | |
#define PRECHARGE_COUNT_AT_ONCE 256 | |
@@ -5682,12 +5673,13 @@ struct cgroup_subsys mem_cgroup_subsys = { | |
.create = mem_cgroup_create, | |
.pre_destroy = mem_cgroup_pre_destroy, | |
.destroy = mem_cgroup_destroy, | |
- .populate = mem_cgroup_populate, | |
.can_attach = mem_cgroup_can_attach, | |
.cancel_attach = mem_cgroup_cancel_attach, | |
.attach = mem_cgroup_move_task, | |
+ .base_cftypes = mem_cgroup_files, | |
.early_init = 0, | |
.use_id = 1, | |
+ .__DEPRECATED_clear_css_refs = true, | |
}; | |
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP | |
diff --git a/mm/shmem.c b/mm/shmem.c | |
index 4489566..0fa474e 100644 | |
--- a/mm/shmem.c | |
+++ b/mm/shmem.c | |
@@ -76,11 +76,16 @@ static struct vfsmount *shm_mnt; | |
/* Symlink up to this size is kmalloc'ed instead of using a swappable page */ | |
#define SHORT_SYMLINK_LEN 128 | |
-struct shmem_xattr { | |
- struct list_head list; /* anchored by shmem_inode_info->xattr_list */ | |
- char *name; /* xattr name */ | |
- size_t size; | |
- char value[0]; | |
+/* | |
+ * shmem_fallocate and shmem_writepage communicate via inode->i_private | |
+ * (with i_mutex making sure that it has only one user at a time): | |
+ * we would prefer not to enlarge the shmem inode just for that. | |
+ */ | |
+struct shmem_falloc { | |
+ pgoff_t start; /* start of range currently being fallocated */ | |
+ pgoff_t next; /* the next page offset to be fallocated */ | |
+ pgoff_t nr_falloced; /* how many new pages have been fallocated */ | |
+ pgoff_t nr_unswapped; /* how often writepage refused to swap out */ | |
}; | |
/* Flag allocation requirements to shmem_getpage */ | |
@@ -577,7 +582,6 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr) | |
static void shmem_evict_inode(struct inode *inode) | |
{ | |
struct shmem_inode_info *info = SHMEM_I(inode); | |
- struct shmem_xattr *xattr, *nxattr; | |
if (inode->i_mapping->a_ops == &shmem_aops) { | |
shmem_unacct_size(info->flags, inode->i_size); | |
@@ -591,11 +595,8 @@ static void shmem_evict_inode(struct inode *inode) | |
} else | |
kfree(info->symlink); | |
- list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) { | |
- kfree(xattr->name); | |
- kfree(xattr); | |
- } | |
- WARN_ON(inode->i_blocks); | |
+ simple_xattrs_free(&info->xattrs); | |
+ BUG_ON(inode->i_blocks); | |
shmem_free_inode(inode->i_sb); | |
end_writeback(inode); | |
} | |
@@ -1145,7 +1146,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode | |
spin_lock_init(&info->lock); | |
info->flags = flags & VM_NORESERVE; | |
INIT_LIST_HEAD(&info->swaplist); | |
- INIT_LIST_HEAD(&info->xattr_list); | |
+ simple_xattrs_init(&info->xattrs); | |
cache_no_acl(inode); | |
switch (mode & S_IFMT) { | |
@@ -1718,28 +1719,6 @@ static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *co | |
*/ | |
/* | |
- * Allocate new xattr and copy in the value; but leave the name to callers. | |
- */ | |
-static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size) | |
-{ | |
- struct shmem_xattr *new_xattr; | |
- size_t len; | |
- | |
- /* wrap around? */ | |
- len = sizeof(*new_xattr) + size; | |
- if (len <= sizeof(*new_xattr)) | |
- return NULL; | |
- | |
- new_xattr = kmalloc(len, GFP_KERNEL); | |
- if (!new_xattr) | |
- return NULL; | |
- | |
- new_xattr->size = size; | |
- memcpy(new_xattr->value, value, size); | |
- return new_xattr; | |
-} | |
- | |
-/* | |
* Callback for security_inode_init_security() for acquiring xattrs. | |
*/ | |
static int shmem_initxattrs(struct inode *inode, | |
@@ -1748,11 +1727,11 @@ static int shmem_initxattrs(struct inode *inode, | |
{ | |
struct shmem_inode_info *info = SHMEM_I(inode); | |
const struct xattr *xattr; | |
- struct shmem_xattr *new_xattr; | |
+ struct simple_xattr *new_xattr; | |
size_t len; | |
for (xattr = xattr_array; xattr->name != NULL; xattr++) { | |
- new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len); | |
+ new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len); | |
if (!new_xattr) | |
return -ENOMEM; | |
@@ -1769,91 +1748,12 @@ static int shmem_initxattrs(struct inode *inode, | |
memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN, | |
xattr->name, len); | |
- spin_lock(&info->lock); | |
- list_add(&new_xattr->list, &info->xattr_list); | |
- spin_unlock(&info->lock); | |
+ simple_xattr_list_add(&info->xattrs, new_xattr); | |
} | |
return 0; | |
} | |
-static int shmem_xattr_get(struct dentry *dentry, const char *name, | |
- void *buffer, size_t size) | |
-{ | |
- struct shmem_inode_info *info; | |
- struct shmem_xattr *xattr; | |
- int ret = -ENODATA; | |
- | |
- info = SHMEM_I(dentry->d_inode); | |
- | |
- spin_lock(&info->lock); | |
- list_for_each_entry(xattr, &info->xattr_list, list) { | |
- if (strcmp(name, xattr->name)) | |
- continue; | |
- | |
- ret = xattr->size; | |
- if (buffer) { | |
- if (size < xattr->size) | |
- ret = -ERANGE; | |
- else | |
- memcpy(buffer, xattr->value, xattr->size); | |
- } | |
- break; | |
- } | |
- spin_unlock(&info->lock); | |
- return ret; | |
-} | |
- | |
-static int shmem_xattr_set(struct inode *inode, const char *name, | |
- const void *value, size_t size, int flags) | |
-{ | |
- struct shmem_inode_info *info = SHMEM_I(inode); | |
- struct shmem_xattr *xattr; | |
- struct shmem_xattr *new_xattr = NULL; | |
- int err = 0; | |
- | |
- /* value == NULL means remove */ | |
- if (value) { | |
- new_xattr = shmem_xattr_alloc(value, size); | |
- if (!new_xattr) | |
- return -ENOMEM; | |
- | |
- new_xattr->name = kstrdup(name, GFP_KERNEL); | |
- if (!new_xattr->name) { | |
- kfree(new_xattr); | |
- return -ENOMEM; | |
- } | |
- } | |
- | |
- spin_lock(&info->lock); | |
- list_for_each_entry(xattr, &info->xattr_list, list) { | |
- if (!strcmp(name, xattr->name)) { | |
- if (flags & XATTR_CREATE) { | |
- xattr = new_xattr; | |
- err = -EEXIST; | |
- } else if (new_xattr) { | |
- list_replace(&xattr->list, &new_xattr->list); | |
- } else { | |
- list_del(&xattr->list); | |
- } | |
- goto out; | |
- } | |
- } | |
- if (flags & XATTR_REPLACE) { | |
- xattr = new_xattr; | |
- err = -ENODATA; | |
- } else { | |
- list_add(&new_xattr->list, &info->xattr_list); | |
- xattr = NULL; | |
- } | |
-out: | |
- spin_unlock(&info->lock); | |
- if (xattr) | |
- kfree(xattr->name); | |
- kfree(xattr); | |
- return err; | |
-} | |
- | |
static const struct xattr_handler *shmem_xattr_handlers[] = { | |
#ifdef CONFIG_TMPFS_POSIX_ACL | |
&generic_acl_access_handler, | |
@@ -1884,6 +1784,7 @@ static int shmem_xattr_validate(const char *name) | |
static ssize_t shmem_getxattr(struct dentry *dentry, const char *name, | |
void *buffer, size_t size) | |
{ | |
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode); | |
int err; | |
/* | |
@@ -1898,12 +1799,13 @@ static ssize_t shmem_getxattr(struct dentry *dentry, const char *name, | |
if (err) | |
return err; | |
- return shmem_xattr_get(dentry, name, buffer, size); | |
+ return simple_xattr_get(&info->xattrs, name, buffer, size); | |
} | |
static int shmem_setxattr(struct dentry *dentry, const char *name, | |
const void *value, size_t size, int flags) | |
{ | |
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode); | |
int err; | |
/* | |
@@ -1918,15 +1820,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name, | |
if (err) | |
return err; | |
- if (size == 0) | |
- value = ""; /* empty EA, do not remove */ | |
- | |
- return shmem_xattr_set(dentry->d_inode, name, value, size, flags); | |
- | |
+ return simple_xattr_set(&info->xattrs, name, value, size, flags); | |
} | |
static int shmem_removexattr(struct dentry *dentry, const char *name) | |
{ | |
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode); | |
int err; | |
/* | |
@@ -1941,45 +1840,13 @@ static int shmem_removexattr(struct dentry *dentry, const char *name) | |
if (err) | |
return err; | |
- return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE); | |
-} | |
- | |
-static bool xattr_is_trusted(const char *name) | |
-{ | |
- return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN); | |
+ return simple_xattr_remove(&info->xattrs, name); | |
} | |
static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size) | |
{ | |
- bool trusted = capable(CAP_SYS_ADMIN); | |
- struct shmem_xattr *xattr; | |
- struct shmem_inode_info *info; | |
- size_t used = 0; | |
- | |
- info = SHMEM_I(dentry->d_inode); | |
- | |
- spin_lock(&info->lock); | |
- list_for_each_entry(xattr, &info->xattr_list, list) { | |
- size_t len; | |
- | |
- /* skip "trusted." attributes for unprivileged callers */ | |
- if (!trusted && xattr_is_trusted(xattr->name)) | |
- continue; | |
- | |
- len = strlen(xattr->name) + 1; | |
- used += len; | |
- if (buffer) { | |
- if (size < used) { | |
- used = -ERANGE; | |
- break; | |
- } | |
- memcpy(buffer, xattr->name, len); | |
- buffer += len; | |
- } | |
- } | |
- spin_unlock(&info->lock); | |
- | |
- return used; | |
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode); | |
+ return simple_xattr_list(&info->xattrs, buffer, size); | |
} | |
#endif /* CONFIG_TMPFS_XATTR */ | |
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c | |
index ba6900f..b2a5986 100644 | |
--- a/net/core/netprio_cgroup.c | |
+++ b/net/core/netprio_cgroup.c | |
@@ -23,21 +23,6 @@ | |
#include <net/sock.h> | |
#include <net/netprio_cgroup.h> | |
-static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp); | |
-static void cgrp_destroy(struct cgroup *cgrp); | |
-static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp); | |
- | |
-struct cgroup_subsys net_prio_subsys = { | |
- .name = "net_prio", | |
- .create = cgrp_create, | |
- .destroy = cgrp_destroy, | |
- .populate = cgrp_populate, | |
-#ifdef CONFIG_NETPRIO_CGROUP | |
- .subsys_id = net_prio_subsys_id, | |
-#endif | |
- .module = THIS_MODULE | |
-}; | |
- | |
#define PRIOIDX_SZ 128 | |
static unsigned long prioidx_map[PRIOIDX_SZ]; | |
@@ -257,12 +242,19 @@ static struct cftype ss_files[] = { | |
.read_map = read_priomap, | |
.write_string = write_priomap, | |
}, | |
+ { } /* terminate */ | |
}; | |
-static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp) | |
-{ | |
- return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files)); | |
-} | |
+struct cgroup_subsys net_prio_subsys = { | |
+ .name = "net_prio", | |
+ .create = cgrp_create, | |
+ .destroy = cgrp_destroy, | |
+#ifdef CONFIG_NETPRIO_CGROUP | |
+ .subsys_id = net_prio_subsys_id, | |
+#endif | |
+ .base_cftypes = ss_files, | |
+ .module = THIS_MODULE | |
+}; | |
static int netprio_device_event(struct notifier_block *unused, | |
unsigned long event, void *ptr) | |
diff --git a/net/core/sock.c b/net/core/sock.c | |
index 832cf04..f409f8d 100644 | |
--- a/net/core/sock.c | |
+++ b/net/core/sock.c | |
@@ -140,7 +140,7 @@ static DEFINE_MUTEX(proto_list_mutex); | |
static LIST_HEAD(proto_list); | |
#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM | |
-int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
+int mem_cgroup_sockets_init(struct mem_cgroup *memcg, struct cgroup_subsys *ss) | |
{ | |
struct proto *proto; | |
int ret = 0; | |
@@ -148,7 +148,7 @@ int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
mutex_lock(&proto_list_mutex); | |
list_for_each_entry(proto, &proto_list, node) { | |
if (proto->init_cgroup) { | |
- ret = proto->init_cgroup(cgrp, ss); | |
+ ret = proto->init_cgroup(memcg, ss); | |
if (ret) | |
goto out; | |
} | |
@@ -159,19 +159,19 @@ int mem_cgroup_sockets_init(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
out: | |
list_for_each_entry_continue_reverse(proto, &proto_list, node) | |
if (proto->destroy_cgroup) | |
- proto->destroy_cgroup(cgrp); | |
+ proto->destroy_cgroup(memcg); | |
mutex_unlock(&proto_list_mutex); | |
return ret; | |
} | |
-void mem_cgroup_sockets_destroy(struct cgroup *cgrp) | |
+void mem_cgroup_sockets_destroy(struct mem_cgroup *memcg) | |
{ | |
struct proto *proto; | |
mutex_lock(&proto_list_mutex); | |
list_for_each_entry_reverse(proto, &proto_list, node) | |
if (proto->destroy_cgroup) | |
- proto->destroy_cgroup(cgrp); | |
+ proto->destroy_cgroup(memcg); | |
mutex_unlock(&proto_list_mutex); | |
} | |
#endif | |
diff --git a/net/ipv4/tcp_memcontrol.c b/net/ipv4/tcp_memcontrol.c | |
index e795272..1517037 100644 | |
--- a/net/ipv4/tcp_memcontrol.c | |
+++ b/net/ipv4/tcp_memcontrol.c | |
@@ -6,37 +6,6 @@ | |
#include <linux/memcontrol.h> | |
#include <linux/module.h> | |
-static u64 tcp_cgroup_read(struct cgroup *cont, struct cftype *cft); | |
-static int tcp_cgroup_write(struct cgroup *cont, struct cftype *cft, | |
- const char *buffer); | |
-static int tcp_cgroup_reset(struct cgroup *cont, unsigned int event); | |
- | |
-static struct cftype tcp_files[] = { | |
- { | |
- .name = "kmem.tcp.limit_in_bytes", | |
- .write_string = tcp_cgroup_write, | |
- .read_u64 = tcp_cgroup_read, | |
- .private = RES_LIMIT, | |
- }, | |
- { | |
- .name = "kmem.tcp.usage_in_bytes", | |
- .read_u64 = tcp_cgroup_read, | |
- .private = RES_USAGE, | |
- }, | |
- { | |
- .name = "kmem.tcp.failcnt", | |
- .private = RES_FAILCNT, | |
- .trigger = tcp_cgroup_reset, | |
- .read_u64 = tcp_cgroup_read, | |
- }, | |
- { | |
- .name = "kmem.tcp.max_usage_in_bytes", | |
- .private = RES_MAX_USAGE, | |
- .trigger = tcp_cgroup_reset, | |
- .read_u64 = tcp_cgroup_read, | |
- }, | |
-}; | |
- | |
static inline struct tcp_memcontrol *tcp_from_cgproto(struct cg_proto *cg_proto) | |
{ | |
return container_of(cg_proto, struct tcp_memcontrol, cg_proto); | |
@@ -49,7 +18,7 @@ static void memcg_tcp_enter_memory_pressure(struct sock *sk) | |
} | |
EXPORT_SYMBOL(memcg_tcp_enter_memory_pressure); | |
-int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
+int tcp_init_cgroup(struct mem_cgroup *memcg, struct cgroup_subsys *ss) | |
{ | |
/* | |
* The root cgroup does not use res_counters, but rather, | |
@@ -59,13 +28,12 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
struct res_counter *res_parent = NULL; | |
struct cg_proto *cg_proto, *parent_cg; | |
struct tcp_memcontrol *tcp; | |
- struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp); | |
struct mem_cgroup *parent = parent_mem_cgroup(memcg); | |
struct net *net = current->nsproxy->net_ns; | |
cg_proto = tcp_prot.proto_cgroup(memcg); | |
if (!cg_proto) | |
- goto create_files; | |
+ return 0; | |
tcp = tcp_from_cgproto(cg_proto); | |
@@ -88,15 +56,12 @@ int tcp_init_cgroup(struct cgroup *cgrp, struct cgroup_subsys *ss) | |
cg_proto->sockets_allocated = &tcp->tcp_sockets_allocated; | |
cg_proto->memcg = memcg; | |
-create_files: | |
- return cgroup_add_files(cgrp, ss, tcp_files, | |
- ARRAY_SIZE(tcp_files)); | |
+ return 0; | |
} | |
EXPORT_SYMBOL(tcp_init_cgroup); | |
-void tcp_destroy_cgroup(struct cgroup *cgrp) | |
+void tcp_destroy_cgroup(struct mem_cgroup *memcg) | |
{ | |
- struct mem_cgroup *memcg = mem_cgroup_from_cont(cgrp); | |
struct cg_proto *cg_proto; | |
struct tcp_memcontrol *tcp; | |
u64 val; | |
@@ -270,3 +235,37 @@ void tcp_prot_mem(struct mem_cgroup *memcg, long val, int idx) | |
tcp->tcp_prot_mem[idx] = val; | |
} | |
+ | |
+static struct cftype tcp_files[] = { | |
+ { | |
+ .name = "kmem.tcp.limit_in_bytes", | |
+ .write_string = tcp_cgroup_write, | |
+ .read_u64 = tcp_cgroup_read, | |
+ .private = RES_LIMIT, | |
+ }, | |
+ { | |
+ .name = "kmem.tcp.usage_in_bytes", | |
+ .read_u64 = tcp_cgroup_read, | |
+ .private = RES_USAGE, | |
+ }, | |
+ { | |
+ .name = "kmem.tcp.failcnt", | |
+ .private = RES_FAILCNT, | |
+ .trigger = tcp_cgroup_reset, | |
+ .read_u64 = tcp_cgroup_read, | |
+ }, | |
+ { | |
+ .name = "kmem.tcp.max_usage_in_bytes", | |
+ .private = RES_MAX_USAGE, | |
+ .trigger = tcp_cgroup_reset, | |
+ .read_u64 = tcp_cgroup_read, | |
+ }, | |
+ { } /* terminate */ | |
+}; | |
+ | |
+static int __init tcp_memcontrol_init(void) | |
+{ | |
+ WARN_ON(cgroup_add_cftypes(&mem_cgroup_subsys, tcp_files)); | |
+ return 0; | |
+} | |
+__initcall(tcp_memcontrol_init); | |
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c | |
index 1afaa28..7743ea8 100644 | |
--- a/net/sched/cls_cgroup.c | |
+++ b/net/sched/cls_cgroup.c | |
@@ -22,22 +22,6 @@ | |
#include <net/sock.h> | |
#include <net/cls_cgroup.h> | |
-static struct cgroup_subsys_state *cgrp_create(struct cgroup *cgrp); | |
-static void cgrp_destroy(struct cgroup *cgrp); | |
-static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp); | |
- | |
-struct cgroup_subsys net_cls_subsys = { | |
- .name = "net_cls", | |
- .create = cgrp_create, | |
- .destroy = cgrp_destroy, | |
- .populate = cgrp_populate, | |
-#ifdef CONFIG_NET_CLS_CGROUP | |
- .subsys_id = net_cls_subsys_id, | |
-#endif | |
- .module = THIS_MODULE, | |
-}; | |
- | |
- | |
static inline struct cgroup_cls_state *cgrp_cls_state(struct cgroup *cgrp) | |
{ | |
return container_of(cgroup_subsys_state(cgrp, net_cls_subsys_id), | |
@@ -86,12 +70,19 @@ static struct cftype ss_files[] = { | |
.read_u64 = read_classid, | |
.write_u64 = write_classid, | |
}, | |
+ { } /* terminate */ | |
}; | |
-static int cgrp_populate(struct cgroup_subsys *ss, struct cgroup *cgrp) | |
-{ | |
- return cgroup_add_files(cgrp, ss, ss_files, ARRAY_SIZE(ss_files)); | |
-} | |
+struct cgroup_subsys net_cls_subsys = { | |
+ .name = "net_cls", | |
+ .create = cgrp_create, | |
+ .destroy = cgrp_destroy, | |
+#ifdef CONFIG_NET_CLS_CGROUP | |
+ .subsys_id = net_cls_subsys_id, | |
+#endif | |
+ .base_cftypes = ss_files, | |
+ .module = THIS_MODULE, | |
+}; | |
struct cls_cgroup_head { | |
u32 handle; | |
diff --git a/security/device_cgroup.c b/security/device_cgroup.c | |
index c43a332..442204c 100644 | |
--- a/security/device_cgroup.c | |
+++ b/security/device_cgroup.c | |
@@ -447,22 +447,16 @@ static struct cftype dev_cgroup_files[] = { | |
.read_seq_string = devcgroup_seq_read, | |
.private = DEVCG_LIST, | |
}, | |
+ { } /* terminate */ | |
}; | |
-static int devcgroup_populate(struct cgroup_subsys *ss, | |
- struct cgroup *cgroup) | |
-{ | |
- return cgroup_add_files(cgroup, ss, dev_cgroup_files, | |
- ARRAY_SIZE(dev_cgroup_files)); | |
-} | |
- | |
struct cgroup_subsys devices_subsys = { | |
.name = "devices", | |
.can_attach = devcgroup_can_attach, | |
.create = devcgroup_create, | |
.destroy = devcgroup_destroy, | |
- .populate = devcgroup_populate, | |
.subsys_id = devices_subsys_id, | |
+ .base_cftypes = dev_cgroup_files, | |
}; | |
int __devcgroup_inode_permission(struct inode *inode, int mask) | |
-- | |
2.0.0 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment