The ovl_get_verity_xattr() function was never added, only its declaration.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 184996e92e86 ("ovl: Validate verity xattr when resolving lowerdata")
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In theory overlayfs could support upper layer directly referring to a data
layer, but there's no current use case for this.
Originally, when data-only layers were introduced, this wasn't allowed,
only introduced by the "datadir+" feature, but without actually handling
this case, resulting in an Oops.
Fix by disallowing datadir without lowerdir.
Reported-by: Giuseppe Scrivano <gscrivan@redhat.com>
Fixes: 24e16e385f22 ("ovl: add support for appending lowerdirs one by one")
Cc: <stable@vger.kernel.org> # v6.7
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Alexander Larsson <alexl@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rNwAKCRCRxhvAZXjc
onBJAP9Z8Ywmlb5KQ1E3HvDmkwyY6yOSyZ9/CmbzrkCJ8ywYkQD/d9/xt0EP/O/q
N8YtzXArHWt7u0YbcVpy9WK3F72BdwU=
=VJgY
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs async dir updates from Christian Brauner:
"This contains cleanups that fell out of the work from async directory
handling:
- Change kern_path_locked() and user_path_locked_at() to never return
a negative dentry. This simplifies the usability of these helpers
in various places
- Drop d_exact_alias() from the remaining place in NFS where it is
still used. This also allows us to drop the d_exact_alias() helper
completely
- Drop an unnecessary call to fh_update() from nfsd_create_locked()
- Change i_op->mkdir() to return a struct dentry
Change vfs_mkdir() to return a dentry provided by the filesystems
which is hashed and positive. This allows us to reduce the number
of cases where the resulting dentry is not positive to very few
cases. The code in these places becomes simpler and easier to
understand.
- Repack DENTRY_* and LOOKUP_* flags"
* tag 'vfs-6.15-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
doc: fix inline emphasis warning
VFS: Change vfs_mkdir() to return the dentry.
nfs: change mkdir inode_operation to return alternate dentry if needed.
fuse: return correct dentry for ->mkdir
ceph: return the correct dentry on mkdir
hostfs: store inode in dentry after mkdir if possible.
Change inode_operations.mkdir to return struct dentry *
nfsd: drop fh_update() from S_IFDIR branch of nfsd_create_locked()
nfs/vfs: discard d_exact_alias()
VFS: add common error checks to lookup_one_qstr_excl()
VFS: change kern_path_locked() and user_path_locked_at() to never return negative dentry
VFS: repack LOOKUP_ bit flags.
VFS: repack DENTRY_ flags.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZ90rUAAKCRCRxhvAZXjc
opI3AP9ws4S/JXOjxNKoTYmNM2nZ8+r1v8tUxbLIiqdvzx9PygD/V1ZjXtn6lwZr
OK8d5Y8UnlPZTlBF8D61op3AjnXYzws=
=KV4p
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs overlayfs updates from Christian Brauner:
"Currently overlayfs uses the mounter's credentials for its
override_creds() calls. That provides a consistent permission model.
This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will
be mounted in. This provides a consistent and simple security model.
With this it is possible to e.g., mount an overlayfs instance where
the mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage
of custom fs{g,u}id different from the callers and other tweaks"
* tag 'vfs-6.15-rc1.overlayfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests/ovl: add third selftest for "override_creds"
selftests/ovl: add second selftest for "override_creds"
selftests/filesystems: add utils.{c,h}
selftests/ovl: add first selftest for "override_creds"
ovl: allow to specify override credentials
vfs_mkdir() does not guarantee to leave the child dentry hashed or make
it positive on success, and in many such cases the filesystem had to use
a different dentry which it can now return.
This patch changes vfs_mkdir() to return the dentry provided by the
filesystems which is hashed and positive when provided. This reduces
the number of cases where the resulting dentry is not positive to a
handful which don't deserve extra efforts.
The only callers of vfs_mkdir() which are interested in the resulting
inode are in-kernel filesystem clients: cachefiles, nfsd, smb/server.
The only filesystems that don't reliably provide the inode are:
- kernfs, tracefs which these clients are unlikely to be interested in
- cifs in some configurations would need to do a lookup to find the
created inode, but doesn't. cifs cannot be exported via NFS, is
unlikely to be used by cachefiles, and smb/server only has a soft
requirement for the inode, so this is unlikely to be a problem in
practice.
- hostfs, nfs, cifs may need to do a lookup (rarely for NFS) and it is
possible for a race to make that lookup fail. Actual failure
is unlikely and providing callers handle negative dentries graceful
they will fail-safe.
So this patch removes the lookup code in nfsd and smb/server and adjusts
them to fail safe if a negative dentry is provided:
- cache-files already fails safe by restarting the task from the
top - it still does with this change, though it no longer calls
cachefiles_put_directory() as that will crash if the dentry is
negative.
- nfsd reports "Server-fault" which it what it used to do if the lookup
failed. This will never happen on any file-systems that it can actually
export, so this is of no consequence. I removed the fh_update()
call as that is not needed and out-of-place. A subsequent
nfsd_create_setattr() call will call fh_update() when needed.
- smb/server only wants the inode to call ksmbd_smb_inherit_owner()
which updates ->i_uid (without calling notify_change() or similar)
which can be safely skipping on cifs (I hope).
If a different dentry is returned, the first one is put. If necessary
the fact that it is new can be determined by comparing pointers. A new
dentry will certainly have a new pointer (as the old is put after the
new is obtained).
Similarly if an error is returned (via ERR_PTR()) the original dentry is
put.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-7-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
Some filesystems, such as NFS, cifs, ceph, and fuse, do not have
complete control of sequencing on the actual filesystem (e.g. on a
different server) and may find that the inode created for a mkdir
request already exists in the icache and dcache by the time the mkdir
request returns. For example, if the filesystem is mounted twice the
directory could be visible on the other mount before it is on the
original mount, and a pair of name_to_handle_at(), open_by_handle_at()
calls could instantiate the directory inode with an IS_ROOT() dentry
before the first mkdir returns.
This means that the dentry passed to ->mkdir() may not be the one that
is associated with the inode after the ->mkdir() completes. Some
callers need to interact with the inode after the ->mkdir completes and
they currently need to perform a lookup in the (rare) case that the
dentry is no longer hashed.
This lookup-after-mkdir requires that the directory remains locked to
avoid races. Planned future patches to lock the dentry rather than the
directory will mean that this lookup cannot be performed atomically with
the mkdir.
To remove this barrier, this patch changes ->mkdir to return the
resulting dentry if it is different from the one passed in.
Possible returns are:
NULL - the directory was created and no other dentry was used
ERR_PTR() - an error occurred
non-NULL - this other dentry was spliced in
This patch only changes file-systems to return "ERR_PTR(err)" instead of
"err" or equivalent transformations. Subsequent patches will make
further changes to some file-systems to return a correct dentry.
Not all filesystems reliably result in a positive hashed dentry:
- NFS, cifs, hostfs will sometimes need to perform a lookup of
the name to get inode information. Races could result in this
returning something different. Note that this lookup is
non-atomic which is what we are trying to avoid. Placing the
lookup in filesystem code means it only happens when the filesystem
has no other option.
- kernfs and tracefs leave the dentry negative and the ->revalidate
operation ensures that lookup will be called to correctly populate
the dentry. This could be fixed but I don't think it is important
to any of the users of vfs_mkdir() which look at the dentry.
The recommendation to use
d_drop();d_splice_alias()
is ugly but fits with current practice. A planned future patch will
change this.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/20250227013949.536172-2-neilb@suse.de
Signed-off-by: Christian Brauner <brauner@kernel.org>
The issue was caused by dput(upper) being called before
ovl_dentry_update_reval(), while upper->d_flags was still
accessed in ovl_dentry_remote().
Move dput(upper) after its last use to prevent use-after-free.
BUG: KASAN: slab-use-after-free in ovl_dentry_remote fs/overlayfs/util.c:162 [inline]
BUG: KASAN: slab-use-after-free in ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
print_address_description mm/kasan/report.c:377 [inline]
print_report+0xc3/0x620 mm/kasan/report.c:488
kasan_report+0xd9/0x110 mm/kasan/report.c:601
ovl_dentry_remote fs/overlayfs/util.c:162 [inline]
ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167
ovl_link_up fs/overlayfs/copy_up.c:610 [inline]
ovl_copy_up_one+0x2105/0x3490 fs/overlayfs/copy_up.c:1170
ovl_copy_up_flags+0x18d/0x200 fs/overlayfs/copy_up.c:1223
ovl_rename+0x39e/0x18c0 fs/overlayfs/dir.c:1136
vfs_rename+0xf84/0x20a0 fs/namei.c:4893
...
</TASK>
Fixes: b07d5cc93e1b ("ovl: update of dentry revalidate flags after copy up")
Reported-by: syzbot+316db8a1191938280eb6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=316db8a1191938280eb6
Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Link: https://lore.kernel.org/r/20250214215148.761147-1-kovalev@altlinux.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Currently overlayfs uses the mounter's credentials for it's
override_creds() calls. That provides a consistent permission model.
This patches allows a caller to instruct overlayfs to use its
credentials instead. The caller must be located in the same user
namespace hierarchy as the user namespace the overlayfs instance will be
mounted in. This provides a consistent and simple security model.
With this it is possible to e.g., mount an overlayfs instance where the
mounter must have CAP_SYS_ADMIN but the credentials used for
override_creds() have dropped CAP_SYS_ADMIN. It also allows the usage of
custom fs{g,u}id different from the callers and other tweaks.
Link: https://lore.kernel.org/r/20250219-work-overlayfs-v3-1-46af55e4ceda@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5yJdgAKCRBZ7Krx/gZQ
69W4AQDwgxceiQ6icx3rFhCWQigne4jdMO84kd8tNaa+xHGe1AD/WnkeChc5DqjQ
wZWZxAAzml9SS01IcSiHWaF5fgrjlA0=
=rXOq
-----END PGP SIGNATURE-----
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
Most of the filesystem methods where we care about dentry name
and parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for
expected name and expected parent of the dentry being
validated. That kills quite a bit of boilerplate in
->d_revalidate() instances, along with a bunch of races
where they used to access ->d_name without sufficient
precautions.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5gkoQAKCRBZ7Krx/gZQ
6w9FAP4nyxNNWMjE1TwuWR/DNDMYYuw/qn/miZ88B5BUM8hzqgD/W2SjRvcbSaIm
xSIYpbtKgtqNU34P1PU+dBvL8Utz2AE=
=TWY8
-----END PGP SIGNATURE-----
Merge tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs d_revalidate updates from Al Viro:
"Provide stable parent and name to ->d_revalidate() instances
Most of the filesystem methods where we care about dentry name and
parent have their stability guaranteed by the callers;
->d_revalidate() is the major exception.
It's easy enough for callers to supply stable values for expected name
and expected parent of the dentry being validated. That kills quite a
bit of boilerplate in ->d_revalidate() instances, along with a bunch
of races where they used to access ->d_name without sufficient
precautions"
* tag 'pull-revalidate' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
9p: fix ->rename_sem exclusion
orangefs_d_revalidate(): use stable parent inode and name passed by caller
ocfs2_dentry_revalidate(): use stable parent inode and name passed by caller
nfs: fix ->d_revalidate() UAF on ->d_name accesses
nfs{,4}_lookup_validate(): use stable parent inode passed by caller
gfs2_drevalidate(): use stable parent inode and name passed by caller
fuse_dentry_revalidate(): use stable parent inode and name passed by caller
vfat_revalidate{,_ci}(): use stable parent inode passed by caller
exfat_d_revalidate(): use stable parent inode passed by caller
fscrypt_d_revalidate(): use stable parent inode passed by caller
ceph_d_revalidate(): propagate stable name down into request encoding
ceph_d_revalidate(): use stable parent inode passed by caller
afs_d_revalidate(): use stable name and parent inode passed by caller
Pass parent directory inode and expected name to ->d_revalidate()
generic_ci_d_compare(): use shortname_storage
ext4 fast_commit: make use of name_snapshot primitives
dissolve external_name.u into separate members
make take_dentry_name_snapshot() lockless
dcache: back inline names with a struct-wrapped array of unsigned long
make sure that DNAME_INLINE_LEN is a multiple of word size
->d_revalidate() often needs to access dentry parent and name; that has
to be done carefully, since the locking environment varies from caller
to caller. We are not guaranteed that dentry in question will not be
moved right under us - not unless the filesystem is such that nothing
on it ever gets renamed.
It can be dealt with, but that results in boilerplate code that isn't
even needed - the callers normally have just found the dentry via dcache
lookup and want to verify that it's in the right place; they already
have the values of ->d_parent and ->d_name stable. There is a couple
of exceptions (overlayfs and, to less extent, ecryptfs), but for the
majority of calls that song and dance is not needed at all.
It's easier to make ecryptfs and overlayfs find and pass those values if
there's a ->d_revalidate() instance to be called, rather than doing that
in the instances.
This commit only changes the calling conventions; making use of supplied
values is left to followups.
NOTE: some instances need more than just the parent - things like CIFS
may need to build an entire path from filesystem root, so they need
more precautions than the usual boilerplate. This series doesn't
do anything to that need - these filesystems have to keep their locking
mechanisms (rename_lock loops, use of dentry_path_raw(), private rwsem
a-la v9fs).
One thing to keep in mind when using name is that name->name will normally
point into the pathname being resolved; the filename in question occupies
name->len bytes starting at name->name, and there is NUL somewhere after it,
but it the next byte might very well be '/' rather than '\0'. Do not
ignore name->len.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Gabriel Krisman Bertazi <gabriel@krisman.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Dmitry Safonov reported that a WARN_ON() assertion can be trigered by
userspace when calling inotify_show_fdinfo() for an overlayfs watched
inode, whose dentry aliases were discarded with drop_caches.
The WARN_ON() assertion in inotify_show_fdinfo() was removed, because
it is possible for encoding file handle to fail for other reason, but
the impact of failing to encode an overlayfs file handle goes beyond
this assertion.
As shown in the LTP test case mentioned in the link below, failure to
encode an overlayfs file handle from a non-aliased inode also leads to
failure to report an fid with FAN_DELETE_SELF fanotify events.
As Dmitry notes in his analyzis of the problem, ovl_encode_fh() fails
if it cannot find an alias for the inode, but this failure can be fixed.
ovl_encode_fh() seldom uses the alias and in the case of non-decodable
file handles, as is often the case with fanotify fid info,
ovl_encode_fh() never needs to use the alias to encode a file handle.
Defer finding an alias until it is actually needed so ovl_encode_fh()
will not fail in the common case of FAN_DELETE_SELF fanotify events.
Fixes: 16aac5ad1fa9 ("ovl: support encoding non-decodable file handles")
Reported-by: Dmitry Safonov <dima@arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiie81voLZZi2zXS1BziXZCM24nXqPAxbu8kxXCUWdwOg@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20250105162404.357058-3-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@google.com/
Reported-by: Dmitry Safonov <dima@arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3F+Juqy_o6oP8uw@mail.gmail.com/
Cc: stable@vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
security_inode_copy_up() allocates a set of new credentials and has
taken a reference count.
Link: https://lore.kernel.org/r/20241125-work-cred-v2-18-68b9d38bb5b2@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Convert all calls to revert_creds() over to explicitly dropping
reference counts in preparation for converting revert_creds() to
revert_creds_light() semantics.
Link: https://lore.kernel.org/r/20241125-work-cred-v2-3-68b9d38bb5b2@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Convert all callers from override_creds() to
override_creds_light(get_new_cred()) in preparation of making
override_creds() not take a separate reference at all.
Link: https://lore.kernel.org/r/20241125-work-cred-v2-1-68b9d38bb5b2@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmc90jsACgkQEVvVyTe/
1Wol0A//RhzFCG8geR7Grbptp40CUm9kVISvkr50mPBdvVk3jX9WvH9m/10qapGP
tcGHSdHt+q5qabqutKLmQRiFbwpGEaBMaFOe7JH8na8xWvmSa3p7sJC5kLByS3rm
D2F+cVx3Di7MTscz/Ma724bHdHOUO5RbDuMIcjp7uXRvaNWJ0uZg5xWlBKsNa3h8
DbNSYi5ICihLYpUxI9NglHZ6iqcS2jHsUHSAw52/GJ2Zon1LAAmKoSn6s7hZ27ZJ
f8Rv5fFuYmkRV7nYo/gjLY1gt7KXZFcfUtMT05yd7zcnqDayKEFXEiwI/Bz5fXZL
HmZpOP4RV2M9B8HzhReVR/yG8gZaaUezX+aVQp7plZSc73GhMdFFd1bUyjgJ4Lzf
C2BlBMWafc/Zc7a7r0+X5577i34nED8lGuVMEdYMtjSjstpzIP+1Wlzn2cGi4+5K
VAb+kEravjP9ck7YrmbruRYfVhDaE37BDs4XML4S8gzcZgdaTcEMyGw1ifEhvPjA
vLbRs24a5VO7/cKlks7PWS6i9uExaz7g4re0jUPwUuc+nS+Hv+y8kLSPqLS4CtNY
MxhS2IhKK5gp1Z9XGpLsak+ancTYLSV0OJ15qsAChpqoqSG5Xd9Lt4CWACnF33Ea
ny8z5QpOAHWVb97k6xaEvu/r0dl+PHdG7vfb0MNhXaajNF8SKiU=
=pgoX
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- Fix a syzbot reported NULL pointer deref with bfs lower layers
- Fix a copy up failure of large file from lower fuse fs
- Followup cleanup of backing_file API from Miklos
- Introduction and use of revert/override_creds_light() helpers, that
were suggested by Christian as a mitigation to cache line bouncing
and false sharing of fields in overlayfs creator_cred long lived
struct cred copy.
- Store up to two backing file references (upper and lower) in an
ovl_file container instead of storing a single backing file in
file->private_data.
This is used to avoid the practice of opening a short lived backing
file for the duration of some file operations and to avoid the
specialized use of FDPUT_FPUT in such occasions, that was getting in
the way of Al's fd_file() conversions.
* tag 'ovl-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: Filter invalid inodes with missing lookup function
ovl: convert ovl_real_fdget() callers to ovl_real_file()
ovl: convert ovl_real_fdget_path() callers to ovl_real_file_path()
ovl: store upper real file in ovl_file struct
ovl: allocate a container struct ovl_file for ovl private context
ovl: do not open non-data lower file for fsync
ovl: Optimize override/revert creds
ovl: pass an explicit reference of creators creds to callers
ovl: use wrapper ovl_revert_creds()
fs/backing-file: Convert to revert/override_creds_light()
cred: Add a light version of override/revert_creds()
backing-file: clean up the API
ovl: properly handle large files in ovl_security_fileattr
and friends
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZzdpZQAKCRBZ7Krx/gZQ
6whMAQDhlGFV+nGRetwe4t60mVRpxIoc71GLC7b6V8FmyfTI5AEAkAigkJ8KCZDP
mfGsN/3PtzoxnIkIqdk7Y7q4/fowyAw=
=4DWZ
-----END PGP SIGNATURE-----
Merge tag 'pull-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull statx updates from Al Viro:
"Sanitize struct filename and lookup flags handling in statx and
friends"
* tag 'pull-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
libfs: kill empty_dir_getattr()
fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag
fs/stat.c: switch to CLASS(fd_raw)
kill getname_statx_lookup_flags()
io_statx_prep(): use getname_uflags()
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcXJQAKCRCRxhvAZXjc
opmaAQDuEpUXi3PSNwgXGRYAzeBpsonoNrBtzSRTlaza7KZCiAEAsFUSbQJNGdkN
TqC9KvI1EiX2hVvWmVBVttBNH6CY+wY=
=4le5
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull overlayfs updates from Christian Brauner:
"Make overlayfs support specifying layers through file descriptors.
Currently overlayfs only allows specifying layers through path names.
This is inconvenient for users that want to assemble an overlayfs
mount purely based on file descriptors:
This enables user to specify both:
fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);
in addition to:
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);
There's also a large set of new overlayfs selftests to test new
features and some older properties"
* tag 'vfs-6.13.ovl' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add test for specifying 500 lower layers
selftests: add overlayfs fd mounting selftests
selftests: use shared header
Documentation,ovl: document new file descriptor based layers
ovl: specify layers via file descriptors
fs: add helper to use mount option as path or fd
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZzcW4gAKCRCRxhvAZXjc
okF+AP9xTMb2SlnRPBOBd9yFcmVXmQi86TSCUPAEVb+wIldGYwD/RIOdvXYJlp9v
RgJkU1DC3ddkXtONNDY6gFaP+siIWA0=
=gMc7
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file updates from Christian Brauner:
"This contains changes the changes for files for this cycle:
- Introduce a new reference counting mechanism for files.
As atomic_inc_not_zero() is implemented with a try_cmpxchg() loop
it has O(N^2) behaviour under contention with N concurrent
operations and it is in a hot path in __fget_files_rcu().
The rcuref infrastructures remedies this problem by using an
unconditional increment relying on safe- and dead zones to make
this work and requiring rcu protection for the data structure in
question. This not just scales better it also introduces overflow
protection.
However, in contrast to generic rcuref, files require a memory
barrier and thus cannot rely on *_relaxed() atomic operations and
also require to be built on atomic_long_t as having massive amounts
of reference isn't unheard of even if it is just an attack.
This adds a file specific variant instead of making this a generic
library.
This has been tested by various people and it gives consistent
improvement up to 3-5% on workloads with loads of threads.
- Add a fastpath for find_next_zero_bit(). Skip 2-levels searching
via find_next_zero_bit() when there is a free slot in the word that
contains the next fd. This improves pts/blogbench-1.1.0 read by 8%
and write by 4% on Intel ICX 160.
- Conditionally clear full_fds_bits since it's very likely that a bit
in full_fds_bits has been cleared during __clear_open_fds(). This
improves pts/blogbench-1.1.0 read up to 13%, and write up to 5% on
Intel ICX 160.
- Get rid of all lookup_*_fdget_rcu() variants. They were used to
lookup files without taking a reference count. That became invalid
once files were switched to SLAB_TYPESAFE_BY_RCU and now we're
always taking a reference count. Switch to an already existing
helper and remove the legacy variants.
- Remove pointless includes of <linux/fdtable.h>.
- Avoid cmpxchg() in close_files() as nobody else has a reference to
the files_struct at that point.
- Move close_range() into fs/file.c and fold __close_range() into it.
- Cleanup calling conventions of alloc_fdtable() and expand_files().
- Merge __{set,clear}_close_on_exec() into one.
- Make __set_open_fd() set cloexec as well instead of doing it in two
separate steps"
* tag 'vfs-6.13.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
selftests: add file SLAB_TYPESAFE_BY_RCU recycling stressor
fs: port files to file_ref
fs: add file_ref
expand_files(): simplify calling conventions
make __set_open_fd() set cloexec state as well
fs: protect backing files with rcu
file.c: merge __{set,clear}_close_on_exec()
alloc_fdtable(): change calling conventions.
fs/file.c: add fast path in find_next_fd()
fs/file.c: conditionally clear full_fds
fs/file.c: remove sanity_check and add likely/unlikely in alloc_fd()
move close_range(2) into fs/file.c, fold __close_range() into it
close_files(): don't bother with xchg()
remove pointless includes of <linux/fdtable.h>
get rid of ...lookup...fdget_rcu() family
Stop using struct fd to return a real file from ovl_real_fdget(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.
Rename the helper to ovl_real_file(), return a borrowed reference of
the real file that is referenced from the overlayfs file or an error.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Stop using struct fd to return a real file from ovl_real_fdget_path(),
because we no longer return a temporary file object and the callers
always get a borrowed file reference.
Rename the helper to ovl_real_file_path(), return a borrowed reference
of the real file that is referenced from the overlayfs file or an error.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
When an overlayfs file is opened as lower and then the file is copied up,
every operation on the overlayfs open file will open a temporary backing
file to the upper dentry and close it at the end of the operation.
Store the upper real file along side the original (lower) real file in
ovl_file instead of opening a temporary upper file on every operation.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Instead of using ->private_data to point at realfile directly, so
that we can add more context per ovl open file.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
ovl_fsync() with !datasync opens a backing file from the top most dentry
in the stack, checks if this dentry is non-upper and skips the fsync.
In case of an overlay dentry stack with lower data and lower metadata
above it, but without an upper metadata above it, the backing file is
opened from the top most lower metadata dentry and never used.
Refactor the helper ovl_real_fdget_meta() into ovl_real_fdget_path() and
open code the checks for non-upper inode in ovl_fsync(), so in that case
we can avoid the unneeded backing file open.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Use override_creds_light() in ovl_override_creds() and
revert_creds_light() in ovl_revert_creds().
The _light() functions do not change the 'usage' of the credentials in
question, as they refer to the credentials associated with the
mounter, which have a longer lifetime.
In ovl_setup_cred_for_create(), do not need to modify the mounter
credentials (returned by override_creds_light()) 'usage' counter.
Add a warning to verify that we are indeed working with the mounter
credentials (stored in the superblock). Failure in this assumption
means that creds may leak.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
ovl_setup_cred_for_create() decrements one refcount of new creds and
ovl_revert_creds() in callers decrements the last refcount.
In preparation to revert_creds_light() back to caller creds, pass an
explicit reference of the creators creds to the callers and drop the
refcount explicitly in the callers after ovl_revert_creds().
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Commit 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface
function")' introduced the AT_GETATTR_NOSEC flag to ensure that the
call paths only call vfs_getattr_nosec if it is set instead of vfs_getattr.
Now, simplify the getattr interface functions of filesystems where the flag
AT_GETATTR_NOSEC is checked.
There is only a single caller of inode_operations getattr function and it
is located in fs/stat.c in vfs_getattr_nosec. The caller there is the only
one from which the AT_GETATTR_NOSEC flag is passed from.
Two filesystems are checking this flag in .getattr and the flag is always
passed to them unconditionally from only vfs_getattr_nosec:
- ecryptfs: Simplify by always calling vfs_getattr_nosec in
ecryptfs_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
- overlayfs: Simplify by always calling vfs_getattr_nosec in
ovl_getattr. From there the flag is passed to no other
function and this function is not called otherwise.
The query_flags in vfs_getattr_nosec will mask-out AT_GETATTR_NOSEC from
any caller using AT_STATX_SYNC_TYPE as mask so that the flag is not
important inside this function. Also, since no filesystem is checking the
flag anymore, remove the flag entirely now, including the BUG_ON check that
never triggered.
The net change of the changes here combined with the original commit is
that ecryptfs and overlayfs do not call vfs_getattr but only
vfs_getattr_nosec.
Fixes: 8a924db2d7b5 ("fs: Pass AT_GETATTR_NOSEC flag to getattr interface function")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Closes: https://lore.kernel.org/linux-fsdevel/20241101011724.GN1350452@ZenIV/T/#u
Cc: Tyler Hicks <code@tyhicks.com>
Cc: ecryptfs@vger.kernel.org
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: linux-unionfs@vger.kernel.org
Cc: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Introduce ovl_revert_creds() wrapper of revert_creds() to
match callers of ovl_override_creds().
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
- Pass iocb to ctx->end_write() instead of file + pos
- Get rid of ctx->user_file, which is redundant most of the time
- Instead pass iocb to backing_file_splice_read and
backing_file_splice_write
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
dentry_open in ovl_security_fileattr fails for any file
larger than 2GB if open method of the underlying filesystem
calls generic_file_open (e.g. fusefs).
The issue can be reproduce using the following script:
(passthrough_ll is an example app from libfuse).
$ D=/opt/test/mnt
$ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs}
$ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2
$ passthrough_ll -o source=${D}/source ${D}/base
$ mount -t overlay overlay \
-olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \
${D}/ovlfs
$ chmod 0777 ${D}/mnt/ovlfs/zero.bin
Running this script results in "Value too large for defined data type"
error message from chmod.
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Currently overlayfs only allows specifying layers through path names.
This is inconvenient for users such as systemd that want to assemble an
overlayfs mount purely based on file descriptors.
This enables user to specify both:
fsconfig(fd_overlay, FSCONFIG_SET_FD, "upperdir+", NULL, fd_upper);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "workdir+", NULL, fd_work);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower1);
fsconfig(fd_overlay, FSCONFIG_SET_FD, "lowerdir+", NULL, fd_lower2);
in addition to:
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "upperdir+", "/upper", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "workdir+", "/work", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower1", 0);
fsconfig(fd_overlay, FSCONFIG_SET_STRING, "lowerdir+", "/lower2", 0);
Link: https://lore.kernel.org/r/20241014-work-overlayfs-v3-2-32b3fed1286e@kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
some of those used to be needed, some had been cargo-culted for
no reason...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
ovl_open_realfile() is wrongly called twice after conversion to
new struct fd.
Fixes: 88a2f6468d01 ("struct fd: representation change")
Reported-by: syzbot+d9efec94dcbfa0de1c07@syzkaller.appspotmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZvDNmgAKCRBZ7Krx/gZQ
63zrAP9vI0rf55v27twiabe9LnI7aSx5ckoqXxFIFxyT3dOYpQD/bPmoApnWDD3d
592+iDgLsema/H/0/CqfqlaNtDNY8Q0=
=HUl5
-----END PGP SIGNATURE-----
Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull 'struct fd' updates from Al Viro:
"Just the 'struct fd' layout change, with conversion to accessor
helpers"
* tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add struct fd constructors, get rid of __to_fd()
struct fd: representation change
introduce fd_file(), convert all accessors to it.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE9zuTYTs0RXF+Ke33EVvVyTe/1WoFAmbodTUACgkQEVvVyTe/
1Wp5Jw/+LN99lGoKXKZ7zB8gta4hMm2O+wd3//IdfTi6u59uff0QvVNbczC6SRlJ
sW6czIgjKpR1xt6Orp0Rof0VZK09PMqq3Iw7Tt3xEkJtjKxppfMCQHeoi03CH9kF
gOHPW+Ad4jlCk+TYFEuHk5lQIgVtJ11JDtGjr5tNa03MbrLIa0pFpTdz65bg9uTn
JdPqA+fiJ/mVoXlFQEZkQPeT8CgMfTbKtQDGfcmKzrNo0ZyQdW30u5lP3MGROptn
X/QX5n77UrE9Du+UHPSaVDRUd8iA/ZcZe/7RAlKXWElHpKvAiqv84Sz0Mc1fczHx
nQHaUfQixGs/vakkPXxVqKn/Xvnj4GDwUqyM9mG8kN1x4QpdyFRqM66zZxtWsSi5
/PL0zwAY950a5fY4VsyRsSf6Zl/SlDdB+PVVajk6BKqMxWgvxyuVbFRUqfesHsfA
V5GHil7F03sPzO12SvAJg8HuresCiafLdAPJgGXuGoeprmeSP9eYx0EqcUzLpHd7
ORUF4qEylYFp8vZBJdBnZpSGRUv/mmjDDowsKeKyfgKyVR99lLKLCtqMJ0e0eGaE
mpGNg+eq//snJldvBUHrMYQvyVm18/bbOzT0ZdGPp2bMUxLjSbiab7D3S9GKRxRt
8GRg/bJfZw5tz2i1bxbe5IWfNij5pDilLeAQUsoO8pUwu/SrCy8=
=wnbj
-----END PGP SIGNATURE-----
Merge tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs
Pull overlayfs updates from Amir Goldstein:
- Increase robustness of overlayfs to crashes in the case of underlying
filesystems that to not guarantee metadata ordering to persistent
storage (problem was reported with ubifs).
- Deny mount inside container with features that require root
privileges to work properly, instead of failing operations later.
- Some clarifications to overlayfs documentation.
* tag 'ovl-update-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/overlayfs/vfs:
ovl: fail if trusted xattrs are needed but caller lacks permission
overlayfs.rst: update metacopy section in overlayfs documentation
ovl: fsync after metadata copy-up
ovl: don't set the superblock's errseq_t manually
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmbiGGAUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPU8BAA1+A15pmS34I9pq7c8TmRz3rNEs/a
zrW1aWJ0X/+axNS7sW3Pwtt1EKuaOhskKU8gNSieRhljC8rgXIVjZzLw6Atgcr5k
upulGbU9TXyVisYN+PWv9/84ito6/nYsKb7Mg3nUVsdodtIFVnsk1fxYLPHQEBig
Pl3i26U3VqH93Kz0W5vs/QR2uduPB8ZyscdTgcbrY9Vv1Y7IDZ2g9QsJVKLvbQKL
qcPK1JkHa+sBPJxDqS9A40zgbLbdPQgWQzsXX3dz822w1Ga7FIHSqxMBA6HwHZ+L
kV4P58wVfavhwt/cQSKMWI/yiGPMMd0B6yD+m8ojOvGfOfRCWxGMmEMqHNuZ3m7k
Bfll5ZgZTY8phUUhiNf3nxO3F3MM/5bHdhPOj3RReqbAbS6uWr4/fThPDYY/zIo6
NCY3HGxx3Ae64uQ01gC2p/czC50jDsMwlbXiZbrgdBhjBm/CVk5ozb80mLVcGrLB
+6XMzzSbC8IaNAH2fDmUJ2ABdwyNPgsSOTGZVzIanpxu1SU2/yk3SMxkp8fv5s36
wLeODUVcLgsjVV538Mkm6PGTE4TlXaH9yi6apMyJAGp0vPYx5c3Xxk2y5A5cur5p
hcrbDiX2QgeqFbwsz36incmPmbef2NU2c8feR8XLtPJuwNIeRcMSje0pnkaFlRmb
TAUJ1sDQAzZ8Fy0=
=HIAO
-----END PGP SIGNATURE-----
Merge tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Move the LSM framework to static calls
This transitions the vast majority of the LSM callbacks into static
calls. Those callbacks which haven't been converted were left as-is
due to the general ugliness of the changes required to support the
static call conversion; we can revisit those callbacks at a future
date.
- Add the Integrity Policy Enforcement (IPE) LSM
This adds a new LSM, Integrity Policy Enforcement (IPE). There is
plenty of documentation about IPE in this patches, so I'll refrain
from going into too much detail here, but the basic motivation behind
IPE is to provide a mechanism such that administrators can restrict
execution to only those binaries which come from integrity protected
storage, e.g. a dm-verity protected filesystem. You will notice that
IPE requires additional LSM hooks in the initramfs, dm-verity, and
fs-verity code, with the associated patches carrying ACK/review tags
from the associated maintainers. We couldn't find an obvious
maintainer for the initramfs code, but the IPE patchset has been
widely posted over several years.
Both Deven Bowers and Fan Wu have contributed to IPE's development
over the past several years, with Fan Wu agreeing to serve as the IPE
maintainer moving forward. Once IPE is accepted into your tree, I'll
start working with Fan to ensure he has the necessary accounts, keys,
etc. so that he can start submitting IPE pull requests to you
directly during the next merge window.
- Move the lifecycle management of the LSM blobs to the LSM framework
Management of the LSM blobs (the LSM state buffers attached to
various kernel structs, typically via a void pointer named "security"
or similar) has been mixed, some blobs were allocated/managed by
individual LSMs, others were managed by the LSM framework itself.
Starting with this pull we move management of all the LSM blobs,
minus the XFRM blob, into the framework itself, improving consistency
across LSMs, and reducing the amount of duplicated code across LSMs.
Due to some additional work required to migrate the XFRM blob, it has
been left as a todo item for a later date; from a practical
standpoint this omission should have little impact as only SELinux
provides a XFRM LSM implementation.
- Fix problems with the LSM's handling of F_SETOWN
The LSM hook for the fcntl(F_SETOWN) operation had a couple of
problems: it was racy with itself, and it was disconnected from the
associated DAC related logic in such a way that the LSM state could
be updated in cases where the DAC state would not. We fix both of
these problems by moving the security_file_set_fowner() hook into the
same section of code where the DAC attributes are updated. Not only
does this resolve the DAC/LSM synchronization issue, but as that code
block is protected by a lock, it also resolve the race condition.
- Fix potential problems with the security_inode_free() LSM hook
Due to use of RCU to protect inodes and the placement of the LSM hook
associated with freeing the inode, there is a bit of a challenge when
it comes to managing any LSM state associated with an inode. The VFS
folks are not open to relocating the LSM hook so we have to get
creative when it comes to releasing an inode's LSM state.
Traditionally we have used a single LSM callback within the hook that
is triggered when the inode is "marked for death", but not actually
released due to RCU.
Unfortunately, this causes problems for LSMs which want to take an
action when the inode's associated LSM state is actually released; so
we add an additional LSM callback, inode_free_security_rcu(), that is
called when the inode's LSM state is released in the RCU free
callback.
- Refactor two LSM hooks to better fit the LSM return value patterns
The vast majority of the LSM hooks follow the "return 0 on success,
negative values on failure" pattern, however, there are a small
handful that have unique return value behaviors which has caused
confusion in the past and makes it difficult for the BPF verifier to
properly vet BPF LSM programs. This includes patches to
convert two of these"special" LSM hooks to the common 0/-ERRNO pattern.
- Various cleanups and improvements
A handful of patches to remove redundant code, better leverage the
IS_ERR_OR_NULL() helper, add missing "static" markings, and do some
minor style fixups.
* tag 'lsm-pr-20240911' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (40 commits)
security: Update file_set_fowner documentation
fs: Fix file_set_fowner LSM hook inconsistencies
lsm: Use IS_ERR_OR_NULL() helper function
lsm: remove LSM_COUNT and LSM_CONFIG_COUNT
ipe: Remove duplicated include in ipe.c
lsm: replace indirect LSM hook calls with static calls
lsm: count the LSMs enabled at compile time
kernel: Add helper macros for loop unrolling
init/main.c: Initialize early LSMs after arch code, static keys and calls.
MAINTAINERS: add IPE entry with Fan Wu as maintainer
documentation: add IPE documentation
ipe: kunit test for parser
scripts: add boot policy generation program
ipe: enable support for fs-verity as a trust provider
fsverity: expose verified fsverity built-in signatures to LSMs
lsm: add security_inode_setintegrity() hook
ipe: add support for dm-verity as a trust provider
dm-verity: expose root hash digest and signature data to LSMs
block,lsm: add LSM blob and new LSM hooks for block devices
ipe: add permissive toggle
...
Some overlayfs features require permission to read/write trusted.*
xattrs. These include redirect_dir, verity, metacopy, and data-only
layers. This patch adds additional validations at mount time to stop
overlays from mounting in certain cases where the resulting mount would
not function according to the user's expectations because they lack
permission to access trusted.* xattrs (for example, not global root.)
Similar checks in ovl_make_workdir() that disable features instead of
failing are still relevant and used in cases where the resulting mount
can still work "reasonably well." Generally, if the feature was enabled
through kernel config or module option, any mount that worked before
will still work the same; this applies to redirect_dir and metacopy. The
user must explicitly request these features in order to generate a mount
failure. Verity and data-only layers on the other hand must be explictly
requested and have no "reasonable" disabled or degraded alternative, so
mounts attempting either always fail.
"lower data-only dirs require metacopy support" moved down in case
userxattr is set, which disables metacopy.
Cc: stable@vger.kernel.org # v6.6+
Signed-off-by: Mike Baynton <mike@mbaynton.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
For upper filesystems which do not use strict ordering of persisting
metadata changes (e.g. ubifs), when overlayfs file is modified for
the first time, copy up will create a copy of the lower file and
its parent directories in the upper layer. Permission lost of the
new upper parent directory was observed during power-cut stress test.
Fix by moving the fsync call to after metadata copy to make sure that the
metadata copied up directory and files persists to disk before renaming
from tmp to final destination.
With metacopy enabled, this change will hurt performance of workloads
such as chown -R, so we keep the legacy behavior of fsync only on copyup
of data.
Link: https://lore.kernel.org/linux-unionfs/CAOQ4uxj-pOvmw1-uXR3qVdqtLjSkwcR9nVKcNU_vC10Zyf2miQ@mail.gmail.com/
Reported-and-tested-by: Fei Lv <feilv@asrmicro.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Since commit 5679897eb104 ("vfs: make sync_filesystem return errors from
->sync_fs"), the return value from sync_fs callback can be seen in
sync_filesystem(). Thus the errseq_set opreation can be removed here.
Depends-on: commit 5679897eb104 ("vfs: make sync_filesystem return errors from ->sync_fs")
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Add '\n' for pr_err in function ovl_parse_param_lowerdir(), which
ensures that error message is displayed at once.
Fixes: b36a5780cb44 ("ovl: modify layer parameter parsing")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20240705011510.794025-4-chengzhihao1@huawei.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
The max count of lowerdir is OVL_MAX_STACK[500], which is broken by
commit 37f32f526438("ovl: fix memory leak in ovl_parse_param()") for
parameter Opt_lowerdir. Since commit 819829f0319a("ovl: refactor layer
parsing helpers") and commit 24e16e385f22("ovl: add support for
appending lowerdirs one by one") added check ovl_mount_dir_check() in
function ovl_parse_param_lowerdir(), the 'ctx->nr' should be smaller
than OVL_MAX_STACK, after commit 37f32f526438("ovl: fix memory leak in
ovl_parse_param()") is applied, the 'ctx->nr' is updated before the
check ovl_mount_dir_check(), which leads the max count of lowerdir
to become 499 for parameter Opt_lowerdir.
Fix it by replacing lower layers parsing code with the existing helper
function ovl_parse_layer().
Fixes: 37f32f526438 ("ovl: fix memory leak in ovl_parse_param()")
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20240705011510.794025-3-chengzhihao1@huawei.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>