timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Highlights include:
Bugfixes:
- 3 Fixes for looping in the NFSv4 state manager delegation code.
- Fix for the NFSv4 state XDR code from Neil Brown.
- Fix a leaked reference in nfs_lock_and_join_requests().
- Fix a use-after-free in the delegation return code.
Features:
- Implemenation of the NFSv4.2 copy offload OFFLOAD_STATUS operation to
allow monitoring of an in-progress copy.
- Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
getdents() call.
- SUNRPC now allows some basic management of an existing RPC client's
connections using sysfs.
- Improvements to the automated teardown of a NFS client when the
container it was initiated from gets killed.
- Improvements to prevent tasks from getting stuck in a killable wait
state after calling exit_signals().
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmftuE0ACgkQZwvnipYK
APIAAhAAqFdJnh88UUT0/R184Qzpd021lR9XhxkwNA3TzhOIzmpuTgBzNE1iMG1j
EHveYqCpTU2orA1aisAyw5c8meJlsCQREPDvUOQ2i4BTCCmsBHOMxg7KDWwwRdNh
SVDCezFWrHYz4An81jpgBe3/x6RJaEyAhKC45ZzQruiBtSMeoOX1TAV/DTWwEo0j
JcLdAUSGVBsfyrj3qT0oJXoj+96o7rbB80loCdNKy8m8PBWHWp0oILwuU00XdXgu
7jYyjZfxW1013It+vfVFsjTYRVfJ92pq3wiz/U9HXYDe3Arc4oPRw509/Jo3xEWW
tdUljc/HepD3459ahiubTCLY39JxILl8/GapWe2Fn0J/JJuOGgZX9lqIMKDn4QCA
6TBOqWK7OEwImj4M7cfPptJQWd+hp91T4AR13xWJeQgp19AR8yOqEW0YX6hVlaBg
UrBwdR+l6ys5lJJBReUW+JMDCYZmbH9RjuwcqzXn71JmlACHNFi6odwLnQ1mInvF
P5pEf7aXaZkF6kEz2kmZ1eUgdkERAaIGCNFQTui6intlCSlQodNurrEU7Vx146os
OvowJYM0HvnVBDOnERrJD04HADKZeDS8jt59ev0uXbP/NFxEJnPRRQgIdiZbfISV
beQrc2fpUgwdjYAURbW1qWO7XNTJzK9LHJzn02SytfCazX0IQO0=
=zPX4
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- Three fixes for looping in the NFSv4 state manager delegation code
- Fix for the NFSv4 state XDR code (Neil Brown)
- Fix a leaked reference in nfs_lock_and_join_requests()
- Fix a use-after-free in the delegation return code
Features:
- Implement the NFSv4.2 copy offload OFFLOAD_STATUS operation to
allow monitoring of an in-progress copy
- Add a mount option to force NFSv3/NFSv4 to use READDIRPLUS in a
getdents() call
- SUNRPC now allows some basic management of an existing RPC client's
connections using sysfs
- Improvements to the automated teardown of a NFS client when the
container it was initiated from gets killed
- Improvements to prevent tasks from getting stuck in a killable wait
state after calling exit_signals()"
* tag 'nfs-for-6.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (29 commits)
nfs: Add missing release on error in nfs_lock_and_join_requests()
NFSv4: Check for delegation validity in nfs_start_delegation_return_locked()
NFS: Don't allow waiting for exiting tasks
SUNRPC: Don't allow waiting for exiting tasks
NFSv4: Treat ENETUNREACH errors as fatal for state recovery
NFSv4: clp->cl_cons_state < 0 signifies an invalid nfs_client
NFSv4: Further cleanups to shutdown loops
NFS: Shut down the nfs_client only after all the superblocks
SUNRPC: rpc_clnt_set_transport() must not change the autobind setting
SUNRPC: rpcbind should never reset the port to the value '0'
pNFS/flexfiles: Report ENETDOWN as a connection error
pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
NFS: Treat ENETUNREACH errors as fatal in containers
NFS: Add a mount option to make ENETUNREACH errors fatal
sunrpc: Add a sysfs file for one-step xprt deletion
sunrpc: Add a sysfs file for adding a new xprt
sunrpc: Add a sysfs files for rpc_clnt information
sunrpc: Add a sysfs attr for xprtsec
NFS: Add implid to sysfs
NFS: Extend rdirplus mount option with "force|none"
...
Neil Brown contributed more scalability improvements to NFSD's
open file cache, and Jeff Layton contributed a menagerie of
repairs to NFSD's NFSv4 callback / backchannel implementation.
Mike Snitzer contributed a change to NFS re-export support that
disables support for file locking on a re-exported NFSv4 mount.
This is because NFSv4 state recovery is currently difficult if
not impossible for re-exported NFS mounts. The change aims to
prevent data integrity exposures after the re-export server
crashes.
Work continues on the evolving NFSD netlink administrative API.
Many thanks to the contributors, reviewers, testers, and bug
reporters who participated during the v6.15 development cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmfmpMIACgkQM2qzM29m
f5f6DA/+P0YqoRg3Zk/4oWwXZWbfEOMhWFltT+D1PE2QjUfOZpiwUSFQfsfYgXO6
OFu0iDQ4g8BxBeP6Umv61qy7Cv6n4fVzIHqzymXQvymh9JzoQiXlE9/fA8nAHuiH
u7kkNPRi7faBz1sMg/WpN9CHctg7STPOhhG/JrZcSFZnh87mU1i4i4bZBNz8tVnK
ZWf483OUuSmJY2/bUTkwvr4GbceTKBlLWFFjiRhfAKvJBWvu4myfC0DI5QzxmsgI
MJ62do7AFJP1ww2Ih9LLi2kFIt/yyInSVAgyts1CPhlJ4BfPnTSOw/i2+CuF3D/M
bZYEAOjH3AqjBZmq58sIQezpD5f9/TOrTSwYwS31zl/THYE413WiW80/MDoWqo0y
9cSNkD3nJlPVLLCfF58vXLoe7wpLoN/ZbTdxoozzUWEFR5A4Jz3XP8F/Cws0cjem
uWWAQMItiQpg1+RYJYfu4dg5+iN6dbgYbvzlr7buISwFNXi3Zo99MkJ4wHj9TJbL
Tpjth1rWGPwwSOMT6ojKiYMq1oUzx5PuAm9Saq9oIzQAbBySmxHF/LSDz3wEuBoO
MK1jzKroEmMk3fJOOAajSDLOdAbL3vfj6H/xi2IHvKnaz9yHCZNu2YGV05BBMprd
hWePf69AO5Ky5Q9KuGClEtwvJ9ZR5pb4DO2dqaYu8ximu3O4vPo=
=e2E2
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Neil Brown contributed more scalability improvements to NFSD's open
file cache, and Jeff Layton contributed a menagerie of repairs to
NFSD's NFSv4 callback / backchannel implementation.
Mike Snitzer contributed a change to NFS re-export support that
disables support for file locking on a re-exported NFSv4 mount. This
is because NFSv4 state recovery is currently difficult if not
impossible for re-exported NFS mounts. The change aims to prevent data
integrity exposures after the re-export server crashes.
Work continues on the evolving NFSD netlink administrative API.
Many thanks to the contributors, reviewers, testers, and bug reporters
who participated during the v6.15 development cycle"
* tag 'nfsd-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (45 commits)
NFSD: Add a Kconfig setting to enable delegated timestamps
sysctl: Fixes nsm_local_state bounds
nfsd: use a long for the count in nfsd4_state_shrinker_count()
nfsd: remove obsolete comment from nfs4_alloc_stid
nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
nfsd: reorganize struct nfs4_delegation for better packing
nfsd: handle errors from rpc_call_async()
nfsd: move cb_need_restart flag into cb_flags
nfsd: replace CB_GETATTR_BUSY with NFSD4_CALLBACK_RUNNING
nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
nfsd: prevent callback tasks running concurrently
nfsd: disallow file locking and delegations for NFSv4 reexport
nfsd: filecache: drop the list_lru lock during lock gc scans
nfsd: filecache: don't repeatedly add/remove files on the lru list
nfsd: filecache: introduce NFSD_FILE_RECENT
nfsd: filecache: use list_lru_walk_node() in nfsd_file_gc()
nfsd: filecache: use nfsd_file_dispose_list() in nfsd_file_close_inode_sync()
NFSD: Re-organize nfsd_file_gc_worker()
nfsd: filecache: remove race handling.
fs: nfs: acl: Avoid -Wflex-array-member-not-at-end warning
...
Once a task calls exit_signals() it can no longer be signalled. So do
not allow it to do killable waits.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
* Move vm_table members out of kernel/sysctl.c
All vm_table array members have moved to their respective subsystems leading
to the removal of vm_table from kernel/sysctl.c. This increases modularity by
placing the ctl_tables closer to where they are actually used and at the same
time reducing the chances of merge conflicts in kernel/sysctl.c.
* ctl_table range fixes
Replace the proc_handler function that checks variable ranges in
coredump_sysctls and vdso_table with the one that actually uses the extra{1,2}
pointers as min/max values. This tightens the range of the values that users
can pass into the kernel effectively preventing {under,over}flows.
* Misc fixes
Correct grammar errors and typos in test messages. Update sysctl files in
MAINTAINERS. Constified and removed array size in declaration for
alignment_tbl
* Testing
- These have all been in linux-next for at least 1 month
- They have gone through 0-day
- Ran all these through sysctl selftests in x86_64
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmfhV8EACgkQupfNUreW
QU/udAv/VCXGkndQsJ5biXpXYFnokX0gIEaYzzHiqrFycZqr8ys0/wWzc+ar1LjF
Jvanl2uKB0mUviLKt7Gk0+Hri+PJlYIrbx+5K5eo2wsKUUxFykqLLm59y/orPODl
gyPQjKNpHJb7COsnEc3Lrq/fvol4NPHlcBPXG8NwehccTeBHZ1ninfo+pSnxh3o8
kI3GSLLxD4K9AgBl5QuVWH4gU7o//u7lUkKzy03NW+2jmuRv3dRcYF7IdgMINNee
AeXnygdSBxLzECBvmkfNdyg+AmL8hdsmzbsIh7UuJDvxLlQOInVLZa+sXBotCOIc
TImCrr1Ws1OuGrD0kpH+21tJvc8pNFWt61QlulObQdrLndWHdZEGyGOusLpXTwbn
jIWZmMvzk1foSwdgzwPFzUqPEpW3FrBVDo4Z4kenBDrCp56QTX7hGRvkNYJNKvot
Ue+i8BeHR/Gm/p+UMqgsSTOaNJXTqZhFqwJQVzxU/9LN/vkS0On6fbjgBd5X6Pn+
a5dlc9gy
=0bcX
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move vm_table members out of kernel/sysctl.c
All vm_table array members have moved to their respective subsystems
leading to the removal of vm_table from kernel/sysctl.c. This
increases modularity by placing the ctl_tables closer to where they
are actually used and at the same time reducing the chances of merge
conflicts in kernel/sysctl.c.
- ctl_table range fixes
Replace the proc_handler function that checks variable ranges in
coredump_sysctls and vdso_table with the one that actually uses the
extra{1,2} pointers as min/max values. This tightens the range of the
values that users can pass into the kernel effectively preventing
{under,over}flows.
- Misc fixes
Correct grammar errors and typos in test messages. Update sysctl
files in MAINTAINERS. Constified and removed array size in
declaration for alignment_tbl
* tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits)
selftests/sysctl: fix wording of help messages
selftests: fix spelling/grammar errors in sysctl/sysctl.sh
MAINTAINERS: Update sysctl file list in MAINTAINERS
sysctl: Fix underflow value setting risk in vm_table
coredump: Fixes core_pipe_limit sysctl proc_handler
sysctl: remove unneeded include
sysctl: remove the vm_table
sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c
x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c
fs: dcache: move the sysctl to fs/dcache.c
sunrpc: simplify rpcauth_cache_shrink_count()
fs: drop_caches: move sysctl to fs/drop_caches.c
fs: fs-writeback: move sysctl to fs/fs-writeback.c
mm: nommu: move sysctl to mm/nommu.c
security: min_addr: move sysctl to security/min_addr.c
mm: mmap: move sysctl to mm/mmap.c
mm: util: move sysctls to mm/util.c
mm: vmscan: move vmscan sysctls to mm/vmscan.c
mm: swap: move sysctl to mm/swap.c
mm: filemap: move sysctl to mm/filemap.c
...
The autobind setting was supposed to be determined in rpc_create(),
since commit c2866763b402 ("SUNRPC: use sockaddr + size when creating
remote transport endpoints").
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we already had a valid port number for the RPC service, then we
should not allow the rpcbind client to set it to the invalid value '0'.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic
NFS client. If the flag is set, the client will receive ENETDOWN and
ENETUNREACH errors from the RPC layer, and is expected to treat them as
being fatal.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Previously, the admin would need to set the xprt state to "offline"
before attempting to remove. This patch adds a new sysfs attr that does
both these steps in a single call.
Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Link: https://lore.kernel.org/r/20250207204225.594002-6-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Writing to this file will clone the 'main' xprt of an xprt_switch and
add it to be used as an additional connection.
--
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
v3: Replace call to xprt_iter_get_xprt() with xprt_iter_get_next()
Link: https://lore.kernel.org/r/20250207204225.594002-5-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
These files display useful information about the RPC client, such as the
rpc version number, program name, and maximum number of connections
allowed.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Link: https://lore.kernel.org/r/20250207204225.594002-4-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
This allows the admin to check the TLS configuration for each xprt.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Link: https://lore.kernel.org/r/20250207204225.594002-3-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
On an rdma-capable machine, a start/stop/start and then on a stop of
a knfsd server would lead kref underflow warning because svc_rdma_free
would indiscriminately unregister the rdma device but a listening
transport never calls the rdma_rn_register() thus leading to kref
going down to 0 on the 1st stop of the server and on the 2nd stop
it leads to a problem.
Suggested-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: c4de97f7c454 ("svcrdma: Handle device removal outside of the CM event handler")
Signed-off-by: Olga Kornievskaia <okorniev@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit ec596aaf9b48 ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the
make_checksum() function.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The last use of krb5_decrypt() was removed in 2023 by
commit 2a9893f796a3 ("SUNRPC:
Remove net/sunrpc/auth_gss/gss_krb5_seqnum.c")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We will write /proc/net/rpc/xxx/flush if we want to clean cache_detail.
This updates nextcheck to the current time and calls cache_flush -->
cache_clean to clean cache_detail.
If we write this interface again within one second, it will only increase
flush_time and nextcheck without actually cleaning cache_detail.
Therefore, if we keep writing this interface repeatedly within one second,
flush_time and nextcheck will keep increasing, even far exceeding the
current time, making it impossible to clear cache_detail through the flush
interface or cache_cleaner.
If someone frequently calls the flush interface, we should immediately
clean the corresponding cache_detail instead of continuously accumulating
nextcheck.
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
this week, so next week's PR is probably going to be bigger. A healthy
dose of fixes for bugs introduced in the current release nonetheless.
Current release - regressions:
- Bluetooth: always allow SCO packets for user channel
- af_unix: fix memory leak in unix_dgram_sendmsg()
- rxrpc:
- remove redundant peer->mtu_lock causing lockdep splats
- fix spinlock flavor issues with the peer record hash
- eth: iavf: fix circular lock dependency with netdev_lock
- net: use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net()
RDMA driver register notifier after the device
Current release - new code bugs:
- ethtool: fix ioctl confusing drivers about desired HDS user config
- eth: ixgbe: fix media cage present detection for E610 device
Previous releases - regressions:
- loopback: avoid sending IP packets without an Ethernet header
- mptcp: reset connection when MPTCP opts are dropped after join
Previous releases - always broken:
- net: better track kernel sockets lifetime
- ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels
- phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT
- eth: enetc: number of error handling fixes
- dsa: rtl8366rb: reshuffle the code to fix config / build issue
with LED support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfAj8MACgkQMUZtbf5S
IrtoTRAAj0XNWXGWZdOuVub0xhtjsPLoZktux4AzsELqaynextkJW6w9pG5qVrWu
UZt3a3bC7u6+JoTgb+GQVhyjuuVjv6NOSuLK3FS+NePW8ijhLP5oTg6eD0MQS60Z
wa9yQx3yL1Kvb6b80Go/3WgRX9V6Rx8zlROAl/gOlZ9NKB0rSVqnueZGPjGZJf1a
ayyXsmzRykshbr5Ic0e+b74hFP3DGxVgHjIob1C4kk/Q+WOfQKnm3C3fnZ/R2QcS
7B7kSk9WokvNwk3hJc7ZtFxJbrQKSSuRI8nCD93hBjTn76yJjlPicJ9b6HJoGhE/
Pwt7fBnDCCA00x6ejD3OrurR+/80PbPtyvNtgMMTD49wSwxQpQ6YpTMInnodCzAV
NvIhkkXBprI0kiTT4dDpNoeFMKD3i07etKpvMfEoDzZR7vgUsj6aClSmuxILeU9a
crFC4Vp5SgyU1/lUPDiG4dfbd8s4hfM4bZ+d0zAtth3/rQA7/EA6dLqbRXXWX7h5
Gl6egKWPsSl+WUgFjpBjYfhqrQsc06hxaCh0SQYH6SnS3i+PlMU2uRJYZMLQ66rX
QsSQOyqCEHwd1qnrLedg9rCniv+DzOJf+qh+H0eY9WhuOay+8T52OHLxpRjSHxBo
SCP+qQxSX0qhH5DtUiOV50Fwg19UhJJyWd0COfv5SIGm/I1dUOY=
=+Ci7
-----END PGP SIGNATURE-----
Merge tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth.
We didn't get netfilter or wireless PRs this week, so next week's PR
is probably going to be bigger. A healthy dose of fixes for bugs
introduced in the current release nonetheless.
Current release - regressions:
- Bluetooth: always allow SCO packets for user channel
- af_unix: fix memory leak in unix_dgram_sendmsg()
- rxrpc:
- remove redundant peer->mtu_lock causing lockdep splats
- fix spinlock flavor issues with the peer record hash
- eth: iavf: fix circular lock dependency with netdev_lock
- net: use rtnl_net_dev_lock() in
register_netdevice_notifier_dev_net() RDMA driver register notifier
after the device
Current release - new code bugs:
- ethtool: fix ioctl confusing drivers about desired HDS user config
- eth: ixgbe: fix media cage present detection for E610 device
Previous releases - regressions:
- loopback: avoid sending IP packets without an Ethernet header
- mptcp: reset connection when MPTCP opts are dropped after join
Previous releases - always broken:
- net: better track kernel sockets lifetime
- ipv6: fix dst ref loop on input in seg6 and rpl lw tunnels
- phy: qca807x: use right value from DTS for DAC_DSP_BIAS_CURRENT
- eth: enetc: number of error handling fixes
- dsa: rtl8366rb: reshuffle the code to fix config / build issue with
LED support"
* tag 'net-6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits)
net: ti: icss-iep: Reject perout generation request
idpf: fix checksums set in idpf_rx_rsc()
selftests: drv-net: Check if combined-count exists
net: ipv6: fix dst ref loop on input in rpl lwt
net: ipv6: fix dst ref loop on input in seg6 lwt
usbnet: gl620a: fix endpoint checking in genelink_bind()
net/mlx5: IRQ, Fix null string in debug print
net/mlx5: Restore missing trace event when enabling vport QoS
net/mlx5: Fix vport QoS cleanup on error
net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
af_unix: Fix memory leak in unix_dgram_sendmsg()
net: Handle napi_schedule() calls from non-interrupt
net: Clear old fragment checksum value in napi_reuse_skb
gve: unlink old napi when stopping a queue using queue API
net: Use rtnl_net_dev_lock() in register_netdevice_notifier_dev_net().
tcp: Defer ts_recent changes until req is owned
net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
net: enetc: remove the mm_lock from the ENETC v4 driver
net: enetc: add missing enetc4_link_deinit()
net: enetc: update UDP checksum when updating originTimestamp field
...
There is a warning about unused variables when building with W=1 and no procfs:
net/sunrpc/cache.c:1660:30: error: 'cache_flush_proc_ops' defined but not used [-Werror=unused-const-variable=]
1660 | static const struct proc_ops cache_flush_proc_ops = {
| ^~~~~~~~~~~~~~~~~~~~
net/sunrpc/cache.c:1622:30: error: 'content_proc_ops' defined but not used [-Werror=unused-const-variable=]
1622 | static const struct proc_ops content_proc_ops = {
| ^~~~~~~~~~~~~~~~
net/sunrpc/cache.c:1598:30: error: 'cache_channel_proc_ops' defined but not used [-Werror=unused-const-variable=]
1598 | static const struct proc_ops cache_channel_proc_ops = {
| ^~~~~~~~~~~~~~~~~~~~~~
These are used inside of an #ifdef, so replacing that with an
IS_ENABLED() check lets the compiler see how they are used while
still dropping them during dead code elimination.
Fixes: dbf847ecb631 ("knfsd: allow cache_register to return error on failure")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If the TLS handshake attempt returns -ETIMEDOUT, we currently translate
that error into -EACCES. This becomes problematic for cases where the RPC
layer is attempting to re-connect in paths that don't resonably handle
-EACCES, for example: writeback. The RPC layer can handle -ETIMEDOUT quite
well, however - so if the handshake returns this error let's just pass it
along.
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
If rpc_signal_task() is called while a task is in an rpc_call_done()
callback function, and the latter calls rpc_restart_call(), the task can
end up looping due to the RPC_TASK_SIGNALLED flag being set without the
tk_rpc_status being set.
Removing the redundant mechanism for signalling the task fixes the
looping behaviour.
Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
It is inappropriate to use sysctl_vfs_cache_pressure here.
The sysctl is documented as: This percentage value controls
the tendency of the kernel to reclaim the memory which is used
for caching of directory and inode objects.
So, simplify result of rpcauth_cache_shrink_count() to
"return number_cred_unused;".
Signed-off-by: Kaixiong Yu <yukaixiong@huawei.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Joel Granados <joel.granados@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZ5yJdgAKCRBZ7Krx/gZQ
69W4AQDwgxceiQ6icx3rFhCWQigne4jdMO84kd8tNaa+xHGe1AD/WnkeChc5DqjQ
wZWZxAAzml9SS01IcSiHWaF5fgrjlA0=
=rXOq
-----END PGP SIGNATURE-----
Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro:
"Two unrelated patches - one is a removal of long-obsolete include in
overlayfs (it used to need fs/internal.h, but the extern it wanted has
been moved back to include/linux/namei.h) and another introduces
convenience helper constructing struct qstr by a NUL-terminated
string"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
add a string-to-qstr constructor
fs/overlayfs/namei.c: get rid of include ../internal.h
New Features:
* Enable using direct IO with localio
* Added localio related tracepoints
Bugfixes:
* Sunrpc fixes for working with a very large cl_tasks list
* Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
* Fixes for handling reconnections with localio
* Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
* Fix COPY_NOTIFY xdr_buf size calculations
* pNFS/Flexfiles fix for retrying requesting a layout segment for reads
* Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is expired
Cleanups:
* Various other nfs & nfsd localio cleanups
* Prepratory patches for async copy improvements that are under development
* Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other xprts
* Add netns inum and srcaddr to debugfs rpc_xprt info
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmeZUzUACgkQ18tUv7Cl
QOvArw/9HltIlcJHbi7tApGJ4dFpuJCa/fHbA1n5bHvKrCR5aElmFZoiFDdsM1JX
kFAlMED9n1dW9VmzLJcepxmrLo/t7KXueiZNharHynWTxcszSl6jS+tOFBW6OflG
Rrrjq/SrsWI2Fu8X4e/7ZV7pqRLGGn5SSMwgbuMbcyzBvVgN8mZM/BneIp1J59AI
5NOsif5KWetVhQc43zlRlbVWR5cvNGcUK4i58LIaPFzPMt0xq/XJI+QWffj6kv4g
cHabCNYTdQYMkhiPQC+LLYkw6sMbw2NatajTTYNMWfR/I+7wz9k5ej6CHKPIFCSr
xjmscypySTLfMFQjrDFZkpX2CwSp/VIbV6go36DJwAlcCRzqz+I7cajlrRK4zvyr
DyrcaZHvClEczP9QqdPj2wqRXbmIOsDMksOu4ACTUImd4o3f2v1K6DcwRj9oUIhV
AGR31OEMt2A+RaVvVZYR4PpixJ01vH9LcmsaOu5KkHX8X4q2osQ7eMy+FV4kV09S
pMnxDMAyszJU8IuzUG1/HfkonNlDMivIbqpgG4ZaVW08Nq4mCxJll1vTAa9FTLz2
z+9eocqKwf724q1RAgOB7vj4AwOwL4Ul6d18UBtyUitZz3ndLRZ8Yy6r/AhrpCsC
3co0Y3znZbKeRjmReNl0GLG4qiKE+E7Xh23Lf3IqXg8GE2Mu+Ls=
=srvH
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable using direct IO with localio
- Added localio related tracepoints
Bugfixes:
- Sunrpc fixes for working with a very large cl_tasks list
- Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
- Fixes for handling reconnections with localio
- Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
- Fix COPY_NOTIFY xdr_buf size calculations
- pNFS/Flexfiles fix for retrying requesting a layout segment for
reads
- Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
expired
Cleanups:
- Various other nfs & nfsd localio cleanups
- Prepratory patches for async copy improvements that are under
development
- Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
xprts
- Add netns inum and srcaddr to debugfs rpc_xprt info"
* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
pnfs/flexfiles: retry getting layout segment for reads
NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
NFS: Rename struct nfs4_offloadcancel_data
NFS: Fix typo in OFFLOAD_CANCEL comment
NFS: CB_OFFLOAD can return NFS4ERR_DELAY
nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
nfs: fix incorrect error handling in LOCALIO
nfs: probe for LOCALIO when v3 client reconnects to server
nfs: probe for LOCALIO when v4 client reconnects to server
nfs/localio: remove redundant code and simplify LOCALIO enablement
nfs_common: add nfs_localio trace events
nfs_common: track all open nfsd_files per LOCALIO nfs_client
nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
nfsd: update percpu_ref to manage references on nfsd_net
...
Jeff Layton contributed an implementation of NFSv4.2+ attribute
delegation, as described here:
https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
This interoperates with similar functionality introduced into the
Linux NFS client in v6.11. An attribute delegation permits an NFS
client to manage a file's mtime, rather than flushing dirty data to
the NFS server so that the file's mtime reflects the last write,
which is considerably slower.
Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
This facility enables NFSD to increase or decrease the number of
slots per NFS session depending on server memory availability. More
session slots means greater parallelism.
Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
encoding screws up when crossing a page boundary in the encoding
buffer. This is a zero-day bug, but hitting it is rare and depends
on the NFS client implementation. The Linux NFS client does not
happen to trigger this issue.
A variety of bug fixes and other incremental improvements fill out
the list of commits in this release. Great thanks to all
contributors, reviewers, testers, and bug reporters who participated
during this development cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmeVPBUACgkQM2qzM29m
f5dClxAAmW4O2bOJaR8neJ54fzeFYXtFYEXRF/XbIh2KdnCy2LoywT9ux8ndzE0k
1tsjtv0g4y84IcYfrxXPRTuhh2GO2pHw5L1kGVRezJg2ODSFbpdzGtcVK7SIrs2S
eijTViqdi9xRgfj1jPqRrvxC99RL3/fmCztqorAPLDFsqYAd/6ZRxZ7+IcZ2h4+J
cJ0Z6Wx6eh10roacZPXweH13XJ7xWO/ublYZvQFQpK2BAKyO98aXGgLDraNt8k60
X3DZLSKkGB/eBlNeAlTtcrXec/ot6XGJPKr3b/7zhwfMi8B13RGdSmCR8SxMdRQM
vCQO4G2YadU4YFS6FFIw9Wc1XDYUuYh2YgcveafjzjXbgi7NnY7rYOxtnTgBi0Xv
XGjtGqpvD676gPm+8b3DcwqmWI3c/WUdtQIZ1uYRCFZdFqkVP91bySPT2aHtx2GO
4j3uEyTlypC00kyvu1oL3+tVUG/EFlJCpYvIbOwqDG2m7KWPStzpfJTD5Q9cdlEl
fdgs6l82EVqe1YyjLTqajDuOcRrLYK2hlR/5STc03hQV+GpKSo5UypRejzE9WtRV
zT/tyelqhj4+0EZJz4ay/8q9s2Jp+5JGVxoVvjujSuH7+Ulb3T+IDkldtMamO8Fm
www2y0/fLfU2xIapMJdCoJ+ZKgel2i8RZMPIc0cIfO5ITXm+dOs=
=hwXx
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed an implementation of NFSv4.2+ attribute
delegation, as described here:
https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html
This interoperates with similar functionality introduced into the
Linux NFS client in v6.11. An attribute delegation permits an NFS
client to manage a file's mtime, rather than flushing dirty data to
the NFS server so that the file's mtime reflects the last write, which
is considerably slower.
Neil Brown contributed dynamic NFSv4.1 session slot table resizing.
This facility enables NFSD to increase or decrease the number of slots
per NFS session depending on server memory availability. More session
slots means greater parallelism.
Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND
encoding screws up when crossing a page boundary in the encoding
buffer. This is a zero-day bug, but hitting it is rare and depends on
the NFS client implementation. The Linux NFS client does not happen to
trigger this issue.
A variety of bug fixes and other incremental improvements fill out the
list of commits in this release. Great thanks to all contributors,
reviewers, testers, and bug reporters who participated during this
development cycle"
* tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits)
sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode
sunrpc: Remove gss_generic_token deadcode
sunrpc: Remove unused xprt_iter_get_xprt
Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages"
nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION
nfsd: handle delegated timestamps in SETATTR
nfsd: add support for delegated timestamps
nfsd: rework NFS4_SHARE_WANT_* flag handling
nfsd: add support for FATTR4_OPEN_ARGUMENTS
nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations
nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_*
nfsd: switch to autogenerated definitions for open_delegation_type4
nfs_common: make include/linux/nfs4.h include generated nfs4_1.h
nfsd: fix handling of delegated change attr in CB_GETATTR
SUNRPC: Document validity guarantees of the pointer returned by reserve_space
NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer
NFSD: Refactor nfsd4_do_encode_secinfo() again
NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer
NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer
...
Quite a few places want to build a struct qstr by given string;
it would be convenient to have a primitive doing that, rather
than open-coding it via QSTR_INIT().
The closest approximation was in bcachefs, but that expands to
initializer list - {.len = strlen(string), .name = string}.
It would be more useful to have it as compound literal -
(struct qstr){.len = strlen(string), .name = string}.
Unlike initializer list it's a valid expression. What's more,
it's a valid lvalue - it's an equivalent of anonymous local
variable with such initializer, so the things like
path->dentry = d_alloc_pseudo(mnt->mnt_sb, &QSTR(name));
are valid. It can also be used as initializer, with identical
effect -
struct qstr x = (struct qstr){.name = s, .len = strlen(s)};
is equivalent to
struct qstr anon_variable = {.name = s, .len = strlen(s)};
struct qstr x = anon_variable;
// anon_variable is never used after that point
and any even remotely sane compiler will manage to collapse that
into
struct qstr x = {.name = s, .len = strlen(s)};
What compound literals can't be used for is initialization of
global variables, but those are covered by QSTR_INIT().
This commit lifts definition(s) of QSTR() into linux/dcache.h,
converts it to compound literal (all bcachefs users are fine
with that) and converts assorted open-coded instances to using
that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
The previous commit removed the page_list argument from
alloc_pages_bulk_noprof() along with the alloc_pages_bulk_list() function.
Now that only the *_array() flavour of the API remains, we can do the
following renaming (along with the _noprof() ones):
alloc_pages_bulk_array -> alloc_pages_bulk
alloc_pages_bulk_array_mempolicy -> alloc_pages_bulk_mempolicy
alloc_pages_bulk_array_node -> alloc_pages_bulk_node
Link: https://lkml.kernel.org/r/275a3bbc0be20fbe9002297d60045e67ab3d4ada.1734991165.git.luizcap@redhat.com
Signed-off-by: Luiz Capitulino <luizcap@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a user TGT ticket expired, gssd returns EKEYEXPIRED to the RPC
layer for the upcall to create the security context. The RPC layer
then retries the upcall twice before returning the EKEYEXPIRED to
the NFS layer.
This results in three separate TCP connections to the NFS server being
created by gssd for each RPC request. These connections are not used
and left in TIME_WAIT state.
Note that for RPC call that uses machine credential, gssd automatically
renews the ticket. But for a regular user the ticket needs to be
renewed by the user before access to the krb5 share is allowed.
This patch removes the retries by RPC on EKEYEXPIRED so that these
unused TCP connections are not created.
Reproducer:
$ kinit -l 1m
$ sleep 65
$ cd /mnt/krb5share
$ netstat -na |grep TIME_WAIT
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
The output format should provide a value that matches the one in
the /proc/<pid>/ns/net symlink. This makes it simpler to match the
rpc_xprt and rpc_clnt to a particular container.
Also, when the xprt defines the get_srcaddr operation, use that to
display the source address as well.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Commit ec596aaf9b48 ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the
gss_decrypt_xdr_buf() and gss_encrypt_xdr_buf() functions.
Remove them.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit ec596aaf9b48 ("SUNRPC: Remove code behind
CONFIG_RPCSEC_GSS_KRB5_SIMPLIFIED") was the last user of the routines
in gss_generic_token.c.
Remove the routines and associated header.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
xprt_iter_get_xprt() was added by
commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
transports") but is unused.
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Anna Schumaker <anna.schumaker@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I noticed that a handful of NFSv3 fstests were taking an
unexpectedly long time to run. Troubleshooting showed that the
server's TCP window closed and never re-opened, which caused the
client to trigger an RPC retransmit timeout after 180 seconds.
The client's recovery action was to establish a fresh connection
and retransmit the timed-out requests. This worked, but it adds a
long delay.
I tracked the problem to the commit that attempted to reduce the
rate at which the network layer delivers TCP socket data_ready
callbacks. Under most circumstances this change worked as expected,
but for NFSv3, which has no session or other type of throttling, it
can overwhelm the receiver on occasion.
I'm sure I could tweak the lowat settings, but the small benefit
doesn't seem worth the bother. Just revert it.
Fixes: 2b877fc53e97 ("SUNRPC: Reduce thread wake-up rate when receiving large RPC messages")
Cc: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Display the total number of RPC tasks, including tasks waiting
on workqueue and wait queues, for rpc_clnt.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
Under heavy write load, we've seen the cl_tasks list grows to
millions of entries. Even though the list is extremely long,
the system still runs fine until the user wants to get the
information of all active RPC tasks by doing:
When this happens, tasks_start acquires the cl_lock to walk the
cl_tasks list, returning one entry at a time to the caller. The
cl_lock is held until all tasks on this list have been processed.
While the cl_lock is held, completed RPC tasks have to spin wait
in rpc_task_release_client for the cl_lock. If there are millions
of entries in the cl_tasks list it will take a long time before
tasks_stop is called and the cl_lock is released.
The spin wait tasks can use up all the available CPUs in the system,
preventing other jobs to run, this causes the system to temporarily
lock up.
This patch fixes this problem by delaying inserting the RPC
task on the cl_tasks list until the RPC call slot is reserved.
This limits the length of the cl_tasks to the number of call
slots available in the system.
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
A subtlety of this API is that if the @nbytes region traverses a
page boundary, the next __xdr_commit_encode will shift the data item
in the XDR encode buffer. This makes the returned pointer point to
something else, leading to unexpected behavior.
There are a few cases where the caller saves the returned pointer
and then later uses it to insert a computed value into an earlier
part of the stream. This can be safe only if either:
- the data item is guaranteed to be in the XDR buffer's head, and
thus is not ever going to be near a page boundary, or
- the data item is no larger than 4 octets, since XDR alignment
rules require all data items to start on 4-octet boundaries
But that safety is only an artifact of the current implementation.
It would be less brittle if these "safe" uses were eventually
replaced.
Reviewed-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
rcu_read_lock/rcu_read_unlock has already provide protection for the
pointer we will reference when we call c_show. Therefore, there is no
need to obtain a cache reference to help protect cache_head.
Additionally, the .put such as expkey_put/svc_export_put will invoke
dput, which can sleep and break rcu. Stop get cache reference to fix
them all.
Fixes: ae74136b4bb6 ("SUNRPC: Allow cache lookups to use RCU protection rather than the r/w spinlock")
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This is a prepare patch to add cache_check_rcu, will use it with follow
patch.
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that the connection limit only apply to unconfirmed connections,
there is no need to configure it. So remove all the configuration and
fix the number of unconfirmed connections as always 64 - which is
now given a name: XPT_MAX_TMP_CONN
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The heuristic for limiting the number of incoming connections to nfsd
currently uses sv_nrthreads - allowing more connections if more threads
were configured.
A future patch will allow number of threads to grow dynamically so that
there will be no need to configure sv_nrthreads. So we need a different
solution for limiting connections.
It isn't clear what problem is solved by limiting connections (as
mentioned in a code comment) but the most likely problem is a connection
storm - many connections that are not doing productive work. These will
be closed after about 6 minutes already but it might help to slow down a
storm.
This patch adds a per-connection flag XPT_PEER_VALID which indicates
that the peer has presented a filehandle for which it has some sort of
access. i.e the peer is known to be trusted in some way. We now only
count connections which have NOT been determined to be valid. There
should be relative few of these at any given time.
If the number of non-validated peer exceed a limit - currently 64 - we
close the oldest non-validated peer to avoid having too many of these
useless connections.
Note that this patch significantly changes the meaning of the various
configuration parameters for "max connections". The next patch will
remove all of these.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Highlights include:
Bugfixes:
- NFSv4.0: Fix a use-after-free problem in open()
- nfs/localio: fix for a memory corruption in nfs_local_read_done
- Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
- nfsv4: ignore SB_RDONLY when mounting nfs
- sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
- SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
- sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
- pNFS/blocklayout: Fix device registration issues
- SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
Features and cleanups:
- localio cleanups from Mike Snitzer
- Clean up refcounting on the nfs version modules
- __counted_by() annotations
- nfs: make processes that are waiting for an I/O lock killable
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmdIrr0ACgkQZwvnipYK
APKQ3w//ZRqyvhwD1MrK8vyQmDbSPNaMMVx710Hz7GYR5+ij+dGf+FNOr9sLqw8h
NkVrOhX7V1JRM/lz5mq3zPYCip5ZHKJQZAzLqOUqcBq7RtCG3G31h53so8S+GIap
j1hXsc2cmADIVm3ztm+HAn5kiT4lcBoeiEmsu/+dL0i5MVhYiEmCIBj3tdnhRtrL
Gql8nN6zyOCPtOBgiOViNje5w+arcJXN/yFHCWQPU7yPDb/dYDnHSB3ScJsuyxZQ
CjFn/AAdOfe8cHXGOmHryiQ0KlplwC6oxn1DoOG67FENk4ujFgLpYqnF0yPY5XxG
bmWuJVV9sFPwQ+n9RBybAK21lvpOMoGN0O+n5fBnALS25FrYEgJBWphqbXwvWdH1
23PZlTeiBqbjZv80PfCBAXByAmzWffp7wPQVd94Ny3Jr774IXcnAFWeMHgnRhDTj
5bY3wOxRzmVChLkyxIM9kYM1Wafb2vnXkL/EL8Kav3RpAdAGNbCH6kWOfJIpSR0j
Is9znfXGNwav6x3kahL7BGKO9WG52YfWCia+vxOcTWYjtgplLPdXMVZZjB6VlWRe
HzzmXTzRNQ/eMHNqESB04Pyn9pttYQAkVLy2R0ynEV1SQyhSM9E57/QLSOEIyTU8
u+rsIkCGz9KdHwltKOKxNJ/Jy5khpyPOQC5zrcp7vtctPnAsGek=
=Ih5w
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Bugfixes:
- nfs/localio: fix for a memory corruption in nfs_local_read_done
- Revert "nfs: don't reuse partially completed requests in
nfs_lock_and_join_requests"
- nfsv4:
- ignore SB_RDONLY when mounting nfs
- Fix a use-after-free problem in open()
- sunrpc:
- clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
- timeout and cancel TLS handshake with -ETIMEDOUT
- fix one UAF issue caused by sunrpc kernel tcp socket
- Fix a hang in TLS sock_close if sk_write_pending
- pNFS/blocklayout: Fix device registration issues
Features and cleanups:
- localio cleanups from Mike Snitzer
- Clean up refcounting on the nfs version modules
- __counted_by() annotations
- nfs: make processes that are waiting for an I/O lock killable"
* tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
fs/nfs/io: make nfs_start_io_*() killable
nfs/blocklayout: Limit repeat device registration on failure
nfs/blocklayout: Don't attempt unregister for invalid block device
sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
nfs: ignore SB_RDONLY when mounting nfs
Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"
nfs/localio: must clear res.replen in nfs_local_read_done
NFSv4.0: Fix a use-after-free problem in the asynchronous open()
NFSv4.0: Fix the wake up of the next waiter in nfs_release_seqid()
SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
sunrpc: remove newlines from tracepoints
nfs: Annotate struct pnfs_commit_array with __counted_by()
nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
nfs/localio: remove redundant suid/sgid handling
NFS: Implement get_nfs_version()
...
BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0
Read of size 1 at addr ffff888111f322cd by task swapper/0/0
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
Call Trace:
<IRQ>
dump_stack_lvl+0x68/0xa0
print_address_description.constprop.0+0x2c/0x3d0
print_report+0xb4/0x270
kasan_report+0xbd/0xf0
tcp_write_timer_handler+0x156/0x3e0
tcp_write_timer+0x66/0x170
call_timer_fn+0xfb/0x1d0
__run_timers+0x3f8/0x480
run_timer_softirq+0x9b/0x100
handle_softirqs+0x153/0x390
__irq_exit_rcu+0x103/0x120
irq_exit_rcu+0xe/0x20
sysvec_apic_timer_interrupt+0x76/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90
90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 <fa> c3 cc cc cc
cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242
RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f
RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d
R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000
R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0
default_idle_call+0x6b/0xa0
cpuidle_idle_call+0x1af/0x1f0
do_idle+0xbc/0x130
cpu_startup_entry+0x33/0x40
rest_init+0x11f/0x210
start_kernel+0x39a/0x420
x86_64_start_reservations+0x18/0x30
x86_64_start_kernel+0x97/0xa0
common_startup_64+0x13e/0x141
</TASK>
Allocated by task 595:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x87/0x90
kmem_cache_alloc_noprof+0x12b/0x3f0
copy_net_ns+0x94/0x380
create_new_namespaces+0x24c/0x500
unshare_nsproxy_namespaces+0x75/0xf0
ksys_unshare+0x24e/0x4f0
__x64_sys_unshare+0x1f/0x30
do_syscall_64+0x70/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Freed by task 100:
kasan_save_stack+0x24/0x50
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x54/0x70
kmem_cache_free+0x156/0x5d0
cleanup_net+0x5d3/0x670
process_one_work+0x776/0xa90
worker_thread+0x2e2/0x560
kthread+0x1a8/0x1f0
ret_from_fork+0x34/0x60
ret_from_fork_asm+0x1a/0x30
Reproduction script:
mkdir -p /mnt/nfsshare
mkdir -p /mnt/nfs/netns_1
mkfs.ext4 /dev/sdb
mount /dev/sdb /mnt/nfsshare
systemctl restart nfs-server
chmod 777 /mnt/nfsshare
exportfs -i -o rw,no_root_squash *:/mnt/nfsshare
ip netns add netns_1
ip link add name veth_1_peer type veth peer veth_1
ifconfig veth_1_peer 11.11.0.254 up
ip link set veth_1 netns netns_1
ip netns exec netns_1 ifconfig veth_1 11.11.0.1
ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \
--tcp-flags FIN FIN -j DROP
(note: In my environment, a DESTROY_CLIENTID operation is always sent
immediately, breaking the nfs tcp connection.)
ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \
11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1
ip netns del netns_1
The reason here is that the tcp socket in netns_1 (nfs side) has been
shutdown and closed (done in xs_destroy), but the FIN message (with ack)
is discarded, and the nfsd side keeps sending retransmission messages.
As a result, when the tcp sock in netns_1 processes the received message,
it sends the message (FIN message) in the sending queue, and the tcp timer
is re-established. When the network namespace is deleted, the net structure
accessed by tcp's timer handler function causes problems.
To fix this problem, let's hold netns refcnt for the tcp kernel socket as
done in other modules. This is an ugly hack which can easily be backported
to earlier kernels. A proper fix which cleans up the interfaces will
follow, but may not be so easy to backport.
Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
Signed-off-by: Liu Jian <liujian56@huawei.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We've noticed a situation where an unstable TCP connection can cause the
TLS handshake to timeout waiting for userspace to complete it. When this
happens, we don't want to return from xs_tls_handshake_sync() with zero, as
this will cause the upper xprt to be set CONNECTED, and subsequent attempts
to transmit will be returned with -EPIPE. The sunrpc machine does not
recover from this situation and will spin attempting to transmit.
The return value of tls_handshake_cancel() can be used to detect a race
with completion:
* tls_handshake_cancel - cancel a pending handshake
* Return values:
* %true - Uncompleted handshake request was canceled
* %false - Handshake request already completed or not found
If true, we do not want the upper xprt to be connected, so return
-ETIMEDOUT. If false, its possible the handshake request was lost and
that may be the reason for our timeout. Again we do not want the upper
xprt to be connected, so return -ETIMEDOUT.
Ensure that we alway return an error from xs_tls_handshake_sync() if we
call tls_handshake_cancel().
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Since transport->sock has been set to NULL during reset transport,
XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
to dereference the transport->sock that has been set to NULL.
Fixes: 7196dbb02ea0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires
a slow state recovery process.
A wide variety of bug fixes and other incremental improvements make
up the bulk of commits in this series. As always I am grateful to
the NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmdEgLQACgkQM2qzM29m
f5cwmg/9HcfG7blepU/2qNHopzSYRO5vZw1YNJQ5/Wi3bmqIea83lf8OcCY1G/aj
6K+jnenzHrwfhaA4u7N2FPXPVl8sPSMuOrJXY5zC4yE5QnIbranjcyEW5l5zlj3n
ukkTYQgjUsKre3pHlvn3JmDHfUhNPEfzirsJeorP7DS3omne+OFA1LNncNP6emRu
h0aEC6EJ43zUkYiz9nZYqPwIAwrUIA0WOrvVnq7vsi6gR4/Muk7nS+X/y4qFjli3
9enVskEv8sFmmOAIMK3CHJq+exEeKtKEKUuYkD23QgPt2R4+IwqS70o9IM/S1ypf
APiv958BIhxm/SwUn1IjoxIckTB5EdksMxU5/4qGr1ZxprPG4/ruKO80BkrxLzW2
n1HmJ4ZNnpWPQvHN7RQ0WOsPNzL8byxJbGr1bpNgU4AGXnTFWPrAnB6juiyX4xb+
YNfgkQGDY79o7r1OJ5UUdCyx0QBSnaLNACTGm2u2FpI/ukMFPdrWIE99QbBgSe1p
MgWaiPwSY+9crFfGPJeQ4t6/siRAec6L3RO9KT9Epcd2S7/Uts3NXYRdJfwZ+Qza
TkPY2bm7T/WCcMhW7DN372hqgfRHPWOf4tacJ1Tob+As1d6p6qXEX2zi6piCCOLj
dmTVDSVPClRXt8YigF9WqosyWv1jUzSnh9ne+eYPBpj93Ag2YBY=
=wBvS
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Jeff Layton contributed a scalability improvement to NFSD's NFSv4
backchannel session implementation. This improvement is intended to
increase the rate at which NFSD can safely recall NFSv4 delegations
from clients, to avoid the need to revoke them. Revoking requires a
slow state recovery process.
A wide variety of bug fixes and other incremental improvements make up
the bulk of commits in this series. As always I am grateful to the
NFSD contributors, reviewers, testers, and bug reporters who
participated during this cycle"
* tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits)
nfsd: allow for up to 32 callback session slots
nfs_common: must not hold RCU while calling nfsd_file_put_local
nfsd: get rid of include ../internal.h
nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur
NFSD: Add nfsd4_copy time-to-live
NFSD: Add a laundromat reaper for async copy state
NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations
NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD
NFSD: Free async copy information in nfsd4_cb_offload_release()
NFSD: Fix nfsd4_shutdown_copy()
NFSD: Add a tracepoint to record canceled async COPY operations
nfsd: make nfsd4_session->se_flags a bool
nfsd: remove nfsd4_session->se_bchannel
nfsd: make use of warning provided by refcount_t
nfsd: Don't fail OP_SETCLIENTID when there are too many clients.
svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()
xdrgen: Remove program_stat_to_errno() call sites
xdrgen: Update the files included in client-side source code
xdrgen: Remove check for "nfs_ok" in C templates
xdrgen: Remove tracepoint call site
...
There's issue as follows:
RPC: Registered rdma transport module.
RPC: Registered rdma backchannel transport module.
RPC: Unregistered rdma transport module.
RPC: Unregistered rdma backchannel transport module.
BUG: unable to handle page fault for address: fffffbfff80c609a
PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0
Call Trace:
<TASK>
__die+0x1f/0x70
page_fault_oops+0x2cd/0x860
spurious_kernel_fault+0x36/0x450
do_kern_addr_fault+0xca/0x100
exc_page_fault+0x128/0x150
asm_exc_page_fault+0x26/0x30
percpu_counter_destroy_many+0xf7/0x2a0
mmdrop+0x209/0x350
finish_task_switch.isra.0+0x481/0x840
schedule_tail+0xe/0xd0
ret_from_fork+0x23/0x80
ret_from_fork_asm+0x1a/0x30
</TASK>
If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
destroy the percpu counters which init in svc_rdma_proc_init().
If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
'percpu_counters' list. The above issue may occur once the module is
removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
leakage occurs.
To solve above issue just destroy all percpu counters when
register_sysctl() return NULL.
Fixes: 1e7e55731628 ("svcrdma: Restore read and write stats")
Fixes: 22df5a22462e ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter")
Fixes: df971cd853c0 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The function `c_show` was called with protection from RCU. This only
ensures that `cp` will not be freed. Therefore, the reference count for
`cp` can drop to zero, which will trigger a refcount use-after-free
warning when `cache_get` is called. To resolve this issue, use
`cache_get_rcu` to ensure that `cp` remains active.
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
refcount_warn_saturate+0xb1/0x120
CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.16.1-2.fc37 04/01/2014
RIP: 0010:refcount_warn_saturate+0xb1/0x120
Call Trace:
<TASK>
c_show+0x2fc/0x380 [sunrpc]
seq_read_iter+0x589/0x770
seq_read+0x1e5/0x270
proc_reg_read+0xe1/0x140
vfs_read+0x125/0x530
ksys_read+0xc1/0x160
do_syscall_64+0x5f/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Use appropriate frag_page API instead of caller accessing
'page_frag_cache' directly.
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Linux-MM <linux-mm@kvack.org>
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Link: https://patch.msgid.link/20241028115343.3405838-5-linyunsheng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>