mirror of
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/
synced 2025-04-19 20:58:31 +09:00
47795 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
fb6d03238e |
tracing: Freeable reserved ring buffer
Make the ring buffer on reserved memory to be freeable. This allows us to free the trace instance on the reserved memory without changing cmdline and rebooting. Even if we can not change the kernel cmdline for security reason, we can release the reserved memory for the ring buffer as free (available) memory. For example, boot kernel with reserved memory; "reserve_mem=20M:2M:trace trace_instance=boot_mapped^traceoff@trace" ~ # free total used free shared buff/cache available Mem: 1995548 50544 1927568 14964 17436 1911480 Swap: 0 0 0 ~ # rmdir /sys/kernel/tracing/instances/boot_mapped/ [ 23.704023] Freeing reserve_mem:trace memory: 20476K ~ # free total used free shared buff/cache available Mem: 2016024 41844 1956740 14968 17440 1940572 Swap: 0 0 0 Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@kernel.org> Link: https://lore.kernel.org/173989134814.230693.18199312930337815629.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
5f3719f697 |
tracing: Update modules to persistent instances when loaded
When a module is loaded and a persistent buffer is actively tracing, add it to the list of modules in the persistent memory. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164609.469844721@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
1bd25a6f71 |
tracing: Show module names and addresses of last boot
Add the last boot module's names and addresses to the last_boot_info file. This only shows the module information from a previous boot. If the buffer is started and is recording the current boot, this file still will only show "current". ~# cat instances/boot_mapped/last_boot_info 10c00000 [kernel] ffffffffc00ca000 usb_serial_simple ffffffffc00ae000 usbserial ffffffffc008b000 bfq ~# echo function > instances/boot_mapped/current_tracer ~# cat instances/boot_mapped/last_boot_info # Current Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164609.299186021@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
fd39e48fe8 |
tracing: Have persistent trace instances save module addresses
For trace instances that are mapped to persistent memory, have them use the scratch area to save the currently loaded modules. This will allow where the modules have been loaded on the next boot so that their addresses can be deciphered by using where they were loaded previously. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164609.129741650@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
966b7d0e52 |
module: Add module_for_each_mod() function
The tracing system needs a way to save all the currently loaded modules and their addresses into persistent memory so that it can evaluate the addresses on a reboot from a crash. When the persistent memory trace starts, it will load the module addresses and names into the persistent memory. To do so, it will call the module_for_each_mod() function and pass it a function and data structure to get called on each loaded module. Then it can record the memory. This only implements that function. Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: linux-modules@vger.kernel.org Link: https://lore.kernel.org/20250305164608.962615966@goodmis.org Acked-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
b65334825f |
tracing: Have persistent trace instances save KASLR offset
There's no reason to save the KASLR offset for the ring buffer itself. That is used by the tracer. Now that the tracer has a way to save data in the persistent memory of the ring buffer, have the tracing infrastructure take care of the saving of the KASLR offset. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164608.792722274@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
4af0a9c518 |
ring-buffer: Add ring_buffer_meta_scratch()
Now that there's one meta data at the start of the persistent memory used by the ring buffer, allow the caller to request some memory right after that data that it can use as its own persistent memory. Also fix some white space issues with ring_buffer_alloc(). Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164608.619631731@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
4009cc31e7 |
ring-buffer: Add buffer meta data for persistent ring buffer
Instead of just having a meta data at the first page of each sub buffer that has duplicate data, add a new meta page to the entire block of memory that holds the duplicate data and remove it from the sub buffer meta data. This will open up the extra memory in this first page to be used by the tracer for its own persistent data. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164608.446351513@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
bcba8d4dbe |
ring-buffer: Use kaslr address instead of text delta
Instead of saving off the text and data pointers and using them to compare with the current boot's text and data pointers, just save off the KASLR offset. Then that can be used to figure out how to read the previous boots buffer. The last_boot_info will now show this offset, but only if it is for a previous boot: ~# cat instances/boot_mapped/last_boot_info 39000000 [kernel] ~# echo function > instances/boot_mapped/current_tracer ~# cat instances/boot_mapped/last_boot_info # Current If the KASLR offset saved is for the current boot, the last_boot_info will show the value of "current". Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250305164608.274956504@goodmis.org Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
c73f0b6964 |
ring-buffer: Fix bytes_dropped calculation issue
The calculation of bytes-dropped and bytes_dropped_nested is reversed. Although it does not affect the final calculation of total_dropped, it should still be modified. Link: https://lore.kernel.org/20250223070106.6781-1-yangfeng59949@163.com Fixes: 6c43e554a2a5 ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Feng Yang <yangfeng@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
196a062641 |
tracing: Mark binary printing functions with __printf() attribute
Binary printing functions are using printf() type of format, and compiler is not happy about them as is: kernel/trace/trace.c:3292:9: error: function ‘trace_vbprintk’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] kernel/trace/trace_seq.c:182:9: error: function ‘trace_seq_bprintf’ might be a candidate for ‘gnu_printf’ format attribute [-Werror=suggest-attribute=format] Fix the compilation errors by adding __printf() attribute. While at it, move existing __printf() attributes from the implementations to the declarations. IT also fixes incorrect attribute parameters that are used for trace_array_printk(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Kees Cook <kees@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20250321144822.324050-4-andriy.shevchenko@linux.intel.com Signed-off-by: Petr Mladek <pmladek@suse.com> |
||
![]() |
7b667acd69 |
powerpc updates for 6.15
- Removal of support for IBM Cell Blades - SMP support for microwatt platform - Support for inline static calls on PPC32 - Enable pmu selftests for power11 platform - Enable hardware trace macro (HTM) hcall support - Support for limited address mode capability - Changes to RMA size from 512 MB to 768 MB to handle fadump - Misc fixes and cleanups Thanks to: Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann, Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqX2DNAOgU8sBX3pRpnEsdPSHZJQFAmfjZnYACgkQpnEsdPSH ZJTmKg/+NGW7wyFY9d8Iai9ncYY7GSzsMSDTaan7qg0QWOd5gjHbsdbava7TM/DW 8p9XsC+17kSeftRNUjtc52bSN8Ei2gBdsXIagQG1alfB2X2e6wkNauifK+dz3Su6 usMEZZTO5R/jFDotFXNM1nsUj+8dvjnPgOUrji/P8k7PT5295wpza0hz1fy5SrOA hM5cliBP36UgFe5Efvgm4OUX2gQIhbc3stt9MVfymW/k0Mit5f41UIPuVGiTWowY s0cUJGkhxUlGXT3VfOVKuZfn4u9KMha7UCl9afSceJzXOdnUIKIbskui1VEv6cD/ iSIxi839uErAobFHlsLYprgYFciYLII3xe2qNZCA/ZxeIMS/Mm6xokESeWLhBnfa P7ke6l0z3GDtTvgI2eSeU9BdrVveF1NgbP9GYSKgT6gtw/kRRnxgHF8tzmLON5PT KXpQlzz8VuSBRtF2jnLFU89+FFwSA1bRUhDrp89HyYFqw1B5g4N7kFFTUJWHOuKS fwPGy+cveKehmCUBedeTRFqHvvqdwpD/WnPlQzCly3WxqdL8U/eTXYftMiAwuK28 ovLuSs3vRThKRQ8DnUa5oB0UGsjMpRV5LdvYkhw+x8mZKUR59oj4fx2ae4TtPakg dbAYuPPkCORdaSga/nV6vQgsLprFpcGX3dq6E19+BVBAY5D+1PE= =GFcj -----END PGP SIGNATURE----- Merge tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - Remove support for IBM Cell Blades - SMP support for microwatt platform - Support for inline static calls on PPC32 - Enable pmu selftests for power11 platform - Enable hardware trace macro (HTM) hcall support - Support for limited address mode capability - Changes to RMA size from 512 MB to 768 MB to handle fadump - Misc fixes and cleanups Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann, Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav Jain, and Venkat Rao Bagalkote. * tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits) powerpc/kexec: fix physical address calculation in clear_utlb_entry() crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD powerpc: Fix 'intra_function_call not a direct call' warning powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu' KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7 powerpc/microwatt: Add SMP support powerpc: Define config option for processors with broadcast TLBIE powerpc/microwatt: Define an idle power-save function powerpc/microwatt: Device-tree updates powerpc/microwatt: Select COMMON_CLK in order to get the clock framework net: toshiba: Remove reference to PPC_IBM_CELL_BLADE net: spider_net: Remove powerpc Cell driver cpufreq: ppc_cbe: Remove powerpc Cell driver genirq: Remove IRQ_EDGE_EOI_HANDLER docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR powerpc: Remove UDBG_RTAS_CONSOLE powerpc/io: Use standard barrier macros in io.c powerpc/io: Rename _insw_ns() etc. powerpc/io: Use generic raw accessors ... |
||
![]() |
a7e135fe59 |
Probes updates for v6.15:
- probe-events: Add comments about entry data storing code to clarify where and how the entry data is stored for function return events. - probe-events: Log error for exceeding the number of arguments to help user to identify error reason via tracefs/error_log file. - selftests/ftrace: Improve the ftracetest to add followngs. . Expand the tprobe event test to check if it can correctly find the wrong format tracepoint name. . Add new syntax error test to check whether error_log correctly indicates a wrong character in the tracepoint name. . Add a new dynamic events argument limitation test case which checks max number of probe arguments. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmflTlobHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8ba9UIALzzZQxzUhJYO/B/XaCz KTuSrU2594cLr3nCrmCfL3UHUCr5IcjXKCfUdrgzE9mckWF+nVRXWwbp29KpOQky fzU9Ardbr7ksGAYFk4My+P/BeYa7vh9LwofXzWlJibANVxWvq66+GfKnWnh1P8Bl /zjov61DAEQmDfXNGZ2oTmKnYYMoPkJU4voRvAEUgiP02SrF04UUa00uC7hmZJJV aI5b6TatE5FDEmAoHWh0cf04HoyevRyLVOQd5GSswH/oWpyhs90M7a9WENHamBx6 RXEZ3xYyuH9zBSvYVeRn2H1eHnGCR4RMctZiRGXF8EdLWo7RXRQ1Hp9CgvwDrU8Z IL0= =vZ3e -----END PGP SIGNATURE----- Merge tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - probe-events: Add comments about entry data storing code to clarify where and how the entry data is stored for function return events. - probe-events: Log error for exceeding the number of arguments to help user to identify error reason via tracefs/error_log file. - Improve the ftracetest selftests: - Expand the tprobe event test to check if it can correctly find the wrong format tracepoint name. - Add new syntax error test to check whether error_log correctly indicates a wrong character in the tracepoint name. - Add a new dynamic events argument limitation test case which checks max number of probe arguments. * tag 'probes-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: probe-events: Add comments about entry data storing code selftests/ftrace: Add dynamic events argument limitation test case selftests/ftrace: Add new syntax error test selftests/ftrace: Expand the tprobe event test to check wrong format tracing: probe-events: Log error for exceeding the number of arguments |
||
![]() |
dcf9f31c62 |
Livepatching changes for 6.15
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflMGMACgkQUqAMR0iA lPKHhg/8DjxkQ3tvUpW53pfABBdKELJWOl4hH+IMvGcdh2dZrcFf6OkEB4JeXXVn duSODaefalrdU0jV21LChYpEkG40wZqStmamH9im+VbyCc8JwMwvWckrw45RNmjO AC4oVh0K9SJcrTj7XWsFSeS38KmQKhtp2zTHpSqCW9FZkfn8si11E8Xzue2kTtEH 7CESq6N0SMahVfiNkLI1n0x0mUyWiTE43hQ3e+tmmBgtmv3gU4FpQqeTGc/MQUxy B6AaI1vChovLmfNhw4KM8GvyPCTIn4xRQ09WXZRbK2ZzAxgFfm7QXGL3E93ra929 juSoIWnNwzw81RB3apzclOHFTbctHJWL+85KEefqCjrCuEbaoLNGoLkVqNuj0U+D LmLqmm6NfEtPzDVBY232KepjFUoIb890eKRF3Wq1NnHonaVxmUSMEMRSdlKeHpSo uImBhr5XhXDJLHt4+lAa4m0JSm3RH2AK83xIrdJq/YUcKscCzJMHLqqsp8pCa7mb L7iOhHFKs2r6GJCt5kLEw1jkNEg5Lpk08fY5AekjScvNcxXWXF8E8Jl9DnBBoz+5 e15mvqsiQtcAGHyR1Ohmw8UPYnWY2fSZTmru9aClwHCvC0q+cyCxIZgcqLxFYFIr TKIgQ6zSepA+j7SzI1SaDY4Z9cNWMTU+9GqrNO3xO4ddpKrp5C8= =ggzJ -----END PGP SIGNATURE----- Merge tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching Pull livepatching updates from Petr Mladek: - Add a selftest for tracing of a livepatched function - Skip a selftest when kprobes are not using ftrace - Some documentation clean up * tag 'livepatching-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching: selftests: livepatch: test if ftrace can trace a livepatched function selftests: livepatch: add new ftrace helpers functions selftest/livepatch: Only run test-kprobe with CONFIG_KPROBES_ON_FTRACE docs: livepatch: move text out of code block livepatch: Add comment to clarify klp_add_nops() |
||
![]() |
96050814a3 |
printk changes for 6.15
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmflJC4ACgkQUqAMR0iA lPIEMBAAsIJmQPHdZUQ49tkMGMoQPU9gnrrDZbRzN9IWwMQQymwEobSEkJkrx7O3 MFQHc5Z29cAxDqrWDCG5hrH0P3rd994pLPgC/XZC9+UbksYIkIrudCqaPaekLVaw BQmBZnZRe7PltOiDpx/TxQ8WQgNlAmZpLpIRTB3ePHZpikq/mf4CGcoYXkrNij1j E7uzurLZssJwXasRkWOVNvaOFyVcxrjn78H0JoT7mVeh2IUkWyMALX4IyDeD6Uvh SCe19/6W3Uyeh/jpJ6+gjuirVy/Vam/2GXeer4yKpgSQBtsqIaGyki4Cc6Xy12gU 71nRPTggysZXNnqZ40qw68Irb9JGAo63qFDwWf3OQI+yQQWHyGZIinxr+2xupwWF ZgmDJtvUsWKHT4+WmB9+MVTGW7Jy8wKq2u/Haf8xCJxs2yhI0J8iUvQi3CPAMdRH Q8G9UJP29Lfj0hdKVT2/wVNGxZKIZ8c6v1vR6Rd1hJ0Zqp4Oj+XGr0btE8Ah5LYv GI/M00LCdTNT2kBbbw8lekovUIs2stiRl6hfKoQFR11oiZwFDUePLDdJH2GiJAZm g9fPs6YuBERfPtzUeo9a9UbSd6oPTLgLNBEklpHUTWilzO8E/Jh+ZXyvQwj7eXyq Hi4OnShn1uxL68o6DJfKsF2qW8S2U5g61JrU3D1CaQ98Derrp0o= =lNTZ -----END PGP SIGNATURE----- Merge tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - New option "printk.debug_non_panic_cpus" allows to store printk messages from non-panic CPUs during panic. It might be useful when panic() fails. It is disabled by default because it increases the chance to see the messages printed before panic() and on the panic-CPU. - New build option "CONFIG_NULL_TTY_DEFAULT_CONSOLE" allows to build kernel without the virtual terminal support which prefers ttynull over serial console. - Do not unblank suspended consoles. - Some code clean up. * tag 'printk-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk/panic: Add option to allow non-panic CPUs to write to the ring buffer. printk: Add an option to allow ttynull to be a default console device printk: Check CON_SUSPEND when unblanking a console printk: Rename console_start to console_resume printk: Rename console_stop to console_suspend printk: Rename resume_console to console_resume_all printk: Rename suspend_console to console_suspend_all |
||
![]() |
744fab2d9f |
tracing updates for v6.15:
- Add option traceoff_after_boot In order to debug kernel boot, it sometimes is helpful to enable tracing via the kernel command line. Unfortunately, by the time the login prompt appears, the trace is overwritten by the init process and other user space start up applications. Adding a "traceoff_after_boot" will disable tracing when the kernel passes control to init which will allow developers to be able to see the traces that occurred during boot. - Clean up the mmflags macros that display the GFP flags in trace events The macros to print the GFP flags for trace events had a bit of duplication. The code was restructured to remove duplication and in the process it also adds some flags that were missed before. - Removed some dead code and scripts/draw_functrace.py draw_functrace.py hasn't worked in years and as nobody complained about it, remove it. - Constify struct event_trigger_ops The event_trigger_ops is just a structure that has function pointers that are assigned when the variables are created. These variables should all be constants. - Other minor clean ups and fixes -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+V9IhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qr4RAP9JhE3n69pGuOVaJTN/LGLr2Axl59n4 KqZSZS1nUM76/gD6AxYpR7nxyxgJ7VjNkLptS9tSjJVdPDxGAl0v3eO04w4= =SU30 -----END PGP SIGNATURE----- Merge tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Add option traceoff_after_boot In order to debug kernel boot, it sometimes is helpful to enable tracing via the kernel command line. Unfortunately, by the time the login prompt appears, the trace is overwritten by the init process and other user space start up applications. Adding a "traceoff_after_boot" will disable tracing when the kernel passes control to init which will allow developers to be able to see the traces that occurred during boot. - Clean up the mmflags macros that display the GFP flags in trace events The macros to print the GFP flags for trace events had a bit of duplication. The code was restructured to remove duplication and in the process it also adds some flags that were missed before. - Removed some dead code and scripts/draw_functrace.py draw_functrace.py hasn't worked in years and as nobody complained about it, remove it. - Constify struct event_trigger_ops The event_trigger_ops is just a structure that has function pointers that are assigned when the variables are created. These variables should all be constants. - Other minor clean ups and fixes * tag 'trace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Replace strncpy with memcpy for fixed-length substring copy tracing: Fix synth event printk format for str fields tracing: Do not use PERF enums when perf is not defined tracing: Ensure module defining synth event cannot be unloaded while tracing tracing: fix return value in __ftrace_event_enable_disable for TRACE_REG_UNREGISTER tracing/osnoise: Fix possible recursive locking for cpus_read_lock() tracing: Align synth event print fmt tracing: gfp: vsprintf: Do not print "none" when using %pGg printf format tracepoint: Print the function symbol when tracepoint_debug is set tracing: Constify struct event_trigger_ops scripts/tracing: Remove scripts/tracing/draw_functrace.py tracing: Update MAINTAINERS file to include tracepoint.c tracing/user_events: Slightly simplify user_seq_show() tracing/user_events: Don't use %pK through printk tracing: gfp: Remove duplication of recording GFP flags tracing: Remove orphaned event_trace_printk ring-buffer: Fix typo in comment about header page pointer tracing: Add traceoff_after_boot option |
||
![]() |
88221ac0d5 |
Latency tracing changes for v6.15:
- Add some trace events to osnoise and timerlat sample generation This adds more information to the osnoise and timerlat tracers as well as allows BPF programs to be attached to these locations to extract even more data. - Fix to DECLARE_TRACE_CONDITION() macro It wasn't used but now will be and it happened to be broken causing the build to fail. - Add scheduler specification monitors to runtime verifier (RV) This is a continuation of Daniel Bristot's work. RV allows monitors to run and react concurrently. Running the cumulative model is equivalent to running single components using the same reactors, with the advantage that it's easier to point out which specification failed in case of error. This update introduces nested monitors to RV, in short, the sysfs monitor folder will contain a monitor named sched, which is nothing but an empty container for other monitors. Controlling the sched monitor (enable, disable, set reactors) controls all nested monitors. The following scheduling monitors are added: * sco: scheduling context operations Monitor to ensure sched_set_state happens only in thread context * tss: task switch while scheduling Monitor to ensure sched_switch happens only in scheduling context * snroc: set non runnable on its own context Monitor to ensure set_state happens only in the respective task's context * scpd: schedule called with preemption disabled Monitor to ensure schedule is called with preemption disabled * snep: schedule does not enable preempt Monitor to ensure schedule does not enable preempt * sncid: schedule not called with interrupt disabled Monitor to ensure schedule is not called with interrupt disabled -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+QhuxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qg62AP9bkeNDbiCuAqjZGddV09Hw26wC3yum kQoNSebD8G52rQEA9GDjK37xGzYwW/fJokhJVTV39qfub6inAJE5dS6WeQY= =8Ikv -----END PGP SIGNATURE----- Merge tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull latency tracing updates from Steven Rostedt: - Add some trace events to osnoise and timerlat sample generation This adds more information to the osnoise and timerlat tracers as well as allows BPF programs to be attached to these locations to extract even more data. - Fix to DECLARE_TRACE_CONDITION() macro It wasn't used but now will be and it happened to be broken causing the build to fail. - Add scheduler specification monitors to runtime verifier (RV) This is a continuation of Daniel Bristot's work. RV allows monitors to run and react concurrently. Running the cumulative model is equivalent to running single components using the same reactors, with the advantage that it's easier to point out which specification failed in case of error. This update introduces nested monitors to RV, in short, the sysfs monitor folder will contain a monitor named sched, which is nothing but an empty container for other monitors. Controlling the sched monitor (enable, disable, set reactors) controls all nested monitors. The following scheduling monitors are added: - sco: scheduling context operations Monitor to ensure sched_set_state happens only in thread context - tss: task switch while scheduling Monitor to ensure sched_switch happens only in scheduling context - snroc: set non runnable on its own context Monitor to ensure set_state happens only in the respective task's context - scpd: schedule called with preemption disabled Monitor to ensure schedule is called with preemption disabled - snep: schedule does not enable preempt Monitor to ensure schedule does not enable preempt - sncid: schedule not called with interrupt disabled Monitor to ensure schedule is not called with interrupt disabled * tag 'trace-latency-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/rv: Allow rv list to filter for container Documentation/rv: Add docs for the sched monitors verification/dot2k: Add support for nested monitors tools/rv: Add support for nested monitors rv: Add scpd, snep and sncid per-cpu monitors rv: Add snroc per-task monitor rv: Add sco and tss per-cpu monitors rv: Add option for nested monitors and include sched sched: Add sched tracepoints for RV task model rv: Add license identifiers to monitor files tracing: Fix DECLARE_TRACE_CONDITION trace/osnoise: Add trace events for samples |
||
![]() |
31eb415bf6 |
ftrace changes for v6.15:
- Record function parameters for function and function graph tracers An option has been added to function tracer (func-args) and the function graph tracer (funcgraph-args) that when set, the tracers will record the registers that hold the arguments into each function event. On reading of the trace, it will use BTF to print those arguments. Most archs support up to 6 arguments (depending on the complexity of the arguments) and those are printed. If a function has more arguments then what was recorded, the output will end with " ... )". Example of function graph tracer: 6) | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) { 6) | consume_skb(skb = 0x8887c100) { 6) | skb_release_head_state(skb = 0x8887c100) { 6) 0.178 us | sock_wfree(skb = 0x8887c100) 6) 0.627 us | } - The rest of the changes are minor clean ups and fixes -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+M9vRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qrBGAP9NTn4Lpci69n8FJGfz+UKUEKbzOOeg QrkIZIxbm8N56gEA0RaUionWjt1znZXUdBTsE3h+en/Ik0//VZytQcmJBwM= =fxuG -----END PGP SIGNATURE----- Merge tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Record function parameters for function and function graph tracers An option has been added to function tracer (func-args) and the function graph tracer (funcgraph-args) that when set, the tracers will record the registers that hold the arguments into each function event. On reading of the trace, it will use BTF to print those arguments. Most archs support up to 6 arguments (depending on the complexity of the arguments) and those are printed. If a function has more arguments then what was recorded, the output will end with " ... )". Example of function graph tracer: 6) | dummy_xmit [dummy](skb = 0x8887c100, dev = 0x872ca000) { 6) | consume_skb(skb = 0x8887c100) { 6) | skb_release_head_state(skb = 0x8887c100) { 6) 0.178 us | sock_wfree(skb = 0x8887c100) 6) 0.627 us | } - The rest of the changes are minor clean ups and fixes * tag 'ftrace-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use hashtable.h for event_hash tracing: Fix use-after-free in print_graph_function_flags during tracer switching function_graph: Remove the unused variable func ftrace: Add arguments to function tracer ftrace: Have funcgraph-args take affect during tracing ftrace: Add support for function argument to graph tracer ftrace: Add print_function_args() ftrace: Have ftrace_free_filter() WARN and exit if ops is active fgraph: Correct typo in ftrace_return_to_handler comment |
||
![]() |
dd161f74f8 |
tracing and sorttable updates for 6.15:
- Implement arm64 build time sorting of the mcount location table When gcc is used to build arm64, the mcount_loc section is all zeros in the vmlinux elf file. The addresses are stored in the Elf_Rela location. To sort at build time, an array is allocated and the addresses are added to it via the content of the mcount_loc section as well as he Elf_Rela data. After sorting, the information is put back into the Elf_Rela which now has the section sorted. - Make sorting of mcount location table for arm64 work with clang as well When clang is used, the mcount_loc section contains the addresses, unlike the gcc build. An array is still created and the sorting works for both methods. - Remove weak functions from the mcount_loc section Have the sorttable code pass in the data of functions defined via nm -S which shows the functions as well as their sizes. Using this information the sorttable code can determine if a function in the mcount_loc section was weak and overridden. If the function is not found, it is set to be zero. On boot, when the mcount_loc section is read and the ftrace table is created, if the address in the mcount_loc is not in the kernel core text then it is removed and not added to the ftrace_filter_functions (the functions that can be attached by ftrace callbacks). - Update and fix the reporting of how much data is used for ftrace functions On boot, a report of how many pages were used by the ftrace table as well as how they were grouped (the table holds a list of sections that are groups of pages that were able to be allocated). The removing of the weak functions required the accounting to be updated. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+MnThQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qivsAQDhPOCaONai7rvHX9T1aOHGjdajZ7SI qoZgBOsc2ZUkoQD/U2M/m7Yof9aR4I+VFKtT5NsAwpfqPSOL/t/1j6UEOQ8= =45AV -----END PGP SIGNATURE----- Merge tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing / sorttable updates from Steven Rostedt: - Implement arm64 build time sorting of the mcount location table When gcc is used to build arm64, the mcount_loc section is all zeros in the vmlinux elf file. The addresses are stored in the Elf_Rela location. To sort at build time, an array is allocated and the addresses are added to it via the content of the mcount_loc section as well as he Elf_Rela data. After sorting, the information is put back into the Elf_Rela which now has the section sorted. - Make sorting of mcount location table for arm64 work with clang as well When clang is used, the mcount_loc section contains the addresses, unlike the gcc build. An array is still created and the sorting works for both methods. - Remove weak functions from the mcount_loc section Have the sorttable code pass in the data of functions defined via 'nm -S' which shows the functions as well as their sizes. Using this information the sorttable code can determine if a function in the mcount_loc section was weak and overridden. If the function is not found, it is set to be zero. On boot, when the mcount_loc section is read and the ftrace table is created, if the address in the mcount_loc is not in the kernel core text then it is removed and not added to the ftrace_filter_functions (the functions that can be attached by ftrace callbacks). - Update and fix the reporting of how much data is used for ftrace functions On boot, a report of how many pages were used by the ftrace table as well as how they were grouped (the table holds a list of sections that are groups of pages that were able to be allocated). The removing of the weak functions required the accounting to be updated. * tag 'trace-sorttable-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: scripts/sorttable: Allow matches to functions before function entry scripts/sorttable: Use normal sort if theres no relocs in the mcount section ftrace: Check against is_kernel_text() instead of kaslr_offset() ftrace: Test mcount_loc addr before calling ftrace_call_addr() ftrace: Have ftrace pages output reflect freed pages ftrace: Update the mcount_loc check of skipped entries scripts/sorttable: Zero out weak functions in mcount_loc table scripts/sorttable: Always use an array for the mcount_loc sorting scripts/sorttable: Have mcount rela sort use direct values arm64: scripts/sorttable: Implement sorting mcount_loc at boot for arm64 |
||
![]() |
bb9c6020f4 |
tracing: probe-events: Add comments about entry data storing code
Add comments about entry data storing code to __store_entry_arg() and traceprobe_get_entry_data_size(). These are a bit complicated because of building the entry data storing code and scanning it. This just add comments, no behavior change. Link: https://lore.kernel.org/all/174061715004.501424.333819546601401102.stgit@devnote2/ Reported-by: Steven Rostedt <rostedt@goodmis.org> Closes: https://lore.kernel.org/all/20250226102223.586d7119@gandalf.local.home/ Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> |
||
![]() |
57faaa0480 |
tracing: probe-events: Log error for exceeding the number of arguments
Add error message when the number of arguments exceeds the limitation. Link: https://lore.kernel.org/all/174055075075.4079315.10916648136898316476.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
f49040c7aa | Merge branch 'for-6.15-console-suspend-api-cleanup' into for-linus | ||
![]() |
495f53d5cc |
locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class()
Currently, when a lock class is allocated, nr_unused_locks will be increased by 1, until it gets used: nr_unused_locks will be decreased by 1 in mark_lock(). However, one scenario is missed: a lock class may be zapped without even being used once. This could result into a situation that nr_unused_locks != 0 but no unused lock class is active in the system, and when `cat /proc/lockdep_stats`, a WARN_ON() will be triggered in a CONFIG_DEBUG_LOCKDEP=y kernel: [...] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused) [...] WARNING: CPU: 41 PID: 1121 at kernel/locking/lockdep_proc.c:283 lockdep_stats_show+0xba9/0xbd0 And as a result, lockdep will be disabled after this. Therefore, nr_unused_locks needs to be accounted correctly at zap_class() time. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250326180831.510348-1-boqun.feng@gmail.com |
||
![]() |
1a9239bb42 |
Networking changes for 6.15.
Core & protocols ---------------- - Continue Netlink conversions to per-namespace RTNL lock (IPv4 routing, routing rules, routing next hops, ARP ioctls). - Continue extending the use of netdev instance locks. As a driver opt-in protect queue operations and (in due course) ethtool operations with the instance lock and not RTNL lock. - Support collecting TCP timestamps (data submitted, sent, acked) in BPF, allowing for transparent (to the application) and lower overhead tracking of TCP RPC performance. - Tweak existing networking Rx zero-copy infra to support zero-copy Rx via io_uring. - Optimize MPTCP performance in single subflow mode by 29%. - Enable GRO on packets which went thru XDP CPU redirect (were queued for processing on a different CPU). Improving TCP stream performance up to 2x. - Improve performance of contended connect() by 200% by searching for an available 4-tuple under RCU rather than a spin lock. Bring an additional 229% improvement by tweaking hash distribution. - Avoid unconditionally touching sk_tsflags on RX, improving performance under UDP flood by as much as 10%. - Avoid skb_clone() dance in ping_rcv() to improve performance under ping flood. - Avoid FIB lookup in netfilter if socket is available, 20% perf win. - Rework network device creation (in-kernel) API to more clearly identify network namespaces and their roles. There are up to 4 namespace roles but we used to have just 2 netns pointer arguments, interpreted differently based on context. - Use sysfs_break_active_protection() instead of trylock to avoid deadlocks between unregistering objects and sysfs access. - Add a new sysctl and sockopt for capping max retransmit timeout in TCP. - Support masking port and DSCP in routing rule matches. - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST. - Support specifying at what time packet should be sent on AF_XDP sockets. - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users. - Add Netlink YAML spec for WiFi (nl80211) and conntrack. - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols which only need to be exported when IPv6 support is built as a module. - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to normal bridging. - Allow users to specify source port range for GENEVE tunnels. - netconsole: allow attaching kernel release, CPU ID and task name to messages as metadata Driver API ---------- - Continue rework / fixing of Energy Efficient Ethernet (EEE) across the SW layers. Delegate the responsibilities to phylink where possible. Improve its handling in phylib. - Support symmetric OR-XOR RSS hashing algorithm. - Support tracking and preserving IRQ affinity by NAPI itself. - Support loopback mode speed selection for interface selftests. Device drivers -------------- - Remove the IBM LCS driver for s390. - Remove the sb1000 cable modem driver. - Add support for SFP module access over SMBus. - Add MCTP transport driver for MCTP-over-USB. - Enable XDP metadata support in multiple drivers. - Ethernet high-speed NICs: - Broadcom (bnxt): - add PCIe TLP Processing Hints (TPH) support for new AMD platforms - support dumping RoCE queue state for debug - opt into instance locking - Intel (100G, ice, idpf): - ice: rework MSI-X IRQ management and distribution - ice: support for E830 devices - iavf: add support for Rx timestamping - iavf: opt into instance locking - nVidia/Mellanox: - mlx4: use page pool memory allocator for Rx - mlx5: support for one PTP device per hardware clock - mlx5: support for 200Gbps per-lane link modes - mlx5: move IPSec policy check after decryption - AMD/Solarflare: - support FW flashing via devlink - Cisco (enic): - use page pool memory allocator for Rx - enable 32, 64 byte CQEs - get max rx/tx ring size from the device - Meta (fbnic): - support flow steering and RSS configuration - report queue stats - support TCP segmentation - support IRQ coalescing - support ring size configuration - Marvell/Cavium: - support AF_XDP - Wangxun: - support for PTP clock and timestamping - Huawei (hibmcge): - checksum offload - add more statistics - Ethernet virtual: - VirtIO net: - aggressively suppress Tx completions, improve perf by 96% with 1 CPU and 55% with 2 CPUs - expose NAPI to IRQ mapping and persist NAPI settings - Google (gve): - support XDP in DQO RDA Queue Format - opt into instance locking - Microsoft vNIC: - support BIG TCP - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - cleanup Tx and Tx clock setting and other link-focused cleanups - enable SGMII and 2500BASEX mode switching for Intel platforms - support Sophgo SG2044 - Broadcom switches (b53): - support for BCM53101 - TI: - iep: add perout configuration support - icssg: support XDP - Cadence (macb): - implement BQL - Xilinx (axinet): - support dynamic IRQ moderation and changing coalescing at runtime - implement BQL - report standard stats - MediaTek: - support phylink managed EEE - Intel: - igc: don't restart the interface on every XDP program change - RealTek (r8169): - support reading registers of internal PHYs directly - increase max jumbo packet size on RTL8125/RTL8126 - Airoha: - support for RISC-V NPU packet processing unit - enable scatter-gather and support MTU up to 9kB - Tehuti (tn40xx): - support cards with TN4010 MAC and an Aquantia AQR105 PHY - Ethernet PHYs: - support for TJA1102S, TJA1121 - dp83tg720: add randomized polling intervals for link detection - dp83822: support changing the transmit amplitude voltage - support for LEDs on 88q2xxx - CAN: - canxl: support Remote Request Substitution bit access - flexcan: add S32G2/S32G3 SoC - WiFi: - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support - batman-adv: add support for jumbo frames - WiFi drivers: - RealTek (rtw88): - support RTL8814AE and RTL8814AU - RealTek (rtw89): - switch using wiphy_lock and wiphy_work - add BB context to manipulate two PHY as preparation of MLO - improve BT-coexistence mechanism to play A2DP smoothly - Intel (iwlwifi): - add new iwlmld sub-driver for latest HW/FW combinations - MediaTek (mt76): - preparation for mt7996 Multi-Link Operation (MLO) support - Qualcomm/Atheros (ath12k): - continued work on MLO - Silabs (wfx): - Wake-on-WLAN support - Bluetooth: - add support for skb TX SND/COMPLETION timestamping - hci_core: enable buffer flow control for SCO/eSCO - coredump: log devcd dumps into the monitor - Bluetooth drivers: - intel: add support to configure TX power - nxp: handle bootloader error during cmd5 and cmd7 Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmfkLC8ACgkQMUZtbf5S Irsb5g/+L7oKOf0ALbaV9kxFsoz8AymZfAW9i/27F07omGJGpks8oX6j6rQLgIRO OQOFcp7XEdDh1+jh82gHVuPrw2/6lchLtW8ARtzdiQKFr5DRjrsbtua6GRc8iBqA DIRCBFoV2HuMkF39Vr09HMa9AZAT7QR2RLsRGpSq8E8Z8xxKz0X7oujs10PFpMTE IVKhTrVrk+NDot/IU2hzVpnpup+0ld+T2/ZaBklJGcU8uDffImsqNepHRyCG5UC3 xz74Ju23MAj24Gct+og0yFUooF+lUltKyVm0FYCDCY3bASTwgY01NR3kEH/0NQvM cywLzd/ngHm/SMD2ggVAHkjZUieiIVHdaZ53dgjDeBOQoVP6p0dgUK7EumXX8Mx4 8ReR2UiGoYRPaq9c4o+IjG4K027MwVK2p+mF1a6MLa+20XcyMbev8FIRbbHtC/V4 z5/FsOAxcuICWkA1hU9bODrrGzIqemmdRgKG8sGuTJCt/kYGAn72/TCATGNSaCJ0 00n2jN1aepa7wtywHJ5MhVzxN9iQX7+geUHXz0BI+lK4e1Pmk+vjGksymb9ai2fk eQAUV9ekub6q68/J16scD7XeOUM37bTLiMBQeIF8UtZBOJscKiS71zn9QP9Twwxv P2pm01RDZUI+z5ZX3hc12Pm1vjRHaAh9S1JpAw/pTOVlQ+mAJEM= =XY0S -----END PGP SIGNATURE----- Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Continue Netlink conversions to per-namespace RTNL lock (IPv4 routing, routing rules, routing next hops, ARP ioctls) - Continue extending the use of netdev instance locks. As a driver opt-in protect queue operations and (in due course) ethtool operations with the instance lock and not RTNL lock. - Support collecting TCP timestamps (data submitted, sent, acked) in BPF, allowing for transparent (to the application) and lower overhead tracking of TCP RPC performance. - Tweak existing networking Rx zero-copy infra to support zero-copy Rx via io_uring. - Optimize MPTCP performance in single subflow mode by 29%. - Enable GRO on packets which went thru XDP CPU redirect (were queued for processing on a different CPU). Improving TCP stream performance up to 2x. - Improve performance of contended connect() by 200% by searching for an available 4-tuple under RCU rather than a spin lock. Bring an additional 229% improvement by tweaking hash distribution. - Avoid unconditionally touching sk_tsflags on RX, improving performance under UDP flood by as much as 10%. - Avoid skb_clone() dance in ping_rcv() to improve performance under ping flood. - Avoid FIB lookup in netfilter if socket is available, 20% perf win. - Rework network device creation (in-kernel) API to more clearly identify network namespaces and their roles. There are up to 4 namespace roles but we used to have just 2 netns pointer arguments, interpreted differently based on context. - Use sysfs_break_active_protection() instead of trylock to avoid deadlocks between unregistering objects and sysfs access. - Add a new sysctl and sockopt for capping max retransmit timeout in TCP. - Support masking port and DSCP in routing rule matches. - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST. - Support specifying at what time packet should be sent on AF_XDP sockets. - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users. - Add Netlink YAML spec for WiFi (nl80211) and conntrack. - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols which only need to be exported when IPv6 support is built as a module. - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to normal bridging. - Allow users to specify source port range for GENEVE tunnels. - netconsole: allow attaching kernel release, CPU ID and task name to messages as metadata Driver API: - Continue rework / fixing of Energy Efficient Ethernet (EEE) across the SW layers. Delegate the responsibilities to phylink where possible. Improve its handling in phylib. - Support symmetric OR-XOR RSS hashing algorithm. - Support tracking and preserving IRQ affinity by NAPI itself. - Support loopback mode speed selection for interface selftests. Device drivers: - Remove the IBM LCS driver for s390 - Remove the sb1000 cable modem driver - Add support for SFP module access over SMBus - Add MCTP transport driver for MCTP-over-USB - Enable XDP metadata support in multiple drivers - Ethernet high-speed NICs: - Broadcom (bnxt): - add PCIe TLP Processing Hints (TPH) support for new AMD platforms - support dumping RoCE queue state for debug - opt into instance locking - Intel (100G, ice, idpf): - ice: rework MSI-X IRQ management and distribution - ice: support for E830 devices - iavf: add support for Rx timestamping - iavf: opt into instance locking - nVidia/Mellanox: - mlx4: use page pool memory allocator for Rx - mlx5: support for one PTP device per hardware clock - mlx5: support for 200Gbps per-lane link modes - mlx5: move IPSec policy check after decryption - AMD/Solarflare: - support FW flashing via devlink - Cisco (enic): - use page pool memory allocator for Rx - enable 32, 64 byte CQEs - get max rx/tx ring size from the device - Meta (fbnic): - support flow steering and RSS configuration - report queue stats - support TCP segmentation - support IRQ coalescing - support ring size configuration - Marvell/Cavium: - support AF_XDP - Wangxun: - support for PTP clock and timestamping - Huawei (hibmcge): - checksum offload - add more statistics - Ethernet virtual: - VirtIO net: - aggressively suppress Tx completions, improve perf by 96% with 1 CPU and 55% with 2 CPUs - expose NAPI to IRQ mapping and persist NAPI settings - Google (gve): - support XDP in DQO RDA Queue Format - opt into instance locking - Microsoft vNIC: - support BIG TCP - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - cleanup Tx and Tx clock setting and other link-focused cleanups - enable SGMII and 2500BASEX mode switching for Intel platforms - support Sophgo SG2044 - Broadcom switches (b53): - support for BCM53101 - TI: - iep: add perout configuration support - icssg: support XDP - Cadence (macb): - implement BQL - Xilinx (axinet): - support dynamic IRQ moderation and changing coalescing at runtime - implement BQL - report standard stats - MediaTek: - support phylink managed EEE - Intel: - igc: don't restart the interface on every XDP program change - RealTek (r8169): - support reading registers of internal PHYs directly - increase max jumbo packet size on RTL8125/RTL8126 - Airoha: - support for RISC-V NPU packet processing unit - enable scatter-gather and support MTU up to 9kB - Tehuti (tn40xx): - support cards with TN4010 MAC and an Aquantia AQR105 PHY - Ethernet PHYs: - support for TJA1102S, TJA1121 - dp83tg720: add randomized polling intervals for link detection - dp83822: support changing the transmit amplitude voltage - support for LEDs on 88q2xxx - CAN: - canxl: support Remote Request Substitution bit access - flexcan: add S32G2/S32G3 SoC - WiFi: - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support - batman-adv: add support for jumbo frames - WiFi drivers: - RealTek (rtw88): - support RTL8814AE and RTL8814AU - RealTek (rtw89): - switch using wiphy_lock and wiphy_work - add BB context to manipulate two PHY as preparation of MLO - improve BT-coexistence mechanism to play A2DP smoothly - Intel (iwlwifi): - add new iwlmld sub-driver for latest HW/FW combinations - MediaTek (mt76): - preparation for mt7996 Multi-Link Operation (MLO) support - Qualcomm/Atheros (ath12k): - continued work on MLO - Silabs (wfx): - Wake-on-WLAN support - Bluetooth: - add support for skb TX SND/COMPLETION timestamping - hci_core: enable buffer flow control for SCO/eSCO - coredump: log devcd dumps into the monitor - Bluetooth drivers: - intel: add support to configure TX power - nxp: handle bootloader error during cmd5 and cmd7" * tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits) unix: fix up for "apparmor: add fine grained af_unix mediation" mctp: Fix incorrect tx flow invalidation condition in mctp-i2c net: usb: asix: ax88772: Increase phy_name size net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string net: libwx: fix Tx L4 checksum net: libwx: fix Tx descriptor content for some tunnel packets atm: Fix NULL pointer dereference net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus net: phy: aquantia: add essential functions to aqr105 driver net: phy: aquantia: search for firmware-name in fwnode net: phy: aquantia: add probe function to aqr105 for firmware loading net: phy: Add swnode support to mdiobus_scan gve: add XDP DROP and PASS support for DQ gve: update XDP allocation path support RX buffer posting gve: merge packet buffer size fields gve: update GQ RX to use buf_size gve: introduce config-based allocation for XDP gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics ... |
||
![]() |
592329e5e9 |
Summary
* Move vm_table members out of kernel/sysctl.c All vm_table array members have moved to their respective subsystems leading to the removal of vm_table from kernel/sysctl.c. This increases modularity by placing the ctl_tables closer to where they are actually used and at the same time reducing the chances of merge conflicts in kernel/sysctl.c. * ctl_table range fixes Replace the proc_handler function that checks variable ranges in coredump_sysctls and vdso_table with the one that actually uses the extra{1,2} pointers as min/max values. This tightens the range of the values that users can pass into the kernel effectively preventing {under,over}flows. * Misc fixes Correct grammar errors and typos in test messages. Update sysctl files in MAINTAINERS. Constified and removed array size in declaration for alignment_tbl * Testing - These have all been in linux-next for at least 1 month - They have gone through 0-day - Ran all these through sysctl selftests in x86_64 -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmfhV8EACgkQupfNUreW QU/udAv/VCXGkndQsJ5biXpXYFnokX0gIEaYzzHiqrFycZqr8ys0/wWzc+ar1LjF Jvanl2uKB0mUviLKt7Gk0+Hri+PJlYIrbx+5K5eo2wsKUUxFykqLLm59y/orPODl gyPQjKNpHJb7COsnEc3Lrq/fvol4NPHlcBPXG8NwehccTeBHZ1ninfo+pSnxh3o8 kI3GSLLxD4K9AgBl5QuVWH4gU7o//u7lUkKzy03NW+2jmuRv3dRcYF7IdgMINNee AeXnygdSBxLzECBvmkfNdyg+AmL8hdsmzbsIh7UuJDvxLlQOInVLZa+sXBotCOIc TImCrr1Ws1OuGrD0kpH+21tJvc8pNFWt61QlulObQdrLndWHdZEGyGOusLpXTwbn jIWZmMvzk1foSwdgzwPFzUqPEpW3FrBVDo4Z4kenBDrCp56QTX7hGRvkNYJNKvot Ue+i8BeHR/Gm/p+UMqgsSTOaNJXTqZhFqwJQVzxU/9LN/vkS0On6fbjgBd5X6Pn+ a5dlc9gy =0bcX -----END PGP SIGNATURE----- Merge tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move vm_table members out of kernel/sysctl.c All vm_table array members have moved to their respective subsystems leading to the removal of vm_table from kernel/sysctl.c. This increases modularity by placing the ctl_tables closer to where they are actually used and at the same time reducing the chances of merge conflicts in kernel/sysctl.c. - ctl_table range fixes Replace the proc_handler function that checks variable ranges in coredump_sysctls and vdso_table with the one that actually uses the extra{1,2} pointers as min/max values. This tightens the range of the values that users can pass into the kernel effectively preventing {under,over}flows. - Misc fixes Correct grammar errors and typos in test messages. Update sysctl files in MAINTAINERS. Constified and removed array size in declaration for alignment_tbl * tag 'sysctl-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (22 commits) selftests/sysctl: fix wording of help messages selftests: fix spelling/grammar errors in sysctl/sysctl.sh MAINTAINERS: Update sysctl file list in MAINTAINERS sysctl: Fix underflow value setting risk in vm_table coredump: Fixes core_pipe_limit sysctl proc_handler sysctl: remove unneeded include sysctl: remove the vm_table sh: vdso: move the sysctl to arch/sh/kernel/vsyscall/vsyscall.c x86: vdso: move the sysctl to arch/x86/entry/vdso/vdso32-setup.c fs: dcache: move the sysctl to fs/dcache.c sunrpc: simplify rpcauth_cache_shrink_count() fs: drop_caches: move sysctl to fs/drop_caches.c fs: fs-writeback: move sysctl to fs/fs-writeback.c mm: nommu: move sysctl to mm/nommu.c security: min_addr: move sysctl to security/min_addr.c mm: mmap: move sysctl to mm/mmap.c mm: util: move sysctls to mm/util.c mm: vmscan: move vmscan sysctls to mm/vmscan.c mm: swap: move sysctl to mm/swap.c mm: filemap: move sysctl to mm/filemap.c ... |
||
![]() |
336b4dae6d |
IOMMU Updates for Linux v6.15
Including: - Core: IOMMUFD dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API - Core: Fixes for probe-issues from Robin - Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups - AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes - ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() - Rockchip driver changes: - Driver adjustments for recent DT probing changes - S390 IOMMU changes: - Support for IOMMU passthrough - Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmfkFSkACgkQK/BELZcB GuOVUg/9En4QlF0eeg4E1stszGcOOXn9IkKiGP9iqKpL/WqVVitoagn7nwq0rBFF PdqqcGbSPNHctiUursB18DyENOAmRUE8mGG29po9rAfq5wxZ2yA6eLmpFAf9QDCW vY3nz2oen1lUrzbPwVI1IR67L6BrtqEUz/1syf295RMr2yXkdmBXb12lbL1rdjv0 7YfWwlDtMOVsewlZmCuDAAfWHtd7cO/ulDcYwq+GOiSXojVmKw8624EFlFbM9E9f tkceE4mjGQnN/CpZzUc48f1DgIW82PVVTek95Cznbk9ACDJQ4VYYwDMx2WszfMzu D1CXLiQu0vrxQgFdHn02bw3T4yvXQ4HeawIaTF+1J1gEHbrj6MoVQ88zaO3GnKFk yxYI5sfOs5iL6zSMkig4OytXuRmRqDVJhgrF1i4XIM3GXrRKf9EiRZ/1eXb+1gab 3Q5k7+u4+7vWbaKLsxOdIBNzz02hb7LVndPIFlWmSnr1YY0LFBfwmgbfTAsSINZo 22Ni8aE2C42lYOZxJ67m3khLpqTZaTERtQIab/gUd5FcuEjxKuHBIzj6SZeC/0Q9 Y6rzklHxJxxtQLrUJp16IzjpiMJbWH0H2jKstRberOpxk3L5aLrJNd8M1B+CokmP LZeKgqgxV5F+VlR2Bap5AXL9ZEuqjCVDIEy79j8XzAzxobDVnVM= =a8Pd -----END PGP SIGNATURE----- Merge tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Joerg Roedel: "Core iommufd dependencies from Jason: - Change the iommufd fault handle into an always present hwpt handle in the domain - Give iommufd its own SW_MSI implementation along with some IRQ layer rework - Improvements to the handle attach API Core fixes for probe-issues from Robin Intel VT-d changes: - Checking for SVA support in domain allocation and attach paths - Move PCI ATS and PRI configuration into probe paths - Fix a pentential hang on reboot -f - Miscellaneous cleanups AMD-Vi changes: - Support for up to 2k IRQs per PCI device function - Set of smaller fixes ARM-SMMU changes: - SMMUv2 devicetree binding updates for Qualcomm implementations (QCS8300 GPU and MSM8937) - Clean up SMMUv2 runtime PM implementation to help with wider rework of pm_runtime_put_autosuspend() Rockchip driver changes: - Driver adjustments for recent DT probing changes S390 IOMMU changes: - Support for IOMMU passthrough Apple Dart changes: - Driver adjustments to meet ISP device requirements - Null-ptr deref fix - Disable subpage protection for DART 1" * tag 'iommu-updates-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (54 commits) iommu/vt-d: Fix possible circular locking dependency iommu/vt-d: Don't clobber posted vCPU IRTE when host IRQ affinity changes iommu/vt-d: Put IRTE back into posted MSI mode if vCPU posting is disabled iommu: apple-dart: fix potential null pointer deref iommu/rockchip: Retire global dma_dev workaround iommu/rockchip: Register in a sensible order iommu/rockchip: Allocate per-device data sensibly iommu/mediatek-v1: Support COMPILE_TEST iommu/amd: Enable support for up to 2K interrupts per function iommu/amd: Rename DTE_INTTABLEN* and MAX_IRQS_PER_TABLE macro iommu/amd: Replace slab cache allocator with page allocator iommu/amd: Introduce generic function to set multibit feature value iommu: Don't warn prematurely about dodgy probes iommu/arm-smmu: Set rpm auto_suspend once during probe dt-bindings: arm-smmu: Document QCS8300 GPU SMMU iommu: Get DT/ACPI parsing into the proper probe path iommu: Keep dev->iommu state consistent iommu: Resolve ops in iommu_init_device() iommu: Handle race with default domain setup iommu: Unexport iommu_fwspec_free() ... |
||
![]() |
f0c6eab5e4 |
sched_ext: initialize built-in idle state before ops.init()
A BPF scheduler may want to use the built-in idle cpumasks in ops.init() before the scheduler is fully initialized, either directly or through a BPF timer for example. However, this would result in an error, since the idle state has not been properly initialized yet. This can be easily verified by modifying scx_simple to call scx_bpf_get_idle_cpumask() in ops.init(): $ sudo scx_simple DEBUG DUMP =========================================================================== scx_simple[121] triggered exit kind 1024: runtime error (built-in idle tracking is disabled) ... Fix this by properly initializing the idle state before ops.init() is called. With this change applied: $ sudo scx_simple local=2 global=0 local=19 global=11 local=23 global=11 ... Fixes: d73249f88743d ("sched_ext: idle: Make idle static keys private") Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
![]() |
a8897ed852 |
sched_ext: create_dsq: Return -EEXIST on duplicate request
create_dsq and therefore the scx_bpf_create_dsq kfunc currently silently ignore duplicate entries. As a sched_ext scheduler is creating each DSQ for a different purpose this is surprising behaviour. Replace rhashtable_insert_fast which ignores duplicates with rhashtable_lookup_insert_fast that reports duplicates (though doesn't return their value). The rest of the code is structured correctly and this now returns -EEXIST. Tested by adding an extra scx_bpf_create_dsq to scx_simple. Previously this was ignored, now init fails with a -17 code. Also ran scx_lavd which continued to work well. Signed-off-by: Jake Hillion <jake@hillion.co.uk> Acked-by: Andrea Righi <arighi@nvidia.com> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Tejun Heo <tj@kernel.org> |
||
![]() |
883cc35e9f |
sched_ext: Remove a meaningless conditional goto in scx_select_cpu_dfl()
scx_select_cpu_dfl() has a meaningless conditional goto at the end. Remove it. No functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrea Righi <arighi@nvidia.com> |
||
![]() |
37477d9eca |
sched_ext: idle: Fix return code of scx_select_cpu_dfl()
Return -EBUSY when using %SCX_PICK_IDLE_CORE with scx_select_cpu_dfl() if a fully idle SMT core cannot be found, instead of falling back to @prev_cpu, which is not a fully idle SMT core in this case. Fixes: c414c2171cd9e ("sched_ext: idle: Honor idle flags in the built-in idle selection policy") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org> |
||
![]() |
054570267d |
lsm/stable-6.15 PR 20250323
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmfgWgMUHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNW5RAAvCDq5gBtY0aTNlULe637EVLSh+t8 PkSzHzu/NlzU6BfjtwSm2fuML8welTGxSwUPxUzMCI91gPdkGeFktefavT3xa+QI BHWROn7fEJ/KmRZvngPeIkgLr5xhF5nBJmc/Jw71qem20zRzNgJnpzMX16d10Phx dxd2xOO1qM3bv6Z9RcIssZRGaN+PHngpWWg+0B69XuaBUso87S6NDyKNn1XPmvoz as96k+Wk/xAZGVEeCbs/+H5rBx6DLg+FfTRa06Oh4BFsqedpkDPxLrTgCJGJkA0H dsK6O/993zvjx0Jn4ZPoJ9n35S82BmkCsz4bGq1xVl6FYUiMcm3/8yO41wllS+w4 j+RlTU/RIdB7n8EKyMMl1hj1stTvt3Bi9F5Cbf7ZEv0snfR00K4KVpi17jnFjUHv kpOiEtXZb/NGQip7UAuUq0PisfqbiO4jJurYHRetDgv1WCy6+C8ufM5t6I+cnvmG VG+dlxcW+rDIn6bLRVuGi9TJRsQ6eox9ipa+qEKNNiOXgftELcgT7m74nAS5m0uv n5rDa221nPXecEB0X7d6YUFk711lly90dbelNeLrmv1w6jl8L1PpS1oBaW+UzGu9 46eGBd6pzu9otvK9WVyDEdotDOCrgH0sd7pTetqDhLJZ7KrGwyyqO2gD/JroUKcC lnxBQwPnat86iI8= =oxfV -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Various minor updates to the LSM Rust bindings Changes include marking trivial Rust bindings as inlines and comment tweaks to better reflect the LSM hooks. - Add LSM/SELinux access controls to io_uring_allowed() Similar to the io_uring_disabled sysctl, add a LSM hook to io_uring_allowed() to enable LSMs a simple way to enforce security policy on the use of io_uring. This pull request includes SELinux support for this new control using the io_uring/allowed permission. - Remove an unused parameter from the security_perf_event_open() hook The perf_event_attr struct parameter was not used by any currently supported LSMs, remove it from the hook. - Add an explicit MAINTAINERS entry for the credentials code We've seen problems in the past where patches to the credentials code sent by non-maintainers would often languish on the lists for multiple months as there was no one explicitly tasked with the responsibility of reviewing and/or merging credentials related code. Considering that most of the code under security/ has a vested interest in ensuring that the credentials code is well maintained, I'm volunteering to look after the credentials code and Serge Hallyn has also volunteered to step up as an official reviewer. I posted the MAINTAINERS update as a RFC to LKML in hopes that someone else would jump up with an "I'll do it!", but beyond Serge it was all crickets. - Update Stephen Smalley's old email address to prevent confusion This includes a corresponding update to the mailmap file. * tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: mailmap: map Stephen Smalley's old email addresses lsm: remove old email address for Stephen Smalley MAINTAINERS: add Serge Hallyn as a credentials reviewer MAINTAINERS: add an explicit credentials entry cred,rust: mark Credential methods inline lsm,rust: reword "destroy" -> "release" in SecurityCtx lsm,rust: mark SecurityCtx methods inline perf: Remove unnecessary parameter of security check lsm: fix a missing security_uring_allowed() prototype io_uring,lsm,selinux: add LSM hooks for io_uring_setup() io_uring: refactor io_uring_allowed() |
||
![]() |
7d20aa5c32 |
Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar). - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian). - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai). - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski). - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng). - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar). - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki). - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan). - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki). - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki). - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai). - Clean up the Energy Model handling code somewhat (Rafael Wysocki). - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing). - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba). - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki). - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao). - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki). - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki). - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King). - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki). - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu). - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang). - Remove needless return in three void functions related to system wakeup (Zijun Hu). - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver). - Remove unused helper functions related to system sleep (David Alan Gilbert). - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson). - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven). - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han). -----BEGIN PGP SIGNATURE----- iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI= =LqVy -----END PGP SIGNATURE----- Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are dominated by cpufreq updates which in turn are dominated by updates related to boost support in the core and drivers and amd-pstate driver optimizations. Apart from the above, there are some cpuidle updates including a rework of the most recent idle intervals handling in the venerable menu governor that leads to significant improvements in some performance benchmarks, as the governor is now more likely to predict a shorter idle duration in some cases, and there are updates of the core device power management code, mostly related to system suspend and resume, that should help to avoid potential issues arising when the drivers of devices depending on one another want to use different optimizations. There is also a usual collection of assorted fixes and cleanups, including removal of some unused code. Specifics: - Manage sysfs attributes and boost frequencies efficiently from cpufreq core to reduce boilerplate code in drivers (Viresh Kumar) - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider, Dhananjay Ugwekar, Imran Shaik, zuoqian) - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky Bai) - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski) - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng) - Optimize the amd-pstate driver to avoid cases where call paths end up calling the same writes multiple times and needlessly caching variables through code reorganization, locking overhaul and tracing adjustments (Mario Limonciello, Dhananjay Ugwekar) - Make it possible to avoid enabling capacity-aware scheduling (CAS) in the intel_pstate driver and relocate a check for out-of-band (OOB) platform handling in it to make it detect OOB before checking HWP availability (Rafael Wysocki) - Fix dbs_update() to avoid inadvertent conversions of negative integer values to unsigned int which causes CPU frequency selection to be inaccurate in some cases when the "conservative" cpufreq governor is in use (Jie Zhan) - Update the handling of the most recent idle intervals in the menu cpuidle governor to prevent useful information from being discarded by it in some cases and improve the prediction accuracy (Rafael Wysocki) - Make it possible to tell the intel_idle driver to ignore its built-in table of idle states for the given processor, clean up the handling of auto-demotion disabling on Baytrail and Cherrytrail chips in it, and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy, Rafael Wysocki) - Make some cpuidle drivers use for_each_present_cpu() instead of for_each_possible_cpu() during initialization to avoid issues occurring when nosmp or maxcpus=0 are used (Jacky Bai) - Clean up the Energy Model handling code somewhat (Rafael Wysocki) - Use kfree_rcu() to simplify the handling of runtime Energy Model updates (Li RongQing) - Add an entry for the Energy Model framework to MAINTAINERS as properly maintained (Lukasz Luba) - Address RCU-related sparse warnings in the Energy Model code (Rafael Wysocki) - Remove ENERGY_MODEL dependency on SMP and allow it to be selected when DEVFREQ is set without CPUFREQ so it can be used on a wider range of systems (Jeson Gao) - Unify error handling during runtime suspend and runtime resume in the core to help drivers to implement more consistent runtime PM error handling (Rafael Wysocki) - Drop a redundant check from pm_runtime_force_resume() and rearrange documentation related to __pm_runtime_disable() (Rafael Wysocki) - Rework the handling of the "smart suspend" driver flag in the PM core to avoid issues hat may occur when drivers using it depend on some other drivers and clean up the related PM core code (Rafael Wysocki, Colin Ian King) - Fix the handling of devices with the power.direct_complete flag set if device_suspend() returns an error for at least one device to avoid situations in which some of them may not be resumed (Rafael Wysocki) - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a possible deadlock that may occur if the "compressor" hibernation module parameter is accessed during the registration of a new ieee80211 device (Lizhi Xu) - Suppress sleeping parent warning in device_pm_add() in the case when new children are added under a device with the power.direct_complete set after it has been processed by device_resume() (Xu Yang) - Remove needless return in three void functions related to system wakeup (Zijun Hu) - Replace deprecated kmap_atomic() with kmap_local_page() in the hibernation core code (David Reaver) - Remove unused helper functions related to system sleep (David Alan Gilbert) - Clean up s2idle_enter() so it does not lock and unlock CPU offline in vain and update comments in it (Ulf Hansson) - Clean up broken white space in dpm_wait_for_children() (Geert Uytterhoeven) - Update the cpupower utility to fix lib version-ing in it and memory leaks in error legs, remove hard-coded values, and implement CPU physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah Khan, Yiwei Lin, Zhongqiu Han)" * tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits) PM: sleep: Fix bit masking operation dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650 dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1 dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible cpufreq: Init cpufreq only for present CPUs PM: sleep: Fix handling devices with direct_complete set on errors cpuidle: Init cpuidle only for present CPUs PM: clk: Remove unused pm_clk_remove() PM: sleep: core: Fix indentation in dpm_wait_for_children() PM: s2idle: Extend comment in s2idle_enter() PM: s2idle: Drop redundant locks when entering s2idle PM: sleep: Remove unused pm_generic_ wrappers cpufreq: tegra186: Share policy per cluster cpupower: Make lib versioning scheme more obvious and fix version link PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL PM: EM: Address RCU-related sparse warnings cpupower: Implement CPU physical core querying pm: cpupower: remove hard-coded topology depth values pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology ... |
||
![]() |
72c774aa9d |
objtool, panic: Disable SMAP in __stack_chk_fail()
__stack_chk_fail() can be called from uaccess-enabled code. Make sure uaccess gets disabled before calling panic(). Fixes the following warning: kernel/trace/trace_branch.o: error: objtool: ftrace_likely_update+0x1ea: call to __stack_chk_fail() with UACCESS enabled Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/a3e97e0119e1b04c725a8aa05f7bc83d98e657eb.1742852847.git.jpoimboe@kernel.org |
||
![]() |
a5b3d8660b |
hyperv-next for 6.15
-----BEGIN PGP SIGNATURE----- iQFHBAABCgAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmfhlLATHHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXgchCADOz33rSm4G4w4r0qT05dTDi/lZkEdK 64dQq322XXP/C9FfR66d30243gsAmuM5a0SvzFHLXAOu6yqM270Xehd/Rud+Um2s lSVnc0Ux0AWBgksqFd0t577aN7zmJEukosEYO5lBNop+zOcadrm3S6Th/AoL2h/D yphPkhH13bsCK+Wll/eBOQLIhC9iA0konYbBLuEQ5MqvUbrzc6Rmb5gxsHHZKOqg vLjkrYR/d3s2gIpKxiFp0RwvzGyffZEHxvU/YF3hTenPMlTlnXWbyspBSTVmWggP 13IFLzqxDdW9RgUnGB4xRc424AC1LKqEr42QPQE7zGvl2jdJriA2Q1LT =BXqj -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ... |
||
![]() |
e0344f9564 |
tracing: Replace strncpy with memcpy for fixed-length substring copy
checkpatch.pl reports the following warning: WARNING: Prefer strscpy, strscpy_pad, or __nonstring over strncpy (see: https://github.com/KSPP/linux/issues/90) In synth_field_string_size(), replace strncpy() with memcpy() to copy 'len' characters from 'start' to 'buf'. The code manually adds a NUL terminator after the copy, making memcpy safe here. Link: https://lore.kernel.org/20250325181232.38284-1-siddarthsgml@gmail.com Signed-off-by: Siddarth G <siddarthsgml@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
4d38328eb4 |
tracing: Fix synth event printk format for str fields
The printk format for synth event uses "%.*s" to print string fields, but then only passes the pointer part as var arg. Replace %.*s with %s as the C string is guaranteed to be null-terminated. The output in print fmt should never have been updated as __get_str() handles the string limit because it can access the length of the string in the string meta data that is saved in the ring buffer. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 8db4d6bfbbf92 ("tracing: Change synthetic event string format to limit printed length") Link: https://lore.kernel.org/20250325165202.541088-1-douglas.raillard@arm.com Signed-off-by: Douglas Raillard <douglas.raillard@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
![]() |
dc84bc2aba |
x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
If track_pfn_copy() fails, we already added the dst VMA to the maple tree. As fork() fails, we'll cleanup the maple tree, and stumble over the dst VMA for which we neither performed any reservation nor copied any page tables. Consequently untrack_pfn() will see VM_PAT and try obtaining the PAT information from the page table -- which fails because the page table was not copied. The easiest fix would be to simply clear the VM_PAT flag of the dst VMA if track_pfn_copy() fails. However, the whole thing is about "simply" clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy() and performed a reservation, but copying the page tables fails, we'll simply clear the VM_PAT flag, not properly undoing the reservation ... which is also wrong. So let's fix it properly: set the VM_PAT flag only if the reservation succeeded (leaving it clear initially), and undo the reservation if anything goes wrong while copying the page tables: clearing the VM_PAT flag after undoing the reservation. Note that any copied page table entries will get zapped when the VMA will get removed later, after copy_page_range() succeeded; as VM_PAT is not set then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be happy. Note that leaving these page tables in place without a reservation is not a problem, as we are aborting fork(); this process will never run. A reproducer can trigger this usually at the first try: https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110 Modules linked in: ... CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:get_pat_info+0xf6/0x110 ... Call Trace: <TASK> ... untrack_pfn+0x52/0x110 unmap_single_vma+0xa6/0xe0 unmap_vmas+0x105/0x1f0 exit_mmap+0xf6/0x460 __mmput+0x4b/0x120 copy_process+0x1bf6/0x2aa0 kernel_clone+0xab/0x440 __do_sys_clone+0x66/0x90 do_syscall_64+0x95/0x180 Likely this case was missed in: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") ... and instead of undoing the reservation we simply cleared the VM_PAT flag. Keep the documentation of these functions in include/linux/pgtable.h, one place is more than sufficient -- we should clean that up for the other functions like track_pfn_remap/untrack_pfn separately. Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed") Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3") Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yuxin wang <wang1315768607@163.com> Reported-by: Marius Fleischer <fleischermarius@gmail.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20250321112323.153741-1-david@redhat.com Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/ Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/ |
||
![]() |
dce3ab4c57 |
xen: branch for v6.15-rc1
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZ9/gEwAKCRCAXGG7T9hj vlxhAQCRzSCNI8wwvENnuc2OnRyWKy8gq7C5WAOIOJdJ3U+scQEAwKGhPJLwE4IS /JDh5PRJgZ4rdMYatuDfldEcSAfRRgw= =dF6Z -----END PGP SIGNATURE----- Merge tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - cleanup: remove an used function - add support for a XenServer specific virtual PCI device - fix the handling of a sparse Xen hypervisor symbol table - avoid warnings when building the kernel with gcc 15 - fix use of devices behind a VMD bridge when running as a Xen PV dom0 * tag 'for-linus-6.15-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flag PCI: vmd: Disable MSI remapping bypass under Xen xen/pci: Do not register devices with segments >= 0x10000 xen/pciback: Remove unused pcistub_get_pci_dev xenfs/xensyms: respect hypervisor's "next" indication xen/mcelog: Add __nonstring annotations for unterminated strings xen: Add support for XenServer 6.1 platform device |
||
![]() |
317a76a996 |
Updates for the VDSO infrastructure:
- Consolidate the VDSO storage The VDSO data storage and data layout has been largely architecture specific for historical reasons. That increases the maintenance effort and causes inconsistencies over and over. There is no real technical reason for architecture specific layouts and implementations. The architecture specific details can easily be integrated into a generic layout, which also reduces the amount of duplicated code for managing the mappings. Convert all architectures over to a unified layout and common mapping infrastructure. This splits the VDSO data layout into subsystem specific blocks, timekeeping, random and architecture parts, which provides a better structure and allows to improve and update the functionalities without conflict and interaction. - Rework the timekeeping data storage The current implementation is designed for exposing system timekeeping accessors, which was good enough at the time when it was designed. PTP and Time Sensitive Networking (TSN) change that as there are requirements to expose independent PTP clocks, which are not related to system timekeeping. Replace the monolithic data storage by a structured layout, which allows to add support for independent PTP clocks on top while reusing both the data structures and the time accessor implementations. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgSWUTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoYGED/0f/M8YyacAyErDYW4ufW+zh2sUidSf GVlK0Jn5BMljOoye+y2XfTxuvvXxEDjJNYiJm2uKGPdV29tjNXreGK39XyNqXPu5 jwR4f/IN/QVSM2nCO6jyydMz8ympJ2k6M4RewwmxXBL2KsUzzJWSKTgRNqM5Tdjs 1RhJMjkQVTiiSYerBpHXYCeZLM7/VEfZ120uuzVAYPXo0/R6zuyF7IBgIao9hbfO IQeCMLLfpDQHQhwquTA8ZbWqQusiEoSYHT+kTDa3eXDDbE/2UklAUs9gaatI979x 73zs0Yqxyx2iIGaghACWOAbKdcBWBeCYDw5fFwYVKn4VMQi1+wcxbtOYL767jp9o vfkLXGilXcVkvDjv4fH+e1NoJXXBxq1Ug1silKdOeJzenQF8Q1i3tavkWUVCNfwH qyOIM72NiCEWbYBDcz0lwBxEAyO4o0E6NP1bDc4y50VedEYIbXwSh0QGrdev1abn rjY9vsuUR9oznmZ6BRPPxMTY87gOSHoKvqydgSZUACEgLV9346f5qZf341OReYai MXUmXOM4+LdyaM1+Mec8ppvjMbLw+736NZyZtT2InusEBE+Ddp25L3hYiWnklJu8 2uwv0AoyrwaJ8y6ADOX4thcLZq0gND0Z/Ayz/XvpeI30eftsGUCt5KOVlqwfwOkI 4EQKvk2fAixPxg== =rwei -----END PGP SIGNATURE----- Merge tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull VDSO infrastructure updates from Thomas Gleixner: - Consolidate the VDSO storage The VDSO data storage and data layout has been largely architecture specific for historical reasons. That increases the maintenance effort and causes inconsistencies over and over. There is no real technical reason for architecture specific layouts and implementations. The architecture specific details can easily be integrated into a generic layout, which also reduces the amount of duplicated code for managing the mappings. Convert all architectures over to a unified layout and common mapping infrastructure. This splits the VDSO data layout into subsystem specific blocks, timekeeping, random and architecture parts, which provides a better structure and allows to improve and update the functionalities without conflict and interaction. - Rework the timekeeping data storage The current implementation is designed for exposing system timekeeping accessors, which was good enough at the time when it was designed. PTP and Time Sensitive Networking (TSN) change that as there are requirements to expose independent PTP clocks, which are not related to system timekeeping. Replace the monolithic data storage by a structured layout, which allows to add support for independent PTP clocks on top while reusing both the data structures and the time accessor implementations. * tag 'timers-vdso-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) sparc/vdso: Always reject undefined references during linking x86/vdso: Always reject undefined references during linking vdso: Rework struct vdso_time_data and introduce struct vdso_clock vdso: Move architecture related data before basetime data powerpc/vdso: Prepare introduction of struct vdso_clock arm64/vdso: Prepare introduction of struct vdso_clock x86/vdso: Prepare introduction of struct vdso_clock time/namespace: Prepare introduction of struct vdso_clock vdso/namespace: Rename timens_setup_vdso_data() to reflect new vdso_clock struct vdso/vsyscall: Prepare introduction of struct vdso_clock vdso/gettimeofday: Prepare helper functions for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_coarse() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres_timens() for introduction of struct vdso_clock vdso/gettimeofday: Prepare do_hres() for introduction of struct vdso_clock vdso/gettimeofday: Prepare introduction of struct vdso_clock vdso/helpers: Prepare introduction of struct vdso_clock vdso/datapage: Define vdso_clock to prepare for multiple PTP clocks vdso: Make vdso_time_data cacheline aligned arm64: Make asm/cache.h compatible with vDSO ... |
||
![]() |
a50b4fe095 |
A treewide hrtimer timer cleanup
hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5jQTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVvRD/wKtuwmiA66NJFgXC0qVq82A6fO3bY8 GBdbfysDJIbqGu5PTcULTbJ8qkqv3jeLUv6CcXvS4sZ7y/uJQl2lzf8yrD/0bbwc rLI6sHiPSZmK93kNVN4X5H7kvt7cE/DYC9nnEOgK3BY5FgKc4n9887d4aVBhL8Lv ODwVXvZ+xi351YCj7qRyPU24zt/p4tkkT1o2k4a0HBluqLI0D+V20fke9IERUL8r d1uWKlcn0TqYDesE8HXKIhbst3gx52rMJrXBJDHwFmG6v8Pj1fkTXCVpPo8QcBz8 OTVkpomN9f/Tx4+GZwhZOF86LhLL3OhxD6pT7JhFCXdmSGv+Ez8uyk1YZysM/XpV Juy/1yAcBpDIDkmhMFGdAAn48Nn9Fotty0r4je60zSEp1d/4QMXcFme29qr2JTUE iWnQ/HD6DxUjVHqy7CYvvo26Xegg1C7qgyOVt4PYZwAM1VKF5P3kzYTb4SAdxtop Tpji1sfW9QV08jqMNo6XntD32DSP9S2HqjO9LwBw700jnx2jjJ35fcJs6iodMOUn gckIZLMn3L0OoglPdyA5O7SNTbKE7aFiRKdnT/cJtR3Fa39Qu27CwC5gfiyuie9I Q+LG8GLuYSBHXAR+PBK4GWlzJ7Dn8k3eqmbnLeKpRMsU6ZzcttgA64xhaviN2wN0 iJbvLJeisXr3GA== =bYAX -----END PGP SIGNATURE----- Merge tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide hrtimer timer cleanup hrtimers are initialized with hrtimer_init() and a subsequent store to the callback pointer. This turned out to be suboptimal for the upcoming Rust integration and is obviously a silly implementation to begin with. This cleanup replaces the hrtimer_init(T); T->function = cb; sequence with hrtimer_setup(T, cb); The conversion was done with Coccinelle and a few manual fixups. Once the conversion has completely landed in mainline, hrtimer_init() will be removed and the hrtimer::function becomes a private member" * tag 'timers-cleanups-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits) wifi: rt2x00: Switch to use hrtimer_update_function() io_uring: Use helper function hrtimer_update_function() serial: xilinx_uartps: Use helper function hrtimer_update_function() ASoC: fsl: imx-pcm-fiq: Switch to use hrtimer_setup() RDMA: Switch to use hrtimer_setup() virtio: mem: Switch to use hrtimer_setup() drm/vmwgfx: Switch to use hrtimer_setup() drm/xe/oa: Switch to use hrtimer_setup() drm/vkms: Switch to use hrtimer_setup() drm/msm: Switch to use hrtimer_setup() drm/i915/request: Switch to use hrtimer_setup() drm/i915/uncore: Switch to use hrtimer_setup() drm/i915/pmu: Switch to use hrtimer_setup() drm/i915/perf: Switch to use hrtimer_setup() drm/i915/gvt: Switch to use hrtimer_setup() drm/i915/huc: Switch to use hrtimer_setup() drm/amdgpu: Switch to use hrtimer_setup() stm class: heartbeat: Switch to use hrtimer_setup() i2c: Switch to use hrtimer_setup() iio: Switch to use hrtimer_setup() ... |
||
![]() |
d5048d1176 |
Updates for the core time/timer subsystem:
- Fix a memory ordering issue in posix-timers Posix-timer lookup is lockless and reevaluates the timer validity under the timer lock, but the update which validates the timer is not protected by the timer lock. That allows the store to be reordered against the initialization stores, so that the lookup side can observe a partially initialized timer. That's mostly a theoretical problem, but incorrect nevertheless. - Fix a long standing inconsistency of the coarse time getters The coarse time getters read the base time of the current update cycle without reading the actual hardware clock. NTP frequency adjustment can set the base time backwards. The fine grained interfaces compensate this by reading the clock and applying the new conversion factor, but the coarse grained time getters use the base time directly. That allows the user to observe time going backwards. Cure it by always forwarding base time, when NTP changes the frequency with an immediate step. - Rework of posix-timer hashing The posix-timer hash is not scalable and due to the CRIU timer restore mechanism prone to massive contention on the global hash bucket lock. Replace the global hash lock with a fine grained per bucket locking scheme to address that. - Rework the proc/$PID/timers interface. /proc/$PID/timers is provided for CRIU to be able to restore a timer. The printout happens with sighand lock held and interrupts disabled. That's not required as this can be done with RCU protection as well. - Provide a sane mechanism for CRIU to restore a timer ID CRIU restores timers by creating and deleting them until the kernel internal per process ID counter reached the requested ID. That's horribly slow for sparse timer IDs. Provide a prctl() which allows CRIU to restore a timer with a given ID. When enabled the ID pointer is used as input pointer to read the requested ID from user space. When disabled, the normal allocation scheme (next ID) is active as before. This is backwards compatible for both kernel and user space. - Make hrtimer_update_function() less expensive. The sanity checks are valuable, but expensive for high frequency usage in io/uring. Make the debug checks conditional and enable them only when lockdep is enabled. - Small updates, cleanups and improvements -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmfgQ6wTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoeQzD/9p+EuUGrMbSNaLVMCYFULBbR0lersJ hrGGoKUsNt5T+f6hEEbSLBnkjZcMIj0J+mdIEUiRa73ryw1KmwLk/8MBu0c6u6q3 musDvJqt3dLTG98yN0YeWK3tJDxhSjxIpwcAXusPQ04j16I2fVXFzDQ/kGPq6MTI tdMYzsS3wjuWpi+CbgRSP2HEwu08fIDVsQ7Grynh4Kmd31apne4ZgF2UVp6UiZyp 8yJHZgVzJcFs7Y3MS6XTgezHnuADxMY1irzbXmok19941X8mZz2QRIpGQX+oMh6o g7SG2lj9i8YbLqU9/5RbC5ppjRcWfogDpW0Lk+OmdOpr0RiXTmx5Lz8Egxex9wG5 pUJszeTY+bLw7mmYmkGZyBz+PNoGgVM5KFZRe5ENvYM8Gy8LUW5DA9zvxeHqDDz1 FiMmKdYrwr8VCKqx+8hJQdzlzRbepxq9sNzDdMKVOUcFdGUVWekfG6ZFkfLKxwzA XDTKJilzXbAAj4r57vEvOCYLUZH/ZsFK4yyg0O53fEg6fj87EbTDb5+YUGazb3+C yNTEOQIT8LtutzLR9+xeLi92k+6zlJ4c1PfqBx5Kv/TwBrIfV1P8N2c6TCOWDoRM AOvo2SXEA/jEPix2GjT5jalSV1mROEXo2T9/G7kz4H7K+DkI/dGgS9mXyUDO2mMd ouOxYN0GohVqTQ== =XUGH -----END PGP SIGNATURE----- Merge tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Fix a memory ordering issue in posix-timers Posix-timer lookup is lockless and reevaluates the timer validity under the timer lock, but the update which validates the timer is not protected by the timer lock. That allows the store to be reordered against the initialization stores, so that the lookup side can observe a partially initialized timer. That's mostly a theoretical problem, but incorrect nevertheless. - Fix a long standing inconsistency of the coarse time getters The coarse time getters read the base time of the current update cycle without reading the actual hardware clock. NTP frequency adjustment can set the base time backwards. The fine grained interfaces compensate this by reading the clock and applying the new conversion factor, but the coarse grained time getters use the base time directly. That allows the user to observe time going backwards. Cure it by always forwarding base time, when NTP changes the frequency with an immediate step. - Rework of posix-timer hashing The posix-timer hash is not scalable and due to the CRIU timer restore mechanism prone to massive contention on the global hash bucket lock. Replace the global hash lock with a fine grained per bucket locking scheme to address that. - Rework the proc/$PID/timers interface. /proc/$PID/timers is provided for CRIU to be able to restore a timer. The printout happens with sighand lock held and interrupts disabled. That's not required as this can be done with RCU protection as well. - Provide a sane mechanism for CRIU to restore a timer ID CRIU restores timers by creating and deleting them until the kernel internal per process ID counter reached the requested ID. That's horribly slow for sparse timer IDs. Provide a prctl() which allows CRIU to restore a timer with a given ID. When enabled the ID pointer is used as input pointer to read the requested ID from user space. When disabled, the normal allocation scheme (next ID) is active as before. This is backwards compatible for both kernel and user space. - Make hrtimer_update_function() less expensive. The sanity checks are valuable, but expensive for high frequency usage in io/uring. Make the debug checks conditional and enable them only when lockdep is enabled. - Small updates, cleanups and improvements * tag 'timers-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits) selftests/timers: Improve skew_consistency by testing with other clockids timekeeping: Fix possible inconsistencies in _COARSE clockids posix-timers: Drop redundant memset() invocation selftests/timers/posix-timers: Add a test for exact allocation mode posix-timers: Provide a mechanism to allocate a given timer ID posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held posix-timers: Make per process list RCU safe posix-timers: Avoid false cacheline sharing posix-timers: Switch to jhash32() posix-timers: Improve hash table performance posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t posix-timers: Make lock_timer() use guard() posix-timers: Rework timer removal posix-timers: Simplify lock/unlock_timer() posix-timers: Use guards in a few places posix-timers: Remove SLAB_PANIC from kmem cache posix-timers: Remove a few paranoid warnings posix-timers: Cleanup includes posix-timers: Add cond_resched() to posix_timer_add() search loop posix-timers: Initialise timer before adding it to the hash table ... |
||
![]() |
0ae2062ee3 |
A single update for futexes:
Use a precomputed mask for the hash computation instead of computing the mask from the size on every invocation. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5PETHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoTCJEADEYDoQGkMCwUoTDJ4xVLohb89lH4KN cB38tF0YtHDMoBk2AXzHvDJioqPr5+YRQn3cYVpqQ6ajV0fVay8XDIdbD0GrhSUk KSiz6sfIVT8c18bvgK/YrnQL0B5AytY5X+hYd3A5tS7NHlKNs3nHuZetNFrklDUM RI7dY2dOU0jjzEB1WFMSIU2gIt/B9994fd+uhsSfKhJgVojOO5VpuREzMRb5nTYr sZRvel9JqfvADz1Wr9DpX5JGQ65m1uEXFK8bbdYemJmqqKbdQt0pEWWLC5PJkhNJ z9vih+TPJqx7THNgind+KNNO7Yf3V1aKiks+GKVOn0WplhybmlhqqYdfSgzFZOpR BymsxPQ9AQRf7O+j23f5Ys1Ty/1mc+ZcKYPCmasoG3MngmMt6nS+7nwiK1lnZEYQ hd7EpFKl8pPe3Ga5Cp6iFj64vIna5eMGQ0vFL7DqCY8mnzDnqhZBMk6xrJ/QEgXj rpLYKwDXz34EwOKSuWtFQcds4HUDeOxdJQCSbmfpQo4wh08HmcNK+lr3PXub6a7y ncEFOD4rmRLXfofQyF/m4MKwv5I/nwkSNDQh/hLNfHrmWxGZPE8J9vFJccQcAsf6 2vL5G0cNTKjNA5Afsi8bpi8Khlo9hJTW18Sw7DegpEwJxJjww723X+3Ati8qTjtr clVhRAfjHPp5WQ== =GyIS -----END PGP SIGNATURE----- Merge tag 'locking-futex-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex update from Thomas Gleixner: "A single update for futexes: Use a precomputed mask for the hash computation instead of computing the mask from the size on every invocation" * tag 'locking-futex-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Use a hashmask instead of hashsize |
||
![]() |
0f40464674 |
Updates for interrupt chip drivers:
- Support for hard indices on RISC-V. The hart index identifies a hart (core) within a specific interrupt domain in RISC-V's Priviledged Architecture. - Rework of the RISC-V MSI driver. This moves the driver over to the generic MSI library and solves the affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI controllers are prone to lose interrupts when the MSI message is updated to change the affinity because the message write consists of three 32-bit subsequent writes, which update address and data. As these writes are non-atomic versus the device raising an interrupt, the device can observe a half written update and issue an interrupt on the wrong vector. This is mitiated by a carefully orchestrated step by step update and the observation of an eventually pending interrupt on the CPU which issues the update. The algorithm follows the well established method of the X86 MSI driver. - A new driver for the RISC-V Sophgo SG2042 MSI controller - Overhaul of the Renesas RZQ2L driver. Simplification of the probe function by using devm_*() mechanisms, which avoid the endless list of error prone gotos in the failure paths. - Expand the Renesas RZV2H driver to support RZ/G3E SoCs - A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to ensure that the addressing is limited to the lower 32-bit of the physical address space. - Add support for the Allwinner AS23 NMI controller - Expand the IMX irqsteer driver to handle up to 960 input interrupts - The usual small updates, cleanups and device tree changes. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff454THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoZqoD/4kdHzbxfLpf7vC3NnG8NWwTq5FpbSx 6grQC9hWNMAs4n2IFjJRFLrjeX3AcdAQXL/BWuM0LfW9tQDQaVmqlSIlB/bn69KB 7HyAR6ozbOgnHKGAqFUXSLf+4pq+6q3mOgGKIF289dy14HFu4ta0DqKgkPZeQnVs R/J8i7REUnn+YuxzSt5eOqyDPyt2EHJosSUABSWQZBlrM9jy1W7f6NqDFwawiVsa +tv4U/bz91vjzVxwTIgt7nJK+b2HVYdxoZYuKJwPaTsj26ANPp6ltjRTeOmZhb5h uKgw+OyzDnk6q+tjGcRqrqwl291VKxCvnRiqHFfu3CERdmI9qvpN9IRcEJqIbkcN cakekhAyt7OO7sEPcql5vBL97e9hpb7EcH78gYxwHf8Dy0rFZUvSC5v+L6VRFnJS XcKA1L+f9B6u5qxnBtLan9IW08HYNdvmPq6AuVjk+ndKioPUFqB2q6AtXpuA3Rmu Y3XH/wh/q5wk0pgeByxQW6swsfpMN3OYK3mpLx475wFh2NKzcdGlwGhDFhiw8DKX m1AESy3UZatj1a0qGaFS/M+mm9KGrDYIMrje832Wf4Yf1LGmTsDkd3/V99oazSsq Jm4qhDASXChJXd0imQICX9hPw0aHTlLYNs54obUXVULH4HivQKIgWhUXrjG0dBDL +tttjuv5FJxr3A== =jPHa -----END PGP SIGNATURE----- Merge tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq driver updates from Thomas Gleixner: - Support for hard indices on RISC-V. The hart index identifies a hart (core) within a specific interrupt domain in RISC-V's Priviledged Architecture. - Rework of the RISC-V MSI driver This moves the driver over to the generic MSI library and solves the affinity problem of unmaskable PCI/MSI controllers. Unmaskable PCI/MSI controllers are prone to lose interrupts when the MSI message is updated to change the affinity because the message write consists of three 32-bit subsequent writes, which update address and data. As these writes are non-atomic versus the device raising an interrupt, the device can observe a half written update and issue an interrupt on the wrong vector. This is mitiated by a carefully orchestrated step by step update and the observation of an eventually pending interrupt on the CPU which issues the update. The algorithm follows the well established method of the X86 MSI driver. - A new driver for the RISC-V Sophgo SG2042 MSI controller - Overhaul of the Renesas RZQ2L driver Simplification of the probe function by using devm_*() mechanisms, which avoid the endless list of error prone gotos in the failure paths. - Expand the Renesas RZV2H driver to support RZ/G3E SoCs - A workaround for Rockchip 3568002 erratum in the GIC-V3 driver to ensure that the addressing is limited to the lower 32-bit of the physical address space. - Add support for the Allwinner AS23 NMI controller - Expand the IMX irqsteer driver to handle up to 960 input interrupts - The usual small updates, cleanups and device tree changes * tag 'irq-drivers-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits) irqchip/imx-irqsteer: Support up to 960 input interrupts irqchip/sunxi-nmi: Support Allwinner A523 NMI controller dt-bindings: irq: sun7i-nmi: Document the Allwinner A523 NMI controller irqchip/davinci-cp-intc: Remove public header irqchip/renesas-rzv2h: Add RZ/G3E support irqchip/renesas-rzv2h: Update macros ICU_TSSR_TSSEL_{MASK,PREP} irqchip/renesas-rzv2h: Update TSSR_TIEN macro irqchip/renesas-rzv2h: Add field_width to struct rzv2h_hw_info irqchip/renesas-rzv2h: Add max_tssel to struct rzv2h_hw_info irqchip/renesas-rzv2h: Add struct rzv2h_hw_info with t_offs variable irqchip/renesas-rzv2h: Use devm_pm_runtime_enable() irqchip/renesas-rzv2h: Use devm_reset_control_get_exclusive_deasserted() irqchip/renesas-rzv2h: Simplify rzv2h_icu_init() irqchip/renesas-rzv2h: Drop irqchip from struct rzv2h_icu_priv irqchip/renesas-rzv2h: Fix wrong variable usage in rzv2h_tint_set_type() dt-bindings: interrupt-controller: renesas,rzv2h-icu: Document RZ/G3E SoC riscv: sophgo: dts: Add msi controller for SG2042 irqchip: Add the Sophgo SG2042 MSI interrupt controller dt-bindings: interrupt-controller: Add Sophgo SG2042 MSI arm64: dts: rockchip: rk356x: Move PCIe MSI to use GIC ITS instead of MBI ... |
||
![]() |
36f5f026df |
Updates for MSI interrupts
- Switch the MSI descriptor locking to guards - Replace the broken PCI/TPH implementation, which lacks any form of serialization against concurrent modifications with a properly serialized mechanism in the PCI/MSI core code. - Replace the MSI descriptor abuse in the SCSI/UFS Qualcom driver with dedicated driver internal storage. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff5JcTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYocp6D/9IUXX/8HsBVZwCBY1/SYAC8fnTUNGG fMxPTX0BSRNOaChvGkyIllEEkNbh+WDMh1qkV94fusJCRvQVaDnwJZ9zoTMJekiM ZEBXdqTiDyqUG5LzFeoYQFu/XBth59HMkaDqCxJEy+YbjnTNamX8QS8A/FZlPNH0 uRs4m7oQxABL6E/JkjOkC0UOqddZ5BDwbLgXoGq+uDSLTicO9yR6EsigVacoreap zmkaUPRmL9NM8onrP2/9LduWwgcxn9u2wvd0uOCoIHBUKrlIpHkuNjieQ4R8QCP0 b7INNTEnwv815O11EoU+AVoL+sTYjUVOAUzfeNejlUspwvjBASm5lUkyB2TS5nq7 TtCxFyIS7/eBXwrKeV1w9ZcGjcso0DRJPvBQvhhL+q00Mtlrbpy7NUIF5S/8I60q QWLN1h1iKISl9aT5p7K1v1lnGyPTb942PNmRc9Cd8Si3QMt6be3RkAJekPWU78Wd Q9fXiMS4vyZhtt1S0vKU1L80Ezg3JVL6tWWpsSdZhMBM8wYiw1PI9NYtYEO2lzP/ hDHOu4EaItPg5A6Z+IDOj13pddtQsnssiWJmcDvarnJKBOfmAbwOcNGpwcwCbm62 bephtPc4h7En6/d4lsi8yzK6dSpf7yHLxTRrp9lomMaydEjfsNWcAs/nN2Y+wbr4 UMuZDSrlTGHP0Q== =Vl8Q -----END PGP SIGNATURE----- Merge tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI irq updates from Thomas Gleixner: - Switch the MSI descriptor locking to guards - Replace the broken PCI/TPH implementation, which lacks any form of serialization against concurrent modifications with a properly serialized mechanism in the PCI/MSI core code - Replace the MSI descriptor abuse in the SCSI/UFS Qualcom driver with dedicated driver internal storage * tag 'irq-msi-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/msi: Rename msi_[un]lock_descs() scsi: ufs: qcom: Remove the MSI descriptor abuse PCI/TPH: Replace the broken MSI-X control word update PCI/MSI: Provide a sane mechanism for TPH PCI: hv: Switch MSI descriptor locking to guard() PCI/MSI: Switch to MSI descriptor locking to guard() NTB/msi: Switch MSI descriptor locking to lock guard() soc: ti: ti_sci_inta_msi: Switch MSI descriptor locking to guard() genirq/msi: Use lock guards for MSI descriptor locking cleanup: Provide retain_ptr() genirq/msi: Make a few functions static |
||
![]() |
43a7eec035 |
A small set of core changes for the interrupt subsystem:
- Expose the MSI message in the existing debug filesystem dump. That's useful for validation and debugging. - Small cleanups -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmff3hwTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoXLsEACHNrH1Pl7J7L7NUu2XW/CW1pPrfw2d A0GrGdAEGnLr8Pt4u45uNwYxmVcEQEdlobJ/nj/x31vg1OKFccirBbFHiLRqf1KK dwNeasSrMUuickBLSF+a7IxtdIYYPWUslrao67bTjxrStOT32EgDVudg1jRe3AB1 O/keSLicj6TEqUlEhEdUTANFY3aInJqPvObveuqcqggVhK6SWqG87DnYeH20hVE4 OPQHp32PD5pzSrYc+aPlc0UpaWmZdxnIt5pHU4QNp1ZaxVVKwUxAJGcJrHiPw0id jVz6Q3e98dWEY1oRl7qKkcWRr+1FrA4NlGqvGxSWoxZEFyPZrTxA2UZ7LNF75akq EpA/my70a92ZJ5VHdyWh6jFQAZHdPlhNu1wQhQQIFfp1nqv8ZMVL+IkVgf3xO1zK IT4Ru60ieR0TXPPO4lzf7mojbywgCrEYMm6QP3uulkcWFKfOEPZlBhspfEIVAHPZ v+bNebNYx7i50W8n6ataRChHVhoHrZsGygsK8K4z/8UWcnqevYojnAOVF/e6rVN+ NPHKbz/gzxIfDVVfoKmq+aHgnQk1K3ZaYvc3OjyHgW5KFqUTUvz5R1Hgdc4cmcPs BejIhhaRd47s5yl90tx9HmXsOYDznQDVxbmR1EWl0y9QO8uCJ0NIdlyXYBrLbm5q 3GgDiQWoaEyg+w== =zPgs -----END PGP SIGNATURE----- Merge tag 'irq-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "A small set of core changes for the interrupt subsystem: - Expose the MSI message in the existing debug filesystem dump. That's useful for validation and debugging. - Small cleanups" * tag 'irq-core-2025-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Make a few functions static irqdomain: Remove extern from function declarations genirq/msi: Expose MSI message data in debugfs |
||
![]() |
9133607de3
|
exit: fix the usage of delay_group_leader->exit_code in do_notify_parent() and pidfs_exit()
Consider a process with a group leader L and a sub-thread T. L does sys_exit(1), then T does sys_exit_group(2). In this case wait_task_zombie(L) will notice SIGNAL_GROUP_EXIT and use L->signal->group_exit_code, this is correct. But, before that, do_notify_parent(L) called by release_task(T) will use L->exit_code != L->signal->group_exit_code, and this is not consistent. We don't really care, I think that nobody relies on the info which comes with SIGCHLD, if nothing else SIGCHLD < SIGRTMIN can be queued only once. But pidfs_exit() is more problematic, I think pidfs_exit_info->exit_code should report ->group_exit_code in this case, just like wait_task_zombie(). TODO: with this change we can hopefully cleanup (or may be even kill) the similar SIGNAL_GROUP_EXIT checks, at least in wait_task_zombie(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250324171941.GA13114@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
0b7747a547
|
pidfs: cleanup the usage of do_notify_pidfd()
If a single-threaded process exits do_notify_pidfd() will be called twice, from exit_notify() and right after that from do_notify_parent(). 1. Change exit_notify() to call do_notify_pidfd() if the exiting task is not ptraced and it is not a group leader. 2. Change do_notify_parent() to call do_notify_pidfd() unconditionally. If tsk is not ptraced, do_notify_parent() will only be called when it is a group-leader and thread_group_empty() is true. This means that if tsk is ptraced, do_notify_pidfd() will be called from do_notify_parent() even if tsk is a delay_group_leader(). But this case is less common, and apart from the unnecessary __wake_up() is harmless. Granted, this unnecessary __wake_up() can be avoided, but I don't want to do it in this patch because it's just a consequence of another historical oddity: we notify the tracer even if !thread_group_empty(), but do_wait() from debugger can't work until all other threads exit. With or without this patch we should either eliminate do_notify_parent() in this case, or change do_wait(WEXITED) to untrace the ptraced delay_group_leader() at least when ptrace_reparented(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250323171955.GA834@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> |
||
![]() |
61c39d8c83 |
lockdep: Fix wait context check on softirq for PREEMPT_RT
Since: 0c1d7a2c2d32 ("lockdep: Remove softirq accounting on PREEMPT_RT.") the wait context test for mutex usage within "in softirq context" fails as it references @softirq_context: | wait context tests | -------------------------------------------------------------------------- | rcu | raw | spin |mutex | -------------------------------------------------------------------------- in hardirq context: ok | ok | ok | ok | in hardirq context (not threaded): ok | ok | ok | ok | in softirq context: ok | ok | ok |FAILED| As a fix, add lockdep map for BH disabled section. This fixes the issue by letting us catch cases when local_bh_disable() gets called with preemption disabled where local_lock doesn't get acquired. In the case of "in softirq context" selftest, local_bh_disable() was being called with preemption disable as it's early in the boot. [ boqun: Move the lockdep annotations into __local_bh_*() to avoid false positives because of unpaired local_bh_disable() reported by Borislav Petkov and Peter Zijlstra, and make bh_lock_map only exist for PREEMPT_RT. ] [ mingo: Restored authorship and improved the bh_lock_map definition. ] Signed-off-by: Ryo Takakura <ryotkkr98@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20250321143322.79651-1-boqun.feng@gmail.com |
||
![]() |
e34c38057a |
[ Merge note: this pull request depends on you having merged
two locking commits in the locking tree, part of the locking-core-2025-03-22 pull request. ] x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Bootup: Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/ DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M tJj1oc2TIZw= =Dcb3 -----END PGP SIGNATURE----- Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "x86 CPU features support: - Generate the <asm/cpufeaturemasks.h> header based on build config (H. Peter Anvin, Xin Li) - x86 CPUID parsing updates and fixes (Ahmed S. Darwish) - Introduce the 'setcpuid=' boot parameter (Brendan Jackman) - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan Jackman) - Utilize CPU-type for CPU matching (Pawan Gupta) - Warn about unmet CPU feature dependencies (Sohil Mehta) - Prepare for new Intel Family numbers (Sohil Mehta) Percpu code: - Standardize & reorganize the x86 percpu layout and related cleanups (Brian Gerst) - Convert the stackprotector canary to a regular percpu variable (Brian Gerst) - Add a percpu subsection for cache hot data (Brian Gerst) - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak) - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak) MM: - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction (Rik van Riel) - Rework ROX cache to avoid writable copy (Mike Rapoport) - PAT: restore large ROX pages after fragmentation (Kirill A. Shutemov, Mike Rapoport) - Make memremap(MEMREMAP_WB) map memory as encrypted by default (Kirill A. Shutemov) - Robustify page table initialization (Kirill A. Shutemov) - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn) - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW (Matthew Wilcox) KASLR: - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir Singh) CPU bugs: - Implement FineIBT-BHI mitigation (Peter Zijlstra) - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta) - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta) - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta) System calls: - Break up entry/common.c (Brian Gerst) - Move sysctls into arch/x86 (Joel Granados) Intel LAM support updates: (Maciej Wieczor-Retman) - selftests/lam: Move cpu_has_la57() to use cpuinfo flag - selftests/lam: Skip test if LAM is disabled - selftests/lam: Test get_user() LAM pointer handling AMD SMN access updates: - Add SMN offsets to exclusive region access (Mario Limonciello) - Add support for debugfs access to SMN registers (Mario Limonciello) - Have HSMP use SMN through AMD_NODE (Yazen Ghannam) Power management updates: (Patryk Wlazlyn) - Allow calling mwait_play_dead with an arbitrary hint - ACPI/processor_idle: Add FFH state handling - intel_idle: Provide the default enter_dead() handler - Eliminate mwait_play_dead_cpuid_hint() Build system: - Raise the minimum GCC version to 8.1 (Brian Gerst) - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor) Kconfig: (Arnd Bergmann) - Add cmpxchg8b support back to Geode CPUs - Drop 32-bit "bigsmp" machine support - Rework CONFIG_GENERIC_CPU compiler flags - Drop configuration options for early 64-bit CPUs - Remove CONFIG_HIGHMEM64G support - Drop CONFIG_SWIOTLB for PAE - Drop support for CONFIG_HIGHPTE - Document CONFIG_X86_INTEL_MID as 64-bit-only - Remove old STA2x11 support - Only allow CONFIG_EISA for 32-bit Headers: - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers (Thomas Huth) Assembly code & machine code patching: - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf) - x86/alternatives: Simplify callthunk patching (Peter Zijlstra) - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf) - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf) - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra) - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> (Uros Bizjak) - Use named operands in inline asm (Uros Bizjak) - Improve performance by using asm_inline() for atomic locking instructions (Uros Bizjak) Earlyprintk: - Harden early_serial (Peter Zijlstra) NMI handler: - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus() (Waiman Long) Miscellaneous fixes and cleanups: - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner, Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly Kuznetsov, Xin Li, liuye" * tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits) zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault x86/asm: Make asm export of __ref_stack_chk_guard unconditional x86/mm: Only do broadcast flush from reclaim if pages were unmapped perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones perf/x86/intel, x86/cpu: Simplify Intel PMU initialization x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions x86/asm: Use asm_inline() instead of asm() in clwb() x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h> x86/hweight: Use asm_inline() instead of asm() x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm() x86/hweight: Use named operands in inline asm() x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP x86/head/64: Avoid Clang < 17 stack protector in startup code x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h> x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro x86/cpu/intel: Limit the non-architectural constant_tsc model checks x86/mm/pat: Replace Intel x86_model checks with VFM ones x86/cpu/intel: Fix fast string initialization for extended Families ... |
||
![]() |
327ecdbc0f |
Performance events updates for v6.15:
Core: - Move perf_event sysctls into kernel/events/ (Joel Granados) - Use POLLHUP for pinned events in error (Namhyung Kim) - Avoid the read if the count is already updated (Peter Zijlstra) - Allow the EPOLLRDNORM flag for poll (Tao Chen) - locking/percpu-rwsem: Add guard support (Peter Zijlstra) [ NOTE: this got (mis-)merged into the perf tree due to related work. ] perf_pmu_unregister() related improvements: (Peter Zijlstra) - Simplify the perf_event_alloc() error path - Simplify the perf_pmu_register() error path - Simplify perf_pmu_register() - Simplify perf_init_event() - Simplify perf_event_alloc() - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count - Add this_cpc() helper - Introduce perf_free_addr_filters() - Robustify perf_event_free_bpf_prog() - Simplify the perf_mmap() control flow - Further simplify perf_mmap() - Remove retry loop from perf_mmap() - Lift event->mmap_mutex in perf_mmap() - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes - Fix perf_mmap() failure path Uprobes: - Harden x86 uretprobe syscall trampoline check (Jiri Olsa) - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang) - Remove the spinlock within handle_singlestep() (Liao Chang) x86 Intel PMU enhancements: - Support PEBS counters snapshotting (Kan Liang) - Fix intel_pmu_read_event() (Kan Liang) - Extend per event callchain limit to branch stack (Kan Liang) - Fix system-wide LBR profiling (Kan Liang) - Allocate bts_ctx only if necessary (Li RongQing) - Apply static call for drain_pebs (Peter Zijlstra) x86 AMD PMU enhancements: (Ravi Bangoria) - Remove pointless sample period check - Fix ->config to sample period calculation for OP PMU - Fix perf_ibs_op.cnt_mask for CurCnt - Don't allow freq mode event creation through ->config interface - Add PMU specific minimum period - Add ->check_period() callback - Ceil sample_period to min_period - Add support for OP Load Latency Filtering - Update DTLB/PageSize decode logic Hardware breakpoints: - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar) Hardlockup detector improvements: (Li Huafei) - perf_event memory leak - Warn if watchdog_ev is leaked Fixes and cleanups: - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfehRIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hF3g//TCAQijI6OFNpYiD1xoyMq4m+baIYhYx0 lnxwxhsN58JFcEJeWIEGLACqUePyH68jNKVSr9sIoeV4gnnMX+x2Ny6rh/1H3Ox+ jQyVmPdFmKa8QG7wGNjcDteIzlEKK4zqruXWaG54LX2e6kbQZWwd0I21MyXkrHXb oMIfyZbCAWuPW1wefZm8FPgImT+nvwOosyx90OVagGqk5mYdNb9DFhMjQveStHdQ BnWU6rYdW1c2eXKpeuvxY4uWQoCELC6WntLimvcswy6fb+9LtbglpCYQOGGDrGvp v3RASf/8clFVSau8P/8NEaNgLgjN/e3eN/fAoSut8Z22nAeBC6qv4qjFt1piDpbs AaEXYCYM0/Tfzjp3ctPsFrxbKvB8q2qhxSm37Co0Ix6WyJn3JQbNx48g8GIod2os eGPXSZzoz9O8coeTKKbxWp4fpAjFfyfe/ovWQuVd8JI4bYj7Mi63J+RxQDd2TkJP H+IgxZoamJExgS1YcKJUBtw7QKQm5pHFx03Br7KsNxgmHy7JdoN9bh0h14pkeXjB MnAvWOS5ouuriJgQ+4bqAezS8DSHnDdmFmWgNEEqAlOD9Zy9hDXJ2GiqbHKMyRNC ae35o0PDUFTIX9O5NPIDUyWtJb5uH/S1lQhS7GD+ODlMDIX+ny+REXf9krSCR1H0 GUqq2UmxBGA= =iPmA -----END PGP SIGNATURE----- Merge tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Core: - Move perf_event sysctls into kernel/events/ (Joel Granados) - Use POLLHUP for pinned events in error (Namhyung Kim) - Avoid the read if the count is already updated (Peter Zijlstra) - Allow the EPOLLRDNORM flag for poll (Tao Chen) - locking/percpu-rwsem: Add guard support [ NOTE: this got (mis-)merged into the perf tree due to related work ] (Peter Zijlstra) perf_pmu_unregister() related improvements: (Peter Zijlstra) - Simplify the perf_event_alloc() error path - Simplify the perf_pmu_register() error path - Simplify perf_pmu_register() - Simplify perf_init_event() - Simplify perf_event_alloc() - Merge struct pmu::pmu_disable_count into struct perf_cpu_pmu_context::pmu_disable_count - Add this_cpc() helper - Introduce perf_free_addr_filters() - Robustify perf_event_free_bpf_prog() - Simplify the perf_mmap() control flow - Further simplify perf_mmap() - Remove retry loop from perf_mmap() - Lift event->mmap_mutex in perf_mmap() - Detach 'struct perf_cpu_pmu_context' and 'struct pmu' lifetimes - Fix perf_mmap() failure path Uprobes: - Harden x86 uretprobe syscall trampoline check (Jiri Olsa) - Remove redundant spinlock in uprobe_deny_signal() (Liao Chang) - Remove the spinlock within handle_singlestep() (Liao Chang) x86 Intel PMU enhancements: - Support PEBS counters snapshotting (Kan Liang) - Fix intel_pmu_read_event() (Kan Liang) - Extend per event callchain limit to branch stack (Kan Liang) - Fix system-wide LBR profiling (Kan Liang) - Allocate bts_ctx only if necessary (Li RongQing) - Apply static call for drain_pebs (Peter Zijlstra) x86 AMD PMU enhancements: (Ravi Bangoria) - Remove pointless sample period check - Fix ->config to sample period calculation for OP PMU - Fix perf_ibs_op.cnt_mask for CurCnt - Don't allow freq mode event creation through ->config interface - Add PMU specific minimum period - Add ->check_period() callback - Ceil sample_period to min_period - Add support for OP Load Latency Filtering - Update DTLB/PageSize decode logic Hardware breakpoints: - Return EOPNOTSUPP for unsupported breakpoint type (Saket Kumar Bhaskar) Hardlockup detector improvements: (Li Huafei) - perf_event memory leak - Warn if watchdog_ev is leaked Fixes and cleanups: - Misc fixes and cleanups (Andy Shevchenko, Kan Liang, Peter Zijlstra, Ravi Bangoria, Thorsten Blum, XieLudan)" * tag 'perf-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) perf: Fix __percpu annotation perf: Clean up pmu specific data perf/x86: Remove swap_task_ctx() perf/x86/lbr: Fix shorter LBRs call stacks for the system-wide mode perf: Supply task information to sched_task() perf: attach/detach PMU specific data locking/percpu-rwsem: Add guard support perf: Save PMU specific data in task_struct perf: Extend per event callchain limit to branch stack perf/ring_buffer: Allow the EPOLLRDNORM flag for poll perf/core: Use POLLHUP for pinned events in error perf/core: Use sysfs_emit() instead of scnprintf() perf/core: Remove optional 'size' arguments from strscpy() calls perf/x86/intel/bts: Check if bts_ctx is allocated when calling BTS functions uprobes/x86: Harden uretprobe syscall trampoline check watchdog/hardlockup/perf: Warn if watchdog_ev is leaked watchdog/hardlockup/perf: Fix perf_event memory leak perf/x86: Annotate struct bts_buffer::buf with __counted_by() perf/core: Clean up perf_try_init_event() perf/core: Fix perf_mmap() failure path ... |