296 Commits

Author SHA1 Message Date
Linus Torvalds
7d20aa5c32 Power management updates for 6.15-rc1
- Manage sysfs attributes and boost frequencies efficiently from
    cpufreq core to reduce boilerplate code in drivers (Viresh Kumar).
 
  - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
    Dhananjay Ugwekar, Imran Shaik, zuoqian).
 
  - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
    Bai).
 
  - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski).
 
  - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng).
 
  - Optimize the amd-pstate driver to avoid cases where call paths end
    up calling the same writes multiple times and needlessly caching
    variables through code reorganization, locking overhaul and tracing
    adjustments (Mario Limonciello, Dhananjay Ugwekar).
 
  - Make it possible to avoid enabling capacity-aware scheduling (CAS) in
    the intel_pstate driver and relocate a check for out-of-band (OOB)
    platform handling in it to make it detect OOB before checking HWP
    availability (Rafael Wysocki).
 
  - Fix dbs_update() to avoid inadvertent conversions of negative integer
    values to unsigned int which causes CPU frequency selection to be
    inaccurate in some cases when the "conservative" cpufreq governor is
    in use (Jie Zhan).
 
  - Update the handling of the most recent idle intervals in the menu
    cpuidle governor to prevent useful information from being discarded
    by it in some cases and improve the prediction accuracy (Rafael
    Wysocki).
 
  - Make it possible to tell the intel_idle driver to ignore its built-in
    table of idle states for the given processor, clean up the handling
    of auto-demotion disabling on Baytrail and Cherrytrail chips in it,
    and update its MAINTAINERS entry (David Arcari, Artem Bityutskiy,
    Rafael Wysocki).
 
  - Make some cpuidle drivers use for_each_present_cpu() instead of
    for_each_possible_cpu() during initialization to avoid issues
    occurring when nosmp or maxcpus=0 are used (Jacky Bai).
 
  - Clean up the Energy Model handling code somewhat (Rafael Wysocki).
 
  - Use kfree_rcu() to simplify the handling of runtime Energy Model
    updates (Li RongQing).
 
  - Add an entry for the Energy Model framework to MAINTAINERS as
    properly maintained (Lukasz Luba).
 
  - Address RCU-related sparse warnings in the Energy Model code (Rafael
    Wysocki).
 
  - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
    when DEVFREQ is set without CPUFREQ so it can be used on a wider
    range of systems (Jeson Gao).
 
  - Unify error handling during runtime suspend and runtime resume in the
    core to help drivers to implement more consistent runtime PM error
    handling (Rafael Wysocki).
 
  - Drop a redundant check from pm_runtime_force_resume() and rearrange
    documentation related to __pm_runtime_disable() (Rafael Wysocki).
 
  - Rework the handling of the "smart suspend" driver flag in the PM core
    to avoid issues hat may occur when drivers using it depend on some
    other drivers and clean up the related PM core code (Rafael Wysocki,
    Colin Ian King).
 
  - Fix the handling of devices with the power.direct_complete flag set
    if device_suspend() returns an error for at least one device to avoid
    situations in which some of them may not be resumed (Rafael Wysocki).
 
  - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
    possible deadlock that may occur if the "compressor" hibernation
    module parameter is accessed during the registration of a new
    ieee80211 device (Lizhi Xu).
 
  - Suppress sleeping parent warning in device_pm_add() in the case when
    new children are added under a device with the power.direct_complete
    set after it has been processed by device_resume() (Xu Yang).
 
  - Remove needless return in three void functions related to system
    wakeup (Zijun Hu).
 
  - Replace deprecated kmap_atomic() with kmap_local_page() in the
    hibernation core code (David Reaver).
 
  - Remove unused helper functions related to system sleep (David Alan
    Gilbert).
 
  - Clean up s2idle_enter() so it does not lock and unlock CPU offline
    in vain and update comments in it (Ulf Hansson).
 
  - Clean up broken white space in dpm_wait_for_children() (Geert
    Uytterhoeven).
 
  - Update the cpupower utility to fix lib version-ing in it and memory
    leaks in error legs, remove hard-coded values, and implement CPU
    physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
    Khan, Yiwei Lin, Zhongqiu Han).
 -----BEGIN PGP SIGNATURE-----
 
 iQFGBAABCAAwFiEEcM8Aw/RY0dgsiRUR7l+9nS/U47UFAmfhhTYSHHJqd0Byand5
 c29ja2kubmV0AAoJEO5fvZ0v1OO16/gIAKuRiG1fFgUcUSXC1iFu42vrB/1i4wpA
 02GICACqM3K6/5jd3ct/WOU28GUgDs+xcmqH7CnMaM6y9nXEWjWarmSfFekAO+0q
 TPtQ7xTy0hBCB3he1P2uLKBJBin4Wn47U9/rvs4J7mQd5zDxTINKIiVoHg2lEE+s
 HAeSoNRb2sp5IZDm9+/LfhHNYRP1mJ97cbZlymqctGB3xgDL7qMLid/1+gFPHAQS
 4/LXj3IgyU8DpA/j5nhtpaAqjN5g2QxIUfQgADRIcESK99Y/7aAMs1/G0WhJKaay
 9yx+4/xmkGvVCZQx1DphksFLISEzltY0SFWLsoppPzBTGVEW2GQQsNI=
 =LqVy
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These are dominated by cpufreq updates which in turn are dominated by
  updates related to boost support in the core and drivers and
  amd-pstate driver optimizations.

  Apart from the above, there are some cpuidle updates including a
  rework of the most recent idle intervals handling in the venerable
  menu governor that leads to significant improvements in some
  performance benchmarks, as the governor is now more likely to predict
  a shorter idle duration in some cases, and there are updates of the
  core device power management code, mostly related to system suspend
  and resume, that should help to avoid potential issues arising when
  the drivers of devices depending on one another want to use different
  optimizations.

  There is also a usual collection of assorted fixes and cleanups,
  including removal of some unused code.

  Specifics:

   - Manage sysfs attributes and boost frequencies efficiently from
     cpufreq core to reduce boilerplate code in drivers (Viresh Kumar)

   - Minor cleanups to cpufreq drivers (Aaron Kling, Benjamin Schneider,
     Dhananjay Ugwekar, Imran Shaik, zuoqian)

   - Migrate some cpufreq drivers to using for_each_present_cpu() (Jacky
     Bai)

   - cpufreq-qcom-hw DT binding fixes (Krzysztof Kozlowski)

   - Use str_enable_disable() helper in cpufreq_online() (Lifeng Zheng)

   - Optimize the amd-pstate driver to avoid cases where call paths end
     up calling the same writes multiple times and needlessly caching
     variables through code reorganization, locking overhaul and tracing
     adjustments (Mario Limonciello, Dhananjay Ugwekar)

   - Make it possible to avoid enabling capacity-aware scheduling (CAS)
     in the intel_pstate driver and relocate a check for out-of-band
     (OOB) platform handling in it to make it detect OOB before checking
     HWP availability (Rafael Wysocki)

   - Fix dbs_update() to avoid inadvertent conversions of negative
     integer values to unsigned int which causes CPU frequency selection
     to be inaccurate in some cases when the "conservative" cpufreq
     governor is in use (Jie Zhan)

   - Update the handling of the most recent idle intervals in the menu
     cpuidle governor to prevent useful information from being discarded
     by it in some cases and improve the prediction accuracy (Rafael
     Wysocki)

   - Make it possible to tell the intel_idle driver to ignore its
     built-in table of idle states for the given processor, clean up the
     handling of auto-demotion disabling on Baytrail and Cherrytrail
     chips in it, and update its MAINTAINERS entry (David Arcari, Artem
     Bityutskiy, Rafael Wysocki)

   - Make some cpuidle drivers use for_each_present_cpu() instead of
     for_each_possible_cpu() during initialization to avoid issues
     occurring when nosmp or maxcpus=0 are used (Jacky Bai)

   - Clean up the Energy Model handling code somewhat (Rafael Wysocki)

   - Use kfree_rcu() to simplify the handling of runtime Energy Model
     updates (Li RongQing)

   - Add an entry for the Energy Model framework to MAINTAINERS as
     properly maintained (Lukasz Luba)

   - Address RCU-related sparse warnings in the Energy Model code
     (Rafael Wysocki)

   - Remove ENERGY_MODEL dependency on SMP and allow it to be selected
     when DEVFREQ is set without CPUFREQ so it can be used on a wider
     range of systems (Jeson Gao)

   - Unify error handling during runtime suspend and runtime resume in
     the core to help drivers to implement more consistent runtime PM
     error handling (Rafael Wysocki)

   - Drop a redundant check from pm_runtime_force_resume() and rearrange
     documentation related to __pm_runtime_disable() (Rafael Wysocki)

   - Rework the handling of the "smart suspend" driver flag in the PM
     core to avoid issues hat may occur when drivers using it depend on
     some other drivers and clean up the related PM core code (Rafael
     Wysocki, Colin Ian King)

   - Fix the handling of devices with the power.direct_complete flag set
     if device_suspend() returns an error for at least one device to
     avoid situations in which some of them may not be resumed (Rafael
     Wysocki)

   - Use mutex_trylock() in hibernate_compressor_param_set() to avoid a
     possible deadlock that may occur if the "compressor" hibernation
     module parameter is accessed during the registration of a new
     ieee80211 device (Lizhi Xu)

   - Suppress sleeping parent warning in device_pm_add() in the case
     when new children are added under a device with the
     power.direct_complete set after it has been processed by
     device_resume() (Xu Yang)

   - Remove needless return in three void functions related to system
     wakeup (Zijun Hu)

   - Replace deprecated kmap_atomic() with kmap_local_page() in the
     hibernation core code (David Reaver)

   - Remove unused helper functions related to system sleep (David Alan
     Gilbert)

   - Clean up s2idle_enter() so it does not lock and unlock CPU offline
     in vain and update comments in it (Ulf Hansson)

   - Clean up broken white space in dpm_wait_for_children() (Geert
     Uytterhoeven)

   - Update the cpupower utility to fix lib version-ing in it and memory
     leaks in error legs, remove hard-coded values, and implement CPU
     physical core querying (Thomas Renninger, John B. Wyatt IV, Shuah
     Khan, Yiwei Lin, Zhongqiu Han)"

* tag 'pm-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (139 commits)
  PM: sleep: Fix bit masking operation
  dt-bindings: cpufreq: cpufreq-qcom-hw: Narrow properties on SDX75, SA8775p and SM8650
  dt-bindings: cpufreq: cpufreq-qcom-hw: Drop redundant minItems:1
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add missing constraint for interrupt-names
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add QCS8300 compatible
  cpufreq: Init cpufreq only for present CPUs
  PM: sleep: Fix handling devices with direct_complete set on errors
  cpuidle: Init cpuidle only for present CPUs
  PM: clk: Remove unused pm_clk_remove()
  PM: sleep: core: Fix indentation in dpm_wait_for_children()
  PM: s2idle: Extend comment in s2idle_enter()
  PM: s2idle: Drop redundant locks when entering s2idle
  PM: sleep: Remove unused pm_generic_ wrappers
  cpufreq: tegra186: Share policy per cluster
  cpupower: Make lib versioning scheme more obvious and fix version link
  PM: EM: Rework the depends on for CONFIG_ENERGY_MODEL
  PM: EM: Address RCU-related sparse warnings
  cpupower: Implement CPU physical core querying
  pm: cpupower: remove hard-coded topology depth values
  pm: cpupower: Fix cmd_monitor() error legs to free cpu_topology
  ...
2025-03-25 15:00:18 -07:00
Linus Torvalds
e34c38057a [ Merge note: this pull request depends on you having merged
two locking commits in the locking tree,
 	      part of the locking-core-2025-03-22 pull request. ]
 
 x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid='
     (Brendan Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)
 
 Percpu code:
   - Standardize & reorganize the x86 percpu layout and
     related cleanups (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu
     variable (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)
 
 MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB instruction
     (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation
     (Kirill A. Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)
 
 KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems,
     to support PCI BAR space beyond the 10TiB region
     (CONFIG_PCI_P2PDMA=y) (Balbir Singh)
 
 CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan Gupta)
 
 System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)
 
 Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling
 
 AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)
 
 Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()
 
 Bootup:
 
 Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0
     (Nathan Chancellor)
 
 Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit
 
 Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI headers
     (Thomas Huth)
 
 Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
     (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking instructions
     (Uros Bizjak)
 
 Earlyprintk:
   - Harden early_serial (Peter Zijlstra)
 
 NMI handler:
   - Add an emergency handler in nmi_desc & use it in nmi_shootdown_cpus()
     (Waiman Long)
 
 Miscellaneous fixes and cleanups:
 
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel,
     Artem Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst,
     Dan Carpenter, Dr. David Alan Gilbert, H. Peter Anvin,
     Ingo Molnar, Josh Poimboeuf, Kevin Brodsky, Mike Rapoport,
     Lukas Bulwahn, Maciej Wieczor-Retman, Max Grobecker,
     Patryk Wlazlyn, Pawan Gupta, Peter Zijlstra,
     Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak,
     Vitaly Kuznetsov, Xin Li, liuye.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmfenkQRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1g1FRAAi6OFTSn/5aeLMI0IMNBxJ6ddQiFc3imd
 7+C/vU5nul4CyDs8mKyj/+f/DDrbkG9lKz3VG631Yl237lXHjD8XWcVMeC/1z/q0
 3zInDIloE9/nBHRPkF6F7fARBLBZ0LFgaBsGrCo7mwpGybiQdqGcqcxllvTbtXaw
 OHta4q6ok+lBDNlfc0v6H4cRnzhmmlKu6Ng0j6UI3V7uFhi3vtxas32ltDQtzorq
 2+jbV6/+kbrrv+xPC+jlzOFhTEKRupNPQXmvyQteoQg6G3kqAKMDvBthGXd1rHuX
 Qa+BoDIifE/2NiVeRwNrhoqYH/pHCzUzDREW5IW8+ca+4XNKuzAC6EuC8CeCzyK1
 q8ZjZjooQW4zEeVFeJYllHONzJYfxfSH5CLsnbcuhq99yfGlrQhF1qL72/Omn1w/
 DfPJM8Zt5zyKvLqUg3Md+fkVCO2wyDNhB61QPzRgHF+yD+rvuDpoqvUWir+w7cSn
 fwEDVZGXlFx6dumtSrqRaTd1nvFt80s8yP2ll09DMvGQ8D/yruS7hndGAmmJVCSW
 NAfd8pSjq5v2+ux2UR92/Cc3VF3SjaUqHBOp/Nq9rESya18ZVa3cJpHhVYYtPIVf
 THW0h07RIkGVKs1uq+5ekLCr/8uAZg58UPIqmhTuW0ttymRHCNfohR45FQZzy+0M
 tJj1oc2TIZw=
 =Dcb3
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull core x86 updates from Ingo Molnar:
 "x86 CPU features support:
   - Generate the <asm/cpufeaturemasks.h> header based on build config
     (H. Peter Anvin, Xin Li)
   - x86 CPUID parsing updates and fixes (Ahmed S. Darwish)
   - Introduce the 'setcpuid=' boot parameter (Brendan Jackman)
   - Enable modifying CPU bug flags with '{clear,set}puid=' (Brendan
     Jackman)
   - Utilize CPU-type for CPU matching (Pawan Gupta)
   - Warn about unmet CPU feature dependencies (Sohil Mehta)
   - Prepare for new Intel Family numbers (Sohil Mehta)

  Percpu code:
   - Standardize & reorganize the x86 percpu layout and related cleanups
     (Brian Gerst)
   - Convert the stackprotector canary to a regular percpu variable
     (Brian Gerst)
   - Add a percpu subsection for cache hot data (Brian Gerst)
   - Unify __pcpu_op{1,2}_N() macros to __pcpu_op_N() (Uros Bizjak)
   - Construct __percpu_seg_override from __percpu_seg (Uros Bizjak)

  MM:
   - Add support for broadcast TLB invalidation using AMD's INVLPGB
     instruction (Rik van Riel)
   - Rework ROX cache to avoid writable copy (Mike Rapoport)
   - PAT: restore large ROX pages after fragmentation (Kirill A.
     Shutemov, Mike Rapoport)
   - Make memremap(MEMREMAP_WB) map memory as encrypted by default
     (Kirill A. Shutemov)
   - Robustify page table initialization (Kirill A. Shutemov)
   - Fix flush_tlb_range() when used for zapping normal PMDs (Jann Horn)
   - Clear _PAGE_DIRTY for kernel mappings when we clear _PAGE_RW
     (Matthew Wilcox)

  KASLR:
   - x86/kaslr: Reduce KASLR entropy on most x86 systems, to support PCI
     BAR space beyond the 10TiB region (CONFIG_PCI_P2PDMA=y) (Balbir
     Singh)

  CPU bugs:
   - Implement FineIBT-BHI mitigation (Peter Zijlstra)
   - speculation: Simplify and make CALL_NOSPEC consistent (Pawan Gupta)
   - speculation: Add a conditional CS prefix to CALL_NOSPEC (Pawan
     Gupta)
   - RFDS: Exclude P-only parts from the RFDS affected list (Pawan
     Gupta)

  System calls:
   - Break up entry/common.c (Brian Gerst)
   - Move sysctls into arch/x86 (Joel Granados)

  Intel LAM support updates: (Maciej Wieczor-Retman)
   - selftests/lam: Move cpu_has_la57() to use cpuinfo flag
   - selftests/lam: Skip test if LAM is disabled
   - selftests/lam: Test get_user() LAM pointer handling

  AMD SMN access updates:
   - Add SMN offsets to exclusive region access (Mario Limonciello)
   - Add support for debugfs access to SMN registers (Mario Limonciello)
   - Have HSMP use SMN through AMD_NODE (Yazen Ghannam)

  Power management updates: (Patryk Wlazlyn)
   - Allow calling mwait_play_dead with an arbitrary hint
   - ACPI/processor_idle: Add FFH state handling
   - intel_idle: Provide the default enter_dead() handler
   - Eliminate mwait_play_dead_cpuid_hint()

  Build system:
   - Raise the minimum GCC version to 8.1 (Brian Gerst)
   - Raise the minimum LLVM version to 15.0.0 (Nathan Chancellor)

  Kconfig: (Arnd Bergmann)
   - Add cmpxchg8b support back to Geode CPUs
   - Drop 32-bit "bigsmp" machine support
   - Rework CONFIG_GENERIC_CPU compiler flags
   - Drop configuration options for early 64-bit CPUs
   - Remove CONFIG_HIGHMEM64G support
   - Drop CONFIG_SWIOTLB for PAE
   - Drop support for CONFIG_HIGHPTE
   - Document CONFIG_X86_INTEL_MID as 64-bit-only
   - Remove old STA2x11 support
   - Only allow CONFIG_EISA for 32-bit

  Headers:
   - Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI and non-UAPI
     headers (Thomas Huth)

  Assembly code & machine code patching:
   - x86/alternatives: Simplify alternative_call() interface (Josh
     Poimboeuf)
   - x86/alternatives: Simplify callthunk patching (Peter Zijlstra)
   - KVM: VMX: Use named operands in inline asm (Josh Poimboeuf)
   - x86/hyperv: Use named operands in inline asm (Josh Poimboeuf)
   - x86/traps: Cleanup and robustify decode_bug() (Peter Zijlstra)
   - x86/kexec: Merge x86_32 and x86_64 code using macros from
     <asm/asm.h> (Uros Bizjak)
   - Use named operands in inline asm (Uros Bizjak)
   - Improve performance by using asm_inline() for atomic locking
     instructions (Uros Bizjak)

  Earlyprintk:
   - Harden early_serial (Peter Zijlstra)

  NMI handler:
   - Add an emergency handler in nmi_desc & use it in
     nmi_shootdown_cpus() (Waiman Long)

  Miscellaneous fixes and cleanups:
   - by Ahmed S. Darwish, Andy Shevchenko, Ard Biesheuvel, Artem
     Bityutskiy, Borislav Petkov, Brendan Jackman, Brian Gerst, Dan
     Carpenter, Dr. David Alan Gilbert, H. Peter Anvin, Ingo Molnar,
     Josh Poimboeuf, Kevin Brodsky, Mike Rapoport, Lukas Bulwahn, Maciej
     Wieczor-Retman, Max Grobecker, Patryk Wlazlyn, Pawan Gupta, Peter
     Zijlstra, Philip Redkin, Qasim Ijaz, Rik van Riel, Thomas Gleixner,
     Thorsten Blum, Tom Lendacky, Tony Luck, Uros Bizjak, Vitaly
     Kuznetsov, Xin Li, liuye"

* tag 'x86-core-2025-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (211 commits)
  zstd: Increase DYNAMIC_BMI2 GCC version cutoff from 4.8 to 11.0 to work around compiler segfault
  x86/asm: Make asm export of __ref_stack_chk_guard unconditional
  x86/mm: Only do broadcast flush from reclaim if pages were unmapped
  perf/x86/intel, x86/cpu: Replace Pentium 4 model checks with VFM ones
  perf/x86/intel, x86/cpu: Simplify Intel PMU initialization
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headers
  x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in UAPI headers
  x86/locking/atomic: Improve performance by using asm_inline() for atomic locking instructions
  x86/asm: Use asm_inline() instead of asm() in clwb()
  x86/asm: Use CLFLUSHOPT and CLWB mnemonics in <asm/special_insns.h>
  x86/hweight: Use asm_inline() instead of asm()
  x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()
  x86/hweight: Use named operands in inline asm()
  x86/stackprotector/64: Only export __ref_stack_chk_guard on CONFIG_SMP
  x86/head/64: Avoid Clang < 17 stack protector in startup code
  x86/kexec: Merge x86_32 and x86_64 code using macros from <asm/asm.h>
  x86/runtime-const: Add the RUNTIME_CONST_PTR assembly macro
  x86/cpu/intel: Limit the non-architectural constant_tsc model checks
  x86/mm/pat: Replace Intel x86_model checks with VFM ones
  x86/cpu/intel: Fix fast string initialization for extended Families
  ...
2025-03-24 22:06:11 -07:00
Josh Poimboeuf
2cbb20b008 tracing: Disable branch profiling in noinstr code
CONFIG_TRACE_BRANCH_PROFILING inserts a call to ftrace_likely_update()
for each use of likely() or unlikely().  That breaks noinstr rules if
the affected function is annotated as noinstr.

Disable branch profiling for files with noinstr functions.  In addition
to some individual files, this also includes the entire arch/x86
subtree, as well as the kernel/entry, drivers/cpuidle, and drivers/idle
directories, all of which are noinstr-heavy.

Due to the nature of how sched binaries are built by combining multiple
.c files into one, branch profiling is disabled more broadly across the
sched code than would otherwise be needed.

This fixes many warnings like the following:

  vmlinux.o: warning: objtool: do_syscall_64+0x40: call to ftrace_likely_update() leaves .noinstr.text section
  vmlinux.o: warning: objtool: __rdgsbase_inactive+0x33: call to ftrace_likely_update() leaves .noinstr.text section
  vmlinux.o: warning: objtool: handle_bug.isra.0+0x198: call to ftrace_likely_update() leaves .noinstr.text section
  ...

Reported-by: Ingo Molnar <mingo@kernel.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/fb94fc9303d48a5ed370498f54500cc4c338eb6d.1742586676.git.jpoimboe@kernel.org
2025-03-22 09:49:26 +01:00
Ingo Molnar
1b4c36f9b1 Merge branch 'x86/urgent' into x86/cpu, to pick up dependent commits
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-03-04 11:15:26 +01:00
Rafael J. Wysocki
3332dd1259 Merge back earlier cpuidle material for 6.15 2025-03-03 13:30:38 +01:00
Thomas Gleixner
c157d35146 intel_idle: Handle older CPUs, which stop the TSC in deeper C states, correctly
The Intel idle driver is preferred over the ACPI processor idle driver,
but fails to implement the work around for Core2 generation CPUs, where
the TSC stops in C2 and deeper C-states. This causes stalls and boot
delays, when the clocksource watchdog does not catch the unstable TSC
before the CPU goes deep idle for the first time.

The ACPI driver marks the TSC unstable when it detects that the CPU
supports C2 or deeper and the CPU does not have a non-stop TSC.

Add the equivivalent work around to the Intel idle driver to cure that.

Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables")
Reported-by: Fab Stz <fabstz-it@yahoo.fr>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Fab Stz <fabstz-it@yahoo.fr>
Cc: All applicable <stable@vger.kernel.org>
Closes: https://lore.kernel.org/all/10cf96aa-1276-4bd4-8966-c890377030c3@yahoo.fr
Link: https://patch.msgid.link/87bjupfy7f.ffs@tglx
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-28 22:04:26 +01:00
David Arcari
5e7e39ae15 intel_idle: introduce 'no_native' module parameter
Since commit 18734958e9bf ("intel_idle: Use ACPI _CST for processor models
without C-state tables") the intel_idle driver has had the ability to use
the ACPI _CST to populate C-states when the processor model is not
recognized.

However, even when the processor model is recognized (native mode) there
are cases where it is useful to make the driver ignore the per-CPU idle
states in lieu of ACPI C-states (such as specific application performance).

Add a new 'no_native' module parameter to provide this functionality.

Signed-off-by: David Arcari <darcari@redhat.com>
Link: https://patch.msgid.link/20250220151120.1131122-1-darcari@redhat.com
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Spell CPU in capitals ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-25 12:13:40 +01:00
Artem Bityutskiy
c93d13b661 intel_idle: clean up BYT/CHT auto demotion disable
Bay Trail (BYT) and Cherry Trail (CHT) platforms have a very specific way
of disabling auto-demotion via specific MSR bits. Clean up the code so that
BYT/CHT-specifics do not show up in the common 'struct idle_cpu' data
structure.

Remove the 'byt_auto_demotion_disable_flag' flag from 'struct idle_cpu',
because a better coding pattern is to avoid very case-specific fields like
'bool byt_auto_demotion_disable_flag' in a common data structure, which is
used for all platforms, not only BYT/CHT. The code is just more readable
when common data structures contain only commonly used fields.

Instead, match BYT/CHT in the 'intel_idle_init_cstates_icpu()' function,
and introduce a small helper to take care of BYT/CHT auto-demotion. This
is consistent with how platform-specific things are done for other
platforms.

No intended functional changes, compile-tested only.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20250210071253.2991030-1-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-19 13:04:45 +01:00
Patryk Wlazlyn
fc4ca9537b intel_idle: Provide the default enter_dead() handler
Recent Intel platforms require idle driver to provide information about
the MWAIT hint used to enter the deepest idle state in the play_dead
code.

Provide the default enter_dead() handler for all of the platforms and
allow overwriting with a custom handler for each platform if needed.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/all/20250205155211.329780-4-artem.bityutskiy%40linux.intel.com
2025-02-05 10:44:52 -08:00
Linus Torvalds
f4b9d3bf44 Power management updates for 6.14-rc1
- Use str_enable_disable()-like helpers in cpufreq (Krzysztof
    Kozlowski).
 
  - Extend the Apple cpufreq driver to support more SoCs (Hector Martin,
    Nick Chan).
 
  - Add new cpufreq driver for Airoha SoCs (Christian Marangi).
 
  - Fix using cpufreq-dt as module (Andreas Kemnade).
 
  - Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
    Edwards, Sibi Sankar, Manivannan Sadhasivam).
 
  - Fix the maximum supported frequency computation in the ACPI cpufreq
    driver to avoid relying on unfounded assumptions (Gautham Shenoy).
 
  - Fix an amd-pstate driver regression with preferred core rankings not
    being used (Mario Limonciello).
 
  - Fix a precision issue with frequency calculation in the amd-pstate
    driver (Naresh Solanki).
 
  - Add ftrace event to the amd-pstate driver for active mode (Mario
    Limonciello).
 
  - Set default EPP policy on Ryzen processors in amd-pstate (Mario
    Limonciello).
 
  - Clean up the amd-pstate cpufreq driver and optimize it to increase
    code reuse (Mario Limonciello, Dhananjay Ugwekar).
 
  - Use CPPC to get scaling factors between HWP performance levels and
    frequency in the intel_pstate driver and make it stop using a built
    -in scaling factor for Arrow Lake processors (Rafael Wysocki).
 
  - Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN for
    consistency with CPU offline (Christian Loehle).
 
  - Fix superfluous updates caused by need_freq_update in the schedutil
    cpufreq governor (Sultan Alsawaf).
 
  - Allow configuring the system suspend-resume (DPM) watchdog to warn
    earlier than panic (Douglas Anderson).
 
  - Implement devm_device_init_wakeup() helper and introduce a device-
    managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan).
 
  - Remove direct inclusions of 'pm_wakeup.h' which should be only
    included via 'device.h' (Wolfram Sang).
 
  - Clean up two comments in the core system-wide PM code (Rafael
    Wysocki, Randy Dunlap).
 
  - Add Clearwater Forest processor support to the intel_idle cpuidle
    driver (Artem Bityutskiy).
 
  - Clean up the Exynos devfreq driver and devfreq core (Markus Elfring,
    Jeongjun Park).
 
  - Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong, Joe
    Hattori).
 
  - Implement dev_pm_opp_get_bw() (Neil Armstrong).
 
  - Expose OPP reference counting helpers for Rust (Viresh Kumar).
 
  - Fix TSC MHz calculation in cpupower (He Rongguang).
 
  - Add install and uninstall options to bindings Makefile and add header
    changes for cpufreq.h to SWIG bindings in cpupower (John B. Wyatt IV).
 
  - Add missing residency header changes in cpuidle.h to SWIG bindings in
    cpupower (John B. Wyatt IV).
 
  - Add output files to .gitignore and clean them up in "make clean" in
    selftests/cpufreq (Li Zhijian).
 
  - Fix cross-compilation in cpupower Makefile (Peng Fan).
 
  - Revise the is_valid flag handling for idle_monitor in the cpupower
    utility (wangfushuai).
 
  - Extend and clean up AMD processors support in cpupower (Mario
    Limonciello).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmeOthsSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxqQsP/ivDt8nqDnxdKB7cKFQIsEK+tl0RnFVD
 o5regvYeRcGWpUXuMaqBtTmCMjsB8bUkcj2yLquM54ubjHAGF6zJuw9ZytMPHVcC
 b2xk3RCFlXSBFXVK8eOh3XRviA9nGhuY97ZnPsQOlvoECrxT2xyeL+mWo7s+t+q9
 2NUH+yfRoi5FM+nqqDhsm0xXxJuPaNg6eAjIASuMjXap48rNk3L5kW6W/6nw7i0I
 xQWd/pKLHaI5e7DRF/QdMKu8+Fm4BbN0jMqLblKPOmTe9KggvBkck5q1Um20sYkJ
 vdKMAT02ClGavIC7DtY092Xik84NZfID4ZUchS6e2hJIQ3Uaw/eDvAo/jlT8gIzq
 fnXPdApRIzQGDvMxFaAsKaGlwxiVlAGHPDSTH6MVWzsp+1DSkbloSwVPAfeYIn44
 Jhov+6Ydux3597sSjo+YmD58acimXl7urVuk8P6m3U5+gb8/jlgbxpIn+vbxH3Ka
 o44Vt7axD63gezOQY134sj5gic5JL0GuZovOlvzrF6+FsjvVqcax6FZ4n3uIXu7P
 C1nwai+Wdzo7wvuz7RfO0g15Y15wYLQLYsRq/osRlf+sOmGVv7nA9tSzZ0LUdD5D
 Pp6PxppF6anM0Kjen8Ppuu+Bcr11JfVvhnVTJqhs6u71XdAy4TnG1JjL4lPWYJ4D
 Gfz2hyPNjiQX
 =AoMC
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "The majority of changes here are cpufreq updates which are dominated
  by amd-pstate driver changes, like in the previous cycle. Moreover,
  changes related to amd-pstate are also the majority of cpupower
  utility updates.

  Included are some pieces of new hardware support, like the addition of
  Clearwater Forest processors support to intel_idle, new cpufreq driver
  for Airoha SoCs, and Apple cpufreq driver extensions to support more
  SoCs. The intel_pstate driver is also extended to be able to support
  new platforms by using ACPI CPPC to compute scaling factors between
  HWP performance states and frequency.

  The rest is mostly fixes and cleanups in assorted pieces of power
  management code.

  Specifics:

   - Use str_enable_disable()-like helpers in cpufreq (Krzysztof
     Kozlowski)

   - Extend the Apple cpufreq driver to support more SoCs (Hector
     Martin, Nick Chan)

   - Add new cpufreq driver for Airoha SoCs (Christian Marangi)

   - Fix using cpufreq-dt as module (Andreas Kemnade)

   - Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter
     Edwards, Sibi Sankar, Manivannan Sadhasivam)

   - Fix the maximum supported frequency computation in the ACPI cpufreq
     driver to avoid relying on unfounded assumptions (Gautham Shenoy)

   - Fix an amd-pstate driver regression with preferred core rankings
     not being used (Mario Limonciello)

   - Fix a precision issue with frequency calculation in the amd-pstate
     driver (Naresh Solanki)

   - Add ftrace event to the amd-pstate driver for active mode (Mario
     Limonciello)

   - Set default EPP policy on Ryzen processors in amd-pstate (Mario
     Limonciello)

   - Clean up the amd-pstate cpufreq driver and optimize it to increase
     code reuse (Mario Limonciello, Dhananjay Ugwekar)

   - Use CPPC to get scaling factors between HWP performance levels and
     frequency in the intel_pstate driver and make it stop using a
     built-in scaling factor for Arrow Lake processors (Rafael Wysocki)

   - Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN
     for consistency with CPU offline (Christian Loehle)

   - Fix superfluous updates caused by need_freq_update in the schedutil
     cpufreq governor (Sultan Alsawaf)

   - Allow configuring the system suspend-resume (DPM) watchdog to warn
     earlier than panic (Douglas Anderson)

   - Implement devm_device_init_wakeup() helper and introduce a device-
     managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan)

   - Remove direct inclusions of 'pm_wakeup.h' which should be only
     included via 'device.h' (Wolfram Sang)

   - Clean up two comments in the core system-wide PM code (Rafael
     Wysocki, Randy Dunlap)

   - Add Clearwater Forest processor support to the intel_idle cpuidle
     driver (Artem Bityutskiy)

   - Clean up the Exynos devfreq driver and devfreq core (Markus
     Elfring, Jeongjun Park)

   - Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong,
     Joe Hattori)

   - Implement dev_pm_opp_get_bw() (Neil Armstrong)

   - Expose OPP reference counting helpers for Rust (Viresh Kumar)

   - Fix TSC MHz calculation in cpupower (He Rongguang)

   - Add install and uninstall options to bindings Makefile and add
     header changes for cpufreq.h to SWIG bindings in cpupower (John B.
     Wyatt IV)

   - Add missing residency header changes in cpuidle.h to SWIG bindings
     in cpupower (John B. Wyatt IV)

   - Add output files to .gitignore and clean them up in "make clean" in
     selftests/cpufreq (Li Zhijian)

   - Fix cross-compilation in cpupower Makefile (Peng Fan)

   - Revise the is_valid flag handling for idle_monitor in the cpupower
     utility (wangfushuai)

   - Extend and clean up AMD processors support in cpupower (Mario
     Limonciello)"

* tag 'pm-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
  PM / OPP: Add reference counting helpers for Rust implementation
  PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq()
  cpufreq: Use str_enable_disable()-like helpers
  cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver
  PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic
  PM: sleep: convert comment from kernel-doc to plain comment
  cpufreq: ACPI: Fix max-frequency computation
  pm: cpupower: Add missing residency header changes in cpuidle.h to SWIG
  PM / devfreq: exynos: remove unused function parameter
  OPP: OF: Fix an OF node leak in _opp_add_static_v2()
  cpufreq/amd-pstate: Refactor max frequency calculation
  cpufreq/amd-pstate: Fix prefcore rankings
  pm: cpupower: Add header changes for cpufreq.h to SWIG bindings
  cpufreq: sparc: change kzalloc to kcalloc
  cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks
  cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available
  cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support
  cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT
  cpufreq: apple-soc: Increase cluster switch timeout to 400us
  cpufreq: apple-soc: Use 32-bit read for status register
  ...
2025-01-22 11:16:14 -08:00
Dave Hansen
e5d3a57891 x86/cpu: Make all all CPUID leaf names consistent
The leaf names are not consistent.  Give them all a CPUID_LEAF_ prefix
for consistency and vertical alignment.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Jiang <dave.jiang@intel.com> # for ioatdma bits
Link: https://lore.kernel.org/all/20241213205040.7B0C3241%40davehans-spike.ostc.intel.com
2024-12-18 06:17:46 -08:00
Dave Hansen
262fba5570 x86/cpu: Remove unnecessary MwAIT leaf checks
The CPUID leaf dependency checker will remove X86_FEATURE_MWAIT if
the CPUID level is below the required level (CPUID_MWAIT_LEAF).
Thus, if you check X86_FEATURE_MWAIT you do not need to also
check the CPUID level.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20241213205030.9B42B458%40davehans-spike.ostc.intel.com
2024-12-18 06:17:30 -08:00
Dave Hansen
497f702846 x86/cpu: Move MWAIT leaf definition to common header
Begin constructing a common place to keep all CPUID leaf definitions.
Move CPUID_MWAIT_LEAF to the CPUID header and include it where
needed.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/all/20241213205028.EE94D02A%40davehans-spike.ostc.intel.com
2024-12-18 06:17:24 -08:00
Artem Bityutskiy
eeed4bfbe9 intel_idle: add Clearwater Forest SoC support
Clearwater Forest (CWF) SoC has the same C-states as Sierra Forest (SRF)
SoC.  Add CWF support by re-using the SRF C-states table.

Note: it is expected that CWF C-states will have same or very similar
characteristics as SRF C-states (latency and target residency).

However, there is a possibility that the characteristics will end up
being different enough when the CWF platform development is finished.
In that case, a separate CWF C-states table will be created and populated
with the CWF-specific characteristics (latency and target residency).

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20241203130306.1559024-1-artem.bityutskiy@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-12-10 20:05:25 +01:00
Artem Bityutskiy
f557e0d1c2 intel_idle: add Granite Rapids Xeon D support
Add Granite Rapids Xeon D C-states support: C1, C1E, C6, and C6P.

The C-states are basically the same as in Granite Rapids Xeon SP/AP, but
characteristics (latency, target residency) are a bit different.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20241107115608.52233-1-artem.bityutskiy@linux.intel.com
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-11 15:48:50 +01:00
Artem Bityutskiy
4c411cca33 intel_idle: fix ACPI _CST matching for newer Xeon platforms
Background
~~~~~~~~~~

The driver uses 'use_acpi = true' in C-state custom table for all Xeon
platforms. The meaning of this flag is as follows.

 1. If a C-state from the custom table is defined in ACPI _CST (matched
    by the mwait hint), then enable this C-state.

 2. Otherwise, disable this C-state, unless the C-sate definition in the
    custom table has the 'CPUIDLE_FLAG_ALWAYS_ENABLE' flag set, in which
    case enabled it.

The goal is to honor BIOS C6 settings - If BIOS disables C6, disable it
by default in the OS too (but it can be enabled via sysfs).

This works well on Xeons that expose only one flavor of C6. This are all
Xeons except for the newest Granite Rapids (GNR) and Sierra Forest (SRF).

The problem
~~~~~~~~~~~

GNR and SRF have 2 flavors of C6: C6/C6P on GNR, C6S/C6SP on SRF. The
the "P" flavor allows for the package C6, while the "non-P" flavor
allows only for core/module C6.

As far as this patch is concerned, both GNR and SRF platforms are
handled the same way. Therefore, further discussion is focused on GNR,
but it applies to SRF as well.

On Intel Xeon platforms, BIOS exposes only 2 ACPI C-states: C1 and C2.
Well, depending on BIOS settings, C2 may be named as C3. But there still
will be only 2 states - C1 and C3. But this is a non-essential detail,
so further discussion is focused on the ACPI C1 and C2 case.

On pre-GNR/SRF Xeon platforms, ACPI C1 is mapped to C1 or C1E, and ACPI
C2 is mapped to C6. The 'use_acpi' flag works just fine:

 * If ACPI C2 enabled, enable C6.
 * Otherwise, disable C6.

However, on GNR there are 2 flavors of C6, so BIOS maps ACPI C2 to
either C6 or C6P, depending on the user settings. As a result, due to
the 'use_acpi' flag, 'intel_idle' disables least one of the C6 flavors.

BIOS                   | OS                         | Verdict
----------------------------------------------------|---------
ACPI C2 disabled       | C6 disabled, C6P disabled  | OK
ACPI C2 mapped to C6   | C6 enabled,  C6P disabled  | Not OK
ACPI C2 mapped to C6P  | C6 disabled, C6P enabled   | Not OK

The goal of 'use_acpi' is to honor BIOS ACPI C2 disabled case, which
works fine. But if ACPI C2 is enabled, the goal is to enable all flavors
of C6, not just one of the flavors. This was overlooked when enabling
GNR/SRF platforms.

In other words, before GNR/SRF, the ACPI C2 status was binary - enabled
or disabled. But it is not binary on GNR/SRF, however the goal is to
continue treat it as binary.

The fix
~~~~~~~

Notice, that current algorithm matches ACPI and custom table C-states
by the mwait hint. However, mwait hint consists of the 'state' and
'sub-state' parts, and all C6 flavors have the same state value of 0x20,
but different sub-state values.

Introduce new C-state table flag - CPUIDLE_FLAG_PARTIAL_HINT_MATCH and
add it to both C6 flavors of the GNR/SRF platforms.

When matching ACPI _CST and custom table C-states, match only the start
part if the C-state has CPUIDLE_FLAG_PARTIAL_HINT_MATCH, other wise
match both state and sub-state parts (as before).

With this fix, GNR C-states enabled/disabled status looks like this.

BIOS                   | OS
----------------------------------------------------
ACPI C2 disabled       | C6 disabled, C6P disabled
ACPI C2 mapped to C6   | C6 enabled, C6P enabled
ACPI C2 mapped to C6P  | C6 enabled, C6P enabled

Possible alternative
~~~~~~~~~~~~~~~~~~~~

The alternative would be to remove 'use_acpi' flag for GNR and SRF.
This would be a simpler solution, but it would violate the principle of
least surprise - users of Xeon platforms are used to the fact that
intel_idle honors C6 enabled/disabled flag. It is more consistent user
experience if GNR/SRF continue doing so.

How tested
~~~~~~~~~~

Tested on GNR and SRF platform with all the 3 BIOS configurations: ACPI
C2 disabled, mapped to C6/C6S, mapped to C6P/C6SP.

Tested on Ice lake Xeon and Sapphire Rapids Xeon platforms with ACPI C2
enabled and disabled, just to verify that the patch does not break older
Xeons.

Fixes: 92813fd5b156 ("intel_idle: add Sierra Forest SoC support")
Fixes: 370406bf5738 ("intel_idle: add Granite Rapids Xeon support")
Cc: 6.8+ <stable@vger.kernel.org> # 6.8+
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20240913165143.4140073-1-dedekind1@gmail.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-09-25 22:30:33 +02:00
Kai-Heng Feng
5bb33212b5 intel_idle: Disable promotion to C1E on Jasper Lake and Elkhart Lake
PCIe ethernet throughut is sub-optimal on Jasper Lake and Elkhart Lake.

The CPU can take long time to exit to C0 to handle IRQ and perform DMA
when C1E has been entered.

For this reason, adjust intel_idle to disable promotion to C1E and still
use C-states from ACPI _CST on those two platforms.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=219023
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Link: https://patch.msgid.link/20240820041128.102452-1-kai.heng.feng@canonical.com
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-20 21:27:43 +02:00
Artem Bityutskiy
370406bf57 intel_idle: add Granite Rapids Xeon support
Add Granite Rapids Xeon C-states, which are C1, C1E, C6, and C6P.

Comparing to previous Xeon Generations (e.g., Emerald Rapids), C6
requests end up only in core C6 state, and no package C-state promotion
takes place even if all cores in the package are in core C6.

C6P requests also end up in core C6, but if all cores have requested
C6P, the SoC will enter the package C6 state.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20240806160310.3719205-1-artem.bityutskiy@linux.intel.com
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-08-19 15:45:09 +02:00
Tony Luck
17c4fc386b intel_idle: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.

Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-06-07 20:35:54 +02:00
He Rongguang
6b8e288f49 cpuidle: ACPI/intel: fix MWAIT hint target C-state computation
According to x86 spec ([1] and [2]), MWAIT hint_address[7:4] plus 1 is
the corresponding C-state, and 0xF means C0.

ACPI C-state table usually only contains C1+, but nothing prevents ACPI
firmware from presenting a C-state (maybe C1+) but using MWAIT address C0
(i.e., 0xF in ACPI FFH MWAIT hint address). And if this is the case, Linux
erroneously treat this cstate as C16, while actually this should be valid
C0 instead of C16, as per the specifications.

Since ACPI firmware is out of Linux kernel scope, fix the kernel handling
of 0xF ->(to) C0 in this situation. This is found when a tweaked ACPI
C-state table is presented by Qemu to VM.

Also modify the intel_idle case for code consistency.

[1]. Intel SDM Vol 2, Table 4-11. MWAIT Hints
Register (EAX): "Value of 0 means C1; 1 means C2 and so on
Value of 01111B means C0".

[2]. AMD manual Vol 3, MWAIT: "The processor C-state is EAX[7:4]+1, so to
request C0 is to place the value F in EAX[7:4] and to request C1 is to
place the value 0 in EAX[7:4].".

Signed-off-by: He Rongguang <herongguang@linux.alibaba.com>
[ rjw: Subject and changelog edits, whitespace fixups ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-03-05 21:25:18 +01:00
Linus Torvalds
7da71072e1 Power management updates for 6.8-rc1
- Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs to
    the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui).
 
  - Do not enable interrupts when entering idle in the haltpoll cpuidle
    driver (Borislav Petkov).
 
  - Add Emerald Rapids support in no-HWP mode to the intel_pstate cpufreq
    driver (Zhenguo Yao).
 
  - Use EPP values programmed by the platform firmware as balanced
    performance ones by default in intel_pstate (Srinivas Pandruvada).
 
  - Add a missing function return value check to the SCMI cpufreq driver
    to avoid unexpected behavior (Alexandra Diupina).
 
  - Fix parameter type warning in the armada-8k cpufreq driver (Gregory
    CLEMENT).
 
  - Rework trans_stat_show() in the devfreq core code to avoid buffer
    overflows (Christian Marangi).
 
  - Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
    list corruption from occurring when devfreq governors are switched
    frequently (Mukesh Ojha).
 
  - Fix possible deadlocks in the core system-wide PM code that occur if
    device-handling functions cannot be executed asynchronously during
    resume from system-wide suspend (Rafael J. Wysocki).
 
  - Clean up unnecessary local variable initializations in multiple
    places in the hibernation code (Wang chaodong, Li zeming).
 
  - Adjust core hibernation code to avoid missing wakeup events that
    occur after saving an image to persistent storage (Chris Feng).
 
  - Update hibernation code to enforce correct ordering during image
    compression and decompression (Hongchen Zhang).
 
  - Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
    during hibernation and restore (Chen Haonan).
 
  - Adjust documentation and code comments to reflect recent tasks freezer
    changes (Kevin Hao).
 
  - Repair excess function parameter description warning in the
    hibernation image-saving code (Randy Dunlap).
 
  - Fix _set_required_opps when opp is NULL (Bryan O'Donoghue).
 
  - Use device_get_match_data() in the OPP code for TI (Rob Herring).
 
  - Clean up OPP level and other parts and call dev_pm_opp_set_opp()
    recursively for required OPPs (Viresh Kumar).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmWb8o8SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxbOQP/2D+YGyd9R1awYqxQoIQSGbpIr7a9oTR
 BOIcOn2PvWLXH8w9wVbIUJsSL9Nx90D9T4S5QcyHrS2qHmR+1Gb0gX3D4QAmEBcG
 +wFCLt2//5PwqShtPcJEUcGdL274aVEpmnEAmzKnk20MkLQM3twxe6FKSkwWMLYb
 u9OKgdN8Vah0iSBUCpyT52O0x4d65MD/tka0QaGjLg64TtyqhTSKi+XgWtZkSZ7H
 lRgn9qMoMXq/h1aeK4MKp5UtJKRxBWRdMijIFFXAfgO8dwbDbyXTo0d2LMR6DiEM
 VsvRIjEePoRcGf7bwAbrUeSoNb5Ec32RW3v9GSNn2sWutW+vhD//frZq48zAR6lm
 i8Xlf2Ar63Z+qNcFpCZjlNwAbfEuZ1vIr0Pu3oDd0GkOXjxiVMgAwtatTp1nSW7/
 wWFuMA5G+wdzU/Z5KcV1p7S8CP1gC8S05LHGwtKKGm9pLbzhauF8GK6Xpa4711T8
 oI3uDFIgxaxW8B/ymsM5cNa2QbfYUuQbOFTwXvBcy4gizrbZwwXRSpfaKoDIYAXZ
 2kfwmFbu3IbrRypboY58lG3SzbnN94oEMANtsVYuxMimGz2x3ZmHBAFm2l9YPYRz
 dBq/RUM7sMIvM1SwqR4tG8rt206L7KpPyW99pUa2AhEdof4iV2bpyujHFdkm83MK
 nJ0OF/xcc98Z
 =zh4c
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add support for new processors (Sierra Forest, Grand Ridge and
  Meteor Lake) to the intel_idle driver, make intel_pstate run on
  Emerald Rapids without HWP support and adjust it to utilize EPP values
  supplied by the platform firmware, fix issues, clean up code and
  improve documentation.

  The most significant fix addresses deadlocks in the core system-wide
  resume code that occur if async_schedule_dev() attempts to run its
  argument function synchronously (for example, due to a memory
  allocation failure). It rearranges the code in question which may
  increase the system resume time in some cases, but this basically is a
  removal of a premature optimization. That optimization will be added
  back later, but properly this time.

  Specifics:

   - Add support for the Sierra Forest, Grand Ridge and Meteorlake SoCs
     to the intel_idle cpuidle driver (Artem Bityutskiy, Zhang Rui)

   - Do not enable interrupts when entering idle in the haltpoll cpuidle
     driver (Borislav Petkov)

   - Add Emerald Rapids support in no-HWP mode to the intel_pstate
     cpufreq driver (Zhenguo Yao)

   - Use EPP values programmed by the platform firmware as balanced
     performance ones by default in intel_pstate (Srinivas Pandruvada)

   - Add a missing function return value check to the SCMI cpufreq
     driver to avoid unexpected behavior (Alexandra Diupina)

   - Fix parameter type warning in the armada-8k cpufreq driver (Gregory
     CLEMENT)

   - Rework trans_stat_show() in the devfreq core code to avoid buffer
     overflows (Christian Marangi)

   - Synchronize devfreq_monitor_[start/stop] so as to prevent a timer
     list corruption from occurring when devfreq governors are switched
     frequently (Mukesh Ojha)

   - Fix possible deadlocks in the core system-wide PM code that occur
     if device-handling functions cannot be executed asynchronously
     during resume from system-wide suspend (Rafael J. Wysocki)

   - Clean up unnecessary local variable initializations in multiple
     places in the hibernation code (Wang chaodong, Li zeming)

   - Adjust core hibernation code to avoid missing wakeup events that
     occur after saving an image to persistent storage (Chris Feng)

   - Update hibernation code to enforce correct ordering during image
     compression and decompression (Hongchen Zhang)

   - Use kmap_local_page() instead of kmap_atomic() in copy_data_page()
     during hibernation and restore (Chen Haonan)

   - Adjust documentation and code comments to reflect recent tasks
     freezer changes (Kevin Hao)

   - Repair excess function parameter description warning in the
     hibernation image-saving code (Randy Dunlap)

   - Fix _set_required_opps when opp is NULL (Bryan O'Donoghue)

   - Use device_get_match_data() in the OPP code for TI (Rob Herring)

   - Clean up OPP level and other parts and call dev_pm_opp_set_opp()
     recursively for required OPPs (Viresh Kumar)"

* tag 'pm-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  OPP: Rename 'rate_clk_single'
  OPP: Pass rounded rate to _set_opp()
  OPP: Relocate dev_pm_opp_sync_regulators()
  PM: sleep: Fix possible deadlocks in core system-wide PM code
  OPP: Move dev_pm_opp_icc_bw to internal opp.h
  async: Introduce async_schedule_dev_nocall()
  async: Split async_schedule_node_domain()
  cpuidle: haltpoll: Do not enable interrupts when entering idle
  OPP: Fix _set_required_opps when opp is NULL
  OPP: The level field is always of unsigned int type
  PM: hibernate: Repair excess function parameter description warning
  PM: sleep: Remove obsolete comment from unlock_system_sleep()
  cpufreq: intel_pstate: Add Emerald Rapids support in no-HWP mode
  Documentation: PM: Adjust freezing-of-tasks.rst to the freezer changes
  PM: hibernate: Use kmap_local_page() in copy_data_page()
  intel_idle: add Sierra Forest SoC support
  intel_idle: add Grand Ridge SoC support
  PM / devfreq: Synchronize devfreq_monitor_[start/stop]
  cpufreq: armada-8k: Fix parameter type warning
  PM: hibernate: Enforce ordering during image compression/decompression
  ...
2024-01-09 16:32:11 -08:00
Artem Bityutskiy
92813fd5b1 intel_idle: add Sierra Forest SoC support
Add Sierra Forest SoC C-states, which are C1, C1E, C6S, and C6SP.

Sierra Forest SoC is built with modules, each module includes 4 cores
(Crestmont microarchitecture). There is one L2 cache per module, shared
between the 4 cores.

There is no core C6 state, but there is C6S state, which has module scope:
when all 4 cores request C6S, the entire module (4 cores + L2 cache)
enters the low power state.

C6SP state has package scope - when all modules in the package enter C6S,
the package enters the power state mode.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-19 18:56:45 +01:00
Artem Bityutskiy
ac89d11b93 intel_idle: add Grand Ridge SoC support
Add Intel Grand Ridge SoC C-states, which are C1, C1E, and C6S.

The Grand Ridge SoC is built with modules, each module includes 4 cores
(Crestmont microarchitecture). There is one L2 cache per module, shared
between the 4 cores.

There is no core C6 state, but there is C6S state, which has module
scope: when all 4 cores request C6S, the entire module (4 cores + L2
cache) enters the low power state.

Package C6 is not supported by Grand Ridge SoC.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-19 18:56:45 +01:00
Zhang Rui
eeae55ed9c intel_idle: Add Meteorlake support
Add intel_idle support for MeteorLake.

C1 and C1E states on Meteorlake are mutually exclusive, like Alderlake
and Raptorlake, but they have little latency difference with measureable
power difference, so always enable "C1E promotion" bit and expose C1E
only.

Expose C6 because it has less power compared with C1E, and smaller
latency compared with C8/C10.

Ignore C8 and expose C10, because C8 does not show latency advantage
compared with C10.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-11 20:58:27 +01:00
Peter Zijlstra
edc8fc01f6 x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram
intel_idle_irq() re-enables IRQs very early. As a result, an interrupt
may fire before mwait() is eventually called. If such an interrupt queues
a timer, it may go unnoticed until mwait returns and the idle loop
handles the tick re-evaluation. And monitoring TIF_NEED_RESCHED doesn't
help because a local timer enqueue doesn't set that flag.

The issue is mitigated by the fact that this idle handler is only invoked
for shallow C-states when, presumably, the next tick is supposed to be
close enough. There may still be rare cases though when the next tick
is far away and the selected C-state is shallow, resulting in a timer
getting ignored for a while.

Fix this with using sti_mwait() whose IRQ-reenablement only triggers
upon calling mwait(), dealing with the race while keeping the interrupt
latency within acceptable bounds.

Fixes: c227233ad64c (intel_idle: enable interrupts before C1 on Xeons)
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Link: https://lkml.kernel.org/r/20231115151325.6262-3-frederic@kernel.org
2023-11-29 15:44:01 +01:00
Waiman Long
aa1567a7e6 intel_idle: Add ibrs_off module parameter to force-disable IBRS
Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the cstate is 6 or lower. However, there are
some use cases where a customer may want to use max_cstate=1 to
lower latency. Such use cases will suffer from the performance
degradation caused by the enabling of IBRS in the sibling idle thread.
Add a "ibrs_off" module parameter to force disable IBRS and the
CPUIDLE_FLAG_IRQ_ENABLE flag if set.

In the case of a Skylake server with max_cstate=1, this new ibrs_off
option will likely increase the IRQ response latency as IRQ will now
be disabled.

When running SPECjbb2015 with cstates set to C1 on a Skylake system.

First test when the kernel is booted with: "intel_idle.ibrs_off":

  max-jOPS = 117828, critical-jOPS = 66047

Then retest when the kernel is booted without the "intel_idle.ibrs_off"
added:

  max-jOPS = 116408, critical-jOPS = 58958

That means booting with "intel_idle.ibrs_off" improves performance by:

  max-jOPS:      +1.2%, which could be considered noise range.
  critical-jOPS: +12%,  which is definitely a solid improvement.

The admin-guide/pm/intel_idle.rst file is updated to add a description
about the new "ibrs_off" module parameter.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230727184600.26768-5-longman@redhat.com
2023-10-07 11:33:28 +02:00
Waiman Long
7506203089 intel_idle: Use __update_spec_ctrl() in intel_idle_ibrs()
When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
in order disable IBRS. However, the new MSR value isn't reflected in
x86_spec_ctrl_current which is at odd with the other code that keep track
of its state in that percpu variable.  Use the new __update_spec_ctrl()
to have the x86_spec_ctrl_current percpu value properly updated.

Since spec-ctrl.h includes both msr.h and nospec-branch.h, we can remove
those from the include file list.

Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20230727184600.26768-4-longman@redhat.com
2023-10-07 11:33:28 +02:00
Linus Torvalds
1a7c611546 Perf events changes for v6.6:
- AMD IBS improvements
 - Intel PMU driver updates
 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events
 - Micro-optimize software events and the ring-buffer code
 - Misc cleanups & fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtBscRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1hHoQ/+IBQ8Xi/rcdd40n8OqEB/VBWVuSjNT3uN
 3pHHcTl2Pio9CxBeat42NekNijlRILCKJrZ3Lt3JWBmWyWv5l3KFabelj+lDF2xa
 TVCjTnQNe1+HvrODYnF4ECIs5vaoMVjcJ9jg8+VDgAcOQr1nZs4m5TVAd6TLqPpV
 urBEQVULkkzk7ZRhfrugKhw+wrpWFefgGCx0RV8ijZB7TLMHc2wE+Q/sTxKdKceL
 wNaJaDgV33pZh0aImwR9pKUE532hF1FiBdLuehkh61PZa1L82jzAX1xjw2s1hSa4
 eIWemPHJIYfivRlENbJsDWc4N8gk6ijVHwrxGcr4Axu+NN+zPtQ3ddhaGMAyKdTo
 qUKXH3MZSMIl++jI5Fkc6xM+XLvY1rML62epSzMwu/cc7Z5MeyWdQcri0N9YFuO7
 wUUNnFpU00lwQBLbyyUQ3Zi8E0QV7NuPW4axTkmntiIjMpLagaEvVSf6nf8qLpbE
 WTT16s707t19hUZNazNZ7ONmhly4ALbHFQEH65J2KoYn99fYqy9z68Hwk+xnmykw
 bc3qvfhpw0MImQQ+DqHiBwb4n4UuvY2WlkkZI3FfNeSG63DaM2mZikfpElpXYjn6
 9iOIXvx21Wiq/n0cbLhidI2q/ZzFCzYLCk6ikZ320wb+rhvd7EoSlZil6QSzn3pH
 Qdk+NEZgWQY=
 =ZT6+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event updates from Ingo Molnar:

 - AMD IBS improvements

 - Intel PMU driver updates

 - Extend core perf facilities & the ARM PMU driver to better handle ARM big.LITTLE events

 - Micro-optimize software events and the ring-buffer code

 - Misc cleanups & fixes

* tag 'perf-core-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/uncore: Remove unnecessary ?: operator around pcibios_err_to_errno() call
  perf/x86/intel: Add Crestmont PMU
  x86/cpu: Update Hybrids
  x86/cpu: Fix Crestmont uarch
  x86/cpu: Fix Gracemont uarch
  perf: Remove unused extern declaration arch_perf_get_page_size()
  perf: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  perf/x86: Remove unused PERF_PMU_CAP_HETEROGENEOUS_CPUS capability
  arm_pmu: Add PERF_PMU_CAP_EXTENDED_HW_TYPE capability
  perf/x86/ibs: Set mem_lvl_num, mem_remote and mem_hops for data_src
  perf/mem: Add PERF_MEM_LVLNUM_NA to PERF_MEM_NA
  perf/mem: Introduce PERF_MEM_LVLNUM_UNC
  perf/ring_buffer: Use local_try_cmpxchg in __perf_output_begin
  locking/arch: Avoid variable shadowing in local_try_cmpxchg()
  perf/core: Use local64_try_cmpxchg in perf_swevent_set_period
  perf/x86: Use local64_try_cmpxchg
  perf/amd: Prevent grouping of IBS events
2023-08-28 16:35:01 -07:00
Peter Zijlstra
882cdb06b6 x86/cpu: Fix Gracemont uarch
Alderlake N is an E-core only product using Gracemont
micro-architecture. It fits the pre-existing naming scheme perfectly
fine, adhere to it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230807150405.686834933@infradead.org
2023-08-09 21:51:06 +02:00
Rafael J. Wysocki
5534f44627 Revert "intel_idle: Add support for using intel_idle in a VM guest using just hlt"
This reverts commit 2f3d08f074b0 ("intel_idle: Add support for using
intel_idle in a VM guest using just hlt"), because it causes functional
issues to appear and it is not really useful without a related commit
that got reverted previously.

Link: https://lore.kernel.org/linux-pm/5c7de6d5-7706-c4a5-7c41-146db1269aff@intel.com
Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 20:10:03 +02:00
Rafael J. Wysocki
a5155c023d Revert "intel_idle: Add a "Long HLT" C1 state for the VM guest mode"
This reverts commit 0fac214bb75e ("intel_idle: Add a "Long HLT" C1 state
for the VM guest mode"), because there is a coding mistake in it and its
validity is questioned.

Link: https://lore.kernel.org/all/20230711132553.GN3062772@hirez.programming.kicks-ass.net
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 20:05:43 +02:00
Rafael J. Wysocki
d46b0a05bd Revert "intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()"
This reverts commit b2918089d5cb ("intel_idle: Add __init annotation to
matchup_vm_state_with_baremetal()"), because the commit fixed by it will
be reverted.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-07-19 19:56:08 +02:00
Rafael J. Wysocki
b2918089d5 intel_idle: Add __init annotation to matchup_vm_state_with_baremetal()
The caller of (recently added) matchup_vm_state_with_baremetal() is an
__init function and it uses some __initdata data structures, so add the
__init annotation to it for consistency.

This addresses the following build warnings:

WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x51 (section: .text) -> intel_idle_max_cstate_reached (section: .init.text)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x62 (section: .text) -> cpuidle_state_table (section: .init.data)
WARNING: modpost: vmlinux: section mismatch in reference: matchup_vm_state_with_baremetal+0x79 (section: .text) -> icpu (section: .init.data)

Fixes: 0fac214bb75e ("intel_idle: Add a "Long HLT" C1 state for the VM guest mode")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
2023-06-28 19:09:55 +02:00
Arjan van de Ven
0fac214bb7 intel_idle: Add a "Long HLT" C1 state for the VM guest mode
intel_idle will, for the bare metal case, usually have one or more deep
power states that have the CPUIDLE_FLAG_TLB_FLUSHED flag set. When
a state with this flag is selected by the cpuidle framework, it will also
flush the TLBs as part of entering this state. The benefit of doing this is
that the kernel does not need to wake the cpu out of this deep power state
just to flush the TLBs... for which the latency can be very high due to
the exit latency of deep power states.

In a VM guest currently, this benefit of avoiding the wakeup does not exist,
while the problem (long exit latency) is even more severe. Linux will need
to wake up a vCPU (causing the host to either come out of a deep C state,
or the VMM to have to deschedule something else to schedule the vCPU) which
can take a very long time.. adding a lot of latency to tlb flush operations
(including munmap and others).

To solve this, add a "Long HLT" C state to the state table for the VM guest
case that has the CPUIDLE_FLAG_TLB_FLUSHED flag set.  The result of that is
that for long idle periods (where the VMM is likely to do things that cause
large latency) the cpuidle framework will flush the TLBs (and avoid the
wakeups), while for short/quick idle durations, the existing behavior is
retained.

Now, there is still only "hlt" available in the guest, but for long idle,
the host can go to a deeper state (say C6).  There is a reasonable debate
one can have to what to set for the exit_latency and break even point for
this "Long HLT" state.  The good news is that intel_idle has these values
available for the underlying CPU (even when mwait is not exposed).  The
solution thus is to just use the latency and break even of the deepest state
from the bare metal CPU.  This is under the assumption that this is a pretty
reasonable estimate of what the VMM would do to cause latency.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-21 19:46:58 +02:00
Arjan van de Ven
2f3d08f074 intel_idle: Add support for using intel_idle in a VM guest using just hlt
In a typical VM guest, the mwait instruction is not available, leaving
only the 'hlt' instruction (which causes a VMEXIT to the host).

So for this common case, intel_idle will detect the lack of mwait, and
fail to initialize (after which another idle method would step in which
will just use hlt always).

Other (non-common) cases exist; the table below shows the before/after
for these:

+------------+--------------------------+-------------------------+
| Hypervisor | Idle method before patch | Idle method after patch |
| exposes    |                          |                         |
+============+==========================+=========================+
| nothing    | default_idle fallback    | intel_idle VM table     |
| (common)   | (straight "hlt")         |                         |
+------------+--------------------------+-------------------------+
| mwait      | intel_idle mwait table   | intel_idle mwait table  |
+------------+--------------------------+-------------------------+
| ACPI       | ACPI C1 state ("hlt")    | intel_idle VM table     |
+------------+--------------------------+-------------------------+

This is only applicable to CPUs known by intel_idle. For the bare metal
case, unknown CPU models will use the ACPI tables (when available) to
get estimates for latency and break even point for longer idle states.
In guests, the common case is that ACPI tables are not available, but
even when they are available, they can't and don't provide the latency
information for the longer (mwait based) states. For this scenario
(unknown CPU model), the default_idle mode (no ACPI) or ACPI C1 (ACPI
avaible) will be used.

By providing capability to do this with the intel_idle driver, we can
do better than the fallback or ACPI table methods. While this current
change only gets us to the existing behavior, later patches in this
series will add new capabilities such as optimized TLB flushing.

In order to do this, a simplified version of the initialization
function for VM guests is created, and this will be called if the CPU
is recognized, but mwait is not supported, and we're in a VM guest.

One thing to note is that the max latency (and break even) of this C1
state is higher than the typical bare metal C1 state. Because hlt causes
a vmexit, and the cost of vmexit + hypervisor overhead + vmenter is
typically in the order of upto 5 microseconds... even if the hypervisor
does not actually goes into a hardware power saving state.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
[ rjw: Dropped redundant checks from should_verify_mwait() ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-16 19:12:36 +02:00
Arjan van de Ven
7826c069c8 intel_idle: clean up the (new) state_update_enter_method function
Now that the logic for state_update_enter_method() is in its own
function, the long if .. else if .. else if .. else if chain
can be simplified by just returning from the function
at the various places. This does not change functionality,
but it makes the logic much simpler to read or modify later.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 20:07:22 +02:00
Arjan van de Ven
4622ba923e intel_idle: refactor state->enter manipulation into its own function
Since the 6.4 kernel, the logic for updating a state's enter method
based on "environmental conditions" (command line options, cpu sidechannel
workarounds etc etc) has gotten pretty complex.
This patch refactors this into a seperate small, self contained function
(no behavior changes) for improved readability and to make future
changes to this logic easier to do and understand.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-12 20:07:22 +02:00
Artem Bityutskiy
bd4468295e intel_idle: mark few variables as __read_mostly
The intention is to clean up the code and make it look a bit more
consistent.

Mark all unitialized module parameter variables as __read_mostly,
not just one of them. The other parameters are read-mostly too.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
4152379a70 intel_idle: do not sprinkle module parameter definitions around
This is a cleanup which improves code consistency. Move the force_irq_on
module parameter variable and definition to the same place where we have
variables and definitions for other module parameters.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
db1ae0c999 intel_idle: fix confusing message
By default, all non-POLL C-states are entered with interrupts disabled.

There are 2 ways to make 'intel_idle' enter C-states with interrupts
enabled:
  1. Mark the C-state with the CPUIDLE_FLAG_IRQ_ENABLE flag.
  2. Use the force_irq_on module parameter.

The former is the "proper" way of doing it, it is per-C-state and
per-platform. The latter is for debugging purposes only.

The problem is that intel_idle prints the "forced intel_idle_irq"
message in both cases, even though the former case does not needed
this message, because nothing is forced there. This patch addresses the
problem.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
00433eae17 intel_idle: improve C-state flags handling robustness
The following C-state flags are currently mutually-exclusive and should not
be combined:
  * IRQ_ENABLE
  * IBRS
  * XSTATE

There is a warning for the situation when the IRQ_ENABLE flag
is combined with the IBRS flag, but no warnings for other combinations.
This is inconsistent and prone to errors.

Improve the situation by adding warnings for all the unexpected
combinations. Add a couple of helpful commentaries too.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
1abffbd827 intel_idle: further intel_idle_init_cstates_icpu() cleanup
Introduce a temporary 'state' variable for referencing the currently
processed C-state in the intel_idle_init_cstates_icpu() function.

This makes code lines shorter and easier to read.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
a78032e94b intel_idle: clean up intel_idle_init_cstates_icpu()
The intel_idle_init_cstates_icpu() function includes a loop that iterates
over every C-state. Inside the loop, the same C-state data is referenced 2
ways:
 1. as cpuidle_state_table[cstate]
 2. as drv->states[drv->state_count] (but it is a copy of #1, not the same
    object).

Make the code be more consistent and easier to read by using only the 2nd
way. So the code structure would be as follows:

 1. Use cpuidle_state_table[cstate]
 2. Copy cpuidle_state_table[cstate] to drv->states[drv->state_count]
 3. Use only drv->states[drv->state_count] from this point.

Note, this change introduces a checkpatch.pl warning (too long line), but it
will be addressed in the next patch.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Artem Bityutskiy
91048ce422 intel_idle: use pr_info() instead of printk()
Substitute 'printk()' with 'pr_info()', because 'intel_idle' already uses
'pr_debug()', so using 'pr_info()' will be more consistent.

In addition to this, this patch addresses  the following checkpatch.pl
warning:
  WARNING: printk() should include KERN_<LEVEL> facility level

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:37:36 +02:00
Linus Torvalds
2504ba8b01 Power management updates for 6.3-rc1
- Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
    Karny, Arnd Bergmann, Bagas Sanjaya).
 
  - Drop the custom cpufreq driver for loongson1 that is not necessary
    any more and the corresponding cpufreq platform device (Keguang
    Zhang).
 
  - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
    entries (Paul E. McKenney).
 
  - Enable thermal cooling for Tegra194 (Yi-Wei Wang).
 
  - Register module device table and add missing compatibles for
    cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss).
 
  - Various dt binding updates for qcom-cpufreq-nvmem and opp-v2-kryo-cpu
    (Christian Marangi).
 
  - Make kobj_type structure in the cpufreq core constant (Thomas
    Weißschuh).
 
  - Make cpufreq_unregister_driver() return void (Uwe Kleine-König).
 
  - Make the TEO cpuidle governor check CPU utilization in order to refine
   idle state selection (Kajetan Puchalski).
 
  - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
    cpuidle driver is selected and replace a default_idle() call in that
    driver with arch_cpu_idle() to allow MWAIT to be used (Li RongQing).
 
  - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
    Bityutskiy).
 
  - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
    avoid randconfig build failures (Arnd Bergmann).
 
  - Make kobj_type structures used in the cpuidle sysfs interface
    constant (Thomas Weißschuh).
 
  - Make the cpuidle driver registration code update microsecond values
    of idle state parameters in accordance with their nanosecond values
    if they are provided (Rafael Wysocki).
 
  - Make the PSCI cpuidle driver prevent topology CPUs from being
    suspended on PREEMPT_RT (Krzysztof Kozlowski).
 
  - Document that pm_runtime_force_suspend() cannot be used with
    DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald).
 
  - Add EXPORT macros for exporting PM functions from drivers (Richard
    Fitzgerald).
 
  - Remove /** from non-kernel-doc comments in hibernation code (Randy
    Dunlap).
 
  - Fix possible name leak in powercap_register_zone() (Yang Yingliang).
 
  - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
    capping driver (Zhang Rui).
 
  - Modify the idle_inject power capping facility to support 100% idle
    injection (Srinivas Pandruvada).
 
  - Fix large time windows handling in the intel_rapl power capping
    driver (Zhang Rui).
 
  - Fix memory leaks with using debugfs_lookup() in the generic PM
    domains and Energy Model code (Greg Kroah-Hartman).
 
  - Add missing 'cache-unified' property in the example for kryo OPP
    bindings (Rob Herring).
 
  - Fix error checking in opp_migrate_dentry() (Qi Zheng).
 
  - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
    Dybcio).
 
  - Modify some power management utilities to use the canonical ftrace
    path (Ross Zwisler).
 
  - Correct spelling problems for Documentation/power/ as reported by
    codespell (Randy Dunlap).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJfMSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx/5kQAJNOVImLEPLerLP8xufw30//LuDU5Gi0
 STsyDOMql/I2MpkeqeCcgrSbpy6NlEglOvg16gfpQ3qqTCLF9ypENxs9E5BGGvW0
 aEdCzvaoqmvi9PCr/jmj0EPP70/U+rIX5m/k0QdjLh9x0aLoAEe3uRJTfR9QVqXf
 I7JX0N9kjKi7YxpA5DlkHrS7J7GPPiWlesJ3p4wXuHMo3jf+6fgkoPFt8yRrGWeh
 AHzGT2BLrsy7aAUjGZB65Qx9q3fnSXMmXOjmn0Xh2njQah+zRZDwrNzwoY2HTLL/
 KQ6/Ww16USYRZtCS1fmGwAj9I+ddq6AOvhPCMn0vLXXmKVAMUrVVWnQS/0+vpm9y
 suUMK9Tndkgxd1vjby2246ThJn27uDd/ERFan4ouQo2j22uICY+SDo3osj2hMXka
 wq4zthXkY8KgjZ+MuXnZxPhcOvo8KRvfxAU0fy5efQnSkbtwY9UlMvjPBMBHm/RA
 21/6kjQNtq5vMmI37oC8DH+oPrRQ7sUKuY7HNqwO9P3QNKWVmNe7cF5UtXXxME7Q
 ULvP1d+u+TNNdHFLryPwCSzBO34wQEccdRZBjalZ8tBe6JiDWUFHC3giSURZSuzZ
 GDvzVaNX6PkgToyv4inBTB8lTp6pAuUjaWNvNJzVvUXiEKHB0ihzg5vpJW5NdwlH
 15Tn8cjH7pp0
 =lZLx
 -----END PGP SIGNATURE-----

Merge tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These add EPP support to the AMD P-state cpufreq driver, add support
  for new platforms to the Intel RAPL power capping driver, intel_idle
  and the Qualcomm cpufreq driver, enable thermal cooling for Tegra194,
  drop the custom cpufreq driver for loongson1 that is not necessary any
  more (and the corresponding cpufreq platform device), fix assorted
  issues and clean up code.

  Specifics:

   - Add EPP support to the AMD P-state cpufreq driver (Perry Yuan, Wyes
     Karny, Arnd Bergmann, Bagas Sanjaya)

   - Drop the custom cpufreq driver for loongson1 that is not necessary
     any more and the corresponding cpufreq platform device (Keguang
     Zhang)

   - Remove "select SRCU" from system sleep, cpufreq and OPP Kconfig
     entries (Paul E. McKenney)

   - Enable thermal cooling for Tegra194 (Yi-Wei Wang)

   - Register module device table and add missing compatibles for
     cpufreq-qcom-hw (Nícolas F. R. A. Prado, Abel Vesa and Luca Weiss)

   - Various dt binding updates for qcom-cpufreq-nvmem and
     opp-v2-kryo-cpu (Christian Marangi)

   - Make kobj_type structure in the cpufreq core constant (Thomas
     Weißschuh)

   - Make cpufreq_unregister_driver() return void (Uwe Kleine-König)

   - Make the TEO cpuidle governor check CPU utilization in order to
     refine idle state selection (Kajetan Puchalski)

   - Make Kconfig select the haltpoll cpuidle governor when the haltpoll
     cpuidle driver is selected and replace a default_idle() call in
     that driver with arch_cpu_idle() to allow MWAIT to be used (Li
     RongQing)

   - Add Emerald Rapids Xeon support to the intel_idle driver (Artem
     Bityutskiy)

   - Add ARCH_SUSPEND_POSSIBLE dependencies for ARMv4 cpuidle drivers to
     avoid randconfig build failures (Arnd Bergmann)

   - Make kobj_type structures used in the cpuidle sysfs interface
     constant (Thomas Weißschuh)

   - Make the cpuidle driver registration code update microsecond values
     of idle state parameters in accordance with their nanosecond values
     if they are provided (Rafael Wysocki)

   - Make the PSCI cpuidle driver prevent topology CPUs from being
     suspended on PREEMPT_RT (Krzysztof Kozlowski)

   - Document that pm_runtime_force_suspend() cannot be used with
     DPM_FLAG_SMART_SUSPEND (Richard Fitzgerald)

   - Add EXPORT macros for exporting PM functions from drivers (Richard
     Fitzgerald)

   - Remove /** from non-kernel-doc comments in hibernation code (Randy
     Dunlap)

   - Fix possible name leak in powercap_register_zone() (Yang Yingliang)

   - Add Meteor Lake and Emerald Rapids support to the intel_rapl power
     capping driver (Zhang Rui)

   - Modify the idle_inject power capping facility to support 100% idle
     injection (Srinivas Pandruvada)

   - Fix large time windows handling in the intel_rapl power capping
     driver (Zhang Rui)

   - Fix memory leaks with using debugfs_lookup() in the generic PM
     domains and Energy Model code (Greg Kroah-Hartman)

   - Add missing 'cache-unified' property in the example for kryo OPP
     bindings (Rob Herring)

   - Fix error checking in opp_migrate_dentry() (Qi Zheng)

   - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad
     Dybcio)

   - Modify some power management utilities to use the canonical ftrace
     path (Ross Zwisler)

   - Correct spelling problems for Documentation/power/ as reported by
     codespell (Randy Dunlap)"

* tag 'pm-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (53 commits)
  Documentation: amd-pstate: disambiguate user space sections
  cpufreq: amd-pstate: Fix invalid write to MSR_AMD_CPPC_REQ
  dt-bindings: opp: opp-v2-kryo-cpu: enlarge opp-supported-hw maximum
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: make cpr bindings optional
  dt-bindings: cpufreq: qcom-cpufreq-nvmem: specify supported opp tables
  PM: Add EXPORT macros for exporting PM functions
  cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT
  MIPS: loongson32: Drop obsolete cpufreq platform device
  powercap: intel_rapl: Fix handling for large time window
  cpuidle: driver: Update microsecond values of state parameters as needed
  cpuidle: sysfs: make kobj_type structures constant
  cpuidle: add ARCH_SUSPEND_POSSIBLE dependencies
  PM: EM: fix memory leak with using debugfs_lookup()
  PM: domains: fix memory leak with using debugfs_lookup()
  cpufreq: Make kobj_type structure constant
  cpufreq: davinci: Fix clk use after free
  cpufreq: amd-pstate: avoid uninitialized variable use
  cpufreq: Make cpufreq_unregister_driver() return void
  OPP: fix error checking in opp_migrate_dentry()
  dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM8550 compatible
  ...
2023-02-21 12:13:58 -08:00
Artem Bityutskiy
74528edfbc intel_idle: add Emerald Rapids Xeon support
Emerald Rapids (EMR) is the next Intel Xeon processor after Sapphire
Rapids (SPR).

EMR C-states are the same as SPR C-states, and we expect that EMR
C-state characteristics (latency and target residency) will be the
same as in SPR. Therefore, add EMR support by using SPR C-states table.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 18:39:42 +01:00
Peter Zijlstra
365bd03ff6 intel_idle: Add force_irq_on module param
For testing purposes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195541.967699392@infradead.org
2023-01-13 11:48:17 +01:00
Peter Zijlstra
9b461a6faa cpuidle, intel_idle: Fix CPUIDLE_FLAG_IBRS
objtool to the rescue:

  vmlinux.o: warning: objtool: intel_idle_ibrs+0x17: call to spec_ctrl_current() leaves .noinstr.text section
  vmlinux.o: warning: objtool: intel_idle_ibrs+0x27: call to wrmsrl.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.556912863@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra
6d9c7f51b1 cpuidle, intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE *again*
So objtool found this bug:

  vmlinux.o: warning: objtool: intel_idle_irq+0x10c: call to trace_hardirqs_off() leaves .noinstr.text section

As per commit 32d4fd5751ea ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE"):

  "must not have tracing in idle functions"

Clearly people can't read and tinker along until splat dissapears.
This straight up reverts commit d295ad34f236 ("intel_idle: Fix false
positive RCU splats due to incorrect hardirqs state").

It doesn't re-introduce the problem because preceding patches fixed it
properly.

Fixes: d295ad34f236 ("intel_idle: Fix false positive RCU splats due to incorrect hardirqs state")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.434302128@infradead.org
2023-01-13 11:48:15 +01:00
Zhang Rui
65c0c2367e intel_idle: Add AlderLake-N support
Similar to the other other AlderLake platforms, the C1 and C1E states on
ADL-N are mutually exclusive. Only one of them can be enabled at a time.

C1E is preferred on ADL-N for better energy efficiency.

C6S is also supported on this platform. Its latency is far bigger than
C6, but really close to C8 (PC8), thus it is not exposed as a separate
state.

Suggested-by: Baieswara Reddy Sagili <baieswara.reddy.sagili@intel.com>
Suggested-by: Vinay Kumar <vinay.kumar@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21 20:33:49 +02:00