]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
3 years agoarm64: atomics: lse: Dereference matching size
Kees Cook [Wed, 12 Jan 2022 20:22:59 +0000 (12:22 -0800)]
arm64: atomics: lse: Dereference matching size

When building with -Warray-bounds, the following warning is generated:

In file included from ./arch/arm64/include/asm/lse.h:16,
                 from ./arch/arm64/include/asm/cmpxchg.h:14,
                 from ./arch/arm64/include/asm/atomic.h:16,
                 from ./include/linux/atomic.h:7,
                 from ./include/asm-generic/bitops/atomic.h:5,
                 from ./arch/arm64/include/asm/bitops.h:25,
                 from ./include/linux/bitops.h:33,
                 from ./include/linux/kernel.h:22,
                 from kernel/printk/printk.c:22:
./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds]
  247 |         asm volatile(                                                   \
      |         ^~~
./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE'
  266 | __CMPXCHG_CASE(w,  , acq_, 32,  a, "memory")
      | ^~~~~~~~~~~~~~
kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner'
 3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
      |                 ^~~~~~~~~~~~~~~~~~~~

This is due to the compiler seeing an unsigned long * cast against
something (atomic_t) that is int sized. Replace the cast with the
matching size cast. This results in no change in binary output.

Note that __ll_sc__cmpxchg_case_##name##sz already uses the same
constraint:

[v] "+Q" (*(u##sz *)ptr

Which is why only the LSE form needs updating and not the
LL/SC form, so this change is unlikely to be problematic.

Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoasm-generic: Add missing brackets for io_stop_wc macro
Xiongfeng Wang [Fri, 14 Jan 2022 10:58:57 +0000 (18:58 +0800)]
asm-generic: Add missing brackets for io_stop_wc macro

After using io_stop_wc(), drivers reports following compile error when
compiled on X86.

  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: In function ‘hns3_tx_push_bd’:
  drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:2058:12: error: expected ‘;’ before ‘(’ token
    io_stop_wc();
              ^
It is because I missed to add the brackets after io_stop_wc macro. So
let's add the missing brackets.

Fixes: d5624bb29f49 ("asm-generic: introduce io_stop_wc() and add implementation for ARM64")
Reported-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Link: https://lore.kernel.org/r/20220114105857.126300-1-wangxiongfeng2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoMerge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace',...
Catalin Marinas [Wed, 5 Jan 2022 18:14:32 +0000 (18:14 +0000)]
Merge branches 'for-next/misc', 'for-next/cache-ops-dzp', 'for-next/stacktrace', 'for-next/xor-neon', 'for-next/kasan', 'for-next/armv8_7-fp', 'for-next/atomics', 'for-next/bti', 'for-next/sve', 'for-next/kselftest' and 'for-next/kcsan', remote-tracking branch 'arm64/for-next/perf' into for-next/core

* arm64/for-next/perf: (32 commits)
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  ...

* for-next/misc:
  : Miscellaneous patches
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64/fp: Add comments documenting the usage of state restore functions
  arm64: mm: Use asid feature macro for cheanup
  arm64: mm: Rename asid2idx() to ctxid2asid()
  arm64: kexec: reduce calls to page_address()
  arm64: extable: remove unused ex_handler_t definition
  arm64: entry: Use SDEI event constants
  arm64: Simplify checking for populated DT
  arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

* for-next/cache-ops-dzp:
  : Avoid DC instructions when DCZID_EL0.DZP == 1
  arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
  arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

* for-next/stacktrace:
  : Unify the arm64 unwind code
  arm64: Make some stacktrace functions private
  arm64: Make dump_backtrace() use arch_stack_walk()
  arm64: Make profile_pc() use arch_stack_walk()
  arm64: Make return_address() use arch_stack_walk()
  arm64: Make __get_wchan() use arch_stack_walk()
  arm64: Make perf_callchain_kernel() use arch_stack_walk()
  arm64: Mark __switch_to() as __sched
  arm64: Add comment for stack_info::kr_cur
  arch: Make ARCH_STACKWALK independent of STACKTRACE

* for-next/xor-neon:
  : Use SHA3 instructions to speed up XOR
  arm64/xor: use EOR3 instructions when available

* for-next/kasan:
  : Log potential KASAN shadow aliases
  arm64: mm: log potential KASAN shadow alias
  arm64: mm: use die_kernel_fault() in do_mem_abort()

* for-next/armv8_7-fp:
  : Add HWCAPS for ARMv8.7 FEAT_AFP amd FEAT_RPRES
  arm64: cpufeature: add HWCAP for FEAT_RPRES
  arm64: add ID_AA64ISAR2_EL1 sys register
  arm64: cpufeature: add HWCAP for FEAT_AFP

* for-next/atomics:
  : arm64 atomics clean-ups and codegen improvements
  arm64: atomics: lse: define RETURN ops in terms of FETCH ops
  arm64: atomics: lse: improve constraints for simple ops
  arm64: atomics: lse: define ANDs in terms of ANDNOTs
  arm64: atomics lse: define SUBs in terms of ADDs
  arm64: atomics: format whitespace consistently

* for-next/bti:
  : BTI clean-ups
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: Use BTI C directly and unconditionally
  arm64: Unconditionally override SYM_FUNC macros
  arm64: Add macro version of the BTI instruction
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

* for-next/sve:
  : SVE code clean-ups and refactoring in prepararation of Scalable Matrix Extensions
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME

* for-next/kselftest:
  : arm64 kselftest additions
  kselftest/arm64: Add pidbench for floating point syscall cases
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information

* for-next/kcsan:
  : Enable KCSAN for arm64
  arm64: Enable KCSAN

3 years agoarm64: Use correct method to calculate nomap region boundaries
Huacai Chen [Fri, 22 Oct 2021 07:06:46 +0000 (15:06 +0800)]
arm64: Use correct method to calculate nomap region boundaries

Nomap regions are treated as "reserved". When region boundaries are not
page aligned, we usually increase the "reserved" regions rather than
decrease them. So, we should use memblock_region_reserved_base_pfn()/
memblock_region_reserved_end_pfn() instead of memblock_region_memory_
base_pfn()/memblock_region_memory_base_pfn() to calculate boundaries.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/r/20211022070646.41923-1-chenhuacai@loongson.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Drop outdated links in comments
Kees Cook [Wed, 15 Dec 2021 19:18:35 +0000 (11:18 -0800)]
arm64: Drop outdated links in comments

As started by commit 05a5f51ca566 ("Documentation: Replace lkml.org links
with lore"), an effort was made to replace lkml.org links with lore to
better use a single source that's more likely to stay available long-term.
However, it seems these links don't offer much value here, so just
remove them entirely.

Cc: Joe Perches <joe@perches.com>
Suggested-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/20210211100213.GA29813@willie-the-truck/
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20211215191835.1420010-1-keescook@chromium.org
[catalin.marinas@arm.com: removed the arch/arm changes]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: perf: Don't register user access sysctl handler multiple times
Will Deacon [Tue, 4 Jan 2022 14:57:14 +0000 (14:57 +0000)]
arm64: perf: Don't register user access sysctl handler multiple times

Commit e2012600810c ("arm64: perf: Add userspace counter access disable
switch") introduced a new 'perf_user_access' sysctl file to enable and
disable direct userspace access to the PMU counters. Sadly, Geert
reports that on his big.LITTLE SoC ('Renesas Salvator-XS w/ R-Car H3'),
the file is created for each PMU type probed, resulting in a splat
during boot:

  | hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
  | sysctl duplicate entry: /kernel//perf_user_access
  | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.16.0-rc3-arm64-renesas-00003-ge2012600810c #1420
  | Hardware name: Renesas Salvator-X 2nd version board based on r8a77951 (DT)
  | Call trace:
  |  dump_backtrace+0x0/0x190
  |  show_stack+0x14/0x20
  |  dump_stack_lvl+0x88/0xb0
  |  dump_stack+0x14/0x2c
  |  __register_sysctl_table+0x384/0x818
  |  register_sysctl+0x20/0x28
  |  armv8_pmu_init.constprop.0+0x118/0x150
  |  armv8_a57_pmu_init+0x1c/0x28
  |  arm_pmu_device_probe+0x1b4/0x558
  |  armv8_pmu_device_probe+0x18/0x20
  |  platform_probe+0x64/0xd0
  |  hw perfevents: enabled with armv8_cortex_a57 PMU driver, 7 counters available

Introduce a state variable to track creation of the sysctl file and
ensure that it is only created once.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: e2012600810c ("arm64: perf: Add userspace counter access disable switch")
Link: https://lore.kernel.org/r/CAMuHMdVcDxR9sGzc5pcnORiotonERBgc6dsXZXMd6wTvLGA9iw@mail.gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodrivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
Dan Carpenter [Fri, 17 Dec 2021 14:59:08 +0000 (17:59 +0300)]
drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check

The devm_ioremap() function does not return error pointers.  It returns
NULL.

Fixes: 036a7584bede ("drivers: perf: Add LLC-TAD perf counter support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20211217145907.GA16611@kili
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/smmuv3: Fix unused variable warning when CONFIG_OF=n
Will Deacon [Tue, 4 Jan 2022 13:34:12 +0000 (13:34 +0000)]
perf/smmuv3: Fix unused variable warning when CONFIG_OF=n

The kbuild robot reports that building the SMMUv3 PMU driver with
CONFIG_OF=n results in a warning for W=1 builds:

>> drivers/perf/arm_smmuv3_pmu.c:889:34: warning: unused variable 'smmu_pmu_of_match' [-Wunused-const-variable]
   static const struct of_device_id smmu_pmu_of_match[] = {
                                    ^

Guard the match table with #ifdef CONFIG_OF.

Link: https://lore.kernel.org/r/202201041700.01KZEzhb-lkp@intel.com
Fixes: 3f7be4356176 ("perf/smmuv3: Add devicetree support")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: errata: Fix exec handling in erratum 1418040 workaround
D Scott Phillips [Mon, 20 Dec 2021 23:41:14 +0000 (15:41 -0800)]
arm64: errata: Fix exec handling in erratum 1418040 workaround

The erratum 1418040 workaround enables CNTVCT_EL1 access trapping in EL0
when executing compat threads. The workaround is applied when switching
between tasks, but the need for the workaround could also change at an
exec(), when a non-compat task execs a compat binary or vice versa. Apply
the workaround in arch_setup_new_exec().

This leaves a small window of time between SET_PERSONALITY and
arch_setup_new_exec where preemption could occur and confuse the old
workaround logic that compares TIF_32BIT between prev and next. Instead, we
can just read cntkctl to make sure it's in the state that the next task
needs. I measured cntkctl read time to be about the same as a mov from a
general-purpose register on N1. Update the workaround logic to examine the
current value of cntkctl instead of the previous task's compat state.

Fixes: d49f7d7376d0 ("arm64: Move handling of erratum 1418040 into C code")
Cc: <stable@vger.kernel.org> # 5.9.x
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211220234114.3926-1-scott@os.amperecomputing.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Unhash early pointer print plus improve comment
Guilherme G. Piccoli [Tue, 21 Dec 2021 15:52:30 +0000 (12:52 -0300)]
arm64: Unhash early pointer print plus improve comment

When facing a really early issue on DT parsing we have currently
a message that shows both the physical and virtual address of the FDT.
The printk pointer modifier for the virtual address shows a hashed
address there unless the user provides "no_hash_pointers" parameter in
the command-line. The situation in which this message shows-up is a bit
more serious though: the boot process is broken, nothing can be done
(even an oops is too much for this early stage) so we have this message
as a last resort in order to help debug bootloader issues, for example.
Hence, we hereby change that to "%px" in order to make debugging easy,
there's not much information leak risk in such early boot failure.

Also, we tried to improve a bit the commenting on that function, given
that if kernel fails there, it just hangs forever in a cpu_relax() loop.
The reason we cannot BUG/panic is that is too early to do so; thanks to
Mark Brown for pointing that on IRC and thanks Robin Murphy for the good
pointer hash discussion in the mailing-list.

Cc: Mark Brown <broonie@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211221155230.1532850-1-gpiccoli@igalia.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoasm-generic: introduce io_stop_wc() and add implementation for ARM64
Xiongfeng Wang [Tue, 21 Dec 2021 03:55:56 +0000 (11:55 +0800)]
asm-generic: introduce io_stop_wc() and add implementation for ARM64

For memory accesses with write-combining attributes (e.g. those returned
by ioremap_wc()), the CPU may wait for prior accesses to be merged with
subsequent ones. But in some situation, such wait is bad for the
performance.

We introduce io_stop_wc() to prevent the merging of write-combining
memory accesses before this macro with those after it.

We add implementation for ARM64 using DGH instruction and provide NOP
implementation for other architectures.

Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Suggested-by: Will Deacon <will@kernel.org>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20211221035556.60346-1-wangxiongfeng2@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Ensure that the 'bti' macro is defined where linkage.h is included
Catalin Marinas [Fri, 17 Dec 2021 16:20:45 +0000 (16:20 +0000)]
arm64: Ensure that the 'bti' macro is defined where linkage.h is included

Not all .S files include asm/assembler.h, however the SYM_FUNC_*
definitions invoke the 'bti' macro. Include asm/assembler.h in
asm/linkage.h.

Fixes: 9be34be87cc8 ("arm64: Add macro version of the BTI instruction")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: remove __dma_*_area() aliases
Mark Rutland [Mon, 6 Dec 2021 12:47:12 +0000 (12:47 +0000)]
arm64: remove __dma_*_area() aliases

The __dma_inv_area() and __dma_clean_area() aliases make cache.S harder
to navigate, but don't gain us anything in practice.

For clarity, let's remove them along with their redundant comments. The
only users are __dma_map_area() and __dma_unmap_area(), which need to be
position independent, and can call __pi_dcache_inval_poc() and
__pi_dcache_clean_poc() directly.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Fuad Tabba <tabba@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211206124715.4101571-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agodocs/arm64: delete a space from tagged-address-abi
Yanteng Si [Thu, 9 Dec 2021 09:19:22 +0000 (17:19 +0800)]
docs/arm64: delete a space from tagged-address-abi

Since e71e2ace5721("userfaultfd: do not untag user pointers") which
introduced a warning:

linux/Documentation/arm64/tagged-address-abi.rst:52: WARNING: Unexpected indentation.

Let's fix it.

Signed-off-by: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20211209091922.560979-1-siyanteng@loongson.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Enable KCSAN
Kefeng Wang [Sat, 11 Dec 2021 13:17:34 +0000 (21:17 +0800)]
arm64: Enable KCSAN

This patch enables KCSAN for arm64, with updates to build rules
to not use KCSAN for several incompatible compilation units.

Recent GCC version(at least GCC10) made outline-atomics as the
default option(unlike Clang), which will cause linker errors
for kernel/kcsan/core.o. Disables the out-of-line atomics by
no-outline-atomics to fix the linker errors.

Meanwhile, as Mark said[1], some latent issues are needed to be
fixed which isn't just a KCSAN problem, we make the KCSAN depends
on EXPERT for now.

Tested selftest and kcsan_test(built with GCC11 and Clang 13),
and all passed.

[1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N

Acked-by: Marco Elver <elver@google.com> # kernel/kcsan
Tested-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20211211131734.126874-1-wangkefeng.wang@huawei.com
[catalin.marinas@arm.com: added comment to justify EXPERT]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokselftest/arm64: Add pidbench for floating point syscall cases
Mark Brown [Thu, 2 Dec 2021 16:51:07 +0000 (16:51 +0000)]
kselftest/arm64: Add pidbench for floating point syscall cases

Since it's likely to be useful for performance work with SVE let's have a
pidbench that gives us some numbers for consideration. In order to ensure
that we test exactly the scenario we want this is written in assembly - if
system libraries use SVE this would stop us exercising the case where the
process has never used SVE.

We exercise three cases:

 - Never having used SVE.
 - Having used SVE once.
 - Using SVE after each syscall.

by spinning running getpid() for a fixed number of iterations with the
time measured using CNTVCT_EL0 reported on the console. This is obviously
a totally unrealistic benchmark which will show the extremes of any
performance variation but equally given the potential gotchas with use of
FP instructions by system libraries it's good to have some concrete code
shared to make it easier to compare notes on results.

Testing over multiple SVE vector lengths will need to be done with vlset
currently, the test could be extended to iterate over all of them if
desired.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211202165107.1075259-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/fp: Add comments documenting the usage of state restore functions
Mark Brown [Tue, 7 Dec 2021 16:32:50 +0000 (16:32 +0000)]
arm64/fp: Add comments documenting the usage of state restore functions

Add comments to help people figure out when fpsimd_bind_state_to_cpu() and
fpsimd_update_current_state() are used.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211207163250.1373542-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokselftest/arm64: Add a test program to exercise the syscall ABI
Mark Brown [Fri, 10 Dec 2021 18:41:02 +0000 (18:41 +0000)]
kselftest/arm64: Add a test program to exercise the syscall ABI

Currently we don't have any coverage of the syscall ABI so let's add a very
dumb test program which sets up register patterns, does a sysscall and then
checks that the register state after the syscall matches what we expect.
The program is written in an extremely simplistic fashion with the goal of
making it easy to verify that it's doing what it thinks it's doing, it is
not a model of how one should write actual code.

Currently we validate the general purpose, FPSIMD and SVE registers. There
are other thing things that could be covered like FPCR and flags registers,
these can be covered incrementally - my main focus at the minute is
covering the ABI for the SVE registers.

The program repeats the tests for all possible SVE vector lengths in case
some vector length specific optimisation causes issues, as well as testing
FPSIMD only. It tries two syscalls, getpid() and sched_yield(), in an
effort to cover both immediate return to userspace and scheduling another
task though there are no guarantees which cases will be hit.

A new test directory "abi" is added to hold the test, it doesn't seem to
fit well into any of the existing directories.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-7-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokselftest/arm64: Allow signal tests to trigger from a function
Mark Brown [Fri, 10 Dec 2021 18:41:01 +0000 (18:41 +0000)]
kselftest/arm64: Allow signal tests to trigger from a function

Currently we have the facility to specify custom code to trigger a signal
but none of the tests use it and for some reason the framework requires us
to also specify a signal to send as a trigger in order to make use of a
custom trigger. This doesn't seem to make much sense, instead allow the
use of a custom trigger function without specifying a signal to inject.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-6-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agokselftest/arm64: Parameterise ptrace vector length information
Mark Brown [Fri, 10 Dec 2021 18:41:00 +0000 (18:41 +0000)]
kselftest/arm64: Parameterise ptrace vector length information

SME introduces a new mode called streaming mode in which the SVE registers
have a different vector length. Since the ptrace interface for this is
based on the existing SVE interface prepare for supporting this by moving
the regset specific configuration into  struct and passing that around,
allowing these tests to be reused for streaming mode. As we will also have
to verify the interoperation of the SVE and streaming SVE regsets don't
just iterate over an array.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-5-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/sve: Minor clarification of ABI documentation
Mark Brown [Fri, 10 Dec 2021 18:40:59 +0000 (18:40 +0000)]
arm64/sve: Minor clarification of ABI documentation

As suggested by Luis for the SME version of this explicitly say that the
vector length should be extracted from the return value of a set vector
length prctl() with a bitwise and rather than just any old and.

Suggested-by: Luis Machado <Luis.Machado@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/sve: Generalise vector length configuration prctl() for SME
Mark Brown [Fri, 10 Dec 2021 18:40:58 +0000 (18:40 +0000)]
arm64/sve: Generalise vector length configuration prctl() for SME

In preparation for adding SME support update the bulk of the implementation
for the vector length configuration prctl() calls to be independent of
vector type.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/sve: Make sysctl interface for SVE reusable by SME
Mark Brown [Fri, 10 Dec 2021 18:40:57 +0000 (18:40 +0000)]
arm64/sve: Make sysctl interface for SVE reusable by SME

The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoMerge branch 'for-next/perf-cpu' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 18:13:25 +0000 (18:13 +0000)]
Merge branch 'for-next/perf-cpu' into for-next/perf

* for-next/perf-cpu:
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs

3 years agoarm64: Use BTI C directly and unconditionally
Mark Brown [Tue, 14 Dec 2021 15:27:14 +0000 (15:27 +0000)]
arm64: Use BTI C directly and unconditionally

Now we have a macro for BTI C that looks like a regular instruction change
all the users of the current BTI_C macro to just emit a BTI C directly and
remove the macro.

This does mean that we now unconditionally BTI annotate all assembly
functions, meaning that they are worse in this respect than code generated
by the compiler. The overhead should be minimal for implementations with a
reasonable HINT implementation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Unconditionally override SYM_FUNC macros
Mark Brown [Tue, 14 Dec 2021 15:27:13 +0000 (15:27 +0000)]
arm64: Unconditionally override SYM_FUNC macros

Currently we only override the SYM_FUNC macros when we need to insert
BTI C into them, do this unconditionally to make it more likely that we'll
notice bugs in our override.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Add macro version of the BTI instruction
Mark Brown [Tue, 14 Dec 2021 15:27:12 +0000 (15:27 +0000)]
arm64: Add macro version of the BTI instruction

BTI is only available from v8.5 so we need to encode it using HINT in
generic code and for older toolchains. Add an assembler macro based on
one written by Mark Rutland which lets us use the mnemonic and update
the existing users.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoMerge 'arm64/for-next/fixes' into for-next/bti
Catalin Marinas [Tue, 14 Dec 2021 18:11:52 +0000 (18:11 +0000)]
Merge 'arm64/for-next/fixes' into for-next/bti

Needed for the arch/arm64/kernel/entry-ftrace.S fix.

* commit 'arm64/for-next/fixes^^':
  arm64: ftrace: add missing BTIs
  arm64: kexec: use __pa_symbol(empty_zero_page)
  arm64: update PAC description for kernel

3 years agoarm64: perf: Support new DT compatibles
Robin Murphy [Tue, 14 Dec 2021 14:16:15 +0000 (14:16 +0000)]
arm64: perf: Support new DT compatibles

Wire up the new DT compatibles so we can present appropriate
PMU names to userspace for the latest and greatest CPUs.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/62d14ba12d847ec7f1fba7cb0b3b881b437e1cc5.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: perf: Simplify registration boilerplate
Robin Murphy [Tue, 14 Dec 2021 14:16:14 +0000 (14:16 +0000)]
arm64: perf: Simplify registration boilerplate

With the trend for per-core events moving to userspace JSON, registering
names for PMUv3 implementations is increasingly a pure boilerplate
exercise. Let's wrap things a step further so we can generate the basic
PMUv3 init function with a macro invocation, and reduce further new
addition to just 2 lines each.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b79477ea3b97f685d00511d4ecd2f686184dca34.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: perf: Support Denver and Carmel PMUs
Thierry Reding [Tue, 14 Dec 2021 14:16:13 +0000 (14:16 +0000)]
arm64: perf: Support Denver and Carmel PMUs

Add support for the NVIDIA Denver and Carmel PMUs using the generic
PMUv3 event map for now.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
[ rm: reorder entries alphabetically ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5f0f69d47acca78a9e479501aa4d8b429e23cf11.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoMerge branch 'for-next/perf-user-counter-access' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 13:42:22 +0000 (13:42 +0000)]
Merge branch 'for-next/perf-user-counter-access' into for-next/perf

* for-next/perf-user-counter-access:
  Documentation: arm64: Document PMU counters access from userspace
  arm64: perf: Enable PMU counter userspace access for perf event
  arm64: perf: Add userspace counter access disable switch
  perf: Add a counter for number of user access events in context
  x86: perf: Move RDPMC event flag to a common definition

3 years agoMerge branch 'for-next/perf-smmu' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 13:42:17 +0000 (13:42 +0000)]
Merge branch 'for-next/perf-smmu' into for-next/perf

* for-next/perf-smmu:
  perf/smmuv3: Synthesize IIDR from CoreSight ID registers
  perf/smmuv3: Add devicetree support
  dt-bindings: Add Arm SMMUv3 PMCG binding

3 years agoMerge branch 'for-next/perf-hisi' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 13:42:05 +0000 (13:42 +0000)]
Merge branch 'for-next/perf-hisi' into for-next/perf

* for-next/perf-hisi:
  drivers/perf: hisi: Add driver for HiSilicon PCIe PMU
  docs: perf: Add description for HiSilicon PCIe PMU driver

3 years agoMerge branch 'for-next/perf-cn10k' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 13:41:58 +0000 (13:41 +0000)]
Merge branch 'for-next/perf-cn10k' into for-next/perf

* for-next/perf-cn10k:
  dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
  drivers: perf: Add LLC-TAD perf counter support

3 years agoMerge branch 'for-next/perf-cmn' into for-next/perf
Will Deacon [Tue, 14 Dec 2021 13:41:42 +0000 (13:41 +0000)]
Merge branch 'for-next/perf-cmn' into for-next/perf

* for-next/perf-cmn:
  perf/arm-cmn: Add debugfs topology info
  perf/arm-cmn: Add CI-700 Support
  dt-bindings: perf: arm-cmn: Add CI-700
  perf/arm-cmn: Support new IP features
  perf/arm-cmn: Demarcate CMN-600 specifics
  perf/arm-cmn: Move group validation data off-stack
  perf/arm-cmn: Optimise DTC counter accesses
  perf/arm-cmn: Optimise DTM counter reads
  perf/arm-cmn: Refactor DTM handling
  perf/arm-cmn: Streamline node iteration
  perf/arm-cmn: Refactor node ID handling
  perf/arm-cmn: Drop compile-test restriction
  perf/arm-cmn: Account for NUMA affinity
  perf/arm-cmn: Fix CPU hotplug unregistration

3 years agoarm64: atomics: lse: define RETURN ops in terms of FETCH ops
Mark Rutland [Fri, 10 Dec 2021 15:14:10 +0000 (15:14 +0000)]
arm64: atomics: lse: define RETURN ops in terms of FETCH ops

The FEAT_LSE atomic instructions include LD* instructions which return
the original value of a memory location can be used to directly
implement FETCH opertations. Each RETURN op is implemented as a copy of
the corresponding FETCH op with a trailing instruction to generate the
new value of the memory location. We only directly implement
*_fetch_add*(), for which we have a trailing `add` instruction.

As the compiler has no visibility of the `add`, this leads to less than
optimal code generation when consuming the result.

For example, the compiler cannot constant-fold the addition into later
operations, and currently GCC 11.1.0 will compile:

       return __lse_atomic_sub_return(1, v) == 0;

As:

mov     w1, #0xffffffff
ldaddal w1, w2, [x0]
add     w1, w1, w2
cmp     w1, #0x0
cset    w0, eq  // eq = none
ret

This patch improves this by replacing the `add` with C addition after
the inline assembly block, e.g.

ret += i;

This allows the compiler to manipulate `i`. This permits the compiler to
merge the `add` and `cmp` for the above, e.g.

mov     w1, #0xffffffff
ldaddal w1, w1, [x0]
cmp     w1, #0x1
cset    w0, eq  // eq = none
ret

With this change the assembly for each RETURN op is identical to the
corresponding FETCH op (including barriers and clobbers) so I've removed
the inline assembly and rewritten each RETURN op in terms of the
corresponding FETCH op, e.g.

| static inline void __lse_atomic_add_return(int i, atomic_t *v)
| {
|       return __lse_atomic_fetch_add(i, v) + i
| }

The new construction does not adversely affect the common case, and
before and after this patch GCC 11.1.0 can compile:

__lse_atomic_add_return(i, v)

As:

ldaddal w0, w2, [x1]
add     w0, w0, w2

... while having the freedom to do better elsewhere.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: atomics: lse: improve constraints for simple ops
Mark Rutland [Fri, 10 Dec 2021 15:14:09 +0000 (15:14 +0000)]
arm64: atomics: lse: improve constraints for simple ops

We have overly conservative assembly constraints for the basic FEAT_LSE
atomic instructions, and using more accurate and permissive constraints
will allow for better code generation.

The FEAT_LSE basic atomic instructions have come in two forms:

LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
ST{op}{order}{size} <Rs>, [<Rn>]

The ST* forms are aliases of the LD* forms where:

ST{op}{order}{size} <Rs>, [<Rn>]
Is:
LD{op}{order}{size} <Rs>, XZR, [<Rn>]

For either form, both <Rs> and <Rn> are read but not written back to,
and <Rt> is written with the original value of the memory location.
Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
other register value(s) are consumed. There are no UNPREDICTABLE or
CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
<Rn> are the same register.

Our current inline assembly always uses <Rs> == <Rt>, treating this
register as both an input and an output (using a '+r' constraint). This
forces the compiler to do some unnecessary register shuffling and/or
redundant value generation.

For example, the compiler cannot reuse the <Rs> value, and currently GCC
11.1.0 will compile:

__lse_atomic_add(1, a);
__lse_atomic_add(1, b);
__lse_atomic_add(1, c);

As:

mov     w3, #0x1
mov     w4, w3
stadd   w4, [x0]
mov     w0, w3
stadd   w0, [x1]
stadd   w3, [x2]

We can improve this with more accurate constraints, separating <Rs> and
<Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
output-only value ('=r'). As <Rt> is written back after <Rs> is
consumed, it does not need to be earlyclobber ('=&r'), leaving the
compiler free to use the same register for both <Rs> and <Rt> where this
is desirable.

At the same time, the redundant 'r' constraint for `v` is removed, as
the `+Q` constraint is sufficient.

With this change, the above example becomes:

mov     w3, #0x1
stadd   w3, [x0]
stadd   w3, [x1]
stadd   w3, [x2]

I've made this change for the non-value-returning and FETCH ops. The
RETURN ops have a multi-instruction sequence for which we cannot use the
same constraints, and a subsequent patch will rewrite hte RETURN ops in
terms of the FETCH ops, relying on the ability for the compiler to reuse
the <Rs> value.

This is intended as an optimization.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: atomics: lse: define ANDs in terms of ANDNOTs
Mark Rutland [Fri, 10 Dec 2021 15:14:08 +0000 (15:14 +0000)]
arm64: atomics: lse: define ANDs in terms of ANDNOTs

The FEAT_LSE atomic instructions include atomic bit-clear instructions
(`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
operations. Each AND op is implemented as a copy of the corresponding
ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
`i` argument.

As the compiler has no visibility of the `mvn`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __lse_atomic_fetch_and(0xf, v) can be compiled to:

mov     w1, #0xf
mvn     w1, w1
ldclral w1, w1, [x2]

This patch improves this by replacing the `mvn` with NOT in C before the
inline assembly block, e.g.

i = ~i;

This allows the compiler to generate `i` into a register more optimally,
e.g.

mov     w1, #0xfffffff0
ldclral w1, w1, [x2]

With this change the assembly for each AND op is identical to the
corresponding ANDNOT op (including barriers and clobbers), so I've
removed the inline assembly and rewritten each AND op in terms of the
corresponding ANDNOT op, e.g.

| static inline void __lse_atomic_and(int i, atomic_t *v)
| {
|  return __lse_atomic_andnot(~i, v);
| }

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: atomics lse: define SUBs in terms of ADDs
Mark Rutland [Fri, 10 Dec 2021 15:14:07 +0000 (15:14 +0000)]
arm64: atomics lse: define SUBs in terms of ADDs

The FEAT_LSE atomic instructions include atomic ADD instructions
(`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
we must build all of the SUB operations using the ADD instructions. We
open-code these today, with each SUB op implemented as a copy of the
corresponding ADD op with a leading `neg` instruction in the inline
assembly to negate the `i` argument.

As the compiler has no visibility of the `neg`, this leads to less than
optimal code generation when generating `i` into a register. For
example, __les_atomic_fetch_sub(1, v) can be compiled to:

mov     w1, #0x1
neg     w1, w1
ldaddal w1, w1, [x2]

This patch improves this by replacing the `neg` with negation in C
before the inline assembly block, e.g.

i = -i;

This allows the compiler to generate `i` into a register more optimally,
e.g.

mov     w1, #0xffffffff
ldaddal w1, w1, [x2]

With this change the assembly for each SUB op is identical to the
corresponding ADD op (including barriers and clobbers), so I've removed
the inline assembly and rewritten each SUB op in terms of the
corresponding ADD op, e.g.

| static inline void __lse_atomic_sub(int i, atomic_t *v)
| {
|  __lse_atomic_add(-i, v);
| }

For clarity I've moved the definition of each SUB op immediately after
the corresponding ADD op, and used a single macro to create the RETURN
forms of both ops.

This is intended as an optimization and cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: atomics: format whitespace consistently
Mark Rutland [Fri, 10 Dec 2021 15:14:06 +0000 (15:14 +0000)]
arm64: atomics: format whitespace consistently

The code for the atomic ops is formatted inconsistently, and while this
is not a functional problem it is rather distracting when working on
them.

Some have ops have consistent indentation, e.g.

| #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)                           \
| static inline int __lse_atomic_add_return##name(int i, atomic_t *v)     \
| {                                                                       \
|         u32 tmp;                                                        \
|                                                                         \
|         asm volatile(                                                   \
|         __LSE_PREAMBLE                                                  \
|         "       ldadd" #mb "    %w[i], %w[tmp], %[v]\n"                 \
|         "       add     %w[i], %w[i], %w[tmp]"                          \
|         : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)        \
|         : "r" (v)                                                       \
|         : cl);                                                          \
|                                                                         \
|         return i;                                                       \
| }

While others have negative indentation for some lines, and/or have
misaligned trailing backslashes, e.g.

| static inline void __lse_atomic_##op(int i, atomic_t *v)                        \
| {                                                                       \
|         asm volatile(                                                   \
|         __LSE_PREAMBLE                                                  \
| "       " #asm_op "     %w[i], %[v]\n"                                  \
|         : [i] "+r" (i), [v] "+Q" (v->counter)                           \
|         : "r" (v));                                                     \
| }

This patch makes the indentation consistent and also aligns the trailing
backslashes. This makes the code easier to read for those (like myself)
who are easily distracted by these inconsistencies.

This is intended as a cleanup.
There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agodrivers/perf: hisi: Add driver for HiSilicon PCIe PMU
Qi Liu [Thu, 2 Dec 2021 08:06:33 +0000 (16:06 +0800)]
drivers/perf: hisi: Add driver for HiSilicon PCIe PMU

PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported
to sample bandwidth, latency, buffer occupation etc.

Each PMU RCiEP device monitors multiple Root Ports, and each RCiEP is
registered as a PMU in /sys/bus/event_source/devices, so users can
select target PMU, and use filter to do further sets.

Filtering options contains:
event     - select the event.
port      - select target Root Ports. Information of Root Ports are
            shown under sysfs.
bdf       - select requester_id of target EP device.
trig_len  - set trigger condition for starting event statistics.
trig_mode - set trigger mode. 0 means starting to statistic when bigger
            than trigger condition, and 1 means smaller.
thr_len   - set threshold for statistics.
thr_mode  - set threshold mode. 0 means count when bigger than threshold,
            and 1 means smaller.

Acked-by: Krzysztof Wilczyński <kw@linux.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20211202080633.2919-3-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodocs: perf: Add description for HiSilicon PCIe PMU driver
Qi Liu [Thu, 2 Dec 2021 08:06:32 +0000 (16:06 +0800)]
docs: perf: Add description for HiSilicon PCIe PMU driver

PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported on
HiSilicon HIP09 platform. Document it to provide guidance on how to
use it.

Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Reviewed-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20211202080633.2919-2-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings
Bhaskara Budiredla [Mon, 15 Nov 2021 04:35:06 +0000 (10:05 +0530)]
dt-bindings: perf: Add YAML schemas for Marvell CN10K LLC-TAD pmu bindings

Add device tree bindings for Last-level-cache Tag-and-data
(LLC-TAD) unit PMU for Marvell CN10K SoCs.

Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211115043506.6679-3-bbudiredla@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodrivers: perf: Add LLC-TAD perf counter support
Bhaskara Budiredla [Mon, 15 Nov 2021 04:35:05 +0000 (10:05 +0530)]
drivers: perf: Add LLC-TAD perf counter support

This driver adds support for Last-level cache tag-and-data unit
(LLC-TAD) PMU that is featured in some of the Marvell's CN10K
infrastructure silicons.

The LLC is divided into 2N slices distributed across N Mesh tiles
in a single-socket configuration. The driver always configures the
same counter for all of the TADs. The user would end up effectively
reserving one of eight counters in every TAD to look across all TADs.
The occurrences of events are aggregated and presented to the user
at the end of an application run. The driver does not provide a way
for the user to partition TADs so that different TADs are used for
different applications.

The event counters are zeroed to start event counting to avoid any
rollover issues. TAD perf counters are 64-bit, so it's not currently
possible to overflow event counters at current mesh and core
frequencies.

To measure tad pmu events use perf tool stat command. For instance:

perf stat -e tad_dat_msh_in_dss,tad_req_msh_out_any <application>
perf stat -e tad_alloc_any,tad_hit_any,tad_tag_rd <application>

Signed-off-by: Bhaskara Budiredla <bbudiredla@marvell.com>
Link: https://lore.kernel.org/r/20211115043506.6679-2-bbudiredla@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64/xor: use EOR3 instructions when available
Ard Biesheuvel [Mon, 13 Dec 2021 14:02:52 +0000 (15:02 +0100)]
arm64/xor: use EOR3 instructions when available

Use the EOR3 instruction to implement xor_blocks() if the instruction is
available, which is the case if the CPU implements the SHA-3 extension.
This is about 20% faster on Apple M1 when using the 5-way version.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20211213140252.2856053-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoperf/smmuv3: Synthesize IIDR from CoreSight ID registers
Robin Murphy [Wed, 17 Nov 2021 14:48:45 +0000 (14:48 +0000)]
perf/smmuv3: Synthesize IIDR from CoreSight ID registers

The SMMU_PMCG_IIDR register was not present in older revisions of the
Arm SMMUv3 spec. On Arm Ltd. implementations, the IIDR value consists of
fields from several PIDR registers, allowing us to present a
standardized identifier to userspace.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/20211117144844.241072-4-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/smmuv3: Add devicetree support
Jean-Philippe Brucker [Wed, 17 Nov 2021 14:48:44 +0000 (14:48 +0000)]
perf/smmuv3: Add devicetree support

Add device-tree support to the SMMUv3 PMCG driver.

Signed-off-by: Jay Chen <jkchen@linux.alibaba.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211117144844.241072-3-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodt-bindings: Add Arm SMMUv3 PMCG binding
Jean-Philippe Brucker [Wed, 17 Nov 2021 14:48:43 +0000 (14:48 +0000)]
dt-bindings: Add Arm SMMUv3 PMCG binding

Add binding for the Arm SMMUv3 PMU. Each node represents a PMCG, and is
placed as a sibling node of the SMMU. Although the PMCGs registers may
be within the SMMU MMIO region, they are separate devices, and there can
be multiple PMCG devices for each SMMU (for example one for the TCU and
one for each TBU).

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211117144844.241072-2-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Add debugfs topology info
Robin Murphy [Fri, 3 Dec 2021 11:45:03 +0000 (11:45 +0000)]
perf/arm-cmn: Add debugfs topology info

In general, detailed performance analysis will require knoweldge of the
the SoC beyond the CMN itself - e.g. which actual CPUs/peripherals/etc.
are connected to each node. However for certain development and bringup
tasks it can be useful to have a quick overview of the CMN internal
topology to hand too. Add a debugfs file to map this out.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/159fd4d7e19fb3c8801a8cb64ee73ec50f55903c.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Add CI-700 Support
Robin Murphy [Fri, 3 Dec 2021 11:45:02 +0000 (11:45 +0000)]
perf/arm-cmn: Add CI-700 Support

Add the identifiers and events for the CI-700 coherent interconnect.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/28f566ab23a83733c6c9ef9414c010b760b4549c.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agodt-bindings: perf: arm-cmn: Add CI-700
Robin Murphy [Fri, 3 Dec 2021 11:45:01 +0000 (11:45 +0000)]
dt-bindings: perf: arm-cmn: Add CI-700

CI-700 is a new client-level coherent interconnect derived from
the enterprise-level CMN family, and shares the same PMU design.

CC: devicetree@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/5f0b372f808f1468e6d9500cedafbecd10254674.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Support new IP features
Robin Murphy [Fri, 3 Dec 2021 11:45:00 +0000 (11:45 +0000)]
perf/arm-cmn: Support new IP features

The second generation of CMN IPs add new node types and significantly
expand the configuration space with options for extra device ports on
edge XPs, either plumbed into the regular DTM or with extra dedicated
DTMs to monitor them, plus larger (and smaller) mesh sizes. Add basic
support for pulling this new information out of the hardware, piping
it around as necessary, and handling (most of) the new choices.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e58b495bcc7deec3882be4bac910ed0bf6979674.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Demarcate CMN-600 specifics
Robin Murphy [Fri, 3 Dec 2021 11:44:59 +0000 (11:44 +0000)]
perf/arm-cmn: Demarcate CMN-600 specifics

In preparation for supporting newer CMN products, let's introduce a
means to differentiate the features and events which are specific to a
particular IP from those which remain common to the whole family. The
newer designs have also smoothed off some of the rough edges in terms
of discoverability, so separate out the parts of the flow which have
effectively now become CMN-600 quirks.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9f6368cdca4c821d801138939508a5bba54ccabb.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Move group validation data off-stack
Robin Murphy [Fri, 3 Dec 2021 11:44:58 +0000 (11:44 +0000)]
perf/arm-cmn: Move group validation data off-stack

With the value of CMN_MAX_DTMS increasing significantly, our validation
data structure is set to get quite big. Technically we could pack it at
least twice as densely, since we only need around 19 bits of information
per DTM, but that makes the code even more mind-bogglingly impenetrable,
and even half of "quite big" may still be uncomfortably large for a
stack frame (~1KB). Just move it to an off-stack allocation instead.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/0cabff2e5839ddc0979e757c55515966f65359e4.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Optimise DTC counter accesses
Robin Murphy [Fri, 3 Dec 2021 11:44:57 +0000 (11:44 +0000)]
perf/arm-cmn: Optimise DTC counter accesses

In cases where we do know which DTC domain a node belongs to, we can
skip initialising or reading the global count in DTCs where we know
it won't change. The machinery to achieve that is mostly in place
already, so finish hooking it up by converting the vestigial domain
tracking to propagate suitable bitmaps all the way through to events.

Note that this does not allow allocating such an unused counter to a
different event on that DTC, because that is a flippin' nightmare.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/51d930fd945ef51c81f5889ccca055c302b0a1d0.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Optimise DTM counter reads
Robin Murphy [Fri, 3 Dec 2021 11:44:56 +0000 (11:44 +0000)]
perf/arm-cmn: Optimise DTM counter reads

When multiple nodes of the same type are connected to the same XP
(particularly in CAL configurations), it seems that they are likely
to be consecutive in logical ID. Therefore, we're likely to gain a
small benefit from an easy tweak to optimise out consecutive reads
of the same set of DTM counters for an aggregated event.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/7777d77c2df17693cd3dabb6e268906e15238d82.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Refactor DTM handling
Robin Murphy [Fri, 3 Dec 2021 11:44:55 +0000 (11:44 +0000)]
perf/arm-cmn: Refactor DTM handling

Untangle DTMs from XPs into a dedicated abstraction. This helps make
things a little more obvious and robust, but primarily paves the way
for further development where new IPs can grow extra DTMs per XP.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9cca18b1b98f482df7f1aaf3d3213e7f39500423.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Streamline node iteration
Robin Murphy [Fri, 3 Dec 2021 11:44:54 +0000 (11:44 +0000)]
perf/arm-cmn: Streamline node iteration

Refactor the places where we scan through the set of nodes to switch
from explicit array indexing to pointer-based iteration. This leads to
slightly simpler object code, but also makes the source less dense and
more pleasant for further development. It also unearths an almost-bug
in arm_cmn_event_init() where we've been depending on the "array index"
of NULL relative to cmn->dns being a sufficiently large number, yuck.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/ee0c9eda9a643f46001ac43aadf3f0b1fd5660dd.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Refactor node ID handling
Robin Murphy [Fri, 3 Dec 2021 11:44:53 +0000 (11:44 +0000)]
perf/arm-cmn: Refactor node ID handling

Add a bit more abstraction for the places where we decompose node IDs.
This will help keep things nice and manageable when we come to add yet
more variables which affect the node ID format. Also use the opportunity
to move the rest of the low-level node management helpers back up to the
logical place they were meant to be - how they ended up buried right in
the middle of the event-related definitions is somewhat of a mystery...

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/a2242a8c3c96056c13a04ae87bf2047e5e64d2d9.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Drop compile-test restriction
Robin Murphy [Fri, 3 Dec 2021 11:44:52 +0000 (11:44 +0000)]
perf/arm-cmn: Drop compile-test restriction

Although CMN is currently (and overwhelmingly likely to remain) deployed
in arm64-only (modulo userspace) systems, the 64-bit "dependency" for
compile-testing was just laziness due to heavy reliance on readq/writeq
accessors. Since we only need one extra include for robustness in that
regard, let's pull that in, widen the compile-test coverage, and fix up
the smattering of type laziness that that brings to light.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/baee9ee0d0bdad8aaeb70f5a4b98d8fd4b1f5786.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Account for NUMA affinity
Robin Murphy [Fri, 3 Dec 2021 11:44:51 +0000 (11:44 +0000)]
perf/arm-cmn: Account for NUMA affinity

On a system with multiple CMN meshes, ideally we'd want to access each
PMU from within its own mesh, rather than with a long CML round-trip,
wherever feasible. Since such a system is likely to be presented as
multiple NUMA nodes, let's also hope a proximity domain is specified
for each CMN programming interface, and use that to guide our choice
of IRQ affinity to favour a node-local CPU where possible.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/32438b0d016e0649d882d47d30ac2000484287b9.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf/arm-cmn: Fix CPU hotplug unregistration
Robin Murphy [Fri, 3 Dec 2021 11:44:50 +0000 (11:44 +0000)]
perf/arm-cmn: Fix CPU hotplug unregistration

Attempting to migrate the PMU context after we've unregistered the PMU
device, or especially if we never successfully registered it in the
first place, is a woefully bad idea. It's also fundamentally pointless
anyway. Make sure to unregister an instance from the hotplug handler
*without* invoking the teardown callback.

Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/2c221d745544774e4b07583b65b5d4d94f7e0fe4.1638530442.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoDocumentation: arm64: Document PMU counters access from userspace
Raphael Gault [Wed, 8 Dec 2021 20:11:24 +0000 (14:11 -0600)]
Documentation: arm64: Document PMU counters access from userspace

Add documentation to describe the access to the pmu hardware counters from
userspace.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-6-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: perf: Enable PMU counter userspace access for perf event
Rob Herring [Wed, 8 Dec 2021 20:11:23 +0000 (14:11 -0600)]
arm64: perf: Enable PMU counter userspace access for perf event

Arm PMUs can support direct userspace access of counters which allows for
low overhead (i.e. no syscall) self-monitoring of tasks. The same feature
exists on x86 called 'rdpmc'. Unlike x86, userspace access will only be
enabled for thread bound events. This could be extended if needed, but
simplifies the implementation and reduces the chances for any
information leaks (which the x86 implementation suffers from).

PMU EL0 access will be enabled when an event with userspace access is
part of the thread's context. This includes when the event is not
scheduled on the PMU. There's some additional overhead clearing
dirty counters when access is enabled in order to prevent leaking
disabled counter data from other tasks.

Unlike x86, enabling of userspace access must be requested with a new
attr bit: config1:1. If the user requests userspace access with 64-bit
counters, then the event open will fail if the h/w doesn't support
64-bit counters. Chaining is not supported with userspace access. The
modes for config1 are as follows:

config1 = 0 : user access disabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 2 : user access enabled and always 32-bit
config1 = 3 : user access enabled and always 64-bit

Based on work by Raphael Gault <raphael.gault@arm.com>, but has been
completely re-written.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-5-robh@kernel.org
[will: Made armv8pmu_proc_user_access_handler() static]
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: perf: Add userspace counter access disable switch
Rob Herring [Wed, 8 Dec 2021 20:11:22 +0000 (14:11 -0600)]
arm64: perf: Add userspace counter access disable switch

Like x86, some users may want to disable userspace PMU counter
altogether. Add a sysctl 'perf_user_access' file to control userspace
counter access. The default is '0' which is disabled. Writing '1'
enables access.

Note that x86 supports globally enabling user access by writing '2' to
/sys/bus/event_source/devices/cpu/rdpmc. As there's not existing
userspace support to worry about, this shouldn't be necessary for Arm.
It could be added later if the need arises.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-perf-users@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-4-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoperf: Add a counter for number of user access events in context
Rob Herring [Wed, 8 Dec 2021 20:11:21 +0000 (14:11 -0600)]
perf: Add a counter for number of user access events in context

On arm64, user space counter access will be controlled differently
compared to x86. On x86, access in the strictest mode is enabled for all
tasks in an MM when any event is mmap'ed. For arm64, access is
explicitly requested for an event and only enabled when the event's
context is active. This avoids hooks into the arch context switch code
and gives better control of when access is enabled.

In order to configure user space access when the PMU is enabled, it is
necessary to know if any event (currently active or not) in the current
context has user space accessed enabled. Add a counter similar to other
counters in the context to avoid walking the event list every time.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-3-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agox86: perf: Move RDPMC event flag to a common definition
Rob Herring [Wed, 8 Dec 2021 20:11:20 +0000 (14:11 -0600)]
x86: perf: Move RDPMC event flag to a common definition

In preparation to enable user counter access on arm64 and to move some
of the user access handling to perf core, create a common event flag for
user counter access and convert x86 to use it.

Since the architecture specific flags start at the LSB, starting at the
MSB for common flags.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-perf-users@vger.kernel.org
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly [Fri, 10 Dec 2021 16:54:32 +0000 (16:54 +0000)]
arm64: cpufeature: add HWCAP for FEAT_RPRES

Add a new HWCAP to detect the Increased precision of Reciprocal Estimate
and Reciprocal Square Root Estimate feature (FEAT_RPRES), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64ISAR2_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-4-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: add ID_AA64ISAR2_EL1 sys register
Joey Gouly [Fri, 10 Dec 2021 16:54:31 +0000 (16:54 +0000)]
arm64: add ID_AA64ISAR2_EL1 sys register

This is a new ID register, introduced in 8.7.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Reiji Watanabe <reijiw@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-3-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly [Fri, 10 Dec 2021 16:54:30 +0000 (16:54 +0000)]
arm64: cpufeature: add HWCAP for FEAT_AFP

Add a new HWCAP to detect the Alternate Floating-point Behaviour
feature (FEAT_AFP), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64MMFR1_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-2-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: mm: log potential KASAN shadow alias
Mark Rutland [Tue, 7 Dec 2021 18:32:26 +0000 (18:32 +0000)]
arm64: mm: log potential KASAN shadow alias

When the kernel is built with KASAN_GENERIC or KASAN_SW_TAGS, shadow
memory is allocated and mapped for all legitimate kernel addresses, and
prior to a regular memory access instrumentation will read from the
corresponding shadow address.

Due to the way memory addresses are converted to shadow addresses, bogus
pointers (e.g. NULL) can generate shadow addresses out of the bounds of
allocated shadow memory. For example, with KASAN_GENERIC and 48-bit VAs,
NULL would have a shadow address of dfff800000000000, which falls
between the TTBR ranges.

To make such cases easier to debug, this patch makes die_kernel_fault()
dump the real memory address range for any potential KASAN shadow access
using kasan_non_canonical_hook(), which results in fault information as
below when KASAN is enabled:

| Unable to handle kernel paging request at virtual address dfff800000000017
| KASAN: null-ptr-deref in range [0x00000000000000b8-0x00000000000000bf]
| Mem abort info:
|   ESR = 0x96000004
|   EC = 0x25: DABT (current EL), IL = 32 bits
|   SET = 0, FnV = 0
|   EA = 0, S1PTW = 0
|   FSC = 0x04: level 0 translation fault
| Data abort info:
|   ISV = 0, ISS = 0x00000004
|   CM = 0, WnR = 0
| [dfff800000000017] address between user and kernel address ranges

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: mm: use die_kernel_fault() in do_mem_abort()
Mark Rutland [Tue, 7 Dec 2021 18:32:25 +0000 (18:32 +0000)]
arm64: mm: use die_kernel_fault() in do_mem_abort()

If we take an unhandled fault from EL1, either:

a) The xFSC handler calls die_kernel_fault() directly. In this case,
   die_kernel_fault() calls:

   pr_alert(..., msg, addr);
   mem_abort_decode(esr);
   show_pte(addr);
   die();
   bust_spinlocks(0);
   do_exit(SIGKILL);

b) The xFSC handler returns to do_mem_abort(), indicating failure. In
   this case, do_mem_abort() calls:

   pr_alert(..., addr);
   mem_abort_decode(esr);
   show_pte(addr);
   arm64_notify_die() {
     die();
   }

This inconstency is unfortunatem, and in theory in case (b) registered
notifiers can prevent us from terminating the faulting thread by
returning NOTIFY_STOP, whereupon we'll end up returning from the fault,
replaying, and almost certainly get stuck in a livelock spewing errors
into dmesg. We don't expect notifers to fix things up, since we dump
state to dmesg before invoking them, so it would be more sensible to
consistently terminate the thread in this case.

This patch has do_mem_abort() call die_kernel_fault() for unhandled
faults taken from EL1. Where we would previously have logged a messafe
of the form:

| Unhandled fault at ${ADDR}

... we will now log a message of the form:

| Unable to handle kernel ${FAULT_NAME} at virtual address ${ADDR}

... and we will consistently terminate the thread from which the fault
was taken.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Tested-by: Andrey Konovalov <andreyknvl@gmail.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211207183226.834557-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: mm: Use asid feature macro for cheanup
Yunfeng Ye [Thu, 9 Dec 2021 01:46:03 +0000 (09:46 +0800)]
arm64: mm: Use asid feature macro for cheanup

The commit 95b54c3e4c92 ("KVM: arm64: Add feature register flag
definitions") introduce the ID_AA64MMFR0_ASID_8 and ID_AA64MMFR0_ASID_16
macros.

We can use these macros for cheanup in get_cpu_asid_bits().

No functional change.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/f71c75d3-735e-b32a-8414-b3e513c77240@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: mm: Rename asid2idx() to ctxid2asid()
Yunfeng Ye [Thu, 9 Dec 2021 01:42:25 +0000 (09:42 +0800)]
arm64: mm: Rename asid2idx() to ctxid2asid()

The commit 0c8ea531b774 ("arm64: mm: Allocate ASIDs in pairs") introduce
the asid2idx and idx2asid macro, but these macros are not really useful
after the commit f88f42f853a8 ("arm64: context: Free up kernel ASIDs if
KPTI is not in use").

The code "(asid & ~ASID_MASK)" can be instead by a macro, which is the
same code with asid2idx(). So rename it to ctxid2asid() for a better
understanding.

Also we add asid2ctxid() macro, the contextid can be generated based on
the asid and generation through this macro.

Signed-off-by: Yunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/c31516eb-6d15-94e0-421c-305fc010ea79@huawei.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make some stacktrace functions private
Mark Rutland [Mon, 29 Nov 2021 14:28:49 +0000 (14:28 +0000)]
arm64: Make some stacktrace functions private

Now that open-coded stack unwinds have been converted to
arch_stack_walk(), we no longer need to expose any of unwind_frame(),
walk_stackframe(), or start_backtrace() outside of stacktrace.c.

Make those functions private to stacktrace.c, removing their prototypes
from <asm/stacktrace.h> and marking them static.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make dump_backtrace() use arch_stack_walk()
Madhavan T. Venkataraman [Mon, 29 Nov 2021 14:28:48 +0000 (14:28 +0000)]
arm64: Make dump_backtrace() use arch_stack_walk()

To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic.

Currently, dump_backtrace() walks the stack of the current task or a
blocked task by calling stact_backtrace() and iterating unwind steps
using unwind_frame(). This can be written more simply in terms of
arch_stack_walk(), considering three distinct cases:

1) When unwinding a blocked task, start_backtrace() is called with the
   blocked task's saved PC and FP, and the unwind proceeds immediately
   from this point without skipping any entries. This is functionally
   equivalent to calling arch_stack_walk() with the blocked task, which
   will start with the task's saved PC and FP.

   There is no functional change to this case.

2) When unwinding the current task without regs, start_backtrace() is
   called with dump_backtrace() as the PC and __builtin_frame_address(0)
   as the next frame, and the unwind proceeds immediately without
   skipping. This is *almost* functionally equivalent to calling
   arch_stack_walk() for the current task, which will start with its
   caller (i.e. an offset into dump_backtrace()) as the PC, and the
   callers frame record as the next frame.

   The only difference being that dump_backtrace() will be reported with
   an offset (which is strictly more correct than currently). Otherwise
   there is no functional cahnge to this case.

3) When unwinding the current task with regs, start_backtrace() is
   called with dump_backtrace() as the PC and __builtin_frame_address(0)
   as the next frame, and the unwind is performed silently until the
   next frame is the frame pointed to by regs->fp. Reporting starts
   from regs->pc and continues from the frame in regs->fp.

   Historically, this pre-unwind was necessary to correctly record
   return addresses rewritten by the ftrace graph calller, but this is
   no longer necessary as these are now recovered using the FP since
   commit:

   c6d3cd32fd0064af ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR")

   This pre-unwind is not necessary to recover return addresses
   rewritten by kretprobes, which historically were not recovered, and
   are now recovered using the FP since commit:

   cd9bc2c9258816dc ("arm64: Recover kretprobe modified return address in stacktrace")

   Thus, this is functionally equivalent to calling arch_stack_walk()
   with the current task and regs, which will start with regs->pc as the
   PC and regs->fp as the next frame, without a pre-unwind.

This patch makes dump_backtrace() use arch_stack_walk(). This simplifies
dump_backtrace() and will permit subsequent changes to the unwind code.

Aside from the improved reporting when unwinding current without regs,
there should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
[Mark: elaborate commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make profile_pc() use arch_stack_walk()
Madhavan T. Venkataraman [Mon, 29 Nov 2021 14:28:47 +0000 (14:28 +0000)]
arm64: Make profile_pc() use arch_stack_walk()

To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently profile_pc() walks the stack of an interrupted context by
calling start_backtrace() with the context's PC and FP, and iterating
unwind steps using walk_stackframe(). This is functionally equivalent to
calling arch_stack_walk() with the interrupted context's pt_regs, which
will start with the PC and FP from the regs.

Make profile_pc() use arch_stack_walk(). This simplifies profile_pc(),
and in future will alow us to make walk_stackframe() private to
stacktrace.c.

At the same time, we remove the early return for when regs->pc is not in
lock functions, as this will be handled by the first call to the
profile_pc_cb() callback.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: remove early return, elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make return_address() use arch_stack_walk()
Madhavan T. Venkataraman [Mon, 29 Nov 2021 14:28:46 +0000 (14:28 +0000)]
arm64: Make return_address() use arch_stack_walk()

To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently return_address() walks the stack of the current task by
calling start_backtrace() with return_address as the PC and the frame
pointer of return_address() as the next frame, iterating unwind steps
using walk_stackframe(). This is functionally equivalent to calling
arch_stack_walk() for the current stack, which will start from its
caller (i.e. return_address()) as the PC and it's caller's frame record
as the next frame.

Make return_address() use arch_stackwalk(). This simplifies
return_address(), and in future will alow us to make walk_stackframe()
private to stacktrace.c.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make __get_wchan() use arch_stack_walk()
Madhavan T. Venkataraman [Mon, 29 Nov 2021 14:28:45 +0000 (14:28 +0000)]
arm64: Make __get_wchan() use arch_stack_walk()

To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently, __get_wchan() walks the stack of a blocked task by calling
start_backtrace() with the task's saved PC and FP values, and iterating
unwind steps using unwind_frame(). The initialization is functionally
equivalent to calling arch_stack_walk() with the blocked task, which
will start with the task's saved PC and FP values.

Currently __get_wchan() always performs an initial unwind step, which
will stkip __switch_to(), but as this is now marked as a __sched
function, this no longer needs special handling and will be skipped in
the same way as other sched functions.

Make __get_wchan() use arch_stack_walk(). This simplifies __get_wchan(),
and in future will alow us to make unwind_frame() private to
stacktrace.c. At the same time, we can simplify the try_get_task_stack()
check and avoid the unnecessary `stack_page` variable.

The change to the skipping logic means we may terminate one frame
earlier than previously where there are an excessive number of sched
functions in the trace, but this isn't seen in practice, and wchan is
best-effort anyway, so this should not be a problem.

Other than the above, there should be no functional change as a result
of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
[Mark: rebase atop wchan changes, elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Make perf_callchain_kernel() use arch_stack_walk()
Madhavan T. Venkataraman [Mon, 29 Nov 2021 14:28:44 +0000 (14:28 +0000)]
arm64: Make perf_callchain_kernel() use arch_stack_walk()

To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently perf_callchain_kernel() walks the stack of an interrupted
context by calling start_backtrace() with the context's PC and FP, and
iterating unwind steps using walk_stackframe(). This is functionally
equivalent to calling arch_stack_walk() with the interrupted context's
pt_regs, which will start with the PC and FP from the regs.

Make perf_callchain_kernel() use arch_stack_walk(). This simplifies
perf_callchain_kernel(), and in future will alow us to make
walk_stackframe() private to stacktrace.c.

At the same time, we update the callchain_trace() callback to check the
return value of perf_callchain_store(), which indicates whether there is
space for any further entries. When a non-zero value is returned,
further calls will be ignored, and are redundant, so we can stop the
unwind at this point.

We also remove the stale and confusing comment for callchain_trace.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: elaborate commit message, remove comment, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Mark __switch_to() as __sched
Mark Rutland [Mon, 29 Nov 2021 14:28:43 +0000 (14:28 +0000)]
arm64: Mark __switch_to() as __sched

Unlike most architectures (and only in keeping with powerpc), arm64 has
a non __sched() function on the path to our cpu_switch_to() assembly
function.

It is expected that for a blocked task, in_sched_functions() can be used
to skip all functions between the raw context switch assembly and the
scheduler functions that call into __switch_to(). This is the behaviour
expected by stack_trace_consume_entry_nosched(), and the behaviour we'd
like to have such that we an simplify arm64's __get_wchan()
implementation to use arch_stack_walk().

This patch mark's arm64's __switch_to as __sched. This *will not* change
the behaviour of arm64's current __get_wchan() implementation, which
always performs an initial unwind step which skips __switch_to(). This
*will* change the behaviour of stack_trace_consume_entry_nosched() and
stack_trace_save_tsk() to match their expected behaviour on blocked
tasks, skipping all scheduler-internal functions including
__switch_to().

Other than the above, there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Add comment for stack_info::kr_cur
Mark Rutland [Mon, 29 Nov 2021 14:28:42 +0000 (14:28 +0000)]
arm64: Add comment for stack_info::kr_cur

We added stack_info::kr_cur in commit:

  cd9bc2c9258816dc ("arm64: Recover kretprobe modified return address in stacktrace")

... but didn't add anything in the corresponding comment block.

For consistency, add a corresponding comment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviwed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarch: Make ARCH_STACKWALK independent of STACKTRACE
Peter Zijlstra [Mon, 29 Nov 2021 14:28:41 +0000 (14:28 +0000)]
arch: Make ARCH_STACKWALK independent of STACKTRACE

Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: kexec: reduce calls to page_address()
Rongwei Wang [Thu, 25 Nov 2021 17:06:00 +0000 (01:06 +0800)]
arm64: kexec: reduce calls to page_address()

In kexec_page_alloc(), page_address() is called twice.
This patch add a new variable to help to reduce calls
to page_address().

Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211125170600.1608-3-rongwei.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1
Reiji Watanabe [Mon, 6 Dec 2021 00:47:36 +0000 (16:47 -0800)]
arm64: mte: DC {GVA,GZVA} shouldn't be used when DCZID_EL0.DZP == 1

Currently, mte_set_mem_tag_range() and mte_zero_clear_page_tags() use
DC {GVA,GZVA} unconditionally.  But, they should make sure that
DCZID_EL0.DZP, which indicates whether or not use of those instructions
is prohibited, is zero when using those instructions.
Use ST{G,ZG,Z2G} instead when DCZID_EL0.DZP == 1.

Fixes: 013bb59dbb7c ("arm64: mte: handle tags zeroing at page allocation time")
Fixes: 3d0cca0b02ac ("kasan: speed up mte_set_mem_tag_range")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Link: https://lore.kernel.org/r/20211206004736.1520989-3-reijiw@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1
Reiji Watanabe [Mon, 6 Dec 2021 00:47:35 +0000 (16:47 -0800)]
arm64: clear_page() shouldn't use DC ZVA when DCZID_EL0.DZP == 1

Currently, clear_page() uses DC ZVA instruction unconditionally.  But it
should make sure that DCZID_EL0.DZP, which indicates whether or not use
of DC ZVA instruction is prohibited, is zero when using the instruction.
Use STNP instead when DCZID_EL0.DZP == 1.

Fixes: f27bb139c387 ("arm64: Miscellaneous library functions")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/20211206004736.1520989-2-reijiw@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: extable: remove unused ex_handler_t definition
Jisheng Zhang [Fri, 19 Nov 2021 04:46:08 +0000 (12:46 +0800)]
arm64: extable: remove unused ex_handler_t definition

The ex_handler_t type was introduced in commit d6e2cc564775 ("arm64:
extable: add `type` and `data` fields"), but has never been used, and
is unnecessary. Remove it.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211119124608.3f03380b@xhacker
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: entry: Use SDEI event constants
Florian Fainelli [Thu, 18 Nov 2021 20:18:10 +0000 (12:18 -0800)]
arm64: entry: Use SDEI event constants

Use SDEI_EV_FAILED instead of open coding the 1 to make it clearer how
SDEI_EVENT_COMPLETE vs. SDEI_EVENT_COMPLETE_AND_RESUME is selected.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211118201811.2974922-1-f.fainelli@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: Simplify checking for populated DT
Rob Herring [Fri, 29 Oct 2021 14:40:55 +0000 (09:40 -0500)]
arm64: Simplify checking for populated DT

Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework dt_scan_depth1_nodes to
use libfdt calls directly, and rename it to dt_is_stub() to reflect
exactly what it checking.

Cc: Will Deacon <will@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211029144055.2365814-1-robh@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c
Mark Brown [Mon, 25 Oct 2021 16:32:32 +0000 (17:32 +0100)]
arm64/kvm: Fix bitrotted comment for SVE handling in handle_exit.c

The comment on the SVE trap handler in handle_exit.c says that it is a
placeholder until we support SVE in guests which we now do for both VHE
and nVHE cases so we really shouldn't get here in any sort of standard
case. Update the comment to be less immediately incorrect, the handling
of such a situation is correct.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211025163232.3502052-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
3 years agoarm64: ftrace: add missing BTIs
Mark Rutland [Mon, 29 Nov 2021 13:57:09 +0000 (13:57 +0000)]
arm64: ftrace: add missing BTIs

When branch target identifiers are in use, code reachable via an
indirect branch requires a BTI landing pad at the branch target site.

When building FTRACE_WITH_REGS atop patchable-function-entry, we miss
BTIs at the start start of the `ftrace_caller` and `ftrace_regs_caller`
trampolines, and when these are called from a module via a PLT (which
will use a `BR X16`), we will encounter a BTI failure, e.g.

| # insmod lkdtm.ko
| lkdtm: No crash points registered, enable through debugfs
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # cat /sys/kernel/debug/provoke-crash/DIRECT
| Unhandled 64-bit el1h sync exception on CPU0, ESR 0x34000001 -- BTI
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400405 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=jc)
| pc : ftrace_caller+0x0/0x3c
| lr : lkdtm_debugfs_open+0xc/0x20 [lkdtm]
| sp : ffff800012e43b00
| x29: ffff800012e43b00 x28: 0000000000000000 x27: ffff800012e43c88
| x26: 0000000000000000 x25: 0000000000000000 x24: ffff0000c171f200
| x23: ffff0000c27b1e00 x22: ffff0000c2265240 x21: ffff0000c23c8c30
| x20: ffff8000090ba380 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: ffff80001002bb4c x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000900ff0
| x11: ffff0000c4166310 x10: ffff800012e43b00 x9 : ffff8000104f2384
| x8 : 0000000000000001 x7 : 0000000000000000 x6 : 000000000000003f
| x5 : 0000000000000040 x4 : ffff800012e43af0 x3 : 0000000000000001
| x2 : ffff8000090b0000 x1 : ffff0000c171f200 x0 : ffff0000c23c8c30
| Kernel panic - not syncing: Unhandled exception
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1a4
|  show_stack+0x24/0x30
|  dump_stack_lvl+0x68/0x84
|  dump_stack+0x1c/0x38
|  panic+0x168/0x360
|  arm64_exit_nmi.isra.0+0x0/0x80
|  el1h_64_sync_handler+0x68/0xd4
|  el1h_64_sync+0x78/0x7c
|  ftrace_caller+0x0/0x3c
|  do_dentry_open+0x134/0x3b0
|  vfs_open+0x38/0x44
|  path_openat+0x89c/0xe40
|  do_filp_open+0x8c/0x13c
|  do_sys_openat2+0xbc/0x174
|  __arm64_sys_openat+0x6c/0xbc
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0xdc/0x100
|  do_el0_svc+0x84/0xa0
|  el0_svc+0x28/0x80
|  el0t_64_sync_handler+0xa8/0x130
|  el0t_64_sync+0x1a0/0x1a4
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0,00000f42,da660c5f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---

Fix this by adding the required `BTI C`, as we only require these to be
reachable via BL for direct calls or BR X16/X17 for PLTs. For now, these
are open-coded in the function prologue, matching the style of the
`__hwasan_tag_mismatch` trampoline.

In future we may wish to consider adding a new SYM_CODE_START_*()
variant which has an implicit BTI.

When ftrace is built atop mcount, the trampolines are marked with
SYM_FUNC_START(), and so get an implicit BTI. We may need to change
these over to SYM_CODE_START() in future for RELIABLE_STACKTRACE, in
case we need to apply special care aroud the return address being
rewritten.

Fixes: 97fed779f2a6 ("arm64: bti: Provide Kconfig for kernel mode BTI")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129135709.2274019-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: kexec: use __pa_symbol(empty_zero_page)
Mark Rutland [Tue, 30 Nov 2021 12:18:49 +0000 (12:18 +0000)]
arm64: kexec: use __pa_symbol(empty_zero_page)

In machine_kexec_post_load() we use __pa() on `empty_zero_page`, so that
we can use the physical address during arm64_relocate_new_kernel() to
switch TTBR1 to a new set of tables. While `empty_zero_page` is part of
the old kernel, we won't clobber it until after this switch, so using it
is benign.

However, `empty_zero_page` is part of the kernel image rather than a
linear map address, so it is not correct to use __pa(x), and we should
instead use __pa_symbol(x) or __pa(lm_alias(x)). Otherwise, when the
kernel is built with DEBUG_VIRTUAL, we'll encounter splats as below, as
I've seen when fuzzing v5.16-rc3 with Syzkaller:

| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 000000008492561a (empty_zero_page+0x0/0x1000)
| WARNING: CPU: 3 PID: 11492 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| CPU: 3 PID: 11492 Comm: syz-executor.0 Not tainted 5.16.0-rc3-00001-g48bd452a045c #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| lr : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| sp : ffff80001af17bb0
| x29: ffff80001af17bb0 x28: ffff1cc65207b400 x27: ffffb7828730b120
| x26: 0000000000000e11 x25: 0000000000000000 x24: 0000000000000001
| x23: ffffb7828963e000 x22: ffffb78289644000 x21: 0000600000000000
| x20: 000000000000002d x19: 0000b78289644000 x18: 0000000000000000
| x17: 74706d6528206131 x16: 3635323934383030 x15: 303030303030203a
| x14: 1ffff000035e2eb8 x13: ffff6398d53f4f0f x12: 1fffe398d53f4f0e
| x11: 1fffe398d53f4f0e x10: ffff6398d53f4f0e x9 : ffffb7827c6f76dc
| x8 : ffff1cc6a9fa7877 x7 : 0000000000000001 x6 : ffff6398d53f4f0f
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1cc66f2a99c0
| x2 : 0000000000040000 x1 : d7ce7775b09b5d00 x0 : 0000000000000000
| Call trace:
|  __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
|  machine_kexec_post_load+0x284/0x670 arch/arm64/kernel/machine_kexec.c:150
|  do_kexec_load+0x570/0x670 kernel/kexec.c:155
|  __do_sys_kexec_load kernel/kexec.c:250 [inline]
|  __se_sys_kexec_load kernel/kexec.c:231 [inline]
|  __arm64_sys_kexec_load+0x1d8/0x268 kernel/kexec.c:231
|  __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
|  invoke_syscall+0x90/0x2e0 arch/arm64/kernel/syscall.c:52
|  el0_svc_common.constprop.2+0x1e4/0x2f8 arch/arm64/kernel/syscall.c:142
|  do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:181
|  el0_svc+0x60/0x248 arch/arm64/kernel/entry-common.c:603
|  el0t_64_sync_handler+0x90/0xb8 arch/arm64/kernel/entry-common.c:621
|  el0t_64_sync+0x180/0x184 arch/arm64/kernel/entry.S:572
| irq event stamp: 2428
| hardirqs last  enabled at (2427): [<ffffb7827c6f2308>] __up_console_sem+0xf0/0x118 kernel/printk/printk.c:255
| hardirqs last disabled at (2428): [<ffffb7828223df98>] el1_dbg+0x28/0x80 arch/arm64/kernel/entry-common.c:375
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] softirq_handle_end kernel/softirq.c:401 [inline]
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] __do_softirq+0xa28/0x11e4 kernel/softirq.c:587
| softirqs last disabled at (2417): [<ffffb7827c59015c>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] invoke_softirq kernel/softirq.c:439 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] __irq_exit_rcu kernel/softirq.c:636 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] irq_exit_rcu+0x53c/0x688 kernel/softirq.c:648
| ---[ end trace 0ca578534e7ca938 ]---

With or without DEBUG_VIRTUAL __pa() will fall back to __kimg_to_phys()
for non-linear addresses, and will happen to do the right thing in this
case, even with the warning. But we should not depend upon this, and to
keep the warning useful we should fix this case.

Fix this issue by using __pa_symbol(), which handles kernel image
addresses (and checks its input is a kernel image address). This matches
what we do elsewhere, e.g. in arch/arm64/include/asm/pgtable.h:

| #define ZERO_PAGE(vaddr)       phys_to_page(__pa_symbol(empty_zero_page))

Fixes: 3744b5280e67 ("arm64: kexec: install a copy of the linear-map")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20211130121849.3319010-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoarm64: update PAC description for kernel
Kuan-Ying Lee [Wed, 1 Dec 2021 03:40:10 +0000 (11:40 +0800)]
arm64: update PAC description for kernel

Remove the paragraph which has nothing to do with the kernel and
add PAC description related to kernel.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Link: https://lore.kernel.org/r/20211201034014.20048-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Will Deacon <will@kernel.org>
3 years agoLinux 5.16-rc3
Linus Torvalds [Sun, 28 Nov 2021 22:09:19 +0000 (14:09 -0800)]
Linux 5.16-rc3

3 years agoMerge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Linus Torvalds [Sun, 28 Nov 2021 19:58:52 +0000 (11:58 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
 "Misc fixes all over the place.

  Revert of virtio used length validation series: the approach taken
  does not seem to work, breaking too many guests in the process. We'll
  need to do length validation using some other approach"

[ This merge also ends up reverting commit f7a36b03a732 ("vsock/virtio:
  suppress used length validation"), which came in through the
  networking tree in the meantime, and was part of that whole used
  length validation series   - Linus ]

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa_sim: avoid putting an uninitialized iova_domain
  vhost-vdpa: clean irqs before reseting vdpa device
  virtio-blk: modify the value type of num in virtio_queue_rq()
  vhost/vsock: cleanup removing `len` variable
  vhost/vsock: fix incorrect used length reported to the guest
  Revert "virtio_ring: validate used buffer length"
  Revert "virtio-net: don't let virtio core to validate used length"
  Revert "virtio-blk: don't let virtio core to validate used length"
  Revert "virtio-scsi: don't let virtio core to validate used buffer length"

3 years agoMerge tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Nov 2021 17:24:50 +0000 (09:24 -0800)]
Merge tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 build fix from Thomas Gleixner:
 "A single fix for a missing __init annotation of prepare_command_line()"

* tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Mark prepare_command_line() __init

3 years agoMerge tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Nov 2021 17:15:34 +0000 (09:15 -0800)]
Merge tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "A single scheduler fix to ensure that there is no stale KASAN shadow
  state left on the idle task's stack when a CPU is brought up after it
  was brought down before"

* tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/scs: Reset task stack state in bringup_cpu()

3 years agoMerge tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 28 Nov 2021 17:10:54 +0000 (09:10 -0800)]
Merge tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf fix from Thomas Gleixner:
 "A single fix for perf to prevent it from sending SIGTRAP to another
  task from a trace point event as it's not possible to deliver a
  synchronous signal to a different task from there"

* tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Ignore sigtrap for tracepoints destined for other tasks

3 years agoMerge tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 28 Nov 2021 17:04:41 +0000 (09:04 -0800)]
Merge tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking fixes from Thomas Gleixner:
 "Two regression fixes for reader writer semaphores:

   - Plug a race in the lock handoff which is caused by inconsistency of
     the reader and writer path and can lead to corruption of the
     underlying counter.

   - down_read_trylock() is suboptimal when the lock is contended and
     multiple readers trylock concurrently. That's due to the initial
     value being read non-atomically which results in at least two
     compare exchange loops. Making the initial readout atomic reduces
     this significantly. Whith 40 readers by 11% in a benchmark which
     enforces contention on mmap_sem"

* tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Optimize down_read_trylock() under highly contended case
  locking/rwsem: Make handoff bit handling more consistent