Giovanni Cabiddu [Thu, 14 Sep 2023 09:55:45 +0000 (10:55 +0100)]
crypto: qat - fix state machines cleanup paths
Commit 1bdc85550a2b ("crypto: qat - fix concurrency issue when device
state changes") introduced the function adf_dev_down() which wraps the
functions adf_dev_stop() and adf_dev_shutdown().
In a subsequent change, the sequence adf_dev_stop() followed by
adf_dev_shutdown() was then replaced across the driver with just a call
to the function adf_dev_down().
The functions adf_dev_stop() and adf_dev_shutdown() are called in error
paths to stop the accelerator and free up resources and can be called
even if the counterparts adf_dev_init() and adf_dev_start() did not
complete successfully.
However, the implementation of adf_dev_down() prevents the stop/shutdown
sequence if the device is found already down.
For example, if adf_dev_init() fails, the device status is not set as
started and therefore a call to adf_dev_down() won't be calling
adf_dev_shutdown() to undo what adf_dev_init() did.
Do not check if a device is started in adf_dev_down() but do the
equivalent check in adf_sysfs.c when handling a DEV_DOWN command from
the user.
Fixes: 2b60f79c7b81 ("crypto: qat - replace state machine calls") Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Adam Guerin <adam.guerin@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Yang Shen [Thu, 14 Sep 2023 09:09:08 +0000 (17:09 +0800)]
crypto: hisilicon/zip - remove zlib and gzip
Remove the support of zlib-deflate and gzip.
Signed-off-by: Yang Shen <shenyang39@huawei.com> Reviewed-by: Longfang Liu <liulongfang@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Thu, 14 Sep 2023 08:28:27 +0000 (16:28 +0800)]
crypto: ecb - Convert from skcipher to lskcipher
This patch adds two different implementations of ECB. First of
all an lskcipher wrapper around existing ciphers is introduced as
a temporary transition aid.
Secondly a permanent lskcipher template is also added. It's simply
a wrapper around the underlying lskcipher algorithm.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Thu, 14 Sep 2023 08:28:25 +0000 (16:28 +0800)]
crypto: lskcipher - Add compatibility wrapper around ECB
As an aid to the transition from cipher algorithm implementations
to lskcipher, add a temporary wrapper when creating simple lskcipher
templates by using ecb(X) instead of X if an lskcipher implementation
of X cannot be found.
This can be reverted once all cipher implementations have switched
over to lskcipher.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Thu, 14 Sep 2023 08:28:24 +0000 (16:28 +0800)]
crypto: skcipher - Add lskcipher
Add a new API type lskcipher designed for taking straight kernel
pointers instead of SG lists. Its relationship to skcipher will
be analogous to that between shash and ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
We know `hw.partname` is supposed to be NUL-terminated by its later use with seq_printf:
| nitrox_debugfs.c +25
| seq_printf(s, " Part Name: %s\n", ndev->hw.partname);
Let's prefer a more robust and less ambiguous string interface.
A suitable replacement is `strscpy` [2] due to the fact that it guarantees
NUL-termination on the destination buffer.
Martin Kaiser [Tue, 12 Sep 2023 14:31:18 +0000 (16:31 +0200)]
hwrng: imx-rngc - reasonable timeout for initial seed
Set a more reasonable timeout for calculating the initial seed.
The reference manuals says that "The initial seed takes approximately
2,000,000 clock cycles." The rngc peripheral clock runs at >= 33.25MHz,
so seeding takes at most 60ms.
A timeout of 200ms is more appropriate than the current value of 3
seconds.
Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Martin Kaiser [Tue, 12 Sep 2023 14:31:17 +0000 (16:31 +0200)]
hwrng: imx-rngc - reasonable timeout for selftest
Set a more reasonable timeout for the rngc selftest.
According to the reference manual, "The self test takes approximately
29,000 cycles to complete." The lowest possible frequency of the rngc
peripheral clock is 33.25MHz, the selftest would then take about 872us.
2.5ms should be enough for the selftest timeout.
Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jonas Gorski [Sun, 10 Sep 2023 08:34:17 +0000 (10:34 +0200)]
hwrng: geode - fix accessing registers
When the membase and pci_dev pointer were moved to a new struct in priv,
the actual membase users were left untouched, and they started reading
out arbitrary memory behind the struct instead of registers. This
unfortunately turned the RNG into a constant number generator, depending
on the content of what was at that offset.
To fix this, update geode_rng_data_{read,present}() to also get the
membase via amd_geode_priv, and properly read from the right addresses
again.
Fixes: 9f6ec8dc574e ("hwrng: geode - Fix PCI device refcount leak") Reported-by: Timur I. Davletshin <timur.davletshin@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217882 Tested-by: Timur I. Davletshin <timur.davletshin@gmail.com> Suggested-by: Jo-Philipp Wich <jo@mein.io> Signed-off-by: Jonas Gorski <jonas.gorski@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Fri, 8 Sep 2023 09:21:13 +0000 (17:21 +0800)]
hwrng: octeon - Fix warnings on 32-bit platforms
Use unsigned long instead of u64 to silence compile warnings on
32-bit platforms. Also remove the __force bit which seems no
longer needed with a current sparse.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ccp - Add support for DBC over PSP mailbox
On some SOCs DBC is supported through the PSP mailbox instead of
the platform mailbox. This capability is advertised in the PSP
capabilities register. Allow using this communication path if
supported.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ccp - Add a communication path abstraction for DBC
DBC is currently accessed only from the platform access mailbox and
a lot of that implementation's communication path is intertwined
with DBC. Add an abstraction layer for pointers into the mailbox.
No intended functional changes.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Tom Lendacky [Thu, 7 Sep 2023 18:48:42 +0000 (13:48 -0500)]
crypto: ccp - Move direct access to some PSP registers out of TEE
With the PSP mailbox registers supporting more than just TEE, access to
them must be maintained and serialized by the PSP device support. Remove
TEE support direct access and create an interface in the PSP support
where the register access can be controlled/serialized.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Rijo Thomas <Rijo-john.Thomas@amd.com> Tested-by: Rijo Thomas <Rijo-john.Thomas@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Stefan Wahren [Tue, 5 Sep 2023 23:27:57 +0000 (01:27 +0200)]
hwrng: bcm2835 - Fix hwrng throughput regression
The last RCU stall fix caused a massive throughput regression of the
hwrng on Raspberry Pi 0 - 3. hwrng_msleep doesn't sleep precisely enough
and usleep_range doesn't allow scheduling. So try to restore the
best possible throughput by introducing hwrng_yield which interruptable
sleeps for one jiffy.
Some performance measurements on Raspberry Pi 3B+ (arm64/defconfig):
Lu Jialin [Mon, 4 Sep 2023 13:33:41 +0000 (13:33 +0000)]
crypto: pcrypt - Fix hungtask for PADATA_RESET
We found a hungtask bug in test_aead_vec_cfg as follows:
INFO: task cryptomgr_test:391009 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Call trace:
__switch_to+0x98/0xe0
__schedule+0x6c4/0xf40
schedule+0xd8/0x1b4
schedule_timeout+0x474/0x560
wait_for_common+0x368/0x4e0
wait_for_completion+0x20/0x30
wait_for_completion+0x20/0x30
test_aead_vec_cfg+0xab4/0xd50
test_aead+0x144/0x1f0
alg_test_aead+0xd8/0x1e0
alg_test+0x634/0x890
cryptomgr_test+0x40/0x70
kthread+0x1e0/0x220
ret_from_fork+0x10/0x18
Kernel panic - not syncing: hung_task: blocked tasks
For padata_do_parallel, when the return err is 0 or -EBUSY, it will call
wait_for_completion(&wait->completion) in test_aead_vec_cfg. In normal
case, aead_request_complete() will be called in pcrypt_aead_serial and the
return err is 0 for padata_do_parallel. But, when pinst->flags is
PADATA_RESET, the return err is -EBUSY for padata_do_parallel, and it
won't call aead_request_complete(). Therefore, test_aead_vec_cfg will
hung at wait_for_completion(&wait->completion), which will cause
hungtask.
The problem comes as following:
(padata_do_parallel) |
rcu_read_lock_bh(); |
err = -EINVAL; | (padata_replace)
| pinst->flags |= PADATA_RESET;
err = -EBUSY |
if (pinst->flags & PADATA_RESET) |
rcu_read_unlock_bh() |
return err
In order to resolve the problem, we replace the return err -EBUSY with
-EAGAIN, which means parallel_data is changing, and the caller should call
it again.
v3:
remove retry and just change the return err.
v2:
introduce padata_try_do_parallel() in pcrypt_aead_encrypt and
pcrypt_aead_decrypt to solve the hungtask.
Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Guo Zihua <guozihua@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Danny Tsen [Wed, 30 Aug 2023 13:49:11 +0000 (09:49 -0400)]
crypto: vmx - Improved AES/XTS performance of 6-way unrolling for ppc
Improve AES/XTS performance of 6-way unrolling for PowerPC up
to 17% with tcrypt. This is done by using one instruction,
vpermxor, to replace xor and vsldoi.
The same changes were applied to OpenSSL code and a pull request was
submitted.
This patch has been tested with the kernel crypto module tcrypt.ko and
has passed the selftest. The patch is also tested with
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS enabled.
Signed-off-by: Danny Tsen <dtsen@linux.ibm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Jinjie Ruan [Wed, 30 Aug 2023 07:54:51 +0000 (15:54 +0800)]
crypto: qat - Use list_for_each_entry() helper
Convert list_for_each() to list_for_each_entry() so that the list_itr
list_head pointer and list_entry() call are no longer needed, which
can reduce a few lines of code. No functional changed.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Some of the tests for unfused parts referenced a named member parameter,
but when the test suite was switched to call a python ctypes library they
weren't updated. Adjust them to refer to the first argument of the
process_param() call and set the data type of the signature appropriately.
Fixes: 15f8aa7bb3e5 ("crypto: ccp - Add unit tests for dynamic boost control") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
A local environment change was importing ioctl_opt which is required
for ioctl tests to pass. Add the missing import for it.
Fixes: 15f8aa7bb3e5 ("crypto: ccp - Add unit tests for dynamic boost control") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
crypto: ccp - Get a free page to use while fetching initial nonce
dbc_dev_init() gets a free page from `GFP_KERNEL`, but if that page has
any data in it the first nonce request will fail.
This prevents dynamic boost control from probing. To fix this, explicitly
request a zeroed page with `__GFP_ZERO` to ensure first nonce fetch works.
Fixes: c04cf9e14f10 ("crypto: ccp - Add support for fetching a nonce for dynamic boost control") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This function call was found to be unnecessary as there is no equivalent
platform_get_drvdata() call to access the private data of the driver. Also,
the private data is defined in this driver, so there is no risk of it being
accessed outside of this driver file.
Signed-off-by: Andrei Coardos <aboutphysycs@gmail.com> Reviewed-by: Alexandru Ardelean <alex@shruggie.ro> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrei Coardos [Mon, 28 Aug 2023 10:23:29 +0000 (13:23 +0300)]
hwrng: xgene - removed unneeded call to platform_set_drvdata()
This function call was found to be unnecessary as there is no equivalent
platform_get_drvdata() call to access the private data of the driver. Also,
the private data is defined in this driver, so there is no risk of it being
accessed outside of this driver file.
Signed-off-by: Andrei Coardos <aboutphysycs@gmail.com> Reviewed-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: Alexandru Ardelean <alex@shruggie.ro> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrei Coardos [Mon, 28 Aug 2023 10:17:57 +0000 (13:17 +0300)]
hwrng: mpfs - removed unneeded call to platform_set_drvdata()
This function call was found to be unnecessary as there is no equivalent
platform_get_drvdata() call to access the private data of the driver. Also,
the private data is defined in this driver, so there is no risk of it being
accessed outside of this driver file.
Signed-off-by: Andrei Coardos <aboutphysycs@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Alexandru Ardelean <alex@shruggie.ro> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrei Coardos [Wed, 23 Aug 2023 11:21:39 +0000 (14:21 +0300)]
hwrng: hisi - removed unneeded call to platform_set_drvdata()
This function call was found to be unnecessary as there is no equivalent
platform_get_drvdata() call to access the private data of the driver. Also,
the private data is defined in this driver, so there is no risk of it being
accessed outside of this driver file.
Signed-off-by: Andrei Coardos <aboutphysycs@gmail.com> Reviewed-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: Alexandru Ardelean <alex@shruggie.ro> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Andrei Coardos [Wed, 23 Aug 2023 11:15:55 +0000 (14:15 +0300)]
hwrng: bcm2835 - removed call to platform_set_drvdata()
This function call was found to be unnecessary as there is no equivalent
platform_get_drvdata() call to access the private data of the driver. Also,
the private data is defined in this driver, so there is no risk of it being
accessed outside of this driver file.
Signed-off-by: Andrei Coardos <aboutphysycs@gmail.com> Reviewed-by: Martin Kaiser <martin@kaiser.cx> Reviewed-by: Alexandru Ardelean <alex@shruggie.ro> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Merge tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm
Pull drm ci scripts from Dave Airlie:
"This is a bunch of ci integration for the freedesktop gitlab instance
where we currently do upstream userspace testing on diverse sets of
GPU hardware. From my perspective I think it's an experiment worth
going with and seeing how the benefits/noise playout keeping these
files useful.
Ideally I'd like to get this so we can do pre-merge testing on PRs
eventually.
Below is some info from danvet on why we've ended up making the
decision and how we can roll it back if we decide it was a bad plan.
Why in upstream?
- like documentation, testcases, tools CI integration is one of these
things where you can waste endless amounts of time if you
accidentally have a version that doesn't match your source code
- but also like the above, there's a balance, this is the initial cut
of what we think makes sense to keep in sync vs out-of-tree,
probably needs adjustment
- gitlab supports out-of-repo gitlab integration and that's what's
been used for the kernel in drm, but it results in per-driver
fragmentation and lots of duplicated effort. the simple act of
smashing an arbitrary winner into a topic branch already started
surfacing patches on dri-devel and sparking good cross driver team
discussions
Why gitlab?
- it's not any more shit than any of the other CI
- drm userspace uses it extensively for everything in userspace, we
have a lot of people and experience with this, including
integration of hw testing labs
- media userspace like gstreamer is also on gitlab.fd.o, and there's
discussion to extend this to the media subsystem in some fashion
Can this be shared?
- there's definitely a pile of code that could move to scripts/ if
other subsystem adopt ci integration in upstream kernel git. other
bits are more drm/gpu specific like the igt-gpu-tests/tools
integration
- docker images can be run locally or in other CI runners
Will we regret this?
- it's all in one directory, intentionally, for easy deletion
- probably 1-2 years in upstream to see whether this is worth it or a
Big Mistake. that's roughly what it took to _really_ roll out solid
CI in the bigger userspace projects we have on gitlab.fd.o like
mesa3d"
* tag 'topic/drm-ci-2023-08-31-1' of git://anongit.freedesktop.org/drm/drm:
drm: ci: docs: fix build warning - add missing escape
drm: Add initial ci/ subdirectory
Merge tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Fix preemption delays in the SGX code, remove unnecessarily
UAPI-exported code, fix a ld.lld linker (in)compatibility quirk and
make the x86 SMP init code a bit more conservative to fix kexec()
lockups"
* tag 'x86-urgent-2023-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Break up long non-preemptible delays in sgx_vepc_release()
x86: Remove the arch_calc_vm_prot_bits() macro from the UAPI
x86/build: Fix linker fill bytes quirk/incompatibility for ld.lld
x86/smp: Don't send INIT to non-present and non-booted CPUs
Merge tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Arnaldo Carvalho de Melo:
"perf tools maintainership:
- Add git information for perf-tools and perf-tools-next trees and
branches to the MAINTAINERS file. That is where development now
takes place and myself and Namhyung Kim have write access, more
people to come as we emulate other maintainer groups.
perf record:
- Record kernel data maps when 'perf record --data' is used, so that
global variables can be resolved and used in tools that do data
profiling.
perf trace:
- Remove the old, experimental support for BPF events in which a .c
file was passed as an event: "perf trace -e hello.c" to then get
compiled and loaded.
The only known usage for that, that shipped with the kernel as an
example for such events, augmented the raw_syscalls tracepoints and
was converted to a libbpf skeleton, reusing all the user space
components and the BPF code connected to the syscalls.
In the end just the way to glue the BPF part and the user space
type beautifiers changed, now being performed by libbpf skeletons.
The next step is to use BTF to do pretty printing of all syscall
types, as discussed with Alan Maguire and others.
Now, on a perf built with BUILD_BPF_SKEL=1 we get most if not all
path/filenames/strings, some of the networking data structures,
perf_event_attr, etc, i.e. systemwide tracing of nanosleep calls
and perf_event_open syscalls while 'perf stat' runs 'sleep' for 5
seconds:
- Building with binutils' libopcode now is opt-in (BUILD_NONDISTRO=1)
for licensing reasons, and we missed a build test on
tools/perf/tests makefile.
Since we now default to NDEBUG=1, we ended up segfaulting when
building with BUILD_NONDISTRO=1 because a needed initialization
routine was being "error checked" via an assert.
Fix it by explicitly checking the result and aborting instead if it
fails.
We better back propagate the error, but at least 'perf annotate' on
samples collected for a BPF program is back working when perf is
built with BUILD_NONDISTRO=1.
perf report/top:
- Add back TUI hierarchy mode header, that is seen when using 'perf
report/top --hierarchy'.
- Fix the number of entries for 'e' key in the TUI that was
preventing navigation of lines when expanding an entry.
perf report/script:
- Support cross platform register handling, allowing a perf.data file
collected on one architecture to have registers sampled correctly
displayed when analysis tools such as 'perf report' and 'perf
script' are used on a different architecture.
- Fix handling of event attributes in pipe mode, i.e. when one uses:
perf record -o - | perf report -i -
When no perf.data files are used.
- Handle files generated via pipe mode with a version of perf and
then read also via pipe mode with a different version of perf,
where the event attr record may have changed, use the record size
field to properly support this version mismatch.
perf probe:
- Accessing global variables from uprobes isn't supported, make the
error message state that instead of stating that some minimal
kernel version is needed to have that feature. This seems just a
tool limitation, the kernel probably has all that is needed.
perf tests:
- Fix a reference count related leak in the dlfilter v0 API where the
result of a thread__find_symbol_fb() is not matched with an
addr_location__exit() to drop the reference counts of the resolved
components (machine, thread, map, symbol, etc). Add a dlfilter test
to make sure that doesn't regresses.
- Lots of fixes for the 'perf test' written in shell script related
to problems found with the shellcheck utility.
- Fixes for 'perf test' shell scripts testing features enabled when
perf is built with BUILD_BPF_SKEL=1, such as 'perf stat' bpf
counters.
- Add perf record sample filtering test, things like the following
example, that gets implemented as a BPF filter attached to the
event:
- Improve the way the task_analyzer test checks if libtraceevent is
linked, using 'perf version --build-options' instead of the more
expensinve 'perf record -e "sched:sched_switch"'.
- Add support for riscv in the mmap-basic test. (This went as well
via the RiscV tree, same contents).
libperf:
- Implement riscv mmap support (This went as well via the RiscV tree,
same contents).
perf script:
- New tool that converts perf.data files to the firefox profiler
format so that one can use the visualizer at
https://profiler.firefox.com/. Done by Anup Sharma as part of this
year's Google Summer of Code.
One can generate the output and upload it to the web interface but
Anup also automated everything:
perf script gecko -F 99 -a sleep 60
- Support syscall name parsing on arm64.
- Print "cgroup" field on the same line as "comm".
perf bench:
- Add new 'uprobe' benchmark to measure the overhead of uprobes
with/without BPF programs attached to it.
- breakpoints are not available on power9, skip that test.
perf stat:
- Add #num_cpus_online literal to be used in 'perf stat' metrics, and
add this extra 'perf test' check that exemplifies its purpose:
- Improve tool startup time by lazily reading PMU, JSON, sysfs data.
- Improve error reporting in the parsing of events, passing YYLTYPE
to error routines, so that the output can show were the parsing
error was found.
- Add 'perf test' entries to check the parsing of events
improvements.
- Fix various leak for things detected by -fsanitize=address, mostly
things that would be freed at tool exit, including:
- Free evsel->filter on the destructor.
- Allow tools to register a thread->priv destructor and use it in
'perf trace'.
- Free evsel->priv in 'perf trace'.
- Free string returned by synthesize_perf_probe_point() when the
caller fails to do all it needs.
- Adjust various compiler options to not consider errors some
warnings when building with broken headers found in things like
python, flex, bison, as we otherwise build with -Werror. Some for
gcc, some for clang, some for some specific version of those, some
for some specific version of flex or bison, or some specific
combination of these components, bah.
- Allow customization of clang options for BPF target, this helps
building on gentoo where there are other oddities where BPF targets
gets passed some compiler options intended for the native build, so
building with WERROR=0 helps while these oddities are fixed.
- Dont pass ERR_PTR() values to perf_session__delete() in 'perf top'
and 'perf lock', fixing some segfaults when handling some odd
failures.
- Add LTO build option.
- Fix format of unordered lists in the perf docs
(tools/perf/Documentation)
- Overhaul the bison files, using constructs such as YYNOMEM.
- Remove unused tokens from the bison .y files.
- Add more comments to various structs.
- A few LoongArch enablement patches.
Vendor events (JSON):
- Add JSON metrics for Yitian 710 DDR (aarch64). Things like:
EventName, BriefDescription
visible_window_limit_reached_rd, "At least one entry in read queue reaches the visible window limit.",
visible_window_limit_reached_wr, "At least one entry in write queue reaches the visible window limit.",
op_is_dqsosc_mpc , "A DQS Oscillator MPC command to DRAM.",
op_is_dqsosc_mrr , "A DQS Oscillator MRR command to DRAM.",
op_is_tcr_mrr , "A Temperature Compensated Refresh(TCR) MRR command to DRAM.",
- Add AmpereOne metrics (aarch64).
- Update N2 and V2 metrics (aarch64) and events using Arm telemetry
repo.
- Update scale units and descriptions of common topdown metrics on
aarch64. Things like:
- "MetricExpr": "stall_slot_frontend / (#slots * cpu_cycles)",
- "BriefDescription": "Frontend bound L1 topdown metric",
+ "MetricExpr": "100 * (stall_slot_frontend / (#slots * cpu_cycles))",
+ "BriefDescription": "This metric is the percentage of total slots that were stalled due to resource constraints in the frontend of the processor.",
- Update events for intel: meteorlake to 1.04, sapphirerapids to
1.15, Icelake+ metric constraints.
- Update files for the power10 platform"
* tag 'perf-tools-for-v6.6-1-2023-09-05' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (217 commits)
perf parse-events: Fix driver config term
perf parse-events: Fixes relating to no_value terms
perf parse-events: Fix propagation of term's no_value when cloning
perf parse-events: Name the two term enums
perf list: Don't print Unit for "default_core"
perf vendor events intel: Fix modifier in tma_info_system_mem_parallel_reads for skylake
perf dlfilter: Avoid leak in v0 API test use of resolve_address()
perf metric: Add #num_cpus_online literal
perf pmu: Remove str from perf_pmu_alias
perf parse-events: Make common term list to strbuf helper
perf parse-events: Minor help message improvements
perf pmu: Avoid uninitialized use of alias->str
perf jevents: Use "default_core" for events with no Unit
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
perf test shell stat_bpf_counters: Fix test on Intel
perf test shell record_bpf_filter: Skip 6.2 kernel
libperf: Get rid of attr.id field
perf tools: Convert to perf_record_header_attr_id()
libperf: Add perf_record_header_attr_id()
perf tools: Handle old data in PERF_RECORD_ATTR
...
Merge tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
David Howells [Fri, 8 Sep 2023 16:03:22 +0000 (17:03 +0100)]
iov_iter: Kunit tests for page extraction
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that can't be extracted.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Sep 2023 16:03:21 +0000 (17:03 +0100)]
iov_iter: Kunit tests for copying to/from an iterator
Add some kunit tests for page extraction for ITER_BVEC, ITER_KVEC and
ITER_XARRAY type iterators. ITER_UBUF and ITER_IOVEC aren't dealt with
as they require userspace VM interaction. ITER_DISCARD isn't dealt with
either as that does nothing.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Fri, 8 Sep 2023 16:03:20 +0000 (17:03 +0100)]
iov_iter: Fix iov_iter_extract_pages() with zero-sized entries
iov_iter_extract_pages() doesn't correctly handle skipping over initial
zero-length entries in ITER_KVEC and ITER_BVEC-type iterators.
The problem is that it accidentally reduces maxsize to 0 when it
skipping and thus runs to the end of the array and returns 0.
Fix this by sticking the calculated size-to-copy in a new variable
rather than back in maxsize.
Fixes: 7d58fe731028 ("iov_iter: Add a function to extract a page list from an iterator") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull more RISC-V updates from Palmer Dabbelt:
- The kernel now dynamically probes for misaligned access speed, as
opposed to relying on a table of known implementations.
- Support for non-coherent devices on systems using the Andes AX45MP
core, including the RZ/Five SoCs.
- Support for the V extension in ptrace(), again.
- Support for KASLR.
- Support for the BPF prog pack allocator in RISC-V.
- A handful of bug fixes and cleanups.
* tag 'riscv-for-linus-6.6-mw2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (25 commits)
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
RISC-V: Add ptrace support for vectors
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
...
Duoming Zhou [Wed, 2 Aug 2023 03:37:37 +0000 (11:37 +0800)]
sh: push-switch: Reorder cleanup operations to avoid use-after-free bug
The original code puts flush_work() before timer_shutdown_sync()
in switch_drv_remove(). Although we use flush_work() to stop
the worker, it could be rescheduled in switch_timer(). As a result,
a use-after-free bug can occur. The details are shown below:
This patch puts timer_shutdown_sync() before flush_work() to
mitigate the bugs. As a result, the worker and timer will be
stopped safely before the deallocate operations.
Fixes: 9f5e8eee5cfe ("sh: generic push-switch framework.") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20230802033737.9738-1-duoming@zju.edu.cn Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Petr Tesarik [Mon, 24 Jul 2023 12:07:42 +0000 (14:07 +0200)]
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
In all these cases, the last argument to dma_declare_coherent_memory() is
the buffer end address, but the expected value should be the size of the
reserved region.
Fixes: 39fb993038e1 ("media: arch: sh: ap325rxa: Use new renesas-ceu camera driver") Fixes: c2f9b05fd5c1 ("media: arch: sh: ecovec: Use new renesas-ceu camera driver") Fixes: f3590dc32974 ("media: arch: sh: kfr2r09: Use new renesas-ceu camera driver") Fixes: 186c446f4b84 ("media: arch: sh: migor: Use new renesas-ceu camera driver") Fixes: 1a3c230b4151 ("media: arch: sh: ms7724se: Use new renesas-ceu camera driver") Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Link: https://lore.kernel.org/r/20230724120742.2187-1-petrtesarik@huaweicloud.com Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Mostly small stragglers that missed the initial merge.
Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat
due to the volatile qualifier removal, fnic due to unused function
removal and sd.c has a lot of code shuffling to remove forward
declarations)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits)
scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler
scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
scsi: mpt3sas: Remove volatile qualifier
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
scsi: libsas: Simplify sas_queue_reset() and remove unused code
scsi: ufs: Fix the build for the old ARM OABI
scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt()
scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag()
scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport"
scsi: fnic: Replace sgreset tag with max_tag_id
scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs()
scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error
scsi: smartpqi: Change driver version to 2.1.24-046
scsi: smartpqi: Enhance error messages
scsi: smartpqi: Enhance controller offline notification
scsi: smartpqi: Enhance shutdown notification
scsi: smartpqi: Simplify lun_number assignment
scsi: smartpqi: Rename pciinfo to pci_info
scsi: smartpqi: Rename MACRO to clarify purpose
scsi: smartpqi: Add abort handler
...
Merge tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver symbol lookup fix from Greg KH:
"Here is one last fixup for your tree for 6.6-rc1. It resolves a
problem with the way that symbol_get was changed in the module tree
merge in your tree to fix up the DVB drivers which rely on this old
api to attach new devices.
As the changelog comment says:
In commit 9011e49d54dc ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly
restricted to GPL-only marked symbols. This interacts oddly with the
DVB logic which only uses dvb_attach() to load the dvb driver which
then uses symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols
as EXPORT_SYMBOL_GPL().
This has been acked by Hans from the V4L driver side, Luis from the
module side, Mauro on the media side, and Christoph said it was the
correct solution, and was tested by the original reporter of the
issue.
It has passed 0-day testing, but has not been in linux-next due to it
only being sent yesterday"
* tag 'driver-core-6.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
media: dvb: symbol fixup for dvb_attach()
Merge tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- move a dma-debug call that prints a message out from a lock that's
causing problems with the lock order in serial drivers (Sergey
Senozhatsky)
- fix the CONFIG_DMA_NUMA_CMA Kconfig entry to have the right
dependency and not default to y (Christoph Hellwig)
- move an ifdef a bit to remove a __maybe_unused that seems to trip up
some sensitivities (Christoph Hellwig)
- revert a bogus check in the CMA allocator (Zhenhua Huang)
* tag 'dma-mapping-6.6-2023-09-09' of git://git.infradead.org/users/hch/dma-mapping:
Revert "dma-contiguous: check for memory region overlap"
dma-pool: remove a __maybe_unused label in atomic_pool_expand
dma-contiguous: fix the Kconfig entry for CONFIG_DMA_NUMA_CMA
dma-debug: don't call __dma_entry_alloc_check_leak() under free_entries_lock
Merge tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Add PCI_DYNAMIC_OF_NODES dependency on OF_IRQ to fix sparc64 build
error (Lizhi Hou)
- After coalescing host bridge resources, free any released resources
to avoid a leak (Ross Lagerwall)
- Revert a quirk that prevented NVIDIA T4 GPUs from using Secondary Bus
Reset. The quirk worked around an issue that we now think is related
to the Root Port, not the GPU (Bjorn Helgaas)
* tag 'pci-v6.6-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
Revert "PCI: Mark NVIDIA T4 GPUs to avoid bus reset"
PCI: Free released resource after coalescing
PCI: Fix CONFIG_PCI_DYNAMIC_OF_NODES kconfig dependencies
Merge tag 'ntb-6.6' of https://github.com/jonmason/ntb
Pull NTB updates from Jon Mason:
"Link toggling fixes and debugfs error path fixes"
[ And for everybody like me who always have to remind themselves what
the TLA of the day is, and what NTB stands for - it's a PCIe
"Non-Transparent Bridge" thing - Linus ]
* tag 'ntb-6.6' of https://github.com/jonmason/ntb:
ntb: Check tx descriptors outstanding instead of head/tail for tx queue
ntb: Fix calculation ntb_transport_tx_free_entry()
ntb: Drop packets when qp link is down
ntb: Clean up tx tail index on link down
ntb: amd: Drop unnecessary error check for debugfs_create_dir
NTB: ntb_tool: Switch to memdup_user_nul() helper
dtivers: ntb: fix parameter check in perf_setup_dbgfs()
ntb: Remove error checking for debugfs_create_dir()
In commit 9011e49d54dc ("modules: only allow symbol_get of
EXPORT_SYMBOL_GPL modules") the use of symbol_get is properly restricted
to GPL-only marked symbols. This interacts oddly with the DVB logic
which only uses dvb_attach() to load the dvb driver which then uses
symbol_get().
Fix this up by properly marking all of the dvb_attach attach symbols as
EXPORT_SYMBOL_GPL().
Fixes: 9011e49d54dc ("modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules") Cc: stable <stable@kernel.org> Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-media@vger.kernel.org Cc: linux-modules@vger.kernel.org Acked-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Link: https://lore.kernel.org/r/20230908092035.3815268-2-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray
Pull xarray fixes from Matthew Wilcox:
- Fix a bug encountered by people using bittorrent where they'd get
NULL pointer dereferences on page cache lookups when using XFS
- Two documentation fixes
* tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray:
idr: fix param name in idr_alloc_cyclic() doc
xarray: Document necessary flag in alloc functions
XArray: Do not return sibling entries from xa_load()
- Fix a regression in partition handling for devices not supporting
partitions (Li)
* tag 'block-6.6-2023-09-08' of git://git.kernel.dk/linux:
drbd: swap bvec_set_page len and offset
block: fix pin count management when merging same-page segments
null_blk: fix poll request timeout handling
s390/dasd: fix string length handling
block: don't add or resize partition on the disk with GENHD_FL_NO_PART
block: remove the call to file_remove_privs in blkdev_write_iter
blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
blk-throttle: fix wrong comparation while 'carryover_ios/bytes' is negative
blk-throttle: print signed value 'carryover_bytes/ios' for user
Merge tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into the 6.6-rc merge window:
- Fix for a regression this merge window caused by the SQPOLL
affinity patch, where we can race with SQPOLL thread shutdown and
cause an oops when trying to set affinity (Gabriel)
- Fix for a regression this merge window where fdinfo reading with
for a ring setup with IORING_SETUP_NO_SQARRAY will attempt to
deference the non-existing SQ ring array (me)
- Add the patch that allows more finegrained control over who can use
io_uring (Matteo)
- Locking fix for a regression added this merge window for IOPOLL
overflow (Pavel)
- IOPOLL fix for stable, breaking our loop if helper threads are
exiting (Pavel)
Also had a fix for unreaped iopoll requests from io-wq from Ming, but
we found an issue with that and hence it got reverted. Will get this
sorted for a future rc"
* tag 'io_uring-6.6-2023-09-08' of git://git.kernel.dk/linux:
Revert "io_uring: fix IO hang in io_wq_put_and_exit from do_exit()"
io_uring: fix unprotected iopoll overflow
io_uring: break out of iowq iopoll on teardown
io_uring: add a sysctl to disable io_uring system-wide
io_uring/fdinfo: only print ->sq_array[] if it's there
io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
io_uring: Don't set affinity on a dying sqpoll thread
Merge tag 'thermal-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"Eliminate an obsolete thermal zone registration function"
* tag 'thermal-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Drop thermal_zone_device_register()
thermal: Use thermal_tripless_zone_device_register()
thermal: core: Add function for registering tripless thermal zones
thermal: core: Clean up headers of thermal zone registration functions
Merge tag 'pm-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Fix an Intel RAPL power capping driver regression introduced during
the 6.5 development cycle (Srinivas Pandruvada)"
* tag 'pm-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: intel_rapl: Fix invalid setting of Power Limit 4
Merge tag 'gpio-fixes-for-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fix from Bartosz Golaszewski:
- fix a regression in irqchip setup in gpio-zynq
* tag 'gpio-fixes-for-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: zynq: restore zynq_gpio_irq_reqres/zynq_gpio_irq_relres callbacks
d5af729dc207 ("PCI: Mark NVIDIA T4 GPUs to avoid bus reset") avoided
Secondary Bus Reset on the T4 because the reset seemed to not work when the
T4 was directly attached to a Root Port.
But NVIDIA thinks the issue is probably related to some issue with the Root
Port, not with the T4. The T4 provides neither PM nor FLR reset, so
masking bus reset compromises this device for assignment scenarios.
Revert d5af729dc207 as requested by Wu Zongyong. This will leave SBR
broken in the specific configuration Wu tested, as it was in v6.5, so Wu
will debug that further.
Merge tag 'sound-fix-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of fixes for 6.6-rc1. All small and easy ones.
- The corrections of the previous PCM iov_iter transitions
- Regression fixes in MIDI 2.0 / USB changes
- Various ASoC codec fixes for Cirrus, Realtek, WCD
- ASoC AMD quirks and ASoC Intel AVS driver workaround"
* tag 'sound-fix-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek - ALC287 I2S speaker platform support
ASoC: amd: yc: Fix a non-functional mic on Lenovo 82TL
ASoC: Intel: avs: Provide support for fallback topology
ALSA: seq: Fix snd_seq_expand_var_event() call to user-space
ALSA: usb-audio: Fix potential memory leaks at error path for UMP open
ALSA: hda/cirrus: Fix broken audio on hardware with two CS42L42 codecs.
ASoC: rt5645: NULL pointer access when removing jack
ASoC: amd: yc: Add DMI entries to support Victus by HP Gaming Laptop 15-fb0xxx (8A3E)
MAINTAINERS: Update the MAINTAINERS enties for TEXAS INSTRUMENTS ASoC DRIVERS
ALSA: sb: Fix wrong argument in commented code
ALSA: pcm: Fix error checks of default read/write copy ops
ASoC: Name iov_iter argument as iterator instead of buffer
ASoC: dmaengine: Drop unused iov_iter for process callback
ALSA: hda/tas2781: Use standard clamp() macro
ASoC: cs35l56: Waiting for firmware to boot must be tolerant of I/O errors
ASoC: dt-bindings: fsl_easrc: Add support for imx8mp-easrc
ASoC: cs42l43: Fix missing error code in cs42l43_codec_probe()
ASoC: cs35l45: Rename DACPCM1 Source control
ASoC: cs35l45: Fix "Dead assigment" warning
ASoC: cs35l45: Add support for Chip ID 0x35A460
...
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The main one is a fix for a broken strscpy() conversion that landed in
the merge window and broke early parsing of the kernel command line.
- Fix an incorrect mask in the CXL PMU driver
- Fix a regression in early parsing of the kernel command line
- Fix an IP checksum OoB access reported by syzbot"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: csum: Fix OoB access in IP checksum code for negative lengths
arm64/sysreg: Fix broken strncpy() -> strscpy() conversion
perf: CXL: fix mismatched number of counters mask
Merge tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Allow usage of LSX/LASX in the kernel, and use them for
SIMD-optimized RAID5/RAID6 routines
- Add Loongson Binary Translation (LBT) extension support
- Add basic KGDB & KDB support
- Add building with kcov coverage
- Add KFENCE (Kernel Electric-Fence) support
- Add KASAN (Kernel Address Sanitizer) support
- Some bug fixes and other small changes
- Update the default config file
* tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add KASAN (Kernel Address Sanitizer) support
LoongArch: Simplify the processing of jumping new kernel for KASLR
kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process
kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping
LoongArch: Add KFENCE (Kernel Electric-Fence) support
LoongArch: Get partial stack information when providing regs parameter
LoongArch: mm: Add page table mapped mode support for virt_to_page()
kfence: Defer the assignment of the local variable addr
LoongArch: Allow building with kcov coverage
LoongArch: Provide kaslr_offset() to get kernel offset
LoongArch: Add basic KGDB & KDB support
LoongArch: Add Loongson Binary Translation (LBT) extension support
raid6: Add LoongArch SIMD recovery implementation
raid6: Add LoongArch SIMD syndrome calculation
LoongArch: Add SIMD-optimized XOR routines
LoongArch: Allow usage of LSX/LASX in the kernel
LoongArch: Define symbol 'fault' as a local label in fpu.S
LoongArch: Adjust {copy, clear}_user exception handler behavior
LoongArch: Use static defined zero page rather than allocated
...
Merge tag 'landlock-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"One test fix and a __counted_by annotation"
* tag 'landlock-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
selftests/landlock: Fix a resource leak
landlock: Annotate struct landlock_rule with __counted_by
soc: renesas: Kconfig: For ARCH_R9A07G043 select the required configs if dependencies are met
To prevent randconfig build issues when enabling the RZ/Five SoC, consider
selecting specific configurations only when their dependencies are
satisfied.
riscv: Kconfig.errata: Add dependency for RISCV_SBI in ERRATA_ANDES config
Andes errata uses sbi_ecalll() which is only available if RISCV_SBI is
enabled. So add an dependency for RISCV_SBI in ERRATA_ANDES config to
avoid any build failures.
riscv: Kconfig.errata: Drop dependency for MMU in ERRATA_ANDES_CMO config
Now that RISCV_DMA_NONCOHERENT conditionally selects DMA_DIRECT_REMAP
ie only if MMU is enabled, we no longer need the MMU dependency in
ERRATA_ANDES_CMO config.
riscv: Kconfig: Select DMA_DIRECT_REMAP only if MMU is enabled
kernel/dma/mapping.c has its use of pgprot_dmacoherent() inside
an #ifdef CONFIG_MMU block. kernel/dma/pool.c has its use of
pgprot_dmacoherent() inside an #ifdef CONFIG_DMA_DIRECT_REMAP block.
So select DMA_DIRECT_REMAP only if MMU is enabled for RISCV_DMA_NONCOHERENT
config.
Merge patch series "bpf, riscv: use BPF prog pack allocator in BPF JIT"
Puranjay Mohan <puranjay12@gmail.com> says:
Here is some data to prove the V2 fixes the problem:
Without this series:
root@rv-selftester:~/src/kselftest/bpf# time ./test_tag
test_tag: OK (40945 tests)
real 7m47.562s
user 0m24.145s
sys 6m37.064s
With this series applied:
root@rv-selftester:~/src/selftest/bpf# time ./test_tag
test_tag: OK (40945 tests)
real 7m29.472s
user 0m25.865s
sys 6m18.401s
BPF programs currently consume a page each on RISCV. For systems with many BPF
programs, this adds significant pressure to instruction TLB. High iTLB pressure
usually causes slow down for the whole system.
Song Liu introduced the BPF prog pack allocator[1] to mitigate the above issue.
It packs multiple BPF programs into a single huge page. It is currently only
enabled for the x86_64 BPF JIT.
I enabled this allocator on the ARM64 BPF JIT[2]. It is being reviewed now.
This patch series enables the BPF prog pack allocator for the RISCV BPF JIT.
======================================================
Performance Analysis of prog pack allocator on RISCV64
======================================================
To test the performance of the BPF prog pack allocator on RV, a stresser
tool[4] linked below was built. This tool loads 8 BPF programs on the system and
triggers 5 of them in an infinite loop by doing system calls.
The runner script starts 20 instances of the above which loads 8*20=160 BPF
programs on the system, 5*20=100 of which are being constantly triggered.
The script is passed a command which would be run in the above environment.
The script was run with following perf command:
./run.sh "perf stat -a \
-e iTLB-load-misses \
-e dTLB-load-misses \
-e dTLB-store-misses \
-e instructions \
--timeout 60000"
The output of the above command is discussed below before and after enabling the
BPF prog pack allocator.
The tests were run on qemu-system-riscv64 with 8 cpus, 16G memory. The rootfs
was created using Bjorn's riscv-cross-builder[5] docker container linked below.
Results
=======
Before enabling prog pack allocator:
------------------------------------
It was expected that the iTLB-load-misses would decrease as now a single huge
page is used to keep all the BPF programs compared to a single page for each
program earlier.
--------------------------------------------
The improvement in iTLB-load-misses: -30.5 %
--------------------------------------------
I repeated this expriment more than 100 times in different setups and the
improvement was always greater than 30%.
This patch series is boot tested on the Starfive VisionFive 2 board[6].
The performance analysis was not done on the board because it doesn't
expose iTLB-load-misses, etc. The stresser program was run on the board to test
the loading and unloading of BPF programs
* b4-shazam-merge:
bpf, riscv: use prog pack allocator in the BPF JIT
riscv: implement a memset like function for text
riscv: extend patch_text_nosync() for multiple pages
bpf: make bpf_prog_pack allocator portable
The following KASLR implementation allows to randomize the kernel mapping:
- virtually: we expect the bootloader to provide a seed in the device-tree
- physically: only implemented in the EFI stub, it relies on the firmware to
provide a seed using EFI_RNG_PROTOCOL. arm64 has a similar implementation
hence the patch 3 factorizes KASLR related functions for riscv to take
advantage.
The new virtual kernel location is limited by the early page table that only
has one PUD and with the PMD alignment constraint, the kernel can only take
< 512 positions.
* b4-shazam-merge:
riscv: libstub: Implement KASLR by using generic functions
libstub: Fix compilation warning for rv32
arm64: libstub: Move KASLR handling functions to kaslr.c
riscv: Dump out kernel offset information on panic
riscv: Introduce virtual kernel mapping KASLR
non-coherent DMA support for AX45MP
====================================
On the Andes AX45MP core, cache coherency is a specification option so it
may not be supported. In this case DMA will fail. To get around with this
issue this patch series does the below:
1] Andes alternative ports is implemented as errata which checks if the
IOCP is missing and only then applies to CMO errata. One vendor specific
SBI EXT (ANDES_SBI_EXT_IOCP_SW_WORKAROUND) is implemented as part of
errata.
Below are the configs which Andes port provides (and are selected by
RZ/Five):
- ERRATA_ANDES
- ERRATA_ANDES_CMO
OpenSBI patch supporting ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI is now
part v1.3 release.
2] Andes AX45MP core has a Programmable Physical Memory Attributes (PMA)
block that allows dynamic adjustment of memory attributes in the runtime.
It contains a configurable amount of PMA entries implemented as CSR
registers to control the attributes of memory locations in interest.
OpenSBI configures the PMA regions as required and creates a reserve memory
node and propagates it to the higher boot stack.
Currently OpenSBI (upstream) configures the required PMA region and passes
this a shared DMA pool to Linux.
* b4-shazam-merge:
soc: renesas: Kconfig: Select the required configs for RZ/Five SoC
cache: Add L2 cache management for Andes AX45MP RISC-V core
dt-bindings: cache: andestech,ax45mp-cache: Add DT binding documentation for L2 cache controller
riscv: mm: dma-noncoherent: nonstandard cache operations support
riscv: errata: Add Andes alternative ports
riscv: asm: vendorid_list: Add Andes Technology to the vendors list
This patch series is a subset from Arnd's original series [0]. Ive just
picked up the bits required for RISC-V unification of cache flushing.
Remaining patches from the series [0] will be taken care by Arnd soon.
* b4-shazam-merge:
riscv: dma-mapping: switch over to generic implementation
riscv: dma-mapping: skip invalidation before bidirectional DMA
riscv: dma-mapping: only invalidate after DMA, not flush
Merge patch series "RISC-V: Probe for misaligned access speed"
Evan Green <evan@rivosinc.com> says:
The current setting for the hwprobe bit indicating misaligned access
speed is controlled by a vendor-specific feature probe function. This is
essentially a per-SoC table we have to maintain on behalf of each vendor
going forward. Let's convert that instead to something we detect at
runtime.
We have two assembly routines at the heart of our probe: one that
does a bunch of word-sized accesses (without aligning its input buffer),
and the other that does byte accesses. If we can move a larger number of
bytes using misaligned word accesses than we can with the same amount of
time doing byte accesses, then we can declare misaligned accesses as
"fast".
The tradeoff of reducing this maintenance burden is boot time. We spend
4-6 jiffies per core doing this measurement (0-2 on jiffie edge
alignment, and 4 on measurement). The timing loop was based on
raid6_choose_gen(), which uses (16+1)*N jiffies (where N is the number
of algorithms). By taking only the fastest iteration out of all
attempts for use in the comparison, variance between runs is very low.
On my THead C906, it looks like this:
[ 0.047563] cpu0: Ratio of byte access time to unaligned word access is 4.34, unaligned accesses are fast
Several others have chimed in with results on slow machines with the
older algorithm, which took all runs into account, including noise like
interrupts. Even with this variation, results indicate that in all cases
(fast, slow, and emulated) the measured numbers are nowhere near each
other (always multiple factors away).