A runtime monitor can cause a reaction to the detection of an
exception on the model's execution. By default, the monitors have
tracing reactions, printing the monitor output via tracepoints.
But other reactions can be added (on-demand) via this interface.
The user interface resembles the kernel tracing interface and
presents these files:
"available_reactors"
- Reading shows the available reactors, one per line.
For example:
# cat available_reactors
nop
panic
printk
"reacting_on"
- It is an on/off general switch for reactors, disabling
all reactions.
"monitors/MONITOR/reactors"
- List available reactors, with the select reaction for the given
MONITOR inside []. The default one is the nop (no operation)
reactor.
- Writing the name of a reactor enables it to the given
MONITOR.
RV is a lightweight (yet rigorous) method that complements classical
exhaustive verification techniques (such as model checking and
theorem proving) with a more practical approach to complex systems.
RV works by analyzing the trace of the system's actual execution,
comparing it against a formal specification of the system behavior.
RV can give precise information on the runtime behavior of the
monitored system while enabling the reaction for unexpected
events, avoiding, for example, the propagation of a failure on
safety-critical systems.
The development of this interface roots in the development of the
paper:
De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo
Silva. Efficient formal verification for the Linux kernel. In:
International Conference on Software Engineering and Formal Methods.
Springer, Cham, 2019. p. 315-332.
And:
De Oliveira, Daniel Bristot. Automata-based formal analysis
and verification of the real-time Linux kernel. PhD Thesis, 2020.
The RV interface resembles the tracing/ interface on purpose. The current
path for the RV interface is /sys/kernel/tracing/rv/.
It presents these files:
"available_monitors"
- List the available monitors, one per line.
For example:
# cat available_monitors
wip
wwnr
"enabled_monitors"
- Lists the enabled monitors, one per line;
- Writing to it enables a given monitor;
- Writing a monitor name with a '!' prefix disables it;
- Truncating the file disables all enabled monitors.
Note that more than one monitor can be enabled concurrently.
"monitoring_on"
- It is an on/off general switcher for monitoring. Note
that it does not disable enabled monitors or detach events,
but stop the per-entity monitors of monitoring the events
received from the system. It resembles the "tracing_on" switcher.
"monitors/"
Each monitor will have its one directory inside "monitors/". There
the monitor specific files will be presented.
The "monitors/" directory resembles the "events" directory on
tracefs.
For example:
# cd monitors/wip/
# ls
desc enable
# cat desc
wakeup in preemptive per-cpu testing monitor.
# cat enable
0
For further information, see the comments in the header of
kernel/trace/rv/rv.c from this patch.
Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When a ftrace_bug happens (where ftrace fails to modify a location) it is
helpful to have what was at that location as well as what was expected to
be there.
But with the conversion to text_poke() the variable that assigns the
expected for debugging was dropped. Unfortunately, I noticed this when I
needed it. Add it back.
Link: https://lkml.kernel.org/r/20220726101851.069d2e70@gandalf.local.home Cc: "x86@kernel.org" <x86@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing: Use a copy of the va_list for __assign_vstr()
If an instance of tracing enables the same trace event as another
instance, or the top level instance, or even perf, then the va_list passed
into some tracepoints can be used more than once.
As va_list can only be traversed once, this can cause issues:
batman-adv: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220724191650.236b1355@rorschach.local.home Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: b.a.t.m.a.n@lists.open-mesh.org Cc: netdev@vger.kernel.org Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
USB: mtu3: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
selftests/kprobe: Update test for no event name syntax error
The commit 208003254c32 ("selftests/kprobe: Do not test for GRP/
without event failures") removed a syntax which is no more cause
a syntax error (NO_EVENT_NAME error with GRP/).
However, there are another case (NO_EVENT_NAME error without GRP/)
which causes a same error. This adds a test for that case.
selftests/kprobe: Do not test for GRP/ without event failures
A new feature is added where kprobes (and other probes) do not need to
explicitly state the event name when creating a probe. The event name will
come from what is being attached.
That is:
# echo 'p:foo/ vfs_read' > kprobe_events
Will no longer error, but instead create an event:
# cat kprobe_events
p:foo/p_vfs_read_0 vfs_read
This should not be tested as an error case anymore. Remove it from the
selftest as now this feature "breaks" the selftest as it no longer fails
as expected.
Linyu Yuan [Mon, 27 Jun 2022 02:19:07 +0000 (10:19 +0800)]
tracing: Auto generate event name when creating a group of events
Currently when creating a specific group of trace events,
take kprobe event as example, the user must use the following format:
p:GRP/EVENT [MOD:]KSYM[+OFFS]|KADDR [FETCHARGS],
which means user must enter EVENT name, one example is:
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224751.271015450@goodmis.org Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
scsi: qla2xxx: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224750.896553364@goodmis.org Cc: Bart Van Assche <bvanassche@acm.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
scsi: iscsi: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224750.715763972@goodmis.org Cc: Fred Herard <fred.herard@oracle.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
usb: musb: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224750.532345354@goodmis.org Cc: Bin Liu <b-liu@ti.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-usb@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
usb: chipidea: tracing: Use the new __vstring() helper
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224749.991587733@goodmis.org Cc: Peter Chen <peter.chen@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-usb@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224749.806599472@goodmis.org Cc: Gregory Greenman <gregory.greenman@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224749.622796175@goodmis.org Cc: Arend van Spriel <aspriel@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: linux-wireless@vger.kernel.org Cc: brcm80211-dev-list.pdl@broadcom.com Cc: SHA-cyfmac-dev-list@infineon.com Cc: netdev@vger.kernel.org Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224749.430339634@goodmis.org Cc: Kalle Valo <kvalo@kernel.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: ath10k@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org Cc: ath11k@lists.infradead.org Acked-by: Kalle Valo <kvalo@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Instead of open coding a __dynamic_array() with a fixed length (which
defeats the purpose of the dynamic array in the first place). Use the new
__vstring() helper that will use a va_list and only write enough of the
string into the ring buffer that is needed.
Link: https://lkml.kernel.org/r/20220705224749.239494531@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
To load a string created by variable array va_list.
The main issue with this approach is that "MSG_MAX" usage in the
__dynamic_array() portion. That actually just reserves the MSG_MAX in the
event, and even wastes space because there's dynamic meta data also saved
in the event to denote the offset and size of the dynamic array. It would
have been better to just use a static __array() field.
Instead, create __vstring() and __assign_vstr() that work like __string
and __assign_str() but instead of taking a destination string to copy,
take a format string and a va_list pointer and fill in the values.
To figure out the length to store the string. It may be slightly slower as
it needs to run the vsnprintf() twice, but it now saves space on the ring
buffer.
Link: https://lkml.kernel.org/r/20220705224749.053570613@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Kalle Valo <kvalo@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Arend van Spriel <aspriel@gmail.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Gregory Greenman <gregory.greenman@intel.com> Cc: Peter Chen <peter.chen@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mathias Nyman <mathias.nyman@intel.com> Cc: Chunfeng Yun <chunfeng.yun@mediatek.com> Cc: Bin Liu <b-liu@ti.com> Cc: Marek Lindner <mareklindner@neomailbox.ch> Cc: Simon Wunderlich <sw@simonwunderlich.de> Cc: Antonio Quartulli <a@unstable.cc> Cc: Sven Eckelmann <sven@narfation.org> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Jim Cromie <jim.cromie@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
neighbor: tracing: Have neigh_create event use __string()
The dev field of the neigh_create event uses __dynamic_array() with a
fixed size, which defeats the purpose of __dynamic_array(). Looking at the
logic, as it already uses __assign_str(), just use the same logic in
__string to create the size needed. It appears that because "dev" can be
NULL, it needs the check. But __string() can have the same checks as
__assign_str() so use them there too.
Link: https://lkml.kernel.org/r/20220705183741.35387e3f@rorschach.local.home Cc: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing/ipv4/ipv6: Use static array for name field in fib*_lookup_table event
The fib_lookup_table and fib6_lookup_table events declare name as a
dynamic_array, but also give it a fixed size, which defeats the purpose of
the dynamic array, especially since the dynamic array also includes meta
data in the event to specify its size.
Since the size of the name is at most 16 bytes (defined by IFNAMSIZ),
it is not worth spending the effort to determine the size of the string.
Just use a fixed size array and copy into it. This will save 4 bytes that
are used for the meta data that saves the size and position of a dynamic
array, and even slightly speed up the event processing.
Link: https://lkml.kernel.org/r/20220704091436.3705edbf@rorschach.local.home Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
tracing: devlink: Use static array for string in devlink_trap_report event
The trace event devlink_trap_report uses the __dynamic_array() macro to
determine the size of the input_dev_name field. This is because it needs
to test the dev field for NULL, and will use "NULL" if it is. But it also
has the size of the dynamic array as a fixed IFNAMSIZ bytes. This defeats
the purpose of the dynamic array, as this will reserve that amount of
bytes on the ring buffer, and to make matters worse, it will even save
that size in the event as the event expects it to be dynamic (for which it
is not).
Since IFNAMSIZ is just 16 bytes, just make it a static array and this will
remove the meta data from the event that records the size.
Link: https://lkml.kernel.org/r/20220712185820.002d9fb5@gandalf.local.home Cc: Leon Romanovsky <leon@kernel.org> Cc: Jiri Pirko <jiri@nvidia.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Zheng Yejian [Thu, 30 Jun 2022 01:31:52 +0000 (09:31 +0800)]
tracing/histograms: Simplify create_hist_fields()
When I look into implements of create_hist_fields(), I think there can be
following two simplifications:
1. If something wrong happened in parse_var_defs(), free_var_defs() would
have been called in it, so no need goto free again after calling it;
2. After calling create_key_fields(), regardless of the value of 'ret', it
then always runs into 'out: ', so the judge of 'ret' is redundant.
sunliming [Mon, 6 Jun 2022 07:56:59 +0000 (15:56 +0800)]
fprobe/samples: Make sample_probe static
This symbol is not used outside of fprobe_example.c, so marks it static.
Fixes the following warning:
sparse warnings: (new ones prefixed by >>)
>> samples/fprobe/fprobe_example.c:23:15: sparse: sparse: symbol 'sample_probe'
was not declared. Should it be static?
Link: https://lkml.kernel.org/r/20220606075659.674556-1-sunliming@kylinos.cn Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: sunliming <sunliming@kylinos.cn> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
ftrace: Be more specific about arch impact when function tracer is enabled
It was brought up that on ARMv7, that because the FUNCTION_TRACER does not
use nops to keep function tracing disabled because of the use of a link
register, it does have some performance impact.
The start of functions when -pg is used to compile the kernel is:
Which just puts the stack back to its normal location. But these two
instructions at the start of every function does incur some overhead.
Be more honest in the Kconfig FUNCTION_TRACER description and specify that
the overhead being in the noise was x86 specific, but other architectures
may vary.
If you drop into kdb and type "ftdump" you'll get a sleeping while
atomic warning from memory allocation in trace_find_next_entry().
This appears to have been caused by commit ff895103a84a ("tracing:
Save off entry when peeking at next entry"), which added the
allocation in that path. The problematic commit was already fixed by
commit 8e99cf91b99b ("tracing: Do not allocate buffer in
trace_find_next_entry() in atomic") but that fix missed the kdb case.
The fix here is easy: just move the assignment of the static buffer to
the place where it should have been to begin with:
trace_init_global_iter(). That function is called in two places, once
is right before the assignment of the static buffer added by the
previous fix and once is in kdb.
Note that it appears that there's a second static buffer that we need
to assign that was added in commit efbbdaa22bb7 ("tracing: Show real
address for trace event arguments"), so we'll move that too.
As commit 46bbe5c671e0 ("tracing: fix double free") said, the
"double free" problem reported by clang static analyzer is:
> In parse_var_defs() if there is a problem allocating
> var_defs.expr, the earlier var_defs.name is freed.
> This free is duplicated by free_var_defs() which frees
> the rest of the list.
However, if there is a problem allocating N-th var_defs.expr:
+ in parse_var_defs(), the freed 'earlier var_defs.name' is
actually the N-th var_defs.name;
+ then in free_var_defs(), the names from 0th to (N-1)-th are freed;
IF ALLOCATING PROBLEM HAPPENED HERE!!! -+
\
|
0th 1th (N-1)-th N-th V
+-------------+-------------+-----+-------------+-----------
var_defs: | name | expr | name | expr | ... | name | expr | name | ///
+-------------+-------------+-----+-------------+-----------
These two frees don't act on same name, so there was no "double free"
problem before. Conversely, after that commit, we get a "memory leak"
problem because the above "N-th var_defs.name" is not freed.
If enable CONFIG_DEBUG_KMEMLEAK and inject a fault at where the N-th
var_defs.expr allocated, then execute on shell like:
$ echo 'hist:key=call_site:val=$v1,$v2:v1=bytes_req,v2=bytes_alloc' > \
/sys/kernel/debug/tracing/events/kmem/kmalloc/trigger
lockref: remove unused 'lockref_get_or_lock()' function
Looking at the conditional lock acquire functions in the kernel due to
the new sparse support (see commit 4a557a5d1a61 "sparse: introduce
conditional lock acquire function attribute"), it became obvious that
the lockref code has a couple of them, but they don't match the usual
naming convention for the other ones, and their return value logic is
also reversed.
In the other very similar places, the naming pattern is '*_and_lock()'
(eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the
function returns true when the lock is taken.
The lockref code is superficially very similar to the refcount code,
only with the special "atomic wrt the embedded lock" semantics. But
instead of the '*_and_lock()' naming it uses '*_or_lock()'.
And instead of returning true in case it took the lock, it returns true
if it *didn't* take the lock.
Now, arguably the reflock code is quite logical: it really is a "either
decrement _or_ lock" kind of situation - and the return value is about
whether the operation succeeded without any special care needed.
So despite the similarities, the differences do make some sense, and
maybe it's not worth trying to unify the different conditional locking
primitives in this area.
But while looking at this all, it did become obvious that the
'lockref_get_or_lock()' function hasn't actually had any users for
almost a decade.
The only user it ever had was the shortlived 'd_rcu_to_refcount()'
function, and it got removed and replaced with 'lockref_get_not_dead()'
back in 2013 in commits 0d98439ea3c6 ("vfs: use lockred 'dead' flag to
mark unrecoverably dead dentries") and e5c832d55588 ("vfs: fix dentry
RCU to refcounting possibly sleeping dput()")
In fact, that single use was removed less than a week after the whole
function was introduced in commit b3abd80250c1 ("lockref: add
'lockref_get_or_lock() helper") so this function has been around for a
decade, but only had a user for six days.
Let's just put this mis-designed and unused function out of its misery.
We can think about the naming and semantic oddities of the remaining
'lockref_put_or_lock()' later, but at least that function has users.
And while the naming is different and the return value doesn't match,
that function matches the whole '{atomic,refcount}_dec_and_test()'
pattern much better (ie the magic happens when the count goes down to
zero, not when it is incremented from zero).
Linus Torvalds [Thu, 30 Jun 2022 16:34:10 +0000 (09:34 -0700)]
sparse: introduce conditional lock acquire function attribute
The kernel tends to try to avoid conditional locking semantics because
it makes it harder to think about and statically check locking rules,
but we do have a few fundamental locking primitives that take locks
conditionally - most obviously the 'trylock' functions.
That has always been a problem for 'sparse' checking for locking
imbalance, and we've had a special '__cond_lock()' macro that we've used
to let sparse know how the locking works:
so that you can then use this to tell sparse that (for example) the
spinlock trylock macro ends up acquiring the lock when it succeeds, but
not when it fails:
and then sparse can follow along the locking rules when you have code like
if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
.. sparse sees that the lock is held here..
spin_unlock(&dentry->d_lock);
and sparse ends up happy about the lock contexts.
However, this '__cond_lock()' use does result in very ugly header files,
and requires you to basically wrap the real function with that macro
that uses '__cond_lock'. Which has made PeterZ NAK things that try to
fix sparse warnings over the years [1].
To solve this, there is now a very experimental patch to sparse that
basically does the exact same thing as '__cond_lock()' did, but using a
function attribute instead. That seems to make PeterZ happy [2].
Note that this does not replace existing use of '__cond_lock()', but
only exposes the new proposed attribute and uses it for the previously
unannotated 'refcount_dec_and_lock()' family of functions.
For existing sparse installations, this will make no difference (a
negative output context was ignored), but if you have the experimental
sparse patch it will make sparse now understand code that uses those
functions, the same way '__cond_lock()' makes sparse understand the very
similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()'
annotations.
Note that in some cases this will silence existing context imbalance
warnings. But in other cases it may end up exposing new sparse warnings
for code that sparse just didn't see the locking for at all before.
This is a trial, in other words. I'd expect that if it ends up being
successful, and new sparse releases end up having this new attribute,
we'll migrate the old-style '__cond_lock()' users to use the new-style
'__cond_acquires' function attribute.
The actual experimental sparse patch was posted in [3].
Merge tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
"This fixes some stalling problems and corrects the last of the
problems (I hope) observed during testing of the new atomic xattr
update feature.
- Fix statfs blocking on background inode gc workers
- Fix some broken inode lock assertion code
- Fix xattr leaf buffer leaks when cancelling a deferred xattr update
operation
- Clean up xattr recovery to make it easier to understand.
- Fix xattr leaf block verifiers tripping over empty blocks.
- Fix a bug where an rt extent crossing EOF was treated as "posteof"
blocks and cleaned unnecessarily.
- Fix a UAF when log shutdown races with unmount"
* tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: prevent a UAF when log IO errors race with unmount
xfs: dont treat rt extents beyond EOF as eofblocks to be cleared
xfs: don't hold xattr leaf buffers across transaction rolls
xfs: empty xattr leaf header blocks are not corruption
xfs: clean up the end of xfs_attri_item_recover
xfs: always free xattri_leaf_bp when cancelling a deferred op
xfs: use invalidate_lock to check the state of mmap_lock
xfs: factor out the common lock flags assert
xfs: introduce xfs_inodegc_push()
xfs: bound maximum wait time for inodegc work
Merge tag 'for-5.19/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc architecture fixes from Helge Deller:
"Two important fixes for bugs in code which was added in 5.18:
- Fix userspace signal failures on 32-bit kernel due to a bug in vDSO
- Fix 32-bit load-word unalignment exception handler which returned
wrong values"
* tag 'for-5.19/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Fix vDSO signal breakage on 32-bit kernel
parisc/unaligned: Fix emulate_ldw() breakage
Addition of vDSO support for parisc in kernel v5.18 suddenly broke glibc
signal testcases on a 32-bit kernel.
The trampoline code (sigtramp.S) which is mapped into userspace includes
an offset to the context data on the stack, which is used by gdb and
glibc to get access to registers.
In a 32-bit kernel we used by mistake the offset into the compat context
(which is valid on a 64-bit kernel only) instead of the offset into the
"native" 32-bit context.
Reported-by: John David Anglin <dave.anglin@bell.net> Tested-by: John David Anglin <dave.anglin@bell.net> Fixes: df24e1783e6e ("parisc: Add vDSO support") CC: stable@vger.kernel.org # 5.18 Signed-off-by: Helge Deller <deller@gmx.de>
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- BPF program info linear (BPIL) data is accessed assuming 64-bit
alignment resulting in undefined behavior as the data is just byte
aligned. Fix it, Found using -fsanitize=undefined.
- Fix 'perf offcpu' build on old kernels wrt task_struct's
state/__state field.
- Fix perf_event_attr.sample_type setting on the 'offcpu-time' event
synthesized by the 'perf offcpu' tool.
- Don't bail out when synthesizing PERF_RECORD_ events for pre-existing
threads when one goes away while parsing its procfs entries.
- Don't sort the task scan result from /proc, its not needed and
introduces bugs when the main thread isn't the first one to be
processed.
- Fix uninitialized 'offset' variable on aarch64 in the unwind code.
- Sync KVM headers with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf synthetic-events: Ignore dead threads during event synthesis
perf synthetic-events: Don't sort the task scan result from /proc
perf unwind: Fix unitialized 'offset' variable on aarch64
tools headers UAPI: Sync linux/kvm.h with the kernel sources
perf bpf: 8 byte align bpil data
tools kvm headers arm64: Update KVM headers from the kernel sources
perf offcpu: Accept allowed sample types only
perf offcpu: Fix build failure on old kernels
Merge tag 'powerpc-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Fix BPF uapi confusion about the correct type of bpf_user_pt_regs_t.
- Fix virt_addr_valid() when memory is hotplugged above the boot-time
high_memory value.
- Fix a bug in 64-bit Book3E map_kernel_page() which would incorrectly
allocate a PMD page at PUD level.
- Fix a couple of minor issues found since we enabled KASAN for 64-bit
Book3S.
Thanks to Aneesh Kumar K.V, Cédric Le Goater, Christophe Leroy, Kefeng
Wang, Liam Howlett, Nathan Lynch, and Naveen N. Rao.
* tag 'powerpc-5.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/memhotplug: Add add_pages override for PPC
powerpc/bpf: Fix use of user_pt_regs in uapi
powerpc/prom_init: Fix kernel config grep
powerpc/book3e: Fix PUD allocation size in map_kernel_page()
powerpc/xive/spapr: correct bitmap allocation size
Namhyung Kim [Fri, 1 Jul 2022 20:54:58 +0000 (13:54 -0700)]
perf synthetic-events: Ignore dead threads during event synthesis
When it synthesize various task events, it scans the list of task
first and then accesses later. There's a window threads can die
between the two and proc entries may not be available.
Instead of bailing out, we can ignore that thread and move on.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220701205458.985106-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Fri, 1 Jul 2022 20:54:57 +0000 (13:54 -0700)]
perf synthetic-events: Don't sort the task scan result from /proc
It should not sort the result as procfs already returns a proper
ordering of tasks. Actually sorting the order caused problems that it
doesn't guararantee to process the main thread first.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220701205458.985106-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ivan Babrou [Fri, 1 Jul 2022 18:20:46 +0000 (11:20 -0700)]
perf unwind: Fix unitialized 'offset' variable on aarch64
Commit dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked
objects") uncovered the following issue on aarch64:
util/unwind-libunwind-local.c: In function 'find_proc_info':
util/unwind-libunwind-local.c:386:28: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
386 | if (ofs > 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:371:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
371 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
util/unwind-libunwind-local.c:363:20: error: 'offset' may be used uninitialized in this function [-Werror=maybe-uninitialized]
363 | if (ofs <= 0) {
| ^
util/unwind-libunwind-local.c:199:22: note: 'offset' was declared here
199 | u64 address, offset;
| ^~~~~~
In file included from util/libunwind/arm64.c:37:
Fixes: dc2cf4ca866f5715 ("perf unwind: Fix segbase for ld.lld linked objects") Signed-off-by: Ivan Babrou <ivan@cloudflare.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Fangrui Song <maskray@google.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: kernel-team@cloudflare.com Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220701182046.12589-1-ivan@cloudflare.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'libnvdimm-fixes-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Vishal Verma:
- Fix a bug in the libnvdimm 'BTT' (Block Translation Table) driver
where accounting for poison blocks to be cleared was off by one,
causing a failure to clear the the last badblock in an nvdimm region.
* tag 'libnvdimm-fixes-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Fix badblocks clear off-by-one error
Merge tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control fix from Rafael Wysocki:
"Add a new CPU ID to the list of supported processors in the
intel_tcc_cooling driver (Sumeet Pawnikar)"
* tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
Merge tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix some issues in cpufreq drivers and some issues in devfreq:
- Fix error code path issues related PROBE_DEFER handling in devfreq
(Christian Marangi)
- Revert an editing accident in SPDX-License line in the devfreq
passive governor (Lukas Bulwahn)
- Fix refcount leak in of_get_devfreq_events() in the exynos-ppmu
devfreq driver (Miaoqian Lin)
- Use HZ_PER_KHZ macro in the passive devfreq governor (Yicong Yang)
- Fix missing of_node_put for qoriq and pmac32 driver (Liang He)
- Fix issues around throttle interrupt for qcom driver (Stephen Boyd)
- Add MT8186 to cpufreq-dt-platdev blocklist (AngeloGioacchino Del
Regno)
- Make amd-pstate enable CPPC on resume from S3 (Jinzhou Su)"
* tag 'pm-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
cpufreq: Add MT8186 to cpufreq-dt-platdev blocklist
cpufreq: pmac32-cpufreq: Fix refcount leak bug
cpufreq: qcom-hw: Don't do lmh things without a throttle interrupt
drivers: cpufreq: Add missing of_node_put() in qoriq-cpufreq.c
cpufreq: amd-pstate: Add resume and suspend callbacks
Merge tag 'hwmon-for-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix error handling in ibmaem driver initialization
- Fix bad data reported by occ driver after setting power cap
- Fix typos in pmbus/ucd9200 driver comments
* tag 'hwmon-for-v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails
hwmon: (pmbus/ucd9200) fix typos in comments
hwmon: (occ) Prevent power cap command overwriting poll response
Yang Yingliang [Fri, 1 Jul 2022 07:41:53 +0000 (15:41 +0800)]
hwmon: (ibmaem) don't call platform_device_del() if platform_device_add() fails
If platform_device_add() fails, it no need to call platform_device_del(), split
platform_device_unregister() into platform_device_del/put(), so platform_device_put()
can be called separately.
Fixes: 8808a793f052 ("ibmaem: new driver for power/energy/temp meters in IBM System X hardware") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20220701074153.4021556-1-yangyingliang@huawei.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Merge tag 's390-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix purgatory build process so bin2c tool does not get built
unnecessarily and the Makefile is more consistent with other
architectures.
- Return earlier simple design of arch_get_random_seed_long|int() and
arch_get_random_long|int() callbacks as result of changes in generic
RNG code.
- Fix minor comment typos and spelling mistakes.
* tag 's390-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/qdio: Fix spelling mistake
s390/sclp: Fix typo in comments
s390/archrandom: simplify back to earlier design and initialize earlier
s390/purgatory: remove duplicated build rule of kexec-purgatory.o
s390/purgatory: hard-code obj-y in Makefile
s390: remove unneeded 'select BUILD_BIN2C'
Merge tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Allocate a fattr for _nfs4_discover_trunking()
- Fix module reference count leak in nfs4_run_state_manager()
* tag 'nfs-for-5.19-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFSv4: Add an fattr allocation to _nfs4_discover_trunking()
NFS: restore module put when manager exits.
Merge tag 'for-5.19/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Three fixes for invalid memory accesses discovered by using KASAN
while running the lvm2 testsuite's dm-raid tests. Includes changes to
MD's raid5.c given the dependency dm-raid has on the MD code"
* tag 'for-5.19/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm raid: fix KASAN warning in raid5_add_disks
dm raid: fix KASAN warning in raid5_remove_disk
dm raid: fix accesses beyond end of raid member array
Merge tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two minor tweaks:
- While we still can, adjust the send/recv based flags to be in
->ioprio rather than in ->addr2. This is consistent with eg accept,
and also doesn't waste a full 64-bit field for flags (Pavel)
- 5.18-stable fix for re-importing provided buffers. Not much real
world relevance here as it'll only impact non-pollable files gone
async, which is more of a practical test case rather than something
that is used in the wild (Dylan)"
* tag 'io_uring-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
io_uring: fix provided buffer import
io_uring: keep sendrecv flags in ioprio
Merge tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix for batch getting of tags in sbitmap (wuchi)
- NVMe pull request via Christoph:
- More quirks (Lamarque Vieira Souza, Pablo Greco)
- Fix a fabrics disconnect regression (Ruozhu Li)
- Fix a nvmet-tcp data_digest calculation regression (Sagi
Grimberg)
- Fix nvme-tcp send failure handling (Sagi Grimberg)
- Fix a regression with nvmet-loop and passthrough controllers
(Alan Adamson)
* tag 'block-5.19-2022-07-01' of git://git.kernel.dk/linux-block:
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
nvmet: add a clear_ids attribute for passthru targets
nvme: fix regression when disconnect a recovering ctrl
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
nvme-tcp: always fail a request when sending it failed
nvmet-tcp: fix regression in data_digest calculation
lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()
Will Deacon [Wed, 29 Jun 2022 09:53:49 +0000 (10:53 +0100)]
arm64: hugetlb: Restore TLB invalidation for BBM on contiguous ptes
Commit fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()")
removed TLB invalidation from get_clear_flush() [now get_clear_contig()]
on the basis that the core TLB invalidation code is aware of hugetlb
mappings backed by contiguous page-table entries and will cover the
correct virtual address range.
However, this change also resulted in the TLB invalidation being removed
from the "break" step in the break-before-make (BBM) sequence used
internally by huge_ptep_set_{access_flags,wrprotect}(), therefore
making the BBM sequence unsafe irrespective of later invalidation.
Although the architecture is desperately unclear about how exactly
contiguous ptes should be updated in a live page-table, restore TLB
invalidation to our BBM sequence under the assumption that BBM is the
right thing to be doing in the first place.
Fixes: fb396bb459c1 ("arm64/hugetlb: Drop TLB flush from get_clear_flush()") Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steve Capper <steve.capper@arm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Marc Zyngier <maz@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20220629095349.25748-1-will@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"Two small fixes
- Initialize a spinlock in the stm32 reset code
- Add dt bindings to the clk maintainer filepattern"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
MAINTAINERS: add include/dt-bindings/clock to COMMON CLK FRAMEWORK
clk: stm32: rcc_reset: Fix missing spin_lock_init()
This appears to be a race between the unmount process, which frees the
CIL and waits for in-flight iclog IO; and the iclog IO completion. When
generic/475 runs, it starts fsstress in the background, waits a few
seconds, and substitutes a dm-error device to simulate a disk falling
out of a machine. If the fsstress encounters EIO on a pure data write,
it will exit but the filesystem will still be online.
The next thing the test does is unmount the filesystem, which tries to
clean the log, free the CIL, and wait for iclog IO completion. If an
iclog was being written when the dm-error switch occurred, it can race
with log unmounting as follows:
Thread 1 Thread 2
xfs_log_unmount
xfs_log_clean
xfs_log_quiesce
xlog_ioend_work
<observe error>
xlog_force_shutdown
test_and_set_bit(XLOG_IOERROR)
xfs_log_force
<log is shut down, nop>
xfs_log_umount_write
<log is shut down, nop>
xlog_dealloc_log
xlog_cil_destroy
<wait for iclogs>
spin_lock(&log->l_cilp->xc_push_lock)
<KABOOM>
Therefore, free the CIL after waiting for the iclogs to complete. I
/think/ this race has existed for quite a few years now, though I don't
remember the ~2014 era logging code well enough to know if it was a real
threat then or if the actual race was exposed only more recently.
Fixes: ac983517ec59 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
Merge tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Bit quieter this week, the main thing is it pulls in the fixes for the
sysfb resource issue you were seeing. these had been queued for next
so should have had some decent testing.
Otherwise amdgpu, i915 and msm each have a few fixes, and vc4 has one.
fbdev:
- sysfb fixes/conflicting fb fixes
amdgpu:
- GPU recovery fix
- Fix integer type usage in fourcc header for AMD modifiers
- Fix setting cache_dirty for dma-buf objects on discrete
msm:
- Fix to increment vsync_cnt before calling drm_crtc_handle_vblank so
that userspace sees the value *after* it is incremented if waiting
for vblank events
- Fix to reset drm_dev to NULL in dp_display_unbind to avoid a crash
in probe/bind error paths
- Fix to resolve the smatch error of de-referencing before NULL check
in dpu_encoder_phys_wb.c
- Fix error return to userspace if fence-id allocation fails in
submit ioctl
vc4:
- NULL ptr dereference fix"
* tag 'drm-fixes-2022-07-01' of git://anongit.freedesktop.org/drm/drm:
Revert "drm/amdgpu/display: set vblank_disable_immediate for DC"
drm/amdgpu: To flush tlb for MMHUB of RAVEN series
drm/fourcc: fix integer type usage in uapi header
drm/amdgpu: fix adev variable used in amdgpu_device_gpu_recover()
fbdev: Disable sysfb device registration when removing conflicting FBs
firmware: sysfb: Add sysfb_disable() helper function
firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer
drm/msm/gem: Fix error return on fence id alloc fail
drm/i915: tweak the ordering in cpu_write_needs_clflush
drm/i915/dgfx: Disable d3cold at gfx root port
drm/i915/gem: add missing else
drm/vc4: perfmon: Fix variable dereferenced before check
drm/msm/dpu: Fix variable dereferenced before check
drm/msm/dp: reset drm_dev to NULL at dp_display_unbind()
drm/msm/dpu: Increment vsync_cnt before waking up userspace
Amir Goldstein [Thu, 30 Jun 2022 19:58:49 +0000 (22:58 +0300)]
vfs: fix copy_file_range() regression in cross-fs copies
A regression has been reported by Nicolas Boichat, found while using the
copy_file_range syscall to copy a tracefs file.
Before commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices") the kernel would return -EXDEV to userspace when trying to
copy a file across different filesystems. After this commit, the
syscall doesn't fail anymore and instead returns zero (zero bytes
copied), as this file's content is generated on-the-fly and thus reports
a size of zero.
Another regression has been reported by He Zhe - the assertion of
WARN_ON_ONCE(ret == -EOPNOTSUPP) can be triggered from userspace when
copying from a sysfs file whose read operation may return -EOPNOTSUPP.
Since we do not have test coverage for copy_file_range() between any two
types of filesystems, the best way to avoid these sort of issues in the
future is for the kernel to be more picky about filesystems that are
allowed to do copy_file_range().
This patch restores some cross-filesystem copy restrictions that existed
prior to commit 5dae222a5ff0 ("vfs: allow copy_file_range to copy across
devices"), namely, cross-sb copy is not allowed for filesystems that do
not implement ->copy_file_range().
Filesystems that do implement ->copy_file_range() have full control of
the result - if this method returns an error, the error is returned to
the user. Before this change this was only true for fs that did not
implement the ->remap_file_range() operation (i.e. nfsv3).
Filesystems that do not implement ->copy_file_range() still fall-back to
the generic_copy_file_range() implementation when the copy is within the
same sb. This helps the kernel can maintain a more consistent story
about which filesystems support copy_file_range().
nfsd and ksmbd servers are modified to fall-back to the
generic_copy_file_range() implementation in case vfs_copy_file_range()
fails with -EOPNOTSUPP or -EXDEV, which preserves behavior of
server-side-copy.
fall-back to generic_copy_file_range() is not implemented for the smb
operation FSCTL_DUPLICATE_EXTENTS_TO_FILE, which is arguably a correct
change of behavior.
Chuck Lever [Thu, 30 Jun 2022 20:48:18 +0000 (16:48 -0400)]
SUNRPC: Fix READ_PLUS crasher
Looks like there are still cases when "space_left - frag1bytes" can
legitimately exceed PAGE_SIZE. Ensure that xdr->end always remains
within the current encode buffer.
Reported-by: Bruce Fields <bfields@fieldses.org> Reported-by: Zorro Lang <zlang@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216151 Fixes: 6c254bf3b637 ("SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Scott Mayhew [Mon, 27 Jun 2022 21:31:29 +0000 (17:31 -0400)]
NFSv4: Add an fattr allocation to _nfs4_discover_trunking()
This was missed in c3ed222745d9 ("NFSv4: Fix free of uninitialized
nfs4_label on referral lookup.") and causes a panic when mounting
with '-o trunkdiscovery':
Fixes: c3ed222745d9 ("NFSv4: Fix free of uninitialized nfs4_label on referral lookup.") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NeilBrown [Thu, 23 Jun 2022 04:47:34 +0000 (14:47 +1000)]
NFS: restore module put when manager exits.
Commit f49169c97fce ("NFSD: Remove svc_serv_ops::svo_module") removed
calls to module_put_and_kthread_exit() from threads that acted as SUNRPC
servers and had a related svc_serv_ops structure. This was correct.
It ALSO removed the module_put_and_kthread_exit() call from
nfs4_run_state_manager() which is NOT a SUNRPC service.
Consequently every time the NFSv4 state manager runs the module count
increments and won't be decremented. So the nfsv4 module cannot be
unloaded.
So restore the module_put_and_kthread_exit() call.
Jens Axboe [Thu, 30 Jun 2022 20:00:11 +0000 (14:00 -0600)]
Merge tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme into block-5.19
Pull NVMe fixes from Christoph:
"nvme fixes for Linux 5.19
- more quirks (Lamarque Vieira Souza, Pablo Greco)
- fix a fabrics disconnect regression (Ruozhu Li)
- fix a nvmet-tcp data_digest calculation regression (Sagi Grimberg)
- fix nvme-tcp send failure handling (Sagi Grimberg)
- fix a regression with nvmet-loop and passthrough controllers
(Alan Adamson)"
* tag 'nvme-5.19-2022-06-30' of git://git.infradead.org/nvme:
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
nvmet: add a clear_ids attribute for passthru targets
nvme: fix regression when disconnect a recovering ctrl
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA XPG SX6000LNP (AKA SPECTRIX S40G)
nvme-tcp: always fail a request when sending it failed
nvmet-tcp: fix regression in data_digest calculation
Vladimir Oltean [Wed, 29 Jun 2022 18:30:07 +0000 (21:30 +0300)]
net: dsa: felix: fix race between reading PSFP stats and port stats
Both PSFP stats and the port stats read by ocelot_check_stats_work() are
indirectly read through the same mechanism - write to STAT_CFG:STAT_VIEW,
read from SYS:STAT:CNT[n].
It's just that for port stats, we write STAT_VIEW with the index of the
port, and for PSFP stats, we write STAT_VIEW with the filter index.
So if we allow them to run concurrently, ocelot_check_stats_work() may
change the view from vsc9959_psfp_counters_get(), and vice versa.
Jakub Kicinski [Wed, 29 Jun 2022 18:19:10 +0000 (11:19 -0700)]
net: tun: avoid disabling NAPI twice
Eric reports that syzbot made short work out of my speculative
fix. Indeed when queue gets detached its tfile->tun remains,
so we would try to stop NAPI twice with a detach(), close()
sequence.
Alternative fix would be to move tun_napi_disable() to
tun_detach_all() and let the NAPI run after the queue
has been detached.
Fixes: a8fc8cb5692a ("net: tun: stop NAPI when detaching queues") Reported-by: syzbot <syzkaller@googlegroups.com> Reported-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220629181911.372047-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
s390/archrandom: simplify back to earlier design and initialize earlier
s390x appears to present two RNG interfaces:
- a "TRNG" that gathers entropy using some hardware function; and
- a "DRBG" that takes in a seed and expands it.
Previously, the TRNG was wired up to arch_get_random_{long,int}(), but
it was observed that this was being called really frequently, resulting
in high overhead. So it was changed to be wired up to arch_get_random_
seed_{long,int}(), which was a reasonable decision. Later on, the DRBG
was then wired up to arch_get_random_{long,int}(), with a complicated
buffer filling thread, to control overhead and rate.
Fortunately, none of the performance issues matter much now. The RNG
always attempts to use arch_get_random_seed_{long,int}() first, which
means a complicated implementation of arch_get_random_{long,int}() isn't
really valuable or useful to have around. And it's only used when
reseeding, which means it won't hit the high throughput complications
that were faced before.
So this commit returns to an earlier design of just calling the TRNG in
arch_get_random_seed_{long,int}(), and returning false in arch_get_
random_{long,int}().
Part of what makes the simplification possible is that the RNG now seeds
itself using the TRNG at bootup. But this only works if the TRNG is
detected early in boot, before random_init() is called. So this commit
also causes that check to happen in setup_arch().
Cc: stable@vger.kernel.org Cc: Harald Freudenberger <freude@linux.ibm.com> Cc: Ingo Franzki <ifranzki@linux.ibm.com> Cc: Juergen Christ <jchrist@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20220610222023.378448-1-Jason@zx2c4.com Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Linus Torvalds [Thu, 30 Jun 2022 17:03:22 +0000 (10:03 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Three minor bug fixes:
- qedr not setting the QP timeout properly toward userspace
- Memory leak on error path in ib_cm
- Divide by 0 in RDMA interrupt moderation"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
linux/dim: Fix divide by 0 in RDMA DIM
RDMA/cm: Fix memory leak in ib_cm_insert_listen
RDMA/qedr: Fix reporting QP timeout attribute
Linus Torvalds [Thu, 30 Jun 2022 16:57:18 +0000 (09:57 -0700)]
Merge tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fanotify fix from Jan Kara:
"A fix for recently added fanotify API to have stricter checks and
refuse some invalid flag combinations to make our life easier in the
future"
* tag 'fsnotify_for_v5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fanotify: refine the validation checks on non-dir inode mask
Linus Torvalds [Thu, 30 Jun 2022 16:45:42 +0000 (09:45 -0700)]
Merge tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"Fix a regression that breaks the ccp driver"
* tag 'v5.19-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ccp - Fix device IRQ counting by using platform_irq_count()
Merge tag 'devfreq-fixes-for-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux
Pull devfreq fixes for 5.19-rc5 from Chanwoo Choi:
"1. Fix devfreq passive governor issue when cpufreq policies are not
ready during kernel boot because some CPUs turn on after kernel
booting or others.
- Re-initialize the vairables of struct devfreq_passive_data when
PROBE_DEFER happens when cpufreq_get() returns NULL.
- Use dev_err_probe to mute warning when PROBE_DEFER.
- Fix cpufreq passive unregister erroring on PROBE_DEFER
by using the allocated parent_cpu_data list to free resouce
instead of for_each_possible_cpu().
- Remove duplicate cpufreq passive unregister and warning when
PROBE_DEFER.
- Use HZ_PER_KZH macro in units.h.
- Fix wrong indentation in SPDX-License line.
2. Fix reference count leak in exynos-ppmu.c by using of_node_put().
3. Rework freq_table to be local to devfreq struct
- struct devfreq_dev_profile includes freq_table array to store
the supported frequencies. If devfreq driver doesn't initialize
the freq_table, devfreq core allocates the memory and initializes
the freq_table.
On a devfreq PROBE_DEFER, the freq_table in the driver profile
struct is never reset and may be left in an undefined state. To fix
this and correctly handle PROBE_DEFER, use a local freq_table and
max_state in the devfreq struct."
* tag 'devfreq-fixes-for-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/linux:
PM / devfreq: passive: revert an editing accident in SPDX-License line
PM / devfreq: Fix kernel warning with cpufreq passive register fail
PM / devfreq: Rework freq_table to be local to devfreq struct
PM / devfreq: exynos-ppmu: Fix refcount leak in of_get_devfreq_events
PM / devfreq: passive: Use HZ_PER_KHZ macro in units.h
PM / devfreq: Fix cpufreq passive unregister erroring on PROBE_DEFER
PM / devfreq: Mute warning on governor PROBE_DEFER
PM / devfreq: Fix kernel panic with cpu based scaling to passive gov
Pavel Begunkov [Thu, 30 Jun 2022 12:25:57 +0000 (13:25 +0100)]
io_uring: keep sendrecv flags in ioprio
We waste a u64 SQE field for flags even though we don't need as many
bits and it can be used for something more useful later. Store io_uring
specific send/recv flags in sqe->ioprio instead of ->addr2.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Fixes: 0455d4ccec54 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
[axboe: change comment in io_uring.h as well] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Petr Machata [Wed, 29 Jun 2022 07:02:05 +0000 (10:02 +0300)]
mlxsw: spectrum_router: Fix rollback in tunnel next hop init
In mlxsw_sp_nexthop6_init(), a next hop is always added to the router
linked list, and mlxsw_sp_nexthop_type_init() is invoked afterwards. When
that function results in an error, the next hop will not have been removed
from the linked list. As the error is propagated upwards and the caller
frees the next hop object, the linked list ends up holding an invalid
object.
A similar issue comes up with mlxsw_sp_nexthop4_init(), where rollback
block does exist, however does not include the linked list removal.
Both IPv6 and IPv4 next hops have a similar issue with next-hop counter
rollbacks. As these were introduced in the same patchset as the next hop
linked list, include the cleanup in this patch.
Fixes: dbe4598c1e92 ("mlxsw: spectrum_router: Keep nexthops in a linked list") Fixes: a5390278a5eb ("mlxsw: spectrum: Add support for setting counters on nexthops") Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220629070205.803952-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Duoming Zhou [Wed, 29 Jun 2022 00:26:40 +0000 (08:26 +0800)]
net: rose: fix UAF bugs caused by timer handler
There are UAF bugs in rose_heartbeat_expiry(), rose_timer_expiry()
and rose_idletimer_expiry(). The root cause is that del_timer()
could not stop the timer handler that is running and the refcount
of sock is not managed properly.
This patch adds refcount of sock when we use functions
such as rose_start_heartbeat() and so on to start timer,
and decreases the refcount of sock when timer is finished
or deleted by functions such as rose_stop_heartbeat()
and so on. As a result, the UAF bugs could be mitigated.
Jose Alonso [Tue, 28 Jun 2022 15:13:02 +0000 (12:13 -0300)]
net: usb: ax88179_178a: Fix packet receiving
This patch corrects packet receiving in ax88179_rx_fixup.
- problem observed:
ifconfig shows allways a lot of 'RX Errors' while packets
are received normally.
This occurs because ax88179_rx_fixup does not recognise properly
the usb urb received.
The packets are normally processed and at the end, the code exits
with 'return 0', generating RX Errors.
(pkt_cnt==-2 and ptk_hdr over field rx_hdr trying to identify
another packet there)
The dump shows that pkt_cnt is the number of entrys in the
per-packet metadata. It is "2 * packet count".
Each packet have two entrys. The first have a valid
value (pkt_len and AX_RXHDR_*) and the second have a
dummy-header 0x80000000 (pkt_len=0 with AX_RXHDR_DROP_ERR).
Why exists dummy-header for each packet?!?
My guess is that this was done probably to align the
entry for each packet to 64-bits and maintain compatibility
with old firmware.
There is also a padding (0x00000000) before the rx_hdr to
align the end of rx_hdr to 64-bit.
Note that packets have a alignment of 64-bits (8-bytes).
This patch assumes that the dummy-header and the last
padding are optional. So it preserves semantics and
recognises the same valid packets as the current code.
This patch was made using only the dumpfile information and
tested with only one device:
0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
Fixes: 57bc3d3ae8c1 ("net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup") Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Signed-off-by: Jose Alonso <joalonsof@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/d6970bb04bf67598af4d316eaeb1792040b18cfd.camel@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
nvme-pci: add NVME_QUIRK_BOGUS_NID for ADATA IM2P33F8ABR1
ADATA IM2P33F8ABR1 reports bogus eui64 values that appear to be the same
across all drives. Quirk them out so they are not marked as "non globally
unique" duplicates.
Co-developed-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Felipe de Jesus Araujo da Conceição <felipe.conceicao@petrosoftdesign.com> Signed-off-by: Lamarque V. Souza <lamarque.souza@petrosoftdesign.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
Alan Adamson [Mon, 27 Jun 2022 23:25:43 +0000 (16:25 -0700)]
nvmet: add a clear_ids attribute for passthru targets
If the clear_ids attribute is set to true, the EUI/GUID/UUID is cleared
for the passthru target. By default, loop targets will set clear_ids to
true.
This resolves an issue where a connect to a passthru target fails when
using a trtype of 'loop' because EUI/GUID/UUID is not unique.
Fixes: 2079f41ec6ff ("nvme: check that EUI/GUID/UUID are globally unique") Signed-off-by: Alan Adamson <alan.adamson@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Yevhen Orlov [Wed, 29 Jun 2022 01:29:14 +0000 (04:29 +0300)]
net: bonding: fix use-after-free after 802.3ad slave unbind
commit 0622cab0341c ("bonding: fix 802.3ad aggregator reselection"),
resolve case, when there is several aggregation groups in the same bond.
bond_3ad_unbind_slave will invalidate (clear) aggregator when
__agg_active_ports return zero. So, ad_clear_agg can be executed even, when
num_of_ports!=0. Than bond_3ad_unbind_slave can be executed again for,
previously cleared aggregator. NOTE: at this time bond_3ad_unbind_slave
will not update slave ports list, because lag_ports==NULL. So, here we
got slave ports, pointing to freed aggregator memory.
Fix with checking actual number of ports in group (as was before
commit 0622cab0341c ("bonding: fix 802.3ad aggregator reselection") ),
before ad_clear_agg().
Oleksij Rempel [Tue, 28 Jun 2022 11:43:49 +0000 (13:43 +0200)]
net: phy: ax88772a: fix lost pause advertisement configuration
In case of asix_ax88772a_link_change_notify() workaround, we run soft
reset which will automatically clear MII_ADVERTISE configuration. The
PHYlib framework do not know about changed configuration state of the
PHY, so we need use phy_init_hw() to reinit PHY configuration.
Fixes: dde258469257 ("net: usb/phy: asix: add support for ax88772A/C PHYs") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220628114349.3929928-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Lukas Wunner [Tue, 28 Jun 2022 10:15:08 +0000 (12:15 +0200)]
net: phy: Don't trigger state machine while in suspend
Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(),
but subsequent interrupts may retrigger it:
They may have been left enabled to facilitate wakeup and are not
quiesced until the ->suspend_noirq() phase. Unwanted interrupts may
hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(),
as well as between dpm_resume_noirq() and mdio_bus_phy_resume().
Retriggering the phy_state_machine() through an interrupt is not only
undesirable for the reason given in mdio_bus_phy_suspend() (freezing it
midway with phydev->lock held), but also because the PHY may be
inaccessible after it's suspended: Accesses to USB-attached PHYs are
blocked once usb_suspend_both() clears the can_submit flag and PHYs on
PCI network cards may become inaccessible upon suspend as well.
Amend phy_interrupt() to avoid triggering the state machine if the PHY
is suspended. Signal wakeup instead if the attached net_device or its
parent has been configured as a wakeup source. (Those conditions are
identical to mdio_bus_phy_may_suspend().) Postpone handling of the
interrupt until the PHY has resumed.
Before stopping the phy_state_machine() in mdio_bus_phy_suspend(),
wait for a concurrent phy_interrupt() to run to completion. That is
necessary because phy_interrupt() may have checked the PHY's suspend
status before the system sleep transition commenced and it may thus
retrigger the state machine after it was stopped.
Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(),
wait for a concurrent phy_interrupt() to complete to ensure that
interrupts which it postponed are properly rerun.
The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward
PHY interrupts to PHY driver to avoid polling"), but has existed since
forever.