Jakub Kicinski [Wed, 4 Sep 2024 23:57:13 +0000 (16:57 -0700)]
Merge branch 'unmask-upper-dscp-bits-part-3'
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 3
tl;dr - This patchset continues to unmask the upper DSCP bits in the
IPv4 flow key in preparation for allowing IPv4 FIB rules to match on
DSCP. No functional changes are expected.
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset continues to unmask the upper DSCP bits, but this time in
the output route path, specifically in the callers of
ip_route_output_ports().
The next patchset (last) will handle the callers of
ip_route_output_key(). Split from this patchset to avoid going over the
15 patches limit.
No functional changes are expected as commit 1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
====================
ipv6: sit: Unmask upper DSCP bits in ipip6_tunnel_bind_dev()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
ip6_tunnel: Unmask upper DSCP bits in ip4ip6_err()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
ipv4: ipmr: Unmask upper DSCP bits in ipmr_queue_xmit()
Unmask the upper DSCP bits when calling ip_route_output_ports() so that
in the future it could perform the FIB lookup according to the full DSCP
value.
The function is passed the full DS field in its 'tos' argument by its
two callers. It then masks the upper DSCP bits using RT_TOS() when
passing it to ip_route_output_ports().
Unmask the upper DSCP bits when passing 'tos' to ip_route_output_ports()
so that in the future it could perform the FIB lookup according to the
full DSCP value.
Chen Ni [Wed, 4 Sep 2024 01:44:41 +0000 (09:44 +0800)]
selftests: net: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
ipv4: Fix user space build failure due to header change
RT_TOS() from include/uapi/linux/in_route.h is defined using
IPTOS_TOS_MASK from include/uapi/linux/ip.h. This is problematic for
files such as include/net/ip_fib.h that want to use RT_TOS() as without
including both header files kernel compilation fails:
In file included from ./include/net/ip_fib.h:25,
from ./include/net/route.h:27,
from ./include/net/lwtunnel.h:9,
from net/core/dst.c:24:
./include/net/ip_fib.h: In function ‘fib_dscp_masked_match’:
./include/uapi/linux/in_route.h:31:32: error: ‘IPTOS_TOS_MASK’ undeclared (first use in this function)
31 | #define RT_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~~~~~~
./include/net/ip_fib.h:440:45: note: in expansion of macro ‘RT_TOS’
440 | return dscp == inet_dsfield_to_dscp(RT_TOS(fl4->flowi4_tos));
Therefore, cited commit changed linux/in_route.h to include linux/ip.h.
However, as reported by David, this breaks iproute2 compilation due
overlapping definitions between linux/ip.h and
/usr/include/netinet/ip.h:
In file included from ../include/uapi/linux/in_route.h:5,
from iproute.c:19:
../include/uapi/linux/ip.h:25:9: warning: "IPTOS_TOS" redefined
25 | #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
| ^~~~~~~~~
In file included from iproute.c:17:
/usr/include/netinet/ip.h:222:9: note: this is the location of the previous definition
222 | #define IPTOS_TOS(tos) ((tos) & IPTOS_TOS_MASK)
Fix by changing include/net/ip_fib.h to include linux/ip.h. Note that
usage of RT_TOS() should not spread further in the kernel due to recent
work in this area.
Fixes: 1fa3314c14c6 ("ipv4: Centralize TOS matching") Reported-by: David Ahern <dsahern@kernel.org> Closes: https://lore.kernel.org/netdev/2f5146ff-507d-4cab-a195-b28c0c9e654e@kernel.org/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20240903133554.2807343-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
James Chapman [Tue, 3 Sep 2024 11:35:47 +0000 (12:35 +0100)]
l2tp: remove unneeded null check in l2tp_v2_session_get_next
Commit aa92c1cec92b ("l2tp: add tunnel/session get_next helpers") uses
idr_get_next APIs to iterate over l2tp session IDR lists. Sessions in
l2tp_v2_session_idr always have a non-null session->tunnel pointer
since l2tp_session_register sets it before inserting the session into
the IDR. Therefore the null check on session->tunnel in
l2tp_v2_session_get_next is redundant and can be removed. Removing the
check avoids a warning from lkp.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202408111407.HtON8jqa-lkp@intel.com/ CC: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: James Chapman <jchapman@katalix.com> Acked-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240903113547.1261048-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: mana: Improve mana_set_channels() in low mem conditions
The mana_set_channels() function requires detaching the mana
driver and reattaching it with changed channel values.
During this operation if the system is low on memory, the reattach
might fail, causing the network device being down.
To avoid this we pre-allocate buffers at the beginning of set operation,
to prevent complete network loss
Add support for group stats for mac. The fbnic_set_counter help preserve
the default values for counters which are not touched by the driver.
The 'reset' flag in 'get_eth_mac_stats' allows to choose between
resetting the counter to recent most value or fetching the aggregated
values of the counter.
The 'fbnic_stat_rd64' read 64b stats counters in an atomic fashion using
read-read-read approach. This allows to isolate cases where counter is
moving too fast making accuracy of the counter questionable.
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool ops support and enable 'get_drvinfo' for fbnic. The driver
provides firmware version information while the driver name and bus
information is provided by ethtool_get_drvinfo().
Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Xing [Mon, 2 Sep 2024 16:06:10 +0000 (00:06 +0800)]
selftests: add selftest for UDP SO_PEEK_OFF support
Add the SO_PEEK_OFF selftest for UDP. In this patch, I mainly do
three things:
1. rename tcp_so_peek_off.c
2. adjust for UDP protocol
3. add selftests into it
Suggested-by: Jon Maloy <jmaloy@redhat.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2024 10:53:50 +0000 (11:53 +0100)]
Merge branch 'sparx5-fdma-part-one'
Daniel Machon says:
====================
net: microchip: add FDMA library and use it for Sparx5
This patch series is the first of a 2-part series, that adds a new
common FDMA library for Microchip switch chips Sparx5 and lan966x. These
chips share the same FDMA engine, and as such will benefit from a
common library with a common implementation. This also has the benefit
of removing a lot open-coded bookkeeping and duplicate code for the two
drivers.
Additionally, upstreaming efforts for a third chip, lan969x, will begin
in the near future. This chip will use the new library too.
In this first series, the FDMA library is introduced and used by the
Sparx5 switch driver.
###################
# Example of use: #
###################
- Initialize the rx and tx fdma structs with values for: number of
DCB's, number of DB's, channel ID, DB size (data buffer size), and
total size of the requested memory. Also provide two callbacks:
nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr.
- Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent().
- Initialize the DCB's with fdma_dcb_init().
- Add new DCB's with fdma_dcb_add().
- Free memory with fdma_free_phys() or fdma_free_coherent().
Patch #1: introduces library and selects it for Sparx5.
Patch #2: includes the fdma_api.h header and removes old symbols.
Patch #3: replaces old rx and tx variables with equivalent ones from the
fdma struct. Only the variables that can be changed without
breaking traffic is changed in this patch.
Patch #4: uses the library for allocation of rx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #5: uses the library for adding DCB's in the rx path.
Patch #6: uses the library for freeing rx buffers.
Patch #7: uses the library helpers in the rx path.
Patch #8: uses the library for allocation of tx buffers. This requires
quite a bit of refactoring in this single patch.
Patch #9: uses the library for adding DCB's in the tx path.
Patch #10: uses the library helpers in the tx path.
Patch #11: ditches the existing linked list for storing buffer addresses,
and instead uses offsets into contiguous memory.
Patch #12: modifies existing rx and tx functions to be direction
independent.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These direction specific functions can be ditched in favor of a single
function: sparx5_fdma_reload(), which retrieves the channel id from the
fdma struct instead.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:16 +0000 (16:54 +0200)]
net: sparx5: use contiguous memory for tx buffers
Currently, the driver uses a linked list for storing the tx buffer
addresses. This requires a good amount of extra bookkeeping code. Ditch
the linked list in favor of tx buffers being in the same contiguous
memory space as the DCB's and the DB's. The FDMA library has a helper
for this - so use that.
The tx buffer addresses are now retrieved as an offset into the FDMA
memory space.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:15 +0000 (16:54 +0200)]
net: sparx5: use library helper for freeing tx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:14 +0000 (16:54 +0200)]
net: sparx5: use FDMA library for adding DCB's in the tx path
Use the fdma_dcb_add() function to add DCB's in the tx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Also, make sure the fdma indexes are advanced using: fdma_dcb_advance(),
so that the correct nextptr and dataptr offsets are retrieved.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:13 +0000 (16:54 +0200)]
net: sparx5: use the FDMA library for allocation of tx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for tx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: tx->dma, tx->first_entry and tx->curr_entry
with the equivalents from the FDMA struct.
- replace uses of sparx5_db_hw and sparx5_tx_dcb_hw with fdma_db and
fdma_dcb.
- add sparx5_fdma_tx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:12 +0000 (16:54 +0200)]
net: sparx5: use a few FDMA helpers in the rx path
The library provides helpers for a number of DCB and DB operations. Use
these in the rx path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:11 +0000 (16:54 +0200)]
net: sparx5: use library helper for freeing rx buffers
The library has the helper fdma_free_phys() for freeing physical FDMA
memory. Use it in the exit path.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:10 +0000 (16:54 +0200)]
net: sparx5: use FDMA library for adding DCB's in the rx path
Use the fdma_dcb_add() function to add DCB's in the rx path. This gets
rid of the open-coding of nextptr and dataptr handling and leaves it to
the library.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:09 +0000 (16:54 +0200)]
net: sparx5: use the FDMA library for allocation of rx buffers
Use the two functions: fdma_alloc_phys() and fdma_dcb_init() for rx
buffer allocation and use the new buffers throughout.
In order to replace the old buffers with the new ones, we have to do the
following refactoring:
- use fdma_alloc_phys() and fdma_dcb_init()
- replace the variables: rx->dma, rx->dcb_entries and rx->last_entry
with the equivalents from the FDMA struct.
- replace uses of sparx5_db_hw and sparx5_rx_dcb_hw with fdma_db and
fdma_dcb.
- add sparx5_fdma_rx_dataptr_cb callback for obtaining the dataptr.
- Initialize FDMA struct values.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:08 +0000 (16:54 +0200)]
net: sparx5: replace a few variables with new equivalent ones
Replace the old rx and tx variables: channel_id, FDMA_DCB_MAX,
FDMA_RX_DCB_MAX_DBS, FDMA_TX_DCB_MAX_DBS, dcb_index and db_index with
the equivalents from the FDMA rx and tx structs. These variables are not
entangled in any buffer allocation and can therefore be replaced in
advance.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:07 +0000 (16:54 +0200)]
net: sparx5: use FDMA library symbols
Include and use the new FDMA header, which now provides the required
masks and bit offsets for operating on the DCB's and DB's.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Machon [Mon, 2 Sep 2024 14:54:06 +0000 (16:54 +0200)]
net: microchip: add FDMA library
Add new FDMA library for interacting with the FDMA engine on Microchip
Sparx5 and lan966x switch chips, in an effort to reduce duplicate code
and provide a common set of symbols and functions.
Signed-off-by: Daniel Machon <daniel.machon@microchip.com> Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com> Reviewed-by: Jens Emil Schulz Østergaard <jensemil.schulzostergaard@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 6b7c5b947c67 ("net: Add be2net driver.") declared be_pci_fnum_get()
and be_cmd_reset() but never implemented. And commit 9fa465c0ce0d ("be2net:
remove code duplication relating to Lancer reset sequence") removed
lancer_test_and_set_rdy_state() but leave declaration.
Commit 76a9e08e33ce ("be2net: cleanup wake-on-lan code") left behind
be_is_wol_supported() declaration.
Commit baaa08d148ac ("be2net: do not call be_set/get_fw_log_level() on
Skyhawk-R") removed be_get_fw_log_level() but leave declaration.
In general, calling dev_err_probe() after successful probe in case of
handling -EPROBE_DEFER error, will set deferred status on the device
already probed. This is however not a problem here now, because
dev_err_probe() in affected places is used for handling errors from
request_firmware(), which does not return -EPROBE_DEFER. Still usage of
dev_err_probe() in such case is not correct, because request_firmware()
could once return -EPROBE_DEFER.
Fixes: bf4d87f884fe ("net: alacritech: Switch to use dev_err_probe()") Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240902163610.17028-1-krzysztof.kozlowski@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
netlink: specs: nftables: allow decode of default firewalld ruleset
This update allows listing default firewalld ruleset on Fedora 40 via
tools/net/ynl/cli.py --spec \
Documentation/netlink/specs/nftables.yaml --dump getrule
Default ruleset uses fib, reject and objref expressions which were
missing.
Other missing expressions can be added later.
Improve decoding while at it:
- add bitwise, ct and lookup attributes
- wire up the quota expression
- translate raw verdict codes to a human reable name, e.g.
'code': 4294967293 becomes 'code': 'jump'.
v2: forgot fib addrtype in enum list (Donald Hunter)
====================
mptcp: MIB counters for MPJ TX + misc improvements
Recently, a few issues have been discovered around the creation of
additional subflows. Without these counters, it was difficult to point
out the reason why some subflows were not created as expected.
In patch 3, all error paths from __mptcp_subflow_connect() are covered,
except the one related to the 'fully established mode', because it can
only happen with the userspace PM, which will propagate the error to the
userspace in this case (ENOTCONN).
These new counters are also verified in the MPTCP Join selftest in patch
6.
While at it, a few other patches are improving the MPTCP path-manager
code ...
- Patch 1: 'flush' related helpers are renamed to avoid confusions
- Patch 2: directly pass known ID and flags to create a new subflow,
i/o getting them later by iterating over all endpoints again
... and the MPJoin selftests:
- Patch 4: reduce the number of positional parameters
- Patch 5: only one line for the 'join' checks, instead of 3
- Patch 7: more explicit check names, instead of sometimes too cryptic
ones: rtx, ptx, ftx, ctx, fclzrx, sum
- Patch 8: specify client/server instead of 'invert' for some checks
not suggesting one specific direction
- Patch 9: mute errors of mptcp_connect when ran in the background
- Patch 10: simplify checksum_tests by using a for-loop
- Patch 11: remove 'define' re-definitions
====================
'MPTCP_PM_NAME' is defined in 'linux/mptcp_pm.h', included in
'linux/mptcp.h', no need to re-define it.
'MPTCP_PM_EVENTS' is not defined in 'linux/mptcp.h', but
'MPTCP_PM_EV_GRP_NAME' is, with the same value. We can then use the
latter, and drop the other one.
selftests: mptcp: join: mute errors when ran in the background
The test is supposed to be killed before the end, which will likely
cause "Connection reset by peer" errors. It is confusing, especially
because in case of real transfer errors, the test will not be marked as
failed. But that's OK, there are many other tests checking that.
selftests: mptcp: join: specify host being checked
Instead of displaying 'invert' when looking at some events like MP_FAIL,
MP_FASTCLOSE, MP_RESET, RM_ADDR, which is a bit vague because they are
not traditionnaly sent from one side, the host being checked is now
printed.
For the ADD_ADDR, only display the host when it is the client sending
it, which is more unusual.
Also before, the 'invert' message was printed after a few checks, but it
was not clear which ones exactly.
selftests: mptcp: join: validate MPJ SYN TX MIB counters
A few new MPJoinSynTx MIB counters have been added in a previous commit.
They are being validated here in mptcp_join.sh selftest, each time the
number of received MPJ are checked.
Most of the time, the number of sent SYN+MPJ is the same as the received
ones. But sometimes, there are more, because there are dropped, or there
are errors.
While at it, the "no MPC reuse with single endpoint" subtest has been
modified to force a bind() error.
Most tests are checking if the expected number of SYN/SYN+ACK/ACK JOINs
have been received, each of them on one line.
More Join related tests are going to be checked soon, no need to add 5
new lines per test in case of success, just one is enough. In case of
issue, the errors will still be reported like before.
Recently, a few issues have been discovered around the creation of
additional subflows. Without these counters, it was difficult to point
out the reason why some subflows were not created as expected.
These counters should have been added earlier, because there is no other
simple ways to extract such information from the kernel, and understand
why subflows have not been created.
While at it, some pr_debug() have been added, just in case the errno
needs to be printed.
__mptcp_subflow_connect() is currently called from the path-managers,
which have all the required information to create subflows. No need to
call the PM again to re-iterate over the list of entries with RCU lock
to get more info.
Instead, it is possible to pass a mptcp_pm_addr_entry structure, instead
of a mptcp_addr_info one. The former contains the ifindex and the flags
that are required when creating the new subflow.
This is a partial revert of commit ee285257a9c1 ("mptcp: drop flags and
ifindex arguments").
While at it, the local ID can also be set if it is known and 0, to avoid
having to set it in the 'rebuild_header' hook, which will cause a new
iteration of the endpoint entries.
Rename all the helpers specific to the flushing operations to make it
clear that the intention is to flush all created subflows, and remove
all announced addresses, not just a specific selection.
That way, it is easier to understand why the id_avail_bitmap and
local_addr_used are reset at the end.
Jakub Kicinski [Tue, 3 Sep 2024 22:18:45 +0000 (15:18 -0700)]
Merge branch 'rx-software-timestamp-for-all'
Gal Pressman says:
====================
RX software timestamp for all
All devices support SOF_TIMESTAMPING_RX_SOFTWARE by virtue of
net_timestamp_check() being called in the device independent code.
Following Willem's suggestion [1], make it so drivers do not have to
handle SOF_TIMESTAMPING_RX_SOFTWARE and SOF_TIMESTAMPING_SOFTWARE, nor
setting of the PHC index to -1.
All drivers will now report RX software timestamp as supported.
The series is limited to 15 patches, I will submit other drivers in
subsequent submissions.
Gal Pressman [Sun, 1 Sep 2024 11:28:03 +0000 (14:28 +0300)]
net: mvpp2: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marcin Wojtas <marcin.s.wojtas@gmail.com> Link: https://patch.msgid.link/20240901112803.212753-16-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:28:02 +0000 (14:28 +0300)]
octeontx2-pf: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:28:01 +0000 (14:28 +0300)]
gianfar: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20240901112803.212753-14-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:28:00 +0000 (14:28 +0300)]
net: enetc: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:59 +0000 (14:27 +0300)]
net: fec: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:58 +0000 (14:27 +0300)]
net: hns3: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:57 +0000 (14:27 +0300)]
net: ethernet: rtsn: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:56 +0000 (14:27 +0300)]
net: renesas: rswitch: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:55 +0000 (14:27 +0300)]
ravb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Gal Pressman [Sun, 1 Sep 2024 11:27:54 +0000 (14:27 +0300)]
ionic: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20240901112803.212753-7-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:27:53 +0000 (14:27 +0300)]
tsnep: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Link: https://patch.msgid.link/20240901112803.212753-6-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:27:52 +0000 (14:27 +0300)]
can: peak_usb: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20240901112803.212753-5-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:27:51 +0000 (14:27 +0300)]
can: peak_canfd: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20240901112803.212753-4-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:27:50 +0000 (14:27 +0300)]
can: dev: Remove setting of RX software timestamp
The responsibility for reporting of RX software timestamp has moved to
the core layer (see __ethtool_get_ts_info()), remove usage from the
device drivers.
Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://patch.msgid.link/20240901112803.212753-3-gal@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Gal Pressman [Sun, 1 Sep 2024 11:27:49 +0000 (14:27 +0300)]
ethtool: RX software timestamp for all
All devices support SOF_TIMESTAMPING_RX_SOFTWARE by virtue of
net_timestamp_check() being called in the device independent code.
Move the responsibility of reporting SOF_TIMESTAMPING_RX_SOFTWARE and
SOF_TIMESTAMPING_SOFTWARE, and setting PHC index to -1 to the core.
Device drivers no longer need to use them.
Jakub Kicinski [Tue, 3 Sep 2024 19:50:02 +0000 (12:50 -0700)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2024-08-30 (igc, e1000e, i40e)
This series contains updates to igc, e1000e, and i40 drivers.
Kurt Kanzenbach adds support for MQPRIO offloads and stops unintended,
excess interrupts on igc.
Sasha adds reporting of EEE (Energy Efficient Ethernet) ability and
moves a register define to a better suited file for igc.
Vitaly stops reporting errors on shutdown and suspend as they are not
fatal for e1000e.
Alex adds reporting of EEE to i40e.
* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
i40e: Add Energy Efficient Ethernet ability for X710 Base-T/KR/KX cards
e1000e: avoid failing the system during pm_suspend
igc: Move the MULTI GBT AN Control Register to _regs file
igc: Add Energy Efficient Ethernet ability
igc: Get rid of spurious interrupts
igc: Add MQPRIO offload support
====================
Jakub Kicinski [Tue, 3 Sep 2024 18:59:44 +0000 (11:59 -0700)]
Merge tag 'ieee802154-for-net-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2024-09-01
Simon Horman catched two typos in our headers. No functional change.
* tag 'ieee802154-for-net-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
ieee802154: Correct spelling in nl802154.h
mac802154: Correct spelling in mac802154.h
====================
Justin Iurman [Fri, 30 Aug 2024 19:19:19 +0000 (21:19 +0200)]
ioam6: improve checks on user data
This patch improves two checks on user data.
The first one prevents bit 23 from being set, as specified by RFC 9197
(Sec 4.4.1):
Bit 23 Reserved; MUST be set to zero upon transmission and be
ignored upon receipt. This bit is reserved to allow for
future extensions of the IOAM Trace-Type bit field.
The second one checks that the tunnel destination address !=
IPV6_ADDR_ANY, just like we already do for the tunnel source address.
Florian Westphal [Fri, 30 Aug 2024 09:22:39 +0000 (11:22 +0200)]
selftests: netfilter: nft_queue.sh: fix spurious timeout on debug kernel
The sctp selftest is very slow on debug kernels.
Its possible that the nf_queue listener program exits due to timeout
before first sctp packet is processed.
In this case socat hangs until script times out.
Fix this by removing the -t option where possible and kill the test
program once the file transfer/socat has exited.
-t sets SO_RCVTIMEO, its inteded for the 'ping' part of the selftest
where we want to make sure that packets get reinjected properly without
skipping a second queue request.
While at it, add a helper to compare the (binary) files instead of diff.
The 'diff' part was copied from a another sub-test that compares text.
Let helper dump file sizes on error so we can see the progress made.
Tested on an old 2010-ish box with a debug kernel and 100 iterations.
This is a followup to the earlier filesize reduction change.
Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20240829080109.GB30766@breakpoint.cc/ Fixes: 0a8b08c554da ("selftests: netfilter: nft_queue.sh: reduce test file size for debug build") Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20240830092254.8029-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
====================
net: Simplified with scoped function
Simplify with scoped for each OF child loop, as well as dev_err_probe().
Changes in v4:
- Drop the fix patch and __free() patch.
- Rebased on the fix patch has been stripped out.
- Remove the extra parentheses.
- Ensure Signed-off-by: should always be last.
- Add Reviewed-by.
- Update the cover letter commit message.
Changes in v3:
- Sort the variables, longest first, shortest last.
- Add Reviewed-by.
Changes in v2:
- Subject prefix: next -> net-next.
- Split __free() from scoped for each OF child loop clean.
- Fix use of_node_put() instead of __free() for the 5th patch.
====================
====================
netdev_features: start cleaning netdev_features_t up
NETDEV_FEATURE_COUNT is currently 64, which means we can't add any new
features as netdev_features_t is u64.
As per several discussions, instead of converting netdev_features_t to
a bitmap, which would mean A LOT of changes, we can try cleaning up
netdev feature bits.
There's a bunch of bits which don't really mean features, rather device
attributes/properties that can't be changed via Ethtool in any of the
drivers. Such attributes can be moved to netdev private flags without
losing any functionality.
Start converting some read-only netdev features to private flags from
the ones that are most obvious, like lockless Tx, inability to change
network namespace etc. I was able to reduce NETDEV_FEATURE_COUNT from
64 to 60, which mean 4 free slots for new features. There are obviously
more read-only features to convert, such as highDMA, "challenged VLAN",
HSR (4 bits) - this will be done in subsequent series.
Please note that currently netdev features are not uAPI/ABI by any means.
Ethtool passes their names and bits to the userspace separately and there
are no hardcoded names/bits in the userspace, so that new Ethtool could
work on older kernels and vice versa.
This, however, isn't true for Ethtools < 3.4. I haven't changed the bit
positions of the already existing features and instead replaced the freed
bits with stubs. But it's anyway theoretically possible that Ethtools
older than 2011 will break. I hope no currently supported distros supply
such an ancient version.
Shell scripts also most likely won't break since the removed bits were
always read-only, meaning nobody would try touching them from a script.
====================
NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only
2 bits, open-code it and remove the definition from netdev_features.h.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
netdev_features: convert NETIF_F_FCOE_MTU to dev->fcoe_mtu
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
netdev_features: convert NETIF_F_NETNS_LOCAL to dev->netns_local
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
netdev_features: convert NETIF_F_LLTX to dev->lltx
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
netdevice: convert private flags > BIT(31) to bitfields
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Pawel Dembicki [Tue, 27 Aug 2024 12:39:38 +0000 (14:39 +0200)]
net: dsa: vsc73xx: implement FDB operations
This commit introduces implementations of three functions:
.port_fdb_dump
.port_fdb_add
.port_fdb_del
The FDB database organization is the same as in other old Vitesse chips:
It has 2048 rows and 4 columns (buckets). The row index is calculated by
the hash function 'vsc73xx_calc_hash' and the FDB entry must be placed
exactly into row[hash]. The chip selects the bucket number by itself.
The first patch is by Duy Nguyen and document the R-Car V4M support in
the rcar-canfd DT bindings.
Frank Li's patch converts the microchip,mcp251x.txt DT bindings
documentation to yaml.
A patch by Zhang Changzhong update a comment in the j1939 CAN
networking stack.
Stefan Mätje's patch updates the CAN configuration netlink code, so
that the bit timing calculation doesn't work on stale
can_priv::ctrlmode data.
Martin Jocic contributes a patch for the kvaser_pciefd driver to
convert some ifdefs into if (IS_ENABLED()).
The last patch is by Yan Zhen and simplifies the probe() function of
the kvaser USB driver by using dev_err_probe().
* tag 'linux-can-next-for-6.12-20240830' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: kvaser_usb: Simplify with dev_err_probe()
can: kvaser_pciefd: Use IS_ENABLED() instead of #ifdef
can: netlink: avoid call to do_set_data_bittiming callback with stale can_priv::ctrlmode
can: j1939: use correct function name in comment
dt-bindings: can: convert microchip,mcp251x.txt to yaml
dt-bindings: can: renesas,rcar-canfd: Document R-Car V4M support
====================
Joe Damato [Sat, 31 Aug 2024 12:17:04 +0000 (12:17 +0000)]
netdev-genl: Set extack and fix error on napi-get
In commit 27f91aaf49b3 ("netdev-genl: Add netlink framework functions
for napi"), when an invalid NAPI ID is specified the return value
-EINVAL is used and no extack is set.
Change the return value to -ENOENT and set the extack.
Andrew Halaney [Thu, 29 Aug 2024 20:48:44 +0000 (15:48 -0500)]
net: stmmac: drop the ethtool begin() callback
This callback doesn't seem to serve much purpose, and prevents things
like:
- systemd.link files from disabling autonegotiation
- carrier detection in NetworkManager
- any ethtool setting
prior to userspace bringing the link up.
The only fear I can think of is accessing unclocked resources due to
pm_runtime, but ethtool ioctls handle that as of commit f32a21376573 ("ethtool: runtime-resume netdev parent before ethtool ioctl ops")
Reviewed-by: Dmitry Dolenko <d.dolenko@metrotek.ru> Tested-by: Dmitry Dolenko <d.dolenko@metrotek.ru> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Sep 2024 17:17:34 +0000 (18:17 +0100)]
Merge branch 'octeontx2-af-cpt-update'
Srujana Challa says:
====================
octeontx2-af: update CPT block for CN10KB and CN10KA B0
This commit addresses two key updates for the CN10KB and CN10KA B0:
1. The number of FLT interrupt vectors has been reduced to 2 on CN10KB.
The code is updated to reflect this change across the CN10K series.
2. The maximum CPT credits that RX can use are now configurable through
a hardware CSR on CN10KA B0. This patch sets the default value to optimize
peak performance, aligning it with other chip versions.
v2:
- Addressed the review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Srujana Challa [Thu, 29 Aug 2024 08:09:35 +0000 (13:39 +0530)]
octeontx2-af: configure default CPT credits for CN10KA B0
The maximum CPT credits that RXC can use are now configurable on CN10KA B0
through a hardware CSR. This patch sets the default value to optimize peak
performance, aligning it with other chip versions.
Signed-off-by: Srujana Challa <schalla@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Srujana Challa [Thu, 29 Aug 2024 08:09:33 +0000 (13:39 +0530)]
octeontx2-af: use dynamic interrupt vectors for CN10K
This patch updates the driver to use a dynamic number of vectors instead
of a hard-coded value. This change accommodates the CN10KB, which has 2
vectors, unlike the previously supported chips that have 3 vectors.
Signed-off-by: Srujana Challa <schalla@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 31 Aug 2024 16:44:52 +0000 (17:44 +0100)]
Merge branch 'unmask-dscp-bits'
Ido Schimmel says:
====================
Unmask upper DSCP bits - part 2
tl;dr - This patchset continues to unmask the upper DSCP bits in the
IPv4 flow key in preparation for allowing IPv4 FIB rules to match on
DSCP. No functional changes are expected. Part 1 was merged in commit
("Merge branch 'unmask-upper-dscp-bits-part-1'").
The TOS field in the IPv4 flow key ('flowi4_tos') is used during FIB
lookup to match against the TOS selector in FIB rules and routes.
It is currently impossible for user space to configure FIB rules that
match on the DSCP value as the upper DSCP bits are either masked in the
various call sites that initialize the IPv4 flow key or along the path
to the FIB core.
In preparation for adding a DSCP selector to IPv4 and IPv6 FIB rules, we
need to make sure the entire DSCP value is present in the IPv4 flow key.
This patchset continues to unmask the upper DSCP bits, but this time in
the output route path.
Patches #1-#3 unmask the upper DSCP bits in the various places that
invoke the core output route lookup functions directly.
Patches #4-#6 do the same in three helpers that are widely used in the
output path to initialize the TOS field in the IPv4 flow key.
The rest of the patches continue to unmask these bits in call sites that
invoke the following wrappers around the core lookup functions:
The next patchset will handle the callers of ip_route_output_ports() and
ip_route_output_key().
No functional changes are expected as commit 1fa3314c14c6 ("ipv4:
Centralize TOS matching") moved the masking of the upper DSCP bits to
the core where 'flowi4_tos' is matched against the TOS selector.
Changes since v1 [1]:
* Remove IPTOS_RT_MASK in patch #7 instead of in patch #6
Ido Schimmel [Thu, 29 Aug 2024 06:54:56 +0000 (09:54 +0300)]
ipv6: sit: Unmask upper DSCP bits in ipip6_tunnel_xmit()
The function calls flowi4_init_output() to initialize an IPv4 flow key
with which it then performs a FIB lookup using ip_route_output_flow().
The 'tos' variable with which the TOS value in the IPv4 flow key
(flowi4_tos) is initialized contains the full DS field. Unmask the upper
DSCP bits so that in the future the FIB lookup could be performed
according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 29 Aug 2024 06:54:55 +0000 (09:54 +0300)]
ipv4: Unmask upper DSCP bits in ip_send_unicast_reply()
The function calls flowi4_init_output() to initialize an IPv4 flow key
with which it then performs a FIB lookup using ip_route_output_flow().
'arg->tos' with which the TOS value in the IPv4 flow key (flowi4_tos) is
initialized contains the full DS field. Unmask the upper DSCP bits so
that in the future the FIB lookup could be performed according to the
full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 29 Aug 2024 06:54:52 +0000 (09:54 +0300)]
ipv4: Unmask upper DSCP bits in get_rttos()
The function is used by a few socket types to retrieve the TOS value
with which to perform the FIB lookup for packets sent through the socket
(flowi4_tos). If a DS field was passed using the IP_TOS control message,
then it is used. Otherwise the one specified via the IP_TOS socket
option.
Unmask the upper DSCP bits so that in the future the lookup could be
performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 29 Aug 2024 06:54:51 +0000 (09:54 +0300)]
ipv4: Unmask upper DSCP bits in ip_sock_rt_tos()
The function is used to read the DS field that was stored in IPv4
sockets via the IP_TOS socket option so that it could be used to
initialize the flowi4_tos field before resolving an output route.
Unmask the upper DSCP bits so that in the future the output route lookup
could be performed according to the full DSCP value.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>