Jakub Kicinski [Tue, 8 Aug 2023 23:41:30 +0000 (16:41 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-07 (ice)
This series contains updates to ice driver only.
Wojciech allows for LAG interfaces to be used for bridge offloads.
Marcin tracks additional metadata for filtering rules to aid in proper
differentiation of similar rules. He also renames some flags that
do not entirely describe their representation.
Karol and Jan add additional waiting for firmware load on devices that
require it.
Przemek refactors RSS implementation to clarify/simplify configurations.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: clean up __ice_aq_get_set_rss_lut()
ice: add FW load wait
ice: Add get C827 PHY index function
ice: Rename enum ice_pkt_flags values
ice: Add direction metadata
ice: Accept LAG netdevs in bridge offloads
====================
Jakub Kicinski [Tue, 8 Aug 2023 23:28:28 +0000 (16:28 -0700)]
Merge tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2023-08-07
1) Few cleanups
2) Dynamic completion EQs
The driver creates completion EQs for all vectors directly on driver
load, even if those EQs will not be utilized later on.
To allow more flexibility in managing completion EQs and to reduce the
memory overhead of driver load, this series will adjust completion EQs
creation to be dynamic. Meaning, completion EQs will be created only
when needed.
Patch #1 introduces a counter for tracking the current number of
completion EQs.
Patches #2-6 refactor the existing infrastructure of managing completion
EQs and completion IRQs to be compatible with per-vector
allocation/release requests.
Patches #7-8 modify the CPU-to-IRQ affinity calculation to be resilient
in case the affinity is requested but completion IRQ is not allocated yet.
Patch #9 function rename.
Patch #10 handles the corner case of SF performing an IRQ request when no
SF IRQ pool is found, and no PF IRQ exists for the same vector.
Patch #11 modify driver to use dynamically allocate completion EQs.
* tag 'mlx5-updates-2023-08-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
net/mlx5: Bridge, Only handle registered netdev bridge events
net/mlx5: E-Switch, Remove redundant arg ignore_flow_lvl
net/mlx5: Fix typo reminder -> remainder
net/mlx5: remove many unnecessary NULL values
net/mlx5: Allocate completion EQs dynamically
net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool
net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()
net/mlx5: Add IRQ vector to CPU lookup function
net/mlx5: Introduce mlx5_cpumask_default_spread
net/mlx5: Implement single completion EQ create/destroy methods
net/mlx5: Use xarray to store and manage completion EQs
net/mlx5: Refactor completion IRQ request/release handlers in EQ layer
net/mlx5: Use xarray to store and manage completion IRQs
net/mlx5: Refactor completion IRQ request/release API
net/mlx5: Track the current number of completion EQs
====================
Jakub Kicinski [Mon, 7 Aug 2023 21:00:51 +0000 (14:00 -0700)]
docs: net: page_pool: de-duplicate the intro comment
In commit 82e896d992fa ("docs: net: page_pool: use kdoc to avoid
duplicating the information") I shied away from using the DOC:
comments when moving to kdoc for documenting page_pool API,
because I wasn't sure how familiar people are with it.
Turns out there is already a DOC: comment for the intro, which
is the same in both places, modulo what looks like minor rewording.
Use the version from Documentation/ but keep the contents with
the code.
Hannes Reinecke [Mon, 7 Aug 2023 07:10:22 +0000 (09:10 +0200)]
net/tls: avoid TCP window full during ->read_sock()
When flushing the backlog after decoding a record we don't really
know how much data the caller want us to evaluate, so use INT_MAX
and 0 as arguments to tls_read_flush_backlog() to ensure we flush
at 128k of data. Otherwise we might be reading too much data and
trigger a TCP window full.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Link: https://lore.kernel.org/r/20230807071022.10091-1-hare@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
xu xin [Mon, 7 Aug 2023 01:54:08 +0000 (01:54 +0000)]
net/ipv4: return the real errno instead of -EINVAL
For now, No matter what error pointer ip_neigh_for_gw() returns,
ip_finish_output2() always return -EINVAL, which may mislead the upper
users.
For exemple, an application uses sendto to send an UDP packet, but when the
neighbor table overflows, sendto() will get a value of -EINVAL, and it will
cause users to waste a lot of time checking parameters for errors.
Return the real errno instead of -EINVAL.
Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Si Hao <si.hao@zte.com.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Link: https://lore.kernel.org/r/20230807015408.248237-1-xu.xin16@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: renesas: rswitch: Add runtime speed change support
The latest SoC version can support runtime speed change. So,
add detect SoC version by using soc_device_match() and then
reconfigure the hardware of this and SerDes if needed.
Lin Ma [Mon, 7 Aug 2023 09:13:47 +0000 (17:13 +0800)]
rtnetlink: remove redundant checks for nlattr IFLA_BRIDGE_MODE
The commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks
IFLA_BRIDGE_MODE length") added the nla_len check in rtnl_bridge_setlink,
which is the only caller for ndo_bridge_setlink handlers defined in
low-level driver codes. Hence, this patch cleanups the redundant checks in
each ndo_bridge_setlink handler function.
Suggested-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807091347.3804523-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Michael Chan [Mon, 7 Aug 2023 14:57:20 +0000 (07:57 -0700)]
bnxt_en: Fix W=stringop-overflow warning in bnxt_dcb.c
Fix the following warning:
drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c: In function ‘bnxt_hwrm_queue_cos2bw_cfg’:
cc1: error: writing 12 bytes into a region of size 1 [-Werror=stringop-overflow ]
In file included from drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:19:
drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h:6045:17: note: destination object ‘unused_0’ of size 1
6045 | u8 unused_0;
Fix it by modifying struct hwrm_queue_cos2bw_cfg_input to use an array
of sub struct similar to the previous patch. This will eliminate the
pointer arithmetc to calculate the destination pointer passed to
memcpy().
Michael Chan [Mon, 7 Aug 2023 14:57:19 +0000 (07:57 -0700)]
bnxt_en: Fix W=1 warning in bnxt_dcb.c from fortify memcpy()
Fix the following warning:
inlined from ‘bnxt_hwrm_queue_cos2bw_qcfg’ at drivers/net/ethernet/broadcom/bnxt/bnxt_dcb.c:165:3,
./include/linux/fortify-string.h:592:4: error: call to ‘__read_overflow2_field’ declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror]
__read_overflow2_field(q_size_field, size);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Modify the FW interface defintion of struct hwrm_queue_cos2bw_qcfg_output
to use an array of sub struct for the queue1 to queue7 fields. Note that
the layout of the queue0 fields are different and these are not part of
the array. This makes the code much cleaner by removing the pointer
arithmetic for memcpy().
Dan Carpenter [Mon, 7 Aug 2023 13:01:53 +0000 (16:01 +0300)]
net: bcmasp: Prevent array undereflow in bcmasp_netfilt_get_init()
The "loc" value comes from the user and it can be negative leading to an
an array underflow when we check "priv->net_filters[loc].claimed". Fix
this by changing the type to u32.
Fixes: c5d511c49587 ("net: bcmasp: Add support for wake on net filters") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Link: https://lore.kernel.org/r/b3b47b25-01fc-4d9f-a6c3-e037ad4d71d7@moroto.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 8 Aug 2023 22:01:41 +0000 (15:01 -0700)]
Merge branch 'net-fs_enet-driver-cleanup'
Christophe Leroy says:
====================
net: fs_enet: Driver cleanup
Over the years, platform and driver initialisation have evolved into
more generic ways, and driver or platform specific stuff has gone
away, leaving stale objects behind.
This series aims at cleaning all that up for fs_enet ethernet driver.
====================
CC drivers/net/ethernet/freescale/fs_enet/fs_enet-main.o
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c: In function 'fs_enet_interrupt':
drivers/net/ethernet/freescale/fs_enet/fs_enet-main.c:321:40: warning: variable 'fpi' set but not used [-Wunused-but-set-variable]
Yue Haibing [Fri, 4 Aug 2023 12:55:25 +0000 (20:55 +0800)]
i40e: Remove unused function declarations
Commit f62b5060d670 ("i40e: fix mac address checking") left behind
i40e_validate_mac_addr() declaration.
Also the other declarations are declared but never implemented in
commit 56a62fc86895 ("i40e: init code and hardware support").
Li Zetao [Fri, 4 Aug 2023 09:59:46 +0000 (17:59 +0800)]
net: dpaa2-switch: Remove redundant initialization owner in dpaa2_switch_drv
The fsl_mc_driver_register() will set "THIS_MODULE" to driver.owner when
register a fsl_mc_driver driver, so it is redundant initialization to set
driver.owner in dpaa2_switch_drv statement. Remove it for clean code.
Li Zetao [Fri, 4 Aug 2023 09:59:45 +0000 (17:59 +0800)]
net: dpaa2-eth: Remove redundant initialization owner in dpaa2_eth_driver
The fsl_mc_driver_register() will set "THIS_MODULE" to driver.owner when
register a fsl_mc_driver driver, so it is redundant initialization to set
driver.owner in dpaa2_eth_driver statement. Remove it for clean code.
Suman Ghosh [Fri, 4 Aug 2023 04:59:34 +0000 (10:29 +0530)]
octeontx2-af: Code restructure to handle TC outer VLAN offload
Moved the TC outer VLAN offload support to a separate function.
This change is done to handle all VLAN related changes cleanly from
a dedicated function.
====================
page_pool: a couple of assorted optimizations
That initially was a spin-off of the IAVF PP series[0], but has grown
(and shrunk) since then a bunch. In fact, it consists of three
semi-independent blocks:
* #1-2: Compile-time optimization. Split page_pool.h into 2 headers to
not overbloat the consumers not needing complex inline helpers and
then stop including it in skbuff.h at all. The first patch is also
prereq for the whole series.
* #3: Improve cacheline locality for users of the Page Pool frag API.
* #4-6: Use direct cache recycling more aggressively, when it is safe
obviously. In addition, make sure nobody wants to use Page Pool API
with disabled interrupts.
Patches #1 and #5 are authored by Yunsheng and Jakub respectively, with
small modifications from my side as per ML discussions.
For the perf numbers for #3-6, please see individual commit messages.
Also available on my GH with many more Page Pool goodies[1].
net: skbuff: always try to recycle PP pages directly when in softirq
Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") allowed direct recycling of skb pages to their PP for some cases,
but unfortunately missed a couple of other majors.
For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
which unconditionally passes `false` as @napi_safe. Thus, all pages go
through ptr_ring and locks, although most of time we're actually inside
the NAPI polling this PP is linked with, so that it would be perfectly
safe to recycle pages directly.
Let's address such. If @napi_safe is true, we're fine, don't change
anything for this path. But if it's false, check whether we are in the
softirq context. It will most likely be so and then if ->list_owner
is our current CPU, we're good to use direct recycling, even though
@napi_safe is false -- concurrent access is excluded. in_softirq()
protection is needed mostly due to we can hit this place in the
process context (not the hardirq though).
For the mentioned xdp-drop-skb-mode case, the improvement I got is
3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
alloc_slow counter doesn't change most of time, which means the
MM layer is not even called to allocate any new pages.
Jakub Kicinski [Fri, 4 Aug 2023 18:05:28 +0000 (20:05 +0200)]
page_pool: add a lockdep check for recycling in hardirq
Page pool use in hardirq is prohibited, add debug checks
to catch misuses. IIRC we previously discussed using
DEBUG_NET_WARN_ON_ONCE() for this, but there were concerns
that people will have DEBUG_NET enabled in perf testing.
I don't think anyone enables lockdep in perf testing,
so use lockdep to avoid pushback and arguing :)
net: skbuff: avoid accessing page_pool if !napi_safe when returning page
Currently, pp->p.napi is always read, but the actual variable it gets
assigned to is read-only when @napi_safe is true. For the !napi_safe
cases, which yet is still a pack, it's an unneeded operation.
Moreover, it can lead to premature or even redundant page_pool
cacheline access. For example, when page_pool_is_last_frag() returns
false (with the recent frag improvements).
Thus, read it only when @napi_safe is true. This also allows moving
@napi inside the condition block itself. Constify it while we are
here, because why not.
On x86_64, frag_* fields of struct page_pool are scattered across two
cachelines despite the summary size of 24 bytes. All three fields are
used in pretty much the same places, but the last field, ::frag_users,
is pushed out to the next CL, provoking unwanted false-sharing on
hotpath (frags allocation code).
There are some holes and cold members to move around. Move frag_* one
block up, placing them right after &page_pool_params perfectly at the
beginning of CL2. This doesn't do any meaningful to the second block, as
those are some destroy-path cold structures, and doesn't do anything to
::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3
(still fitting into 1 cacheline).
On my setup, this yields 1-2% of Mpps when using PP frags actively.
When it comes to 32-bit architectures with 32-byte CL: &page_pool_params
plus ::pad is 44 bytes, the block taken care of is 16 bytes within one
CL, so there should be at least no regressions from the actual change.
::pages_state_hold_cnt is not related directly to that triple, but is
paired currently with ::frags_offset and decoupling them would mean
either two 4-byte holes or more invasive layout changes.
net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>
Currently, touching <net/page_pool/types.h> triggers a rebuild of more
than half of the kernel. That's because it's included in
<linux/skbuff.h>. And each new include to page_pool/types.h adds more
[useless] data for the toolchain to process per each source file from
that pile.
In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB
recycling"), Matteo included it to be able to call a couple of functions
defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info as
long as page pool owns the page") one of the calls was removed, so only
one was left. It's the call to page_pool_return_skb_page() in
napi_frag_unref(). The function is external and doesn't have any
dependencies. Having very niche page_pool_types.h included only for that
looks like an overkill.
As %PP_SIGNATURE is not local to page_pool.c (was only in the
early submissions), nothing holds this function there. Teleport
page_pool_return_skb_page() to skbuff.c, just next to the main consumer,
skb_pp_recycle(), and rename it to napi_pp_put_page(), as it doesn't
work with skbs at all and the former name tells nothing. The #if guards
here are only to not compile and have it in the vmlinux when not needed
-- both call sites are already guarded.
Now, touching page_pool_types.h only triggers rebuilding of the drivers
using it and a couple of core networking files.
Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-3-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Yunsheng Lin [Fri, 4 Aug 2023 18:05:24 +0000 (20:05 +0200)]
page_pool: split types and declarations from page_pool.h
Split types and pure function declarations from page_pool.h
and add them in page_page/types.h, so that C sources can
include page_pool.h and headers should generally only include
page_pool/types.h as suggested by jakub.
Rename page_pool.h to page_pool/helpers.h to have both in
one place.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20230804180529.2483231-2-aleksander.lobakin@intel.com
[Jakub: change microsoft/mana, fix kdoc paths in Documentation] Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Przemek Kitszel [Wed, 7 Jun 2023 13:09:57 +0000 (09:09 -0400)]
ice: clean up __ice_aq_get_set_rss_lut()
Refactor __ice_aq_get_set_rss_lut() to improve reader experience and limit
misuse scenarios (undesired LUT size for given LUT type).
Allow only 3 RSS LUT type+size variants:
PF LUT sized 2048, GLOBAL LUT sized 512, and VSI LUT sized 64, which were
used on default flows prior to this commit.
Prior to the change, code was mixing the meaning of @params->lut_size and
@params->lut_type, flag assigning logic was cryptic, while long defines
made everything harder to follow.
Fix that by extracting some code out to separate helpers.
Drop some of "shift by 0" statements that originated from Intel's
internal HW documentation.
Drop some redundant VSI masks (since ice_is_vsi_valid() gives "valid" for
up to 0x300 VSIs).
After sweeping all the defines out of struct ice_aqc_get_set_rss_lut,
it fits into 7 lines.
Finally apply some cleanup to the callsite
(use of the new enums, tmp var for lengthy bit extraction).
Note that flags for 128 and 64 sized VSI LUT are the same,
and 64 is used everywhere in the code (updated to new enum here), it just
happened that there was 128 in flag name.
__ice_aq_get_set_rss_key() uses the same VSI valid bit, make constant
common for it and __ice_aq_get_set_rss_lut().
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Karol Kolacinski [Thu, 13 Jul 2023 13:21:23 +0000 (15:21 +0200)]
ice: Add get C827 PHY index function
Add a function to find the C827 PHY node handle and return C827 PHY
index for the E810 products.
In order to bring this function to full functionality, some
helpers for this were written by Michal Michalik.
Co-developed-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Michal Michalik <michal.michalik@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Marcin Szycik [Thu, 22 Jun 2023 13:35:13 +0000 (15:35 +0200)]
ice: Rename enum ice_pkt_flags values
enum ice_pkt_flags contains values such as ICE_PKT_FLAGS_VLAN and
ICE_PKT_FLAGS_TUNNEL, but actually the flags words which they refer to
contain a range of unrelated values - e.g. word 0 (ICE_PKT_FLAGS_VLAN)
contains fields such as from_network and ucast, which have nothing to do
with VLAN. Rename each enum value to ICE_PKT_FLAGS_MDID<number>, so it's
clear in which flags word does some value reside.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Marcin Szycik [Thu, 22 Jun 2023 13:35:12 +0000 (15:35 +0200)]
ice: Add direction metadata
Currently it is possible to create a filter which breaks TX traffic, e.g.:
tc filter add dev $PF1 ingress protocol ip prio 1 flower ip_proto udp
dst_port $PORT action mirred egress redirect dev $VF1_PR
This adds a rule which might match both TX and RX traffic, and in TX path
the PF will actually receive the traffic, which breaks communication.
To fix this, add a match on direction metadata flag when adding a tc rule.
Because of the way metadata is currently handled, a duplicate lookup word
would appear if VLAN metadata is also added. The lookup would still work
correctly, but one word would be wasted. To prevent it, lookup 0 now always
contains all metadata. When any metadata needs to be added, it is added to
lookup 0 and lookup count is not incremented. This way, two flags residing
in the same word will take up one word, instead of two.
Note: the drop action is also affected, i.e. it will now only work in one
direction.
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The patch is from me and reverts the addition of the CAN controller
nodes in the allwinner d1 SoC.
* tag 'linux-can-next-for-6.6-20230807' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
Revert "riscv: dts: allwinner: d1: Add CAN controller nodes"
====================
====================
net: stmmac: correct MAC propagation delay
Changes in v3:
- work in Richard's review feedback. Thank you for reviewing my patch:
- as some of the hardware may have no or invalid correction value
registers: introduce feature switch which can be enabled in the glue
code drivers depending on the actual hardware support
- only enable the feature on the i.MX8MP for the time being, as the patch
improves timing accuracy and is tested for this hardware
- Link to v2: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v2-1-3366f38ee9a6@pengutronix.de
Changes in v2:
- fix builds for 32bit, this was found by the kernel build bot Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202307200225.B8rmKQPN-lkp@intel.com/
- while at it also fix an overflow by shifting a u32 constant from macro by 10bits
by casting the constant to u64
- Link to v1: https://lore.kernel.org/r/20230719-stmmac_correct_mac_delay-v1-1-768aa4d09334@pengutronix.de
Tested-by: Kurt Kanzenbach <kurt@linutronix.de> # imx8mp
====================
Johannes Zink [Tue, 1 Aug 2023 15:44:30 +0000 (17:44 +0200)]
net: stmmac: dwmac-imx: enable MAC propagation delay correction for i.MX8MP
As the i.MX8MP supports reading MAC propagation delay and correcting the
Hardware timestamp counter for additional delays [1], enable the feature
for this SoC.
This reduces phase error of the PPS output from the PTP Hardware Clock
from approx 150ns to 100ns.
Johannes Zink [Tue, 1 Aug 2023 15:44:29 +0000 (17:44 +0200)]
net: stmmac: correct MAC propagation delay
The IEEE1588 Standard specifies that the timestamps of Packets must be
captured when the PTP message timestamp point (leading edge of first
octet after the start of frame delimiter) crosses the boundary between
the node and the network. As the MAC latches the timestamp at an
internal point, the captured timestamp must be corrected for the
additional data transmission latency, as described in the publicly
available datasheet [1].
This patch only corrects for the MAC-Internal delay, which can be read
out from the MAC_Ingress_Timestamp_Latency register on DWMAC version 5,
since the Phy framework currently does not support querying the Phy
ingress and egress latency. The Closs Domain Crossing Circuits errors as
indicated in [1] are already being accounted in the
stmmac_get_tx_hwtstamp() function and are not corrected here.
As the Latency varies for different link speeds and MII
modes of operation, the correction value needs to be updated on each
link state change.
As the delay also causes a phase shift in the timestamp counter compared
to the rest of the network, this correction will also reduce phase error
when generating PPS outputs from the timestamp counter.
Since the correction registers may be unavailable on some hardware and
no feature bits are documented for dynamically detection of the MAC
propagation delay readout, introduce a feature bit to explicitely enable
MAC delay Correction in the gluecode driver.
Maher Sanalla [Mon, 12 Jun 2023 07:13:50 +0000 (10:13 +0300)]
net/mlx5: Allocate completion EQs dynamically
This commit enables the dynamic allocation of EQs at runtime, allowing
for more flexibility in managing completion EQs and reducing the memory
overhead of driver load. Whenever a CQ is created for a given vector
index, the driver will lookup to see if there is an already mapped
completion EQ for that vector, if so, utilize it. Otherwise, allocate a
new EQ on demand and then utilize it for the CQ completion events.
Add a protection lock to the EQ table to protect from concurrent EQ
creation attempts.
While at it, replace mlx5_vector2irqn()/mlx5_vector2eqn() with
mlx5_comp_eqn_get() and mlx5_comp_irqn_get() which will allocate an
EQ on demand if no EQ is found for the given vector.
Maher Sanalla [Thu, 22 Jun 2023 16:05:46 +0000 (19:05 +0300)]
net/mlx5: Handle SF IRQ request in the absence of SF IRQ pool
In case the SF IRQ pool is not available due to setup limitations,
SF currently relies on the already allocated PF IRQs to fulfill
its IRQ vector requests.
However, with the dynamic EQ allocation introduced in the next patch,
it is possible that not all IRQs of PF will be allocated after the driver
is loaded. In such case, if a SF requests a completion IRQ without having
its own independent IRQ pool, SF will lack a PF IRQ to utilize.
To address this scenario, allocate an IRQ for the SF from the PF's IRQ pool
on demand. The new IRQ will be shared between the SF and it's PF.
Maher Sanalla [Thu, 22 Jun 2023 15:52:44 +0000 (18:52 +0300)]
net/mlx5: Rename mlx5_comp_vectors_count() to mlx5_comp_vectors_max()
To accurately represent its purpose, rename the function that retrieves
the value of maximum vectors from mlx5_comp_vectors_count() to
mlx5_comp_vectors_max().
Maher Sanalla [Mon, 12 Jun 2023 08:58:14 +0000 (11:58 +0300)]
net/mlx5: Add IRQ vector to CPU lookup function
Currently, once driver load completes, IRQ requests were performed for all
vectors. However, as we move to support dynamic creation of EQs, this will
not be the case as some IRQs will not exist at this stage. Thus, in such
case, use the default CPU to IRQ mapping which is the serial mapping based
on IRQ vector index. Meaning, the n'th vector gets mapped to the n'th CPU.
Introduce an API function mlx5_comp_vector_cpu() that takes an IRQ index and
provides the corresponding CPU mapping. It utilizes the existing IRQ
affinity if defined, or resorts to the default serialized CPU mapping
otherwise.
Maher Sanalla [Sun, 11 Jun 2023 16:55:02 +0000 (19:55 +0300)]
net/mlx5: Implement single completion EQ create/destroy methods
Currently, create_comp_eqs() function handles the creation of all
completion EQs for all the vectors on driver load. While on driver
unload, destroy_comp_eqs() performs the equivalent job.
In preparation for dynamic EQ creation, replace create_comp_eqs() /
destroy_comp_eqs() with create_comp_eq() / destroy_comp_eq() functions
which will receive a vector index and allocate/destroy an EQ for that
specific vector. Thus, allowing more flexibility in the management
of completion EQs.
Maher Sanalla [Mon, 19 Jun 2023 12:01:43 +0000 (15:01 +0300)]
net/mlx5: Use xarray to store and manage completion EQs
Use xarray to store the completion EQs instead of a linked list.
The xarray offers more scalability, reduced memory overhead, and
facilitates the lookup of a certain EQ given a vector index.
Maher Sanalla [Mon, 12 Jun 2023 12:34:27 +0000 (15:34 +0300)]
net/mlx5: Refactor completion IRQ request/release handlers in EQ layer
Break the completion IRQ request/release functions into per-vector
handlers for both PCI devices and SFs in the EQ layer.
On EQ table creation, loop over all vectors and request an IRQ for each
one using the new per-vector functions. Perform the symmetrical change
when releasing IRQs on EQ table cleanup.
Maher Sanalla [Sun, 18 Jun 2023 16:23:24 +0000 (19:23 +0300)]
net/mlx5: Use xarray to store and manage completion IRQs
Use xarray to store the completion IRQs instead of a fixed-size allocated
array as not all completion IRQs will be requested on driver load, but
rather on demand when an EQ is created. The xarray offers more scalability,
reduced memory overhead, and provides the ability to dynamically resize the
array when needed.
Maher Sanalla [Sun, 11 Jun 2023 11:35:36 +0000 (14:35 +0300)]
net/mlx5: Refactor completion IRQ request/release API
Introduce a per-vector completion IRQ request API that requests a
single IRQ for a given vector index instead of multiple IRQs request API.
On driver load, loop over all completion vectors and request an IRQ for
each one via the newly introduced API.
Symmetrically, introduce an IRQ release API per vector. On driver
unload, loop over all vectors and release each completion IRQ via
the new per-vector API.
As IRQ vectors will be requested dynamically later in the patchset,
add a cpumask of the bounded CPUs to avoid the possible mapping of
two IRQs of the same device to the same cpu.
Maher Sanalla [Fri, 9 Jun 2023 12:44:18 +0000 (15:44 +0300)]
net/mlx5: Track the current number of completion EQs
In preparation to allocate completion EQs, add a counter to track the
number of completion EQs currently allocated. Store the maximum number
of EQs in max_comp_eqs variable.
Yue Haibing [Sat, 5 Aug 2023 11:00:09 +0000 (19:00 +0800)]
udp/udplite: Remove unused function declarations udp{,lite}_get_port()
Commit 6ba5a3c52da0 ("[UDP]: Make full use of proto.h.udp_hash innovation.")
removed these implementations but leave declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Fri, 4 Aug 2023 13:49:39 +0000 (16:49 +0300)]
net: omit ndo_hwtstamp_get() call when possible in dev_set_hwtstamp_phylib()
Setting dev->priv_flags & IFF_SEE_ALL_HWTSTAMP_REQUESTS is only legal
for drivers which were converted to ndo_hwtstamp_get() and
ndo_hwtstamp_set(), and it is only there that we call ndo_hwtstamp_set()
for a request that otherwise goes to phylib (for stuff like packet traps,
which need to be undone if phylib failed, hence the old_cfg logic).
The problem is that we end up calling ndo_hwtstamp_get() when we don't
need to (even if the SIOCSHWTSTAMP wasn't intended for phylib, or if it
was, but the driver didn't set IFF_SEE_ALL_HWTSTAMP_REQUESTS). For those
unnecessary conditions, we share a code path with virtual drivers (vlan,
macvlan, bonding) where ndo_hwtstamp_get() is implemented as
generic_hwtstamp_get_lower(), and may be resolved through
generic_hwtstamp_ioctl_lower() if the lower device is unconverted.
I.e. this situation:
$ ip link add link eno0 name eno0.100 type vlan id 100
$ hwstamp_ctl -i eno0.100 -t 1
We are unprepared to deal with this, because if ndo_hwtstamp_get() is
resolved through a legacy ndo_eth_ioctl(SIOCGHWTSTAMP) lower_dev
implementation, that needs a non-NULL old_cfg.ifr pointer, and we don't
have it.
But we don't even need to deal with it either. In the general case,
drivers may not even implement SIOCGHWTSTAMP handling, only SIOCSHWTSTAMP,
so it makes sense to completely avoid a SIOCGHWTSTAMP call if we can.
The solution is to split the single "if" condition into 3 smaller ones,
thus separating the decision to call ndo_hwtstamp_get() from the
decision to call ndo_hwtstamp_set(). The third "if" condition is
identical to the first one, and both are subsets of the second one.
Thus, the "cfg" argument of kernel_hwtstamp_config_changed() is always
valid.
Reported-by: Eric Dumazet <edumazet@google.com> Closes: https://lore.kernel.org/netdev/CANn89iLOspJsvjPj+y8jikg7erXDomWe8sqHMdfL_2LQSFrPAg@mail.gmail.com/ Fixes: fd770e856e22 ("net: remove phy_has_hwtstamp() -> phy_mii_ioctl() decision from converted drivers") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Fri, 4 Aug 2023 09:35:31 +0000 (17:35 +0800)]
net: ethernet: adi: adin1110: use eth_broadcast_addr() to assign broadcast address
Use eth_broadcast_addr() to assign broadcast address instead
of memset().
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Yu Liao [Fri, 4 Aug 2023 09:21:43 +0000 (17:21 +0800)]
ibmvnic: remove unused rc variable
gcc with W=1 reports
drivers/net/ethernet/ibm/ibmvnic.c:194:13: warning: variable 'rc' set but not used [-Wunused-but-set-variable]
^
This variable is not used so remove it.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202308040609.zQsSXWXI-lkp@intel.com/ Signed-off-by: Yu Liao <liaoyu15@huawei.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 4 Aug 2023 20:33:53 +0000 (13:33 -0700)]
net: mana: Add page pool for RX buffers
Add page pool for RX buffers for faster buffer cycle and reduce CPU
usage.
The standard page pool API is used.
With iperf and 128 threads test, this patch improved the throughput
by 12-15%, and decreased the IRQ associated CPU's usage from 99-100% to
10-50%.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 6 Aug 2023 07:34:37 +0000 (08:34 +0100)]
Merge branch 'gve-desc'
Rushil Gupta says:
====================
gve: Add QPL mode for DQO descriptor format
GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Rushil Gupta [Fri, 4 Aug 2023 21:34:44 +0000 (21:34 +0000)]
gve: update gve.rst
Add a note about QPL and RDA mode
Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rushil Gupta [Fri, 4 Aug 2023 21:34:43 +0000 (21:34 +0000)]
gve: RX path for DQO-QPL
The RX path allocates the QPL page pool at queue creation, and
tries to reuse these pages through page recycling. This patch
ensures that on refill no non-QPL pages are posted to the device.
When the driver is running low on free buffers, an ondemand
allocation step kicks in that allocates a non-qpl page for
SKB business to free up the QPL page in use.
gve_try_recycle_buf was moved to gve_rx_append_frags so that driver does
not attempt to mark buffer as used if a non-qpl page was allocated
ondemand.
Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rushil Gupta [Fri, 4 Aug 2023 21:34:42 +0000 (21:34 +0000)]
gve: Tx path for DQO-QPL
Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers.
When a packet needs to be transmitted, we break the packet into max
GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX
descriptor.
We allocate the TX buffers from the free list in dqo_tx.
We store these TX buffer indices in an array in the pending_packet
structure.
The TX buffers are returned to the free list in dqo_compl after
receiving packet completion or when removing packets from miss
completions list.
Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rushil Gupta [Fri, 4 Aug 2023 21:34:41 +0000 (21:34 +0000)]
gve: Control path for DQO-QPL
GVE supports QPL ("queue-page-list") mode where
all data is communicated through a set of pre-registered
pages. Adding this mode to DQO descriptor format.
Add checks, abi-changes and device options to support
QPL mode for DQO in addition to GQI. Also, use
pages-per-qpl supplied by device-option to control the
size of the "queue-page-list".
Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Aug 2023 14:46:16 +0000 (14:46 +0000)]
tcp: set TCP_DEFER_ACCEPT locklessly
rskq_defer_accept field can be read/written without
the need of holding the socket lock.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Aug 2023 14:46:15 +0000 (14:46 +0000)]
tcp: set TCP_LINGER2 locklessly
tp->linger2 can be set locklessly as long as readers
use READ_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Aug 2023 14:46:14 +0000 (14:46 +0000)]
tcp: set TCP_KEEPCNT locklessly
tp->keepalive_probes can be set locklessly, readers
are already taking care of this field being potentially
set by other threads.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Aug 2023 14:46:13 +0000 (14:46 +0000)]
tcp: set TCP_KEEPINTVL locklessly
tp->keepalive_intvl can be set locklessly, readers
are already taking care of this field being potentially
set by other threads.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Aug 2023 14:46:12 +0000 (14:46 +0000)]
tcp: set TCP_USER_TIMEOUT locklessly
icsk->icsk_user_timeout can be set locklessly,
if all read sides use READ_ONCE().
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 5 Aug 2023 01:34:25 +0000 (18:34 -0700)]
Merge tag 'wireless-next-2023-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.6
The first pull request for v6.6 and only driver patches this time.
Nothing special really standing out, it has been quiet most likely due
to vacations.
Major changes:
rtl8xxxu
- enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU),
RTL8192EU and RTL8723BU
mwifiex
- allow moving to a different namespace
mt76
- preparation for mt7925 support
- mt7981 support
ath12k
- Extremely High Throughput (EHT) PHY support for Wi-Fi 7
* tag 'wireless-next-2023-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (172 commits)
wifi: rtw89: return failure if needed firmware elements are not recognized
wifi: rtw89: add to parse firmware elements of BB and RF tables
wifi: rtw89: introduce infrastructure of firmware elements
wifi: rtw89: add firmware suit for BB MCU 0/1
wifi: rtw89: add firmware parser for v1 format
wifi: rtw89: introduce v1 format of firmware header
wifi: rtw89: support firmware log with formatted text
wifi: rtw89: recognize log format from firmware file
wifi: ath12k: avoid deadlock by change ieee80211_queue_work for regd_update_work
wifi: ath12k: add handler for scan event WMI_SCAN_EVENT_DEQUEUED
wifi: ath12k: relax list iteration in ath12k_mac_vif_unref()
wifi: ath12k: configure puncturing bitmap
wifi: ath12k: parse WMI service ready ext2 event
wifi: ath12k: add MLO header in peer association
wifi: ath12k: peer assoc for 320 MHz
wifi: ath12k: add WMI support for EHT peer
wifi: ath12k: prepare EHT peer assoc parameters
wifi: ath12k: add EHT PHY modes
wifi: ath12k: propagate EHT capabilities to userspace
wifi: ath12k: WMI support to process EHT capabilities
...
====================