net: Correctly set segment mac_len in skb_segment().
When performing segmentation, the mac_len value is copied right
out of the original skb. However, this value is not always set correctly
(like when the packet is VLAN-tagged) and we'll end up copying a bad
value.
One way to demonstrate this is to configure a VM which tags
packets internally and turn off VLAN acceleration on the forwarding
bridge port. The packets show up corrupt like this:
16:18:24.985548 52:54:00:ab:be:25 > 52:54:00:26:ce:a3, ethertype 802.1Q
(0x8100), length 1518: vlan 100, p 0, ethertype 0x05e0,
0x0000: 8cdb 1c7c 8cdb 0064 4006 b59d 0a00 6402 ...|...d@.....d.
0x0010: 0a00 6401 9e0d b441 0a5e 64ec 0330 14fa ..d....A.^d..0..
0x0020: 29e3 01c9 f871 0000 0101 080a 000a e833)....q.........3
0x0030: 000f 8c75 6e65 7470 6572 6600 6e65 7470 ...unetperf.netp
0x0040: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
0x0050: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
0x0060: 6572 6600 6e65 7470 6572 6600 6e65 7470 erf.netperf.netp
...
This also leads to awful throughput as GSO packets are dropped and
cause retransmissions.
The solution is to set the mac_len using the values already available
in then new skb. We've already adjusted all of the header offset, so we
might as well correctly figure out the mac_len using skb_reset_mac_len().
After this change, packets are segmented correctly and performance
is restored.
CC: Eric Dumazet <edumazet@google.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 1 Aug 2014 05:23:58 +0000 (22:23 -0700)]
Merge branch 'xen-netfront'
David Vrabel says:
====================
xen-netfront: more multiqueue fixes
A few more xen-netfront fixes for the multiqueue support added in
3.16-rc1. It would be great if these could make it into 3.16 but I
suspect it's a little late for that now.
The second patch fixes a significant resource leak that prevents
guests from migrating more than a handful of times.
These have been tested by repeatedly migrating a guest over 250 times
(it would previously fail with this guest after only 8 iterations).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Vrabel [Thu, 31 Jul 2014 16:38:23 +0000 (17:38 +0100)]
xen-netfront: release per-queue Tx and Rx resource when disconnecting
Since netfront may reconnect to a backend with a different number of
queues, all per-queue Rx and Tx resources (skbs and grant references)
should be freed when disconnecting.
Without this fix, the Tx and Rx grant refs are not released and
netfront will exhaust them after only a few reconnections. netfront
will fail to connect when no free grant references are available.
Since all Rx bufs are freed and reallocated instead of reused this
will add some additional delay to the reconnection but this is
expected to be small compared to the time taken by any backend hotplug
scripts etc.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
macvlan: Initialize vlan_features to turn on offload support.
Macvlan devices do not initialize vlan_features. As a result,
any vlan devices configured on top of macvlans perform very poorly.
Initialize vlan_features based on the vlan features of the lower-level
device.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: Don't include NDA_VLAN for FDB entries with vid 0
An FDB entry with vlan_id 0 doesn't mean it is used in vlan 0, but used when
vlan_filtering is disabled.
There is inconsistency around NDA_VLAN whose payload is 0 - even if we add
an entry by RTM_NEWNEIGH without any NDA_VLAN, and even though adding an
entry with NDA_VLAN 0 is prohibited, we get an entry with NDA_VLAN 0 by
RTM_GETNEIGH.
Dumping an FDB entry with vlan_id 0 shouldn't include NDA_VLAN.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: use kobject_put instead of _del after kobject_add
Otherwise the name of the kobject isn't getting freed and other stuff from
kobject_cleanup() isn't getting called. kobject_put() will call
kobject_del() on its own in kobject_cleanup().
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 29 Jul 2014 14:29:30 +0000 (16:29 +0200)]
bna: fix performance regression
The recent commit "e29aa33 bna: Enable Multi Buffer RX" is causing
a performance regression. It does not properly update 'cmpl' pointer
at the end of the loop in NAPI handler bnad_cq_process(). The result is
only one packet / per NAPI-schedule is processed.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Paasch [Tue, 29 Jul 2014 11:40:57 +0000 (13:40 +0200)]
tcp: Fix integer-overflow in TCP vegas
In vegas we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.
Then, we need to do do_div to allow this to be used on 32-bit arches.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Christoph Paasch [Tue, 29 Jul 2014 10:07:27 +0000 (12:07 +0200)]
tcp: Fix integer-overflows in TCP veno
In veno we do a multiplication of the cwnd and the rtt. This
may overflow and thus their result is stored in a u64. However, we first
need to cast the cwnd so that actually 64-bit arithmetic is done.
A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion
control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it
failed to add the required cast in tcp_veno_cong_avoid().
Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"
Ipv4 tunnels created with "local any remote $ip" didn't work properly since 7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels
had src addr = 0.0.0.0. That was because only dst_entry was cached, although
fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry
(tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with
tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit().
This patch adds saddr to ip_tunnel->dst_cache, fixing this issue.
Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Mon, 28 Jul 2014 13:03:52 +0000 (15:03 +0200)]
bna: fill the magic in bnad_get_eeprom() instead of validating
A driver should fill magic field of ethtool_eeprom struct in .get_eeprom
and validate it in .set_eeprom. The bna incorrectly validates it in both
and this makes its .get_eeprom interface unusable.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull Exynos platform DT fix from Grant Likely:
"Device tree Exynos bug fix for v3.16-rc7
This bug fix has been brewing for a while. I hate sending it to you
so late, but I only got confirmation that it solves the problem this
past weekend. The diff looks big for a bug fix, but the majority of
it is only executed in the Exynos quirk case. Unfortunately it
required splitting early_init_dt_scan() in two and adding quirk
handling in the middle of it on ARM.
Exynos has buggy firmware that puts bad data into the memory node.
Commit 1c2f87c22566 ("ARM: Get rid of meminfo") exposed the bug by
dropping the artificial upper bound on the number of memory banks that
can be added. Exynos fails to boot after that commit. This branch
fixes it by splitting the early DT parse function and inserting a
fixup hook. Exynos uses the hook to correct the DT before parsing
memory regions"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
arm: Add devicetree fixup machine function
of: Add memory limiting function for flattened devicetrees
of: Split early_init_dt_scan into two parts
Merge tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fix from David Vrabel:
"Fix BUG when trying to expand the grant table. This seems to occur
often during boot with Ubuntu 14.04 PV guests"
* tag 'stable/for-linus-3.16-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: safely map and unmap grant frames when in atomic context
As reported by Stephen Rothwell, it causes compile failures in certain
configurations:
drivers/net/usb/cdc_subset.c:360:15: error: 'dummy_prereset' undeclared here (not in a function)
.pre_reset = dummy_prereset,
^
drivers/net/usb/cdc_subset.c:361:16: error: 'dummy_postreset' undeclared here (not in a function)
.post_reset = dummy_postreset,
^
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: David Miller <davem@davemloft.net> Cc: Oliver Neukum <oneukum@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) Make fragmentation IDs less predictable, from Eric Dumazet.
2) TSO tunneling can crash in bnx2x driver, fix from Dmitry Kravkov.
3) Don't allow NULL msg->msg_name just because msg->msg_namelen is
non-zero, from Andrey Ryabinin.
4) ndm->ndm_type set using wrong macros, from Jun Zhao.
5) cdc-ether devices can come up with entries in their address filter,
so explicitly clear the filter after the device initializes. From
Oliver Neukum.
6) Forgotten refcount bump in xfrm_lookup(), from Steffen Klassert.
7) Short packets not padded properly, exposing random data, in bcmgenet
driver. Fix from Florian Fainelli.
8) xgbe_probe() doesn't return an error code, but rather zero, when
netif_set_real_num_tx_queues() fails. Fix from Wei Yongjun.
9) USB speed not probed properly in r8152 driver, from Hayes Wang.
10) Transmit logic choosing the outgoing port in the sunvnet driver
needs to consider a) is the port actually up and b) whether it is a
switch port. Fix from David L Stevens.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
net: phy: re-apply PHY fixups during phy_register_device
cdc-ether: clean packet filter upon probe
cdc_subset: deal with a device that needs reset for timeout
net: sendmsg: fix NULL pointer dereference
isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
ip: make IP identifiers less predictable
neighbour : fix ndm_type type error issue
sunvnet: only use connected ports when sending
can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
bnx2x: fix crash during TSO tunneling
r8152: fix the checking of the usb speed
net: phy: Ensure the MDIO bus module is held
net: phy: Set the driver when registering an MDIO bus device
bnx2x: fix set_setting for some PHYs
hyperv: Fix error return code in netvsc_init_buf()
amd-xgbe: Fix error return code in xgbe_probe()
ath9k: fix aggregation session lockup
net: bcmgenet: correctly pad short packets
net: sctp: inherit auth_capable on INIT collisions
mac80211: fix crash on getting sta info with uninitialized rate control
...
David Vrabel [Fri, 11 Jul 2014 15:42:34 +0000 (16:42 +0100)]
x86/xen: safely map and unmap grant frames when in atomic context
arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.
Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.
These two functions are only used in PV guests.
Introduce arch_gnttab_init() to allocates the virtual address space in
advance.
Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Will Deacon [Fri, 25 Jul 2014 15:29:12 +0000 (16:29 +0100)]
kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
If the physical address of GICV isn't page-aligned, then we end up
creating a stage-2 mapping of the page containing it, which causes us to
map neighbouring memory locations directly into the guest.
As an example, consider a platform with GICV at physical 0x2c02f000
running a 64k-page host kernel. If qemu maps this into the guest at
0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
physical regions may cause UNPREDICTABLE behaviour, for example, on the
Juno platform this will cause an SError exception to EL3, which brings
down the entire physical CPU resulting in RCU stalls / HYP panics / host
crashing / wasted weeks of debugging.
SBSA recommends that systems alias the 4k GICV across the bounding 64k
region, in which case GICV physical could be described as 0x2c020000 in
the above scenario.
This patch fixes the problem by failing the vgic probe if the physical
base address or the size of GICV aren't page-aligned. Note that this
generated a warning in dmesg about freeing enabled IRQs, so I had to
move the IRQ enabling later in the probe.
Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Joel Schopp <joel.schopp@amd.com> Cc: Don Dutile <ddutile@redhat.com> Acked-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Joel Schopp <joel.schopp@amd.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:36 +0000 (10:03 -0700)]
arm: Add devicetree fixup machine function
Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name] Signed-off-by: Grant Likely <grant.likely@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:35 +0000 (10:03 -0700)]
of: Add memory limiting function for flattened devicetrees
Buggy bootloaders may pass bogus memory entries in the devicetree.
Add of_fdt_limit_memory to add an upper bound on the number of
entries that can be present in the devicetree.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Tested-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Grant Likely <grant.likely@linaro.org>
Laura Abbott [Tue, 15 Jul 2014 17:03:34 +0000 (10:03 -0700)]
of: Split early_init_dt_scan into two parts
Currently, early_init_dt_scan validates the header, sets the
boot params, and scans for chosen/memory all in one function.
Split this up into two separate functions (validation/setting
boot params in one, scanning in another) to allow for
additional setup between boot params and scanning the memory.
Signed-off-by: Laura Abbott <lauraa@codeaurora.org> Tested-by: Andreas Färber <afaerber@suse.de>
[glikely: s/early_init_dt_scan_all/early_init_dt_scan_nodes/] Signed-off-by: Grant Likely <grant.likely@linaro.org>
net: phy: re-apply PHY fixups during phy_register_device
Commit 87aa9f9c61ad ("net: phy: consolidate PHY reset in phy_init_hw()")
moved the call to phy_scan_fixups() in phy_init_hw() after a software
reset is performed.
By the time phy_init_hw() is called in phy_device_register(), no driver
has been bound to this PHY yet, so all the checks in phy_init_hw()
against the PHY driver and the PHY driver's config_init function will
return 0. We will therefore never call phy_scan_fixups() as we should.
Fix this by calling phy_scan_fixups() and check for its return value to
restore the intended functionality.
This broke PHY drivers which do register an early PHY fixup callback to
intercept the PHY probing and do things like changing the 32-bits unique
PHY identifier when a pseudo-PHY address has been used, as well as
board-specific PHY fixups that need to be applied during driver probe
time.
Reported-by: Hauke Merthens <hauke-m@hauke-m.de> Reported-by: Jonas Gorski <jogo@openwrt.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Mon, 28 Jul 2014 08:56:36 +0000 (10:56 +0200)]
cdc-ether: clean packet filter upon probe
There are devices that don't do reset all the way. So the packet filter should
be set to a sane initial value. Failure to do so leads to intermittent failures
of DHCP on some systems under some conditions.
Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
After this report there was no usual "Unable to handle kernel NULL pointer dereference"
and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
This bug was introduced in f3d3342602f8bcbf37d7c46641cb9bca7618eb1c
(net: rework recvmsg handler msg_name and msg_namelen logic).
Commit message states that:
"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address."
But in fact this affects sendto when address 0 is mapped and contains
socket address structure in it. In such case copy-in address will succeed,
verify_iovec() function will successfully exit with msg->msg_namelen > 0
and msg->msg_name == NULL.
This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Eric Dumazet <edumazet@google.com> Cc: <stable@vger.kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
isdn/bas_gigaset: fix a leak on failure path in gigaset_probe()
There is a lack of usb_put_dev(udev) on failure path in gigaset_probe().
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Acked-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A nice small set of bug fixes for arm-soc:
- two incorrect register addresses in DT files on shmobile and hisilicon
- one revert for a regression on omap
- one bug fix for a newly introduced pin controller binding
- one regression fix for the memory controller on omap
- one patch to avoid a harmless WARN_ON"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: Revert enabling of twl configuration for n900
ARM: dts: fix L2 address in Hi3620
ARM: OMAP2+: gpmc: fix gpmc_hwecc_bch_capable()
pinctrl: dra: dt-bindings: Fix pull enable/disable
ARM: shmobile: r8a7791: Fix SD2CKCR register address
ARM: OMAP2+: l2c: squelch warning dump on power control setting
Randy Dunlap [Sun, 27 Jul 2014 21:15:33 +0000 (14:15 -0700)]
mm: fix page_alloc.c kernel-doc warnings
Fix kernel-doc warnings and function name in mm/page_alloc.c:
Warning(..//mm/page_alloc.c:6074): No description found for parameter 'pfn'
Warning(..//mm/page_alloc.c:6074): No description found for parameter 'mask'
Warning(..//mm/page_alloc.c:6074): Excess function parameter 'start_bitidx' description in 'get_pfnblock_flags_mask'
Warning(..//mm/page_alloc.c:6102): No description found for parameter 'pfn'
Warning(..//mm/page_alloc.c:6102): No description found for parameter 'mask'
Warning(..//mm/page_alloc.c:6102): Excess function parameter 'start_bitidx' description in 'set_pfnblock_flags_mask'
Merge tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Merge "omap n900 regression fix for v3.16 rc series" from Tony Lindgren:
Minimal regression fix for n900 display that got broken with
enabling of twl4030 PM features. Turns out more work is needed
before we can enable twl4030 PM on n900.
I did not notice this earlier as I have my n900 in a rack
and the display did not get enabled for device tree based booting
until for v3.16.
* tag 'omap-for-v3.16/n900-regression' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: dts: Revert enabling of twl configuration for n900
Tony Lindgren [Fri, 25 Jul 2014 11:41:25 +0000 (04:41 -0700)]
ARM: dts: Revert enabling of twl configuration for n900
Commit 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration
for selected omaps) allowed n900 to cut off core voltages during
off-idle. This however caused a regression where twl regulator
vaux1 was not getting enabled for the LCD panel as we are not
requesting it for the panel.
Turns out quite a few devices on n900 are using vaux1, and we need
to either stop idling it, or add proper regulator_get calls for all
users. But until we have a proper solution implemented and tested,
let's just disable the twl off-idle configuration for now for n900.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Fixes: 9188883fd66e9 (ARM: dts: Enable twl4030 off-idle configuration for selected omaps) Signed-off-by: Tony Lindgren <tony@atomide.com>
Eric Dumazet [Sat, 26 Jul 2014 06:58:10 +0000 (08:58 +0200)]
ip: make IP identifiers less predictable
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
Jedidiah describe ways exploiting linux IP identifier generation to
infer whether two machines are exchanging packets.
With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we
changed IP id generation, but this does not really prevent this
side-channel technique.
This patch adds a random amount of perturbation so that IP identifiers
for a given destination [1] are no longer monotonically increasing after
an idle period.
Note that prandom_u32_max(1) returns 0, so if generator is used at most
once per jiffy, this patch inserts no hole in the ID suite and do not
increase collision probability.
This is jiffies based, so in the worst case (HZ=1000), the id can
rollover after ~65 seconds of idle time, which should be fine.
We also change the hash used in __ip_select_ident() to not only hash
on daddr, but also saddr and protocol, so that ICMP probes can not be
used to infer information for other protocols.
For IPv6, adds saddr into the hash as well, but not nexthdr.
If I ping the patched target, we can see ID are now hard to predict.
21:57:11.008086 IP (...)
A > target: ICMP echo request, seq 1, length 64
21:57:11.010752 IP (... id 2081 ...)
target > A: ICMP echo reply, seq 1, length 64
21:57:12.013133 IP (...)
A > target: ICMP echo request, seq 2, length 64
21:57:12.015737 IP (... id 3039 ...)
target > A: ICMP echo reply, seq 2, length 64
21:57:13.016580 IP (...)
A > target: ICMP echo request, seq 3, length 64
21:57:13.019251 IP (... id 3437 ...)
target > A: ICMP echo reply, seq 3, length 64
[1] TCP sessions uses a per flow ID generator not changed by this patch.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jun Zhao [Fri, 25 Jul 2014 16:38:59 +0000 (00:38 +0800)]
neighbour : fix ndm_type type error issue
ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST.
NDA_DST is for netlink TLV type, hence it's not right value in this context.
Signed-off-by: Jun Zhao <mypopydev@gmail.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David L Stevens [Fri, 25 Jul 2014 14:30:11 +0000 (10:30 -0400)]
sunvnet: only use connected ports when sending
The sunvnet driver doesn't check whether or not a port is connected when
transmitting packets, which results in failures if a port fails to connect
(e.g., due to a version mismatch). The original code also assumes
unnecessarily that the first port is up and a switch, even though there is
a flag for switch ports.
This patch only matches a port if it is connected, and otherwise uses the
switch_port flag to send the packet to a switch port that is up.
Signed-off-by: David L Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 29 Jul 2014 00:01:01 +0000 (17:01 -0700)]
Merge tag 'linux-can-fixes-for-3.16-20140725' of git://gitorious.org/linux-can/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2014-07-25
this is a pull request of one patch for the net tree, hoping to get into the
3.16 release.
The patch by George Cherian fixes a regression in the c_can platform driver.
When using two interfaces the regression leads to a non function second
interface.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull ARM AES crypto fixes from Herbert Xu:
"This push fixes a regression on ARM where odd-sized blocks supplied to
AES may cause crashes"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: arm-aes - fix encryption of unaligned data
crypto: arm64-aes - fix encryption of unaligned data
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are 3 more small powerpc fixes that should still go into .16.
One is a recent regression (MMCR2 business), the other is a trivial
endian fix without which FW updates won't work on LE in IBM machines,
and the 3rd one turns a BUG_ON into a WARN_ON which is definitely a
LOT more friendly especially when the whole thing is about retrieving
error logs ..."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix endianness of flash_block_list in rtas_flash
powerpc/powernv: Change BUG_ON to WARN_ON in elog code
powerpc/perf: Fix MMCR2 handling for EBB
crypto: arm64-aes - fix encryption of unaligned data
cryptsetup fails on arm64 when using kernel encryption via AF_ALG socket.
See https://bugzilla.redhat.com/show_bug.cgi?id=1122937
The bug is caused by incorrect handling of unaligned data in
arch/arm64/crypto/aes-glue.c. Cryptsetup creates a buffer that is aligned
on 8 bytes, but not on 16 bytes. It opens AF_ALG socket and uses the
socket to encrypt data in the buffer. The arm64 crypto accelerator causes
data corruption or crashes in the scatterwalk_pagedone.
This patch fixes the bug by passing the residue bytes that were not
processed as the last parameter to blkcipher_walk_done.
Thomas Falcon [Fri, 25 Jul 2014 17:47:42 +0000 (12:47 -0500)]
powerpc: Fix endianness of flash_block_list in rtas_flash
The function rtas_flash_firmware passes the address of a data structure,
flash_block_list, when making the update-flash-64-and-reboot rtas call.
While the endianness of the address is handled correctly, the endianness
of the data is not. This patch ensures that the data in flash_block_list
is big endian when passed to rtas on little endian hosts.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A bunch of fixes for perf and kprobes:
- revert a commit that caused a perf group regression
- silence dmesg spam
- fix kprobe probing errors on ia64 and ppc64
- filter kprobe faults from userspace
- lockdep fix for perf exit path
- prevent perf #GP in KVM guest
- correct perf event and filters"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
kprobes/x86: Don't try to resolve kprobe faults from userspace
perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
perf/x86/intel: Protect LBR and extra_regs against KVM lying
perf: Fix lockdep warning on process exit
perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
perf: Revert ("perf: Always destroy groups on exit")
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"A couple of crash fixes, plus a fix that on 32 bits would cause a
missing -ENOSYS for nonexistent system calls"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Fix cache topology for early P4-SMT
x86_32, entry: Store badsys error code in %eax
x86, MCE: Robustify mcheck_init_device
Merge tag 'firewire-fix-vt6315' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire regression fix from Stefan Richter:
"IEEE 1394 (FireWire) subsystem fix: MSI don't work on VIA PCIe
controllers with some isochronous workloads (regression since
v3.16-rc1)"
* tag 'firewire-fix-vt6315' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: ohci: disable MSI for VIA VT6315 again
Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Michel Dänzer and a couple of other people reported inexplicable random
oopses in the scheduler, and the cause turns out to be gcc mis-compiling
the load_balance() function when debugging is enabled. The gcc bug
apparently goes back to gcc-4.5, but slight optimization changes means
that it now showed up as a problem in 4.9.0 and 4.9.1.
The instruction scheduling problem causes gcc to schedule a spill
operation to before the stack frame has been created, which in turn can
corrupt the spilled value if an interrupt comes in. There may be other
effects of this bug too, but that's the code generation problem seen in
Michel's case.
This is fixed in current gcc HEAD, but the workaround as suggested by
Markus Trippelsdorf is pretty simple: use -fno-var-tracking-assignments
when compiling the kernel, which disables the gcc code that causes the
problem. This can result in slightly worse debug information for
variable accesses, but that is infinitely preferable to actual code
generation problems.
Doing this unconditionally (not just for CONFIG_DEBUG_INFO) also allows
non-debug builds to verify that the debug build would be identical: we
can do
export GCC_COMPARE_DEBUG=1
to make gcc internally verify that the result of the build is
independent of the "-g" flag (it will make the compiler build everything
twice, toggling the debug flag, and compare the results).
Without the "-fno-var-tracking-assignments" option, the build would fail
(even with 4.8.3 that didn't show the actual stack frame bug) with a gcc
compare failure.
970 if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) ==
971 PF_MEMALLOC))
I did not respond at the time, because a glance at the PageDirty block
in shrink_page_list() quickly shows that this is impossible: we don't do
writeback on file pages (other than tmpfs) from direct reclaim nowadays.
Dave was hallucinating, but it would have been disrespectful to say so.
However, my own /var/log/messages now shows similar complaints
WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b()
WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b()
from stressing some mmotm trees during July.
Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked,
so fail shrink_page_list()'s page_is_file_cache() test, and so proceed
to mapping->a_ops->writepage()?
Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination
page freeing callback") has provided such a way to compaction: if
migrating a SwapBacked page fails, its newpage may be put back on the
list for later use with PageSwapBacked still set, and nothing will clear
it.
Whether that can do anything worse than issue WARN_ON_ONCEs, and get
some statistics wrong, is unclear: easier to fix than to think through
the consequences.
Fixing it here, before the put_new_page(), addresses the bug directly,
but is probably the worst place to fix it. Page migration is doing too
many parts of the job on too many levels: fixing it in
move_to_new_page() to complement its SetPageSwapBacked would be
preferable, except why is it (and newpage->mapping and newpage->index)
done there, rather than down in migrate_page_move_mapping(), once we are
sure of success? Not a cleanup to get into right now, especially not
with memcg cleanups coming in 3.17.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This is radeon and intel fixes, and is a small bit larger than I'm
guessing you'd like it to be.
- i915: fixes 32-bit highmem i915 blank screen, semaphore hang and
runtime pm fix
- radeon: gpuvm stability fix for hangs since 3.15, and hang/reboot
regression on TN/RL devices,
The only slightly controversial one is the change to use GB for the
vm_size, which I'm letting through as its a new interface we defined
in this merge window, and I'd prefer to have the released kernel have
the final interface rather than changing it later"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix cut and paste issue for hawaii.
drm/radeon: fix irq ring buffer overflow handling
drm/i915: Simplify i915_gem_release_all_mmaps()
drm/radeon: fix error handling in radeon_vm_bo_set_addr
drm/i915: fix freeze with blank screen booting highmem
drm/i915: Reorder the semaphore deadlock check, again
drm/radeon/TN: only enable bapm on MSI systems
drm/radeon: fix VM IB handling
drm/radeon: fix handling of radeon_vm_bo_rmv v3
drm/radeon: let's use GB for vm_size (v2)
Merge tag 'sound-3.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here contains only the fixes for the new FireWire bebob driver. All
fairly trivial and local fixes, so safe to apply"
* tag 'sound-3.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: bebob: Correction for return value of special_clk_ctl_put() in error
ALSA: bebob: Correction for return value of .put callback
ALSA: bebob: Use different labels for digital input/output
ALSA: bebob: Fix a missing to unlock mutex in error handling case
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fixes to temperature limit and vrm write operations in smsc47m192
driver"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (smsc47m192) Fix temperature limit and vrm write operations
Randy Dunlap [Sat, 26 Jul 2014 00:07:47 +0000 (17:07 -0700)]
parport: fix menu breakage
Do not split the PARPORT-related symbols with the new kconfig
symbol ARCH_MIGHT_HAVE_PC_PARPORT. The split was causing incorrect
display of these symbols -- they were not being displayed together
as they should be.
Fixes: d90c3eb31535 "Kconfig cleanup (PARPORT_PC dependencies)" Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mark Salter <msalter@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: stable@vger.kernel.org # for 3.13, 3.14, 3.15 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'blackfin-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux
Pull blackfin fixes from Steven Miao:
"smc nor flash PM fix, pinctrl group fix, update defconfig, and build
fixes"
* tag 'blackfin-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/realmz6/blackfin-linux:
blackfin: vmlinux.lds.S: reserve 32 bytes space at the end of data section for XIP kernel
defconfig: BF609: update spi config name
irq: blackfin sec: drop duplicated sec priority set
blackfin: bind different groups of one pinmux function to different state name
blackfin: fix some bf5xx boards build for missing <linux/gpio.h>
pm: bf609: cleanup smc nor flash
Merge branch 'parisc-3.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"We have two trivial patches in here. One removes the SA_RESTORER
#define since on parisc we don't have the sa_restorer field in struct
sigaction, the other patch removes an unnecessary memset().
The SA_RESTORER removal patch is scheduled for stable trees, since
without it some userspace apps don't build"
* 'parisc-3.16-6' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Eliminate memset after alloc_bootmem_pages
parisc: Remove SA_RESTORER define
George Cherian [Fri, 25 Jul 2014 06:19:41 +0000 (11:49 +0530)]
can: c_can_platform: Fix raminit, use devm_ioremap() instead of devm_ioremap_resource()
The raminit register is shared register for both can0 and can1. Since commit:
32766ff net: can: Convert to use devm_ioremap_resource
devm_ioremap_resource() is used to map raminit register. When using both
interfaces the mapping for the can1 interface fails, leading to a non
functional can interface.
Signed-off-by: George Cherian <george.cherian@ti.com> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.11 Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Fixes: a848ade408b ("bnx2x: add CSUM and TSO support for encapsulation protocols") Reported-by: Yulong Pei <ypei@redhat.com> Cc: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the usb speed of the RTL8152 is not high speed, the USB_DEV_STAT[2:1]
should be equal to [0 1]. That is, the STAT_SPEED_FULL should be equal
to 2.
There is a easy way to check the usb speed by the speed field of the
struct usb_device. Use it to replace the original metheod.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Spotted-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Please pull this batch of fixes intended for the 3.16 stream...
For the mac80211 fixes, Johannes says:
"I have two fixes: one for tracing that fixes a long-standing NULL
pointer dereference, and one for a mac80211 issue that causes iwlmvm to
send invalid frames during authentication/association."
and,
"One more fix - for a bug in the newly introduced code that obtains rate
control information for stations."
For the iwlwifi fixes, Emmanuel says:
"It includes a merge damage fix. This region has been changed in -next
and -fixes quite a few times and apparently, I failed to handle it
properly, so here the fix. Along with that I have a fix from Eliad
to properly handle overlapping BSS in AP mode."
On top of that, Felix provides and ath9k fix for Tx stalls that happen
after an aggregation session failure.
Please let me know if there are problems! There are some changes
here that will cause merge conflicts in -next. Once you merge this
I can pull it into wireless-next and resolve those issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 25 Jul 2014 01:10:02 +0000 (18:10 -0700)]
Merge branch 'mdio_module_unload'
Ezequiel Garcia says:
====================
net: phy: Prevent an MDIO bus from being unloaded while in use
Changes from v1:
* Dropped the unneeded module_put() on phy_attach_direct().
The motivation of this small series is to fix the current lack of relationship
between an ethernet driver and the MDIO bus behind the PHY device. In such
cases, we would find no visible link between the drivers:
$ lsmod
Module Size Used by Not tainted
mvmdio 2941 0
mvneta 22069 0
Which means nothing prevents the MDIO driver from being removed:
$ modprobe -r mvmdio
# Unable to handle kernel NULL pointer dereference at virtual address 00000060
pgd = c0004000
[00000060] *pgd=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in: mvneta [last unloaded: mvmdio]
CPU: 0 PID: 22 Comm: kworker/0:1 Not tainted 3.16.0-rc5-01127-g62c0816-dirty #608
Workqueue: events_power_efficient phy_state_machine
task: df5ec840 ti: df67a000 task.ti: df67a000
PC is at phy_state_machine+0x1c/0x468
LR is at phy_state_machine+0x18/0x468
[snip]
This patchset fixes this problem by adding a pair of module_{get,put},
so the module reference count is increased when the PHY device is attached
and is decreased when the PHY is detached.
Tested with mvneta and mv643xx_eth drivers, which depend on the mvmdio
driver to handle MDIO. This series applies on both net and net-next branches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds proper module_{get,put} to prevent the MDIO bus module
from being unloaded while the phydev is connected. By doing so, we fix
a kernel panic produced when a MDIO driver is removed, but the phydev
that relies on it is attached and running.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: Set the driver when registering an MDIO bus device
mdiobus_register() registers a device which is already bound to a driver.
Hence, the driver pointer should be set properly in order to track down
the driver associated to the MDIO bus.
This will be used to allow ethernet driver to pin down a MDIO bus driver,
preventing it from being unloaded while the PHY device is running.
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Allow set_settings() to complete succesfully even if link is
not estabilished and port type isn't known yet.
Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com> Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
went in. What this patch did was, among others, check the return value
of misc_register and exit early if it encountered an error. Original
code sloppily didn't do that.
However,
cef12ee52b05 ("xen/mce: Add mcelog support for Xen platform")
made it so that xen's init routine xen_late_init_mcelog runs first. This
was needed for the xen mcelog device which is supposed to be independent
from the baremetal one.
Initially it was reported that misc_register() fails often on xen and
that's why it needed fixing. However, it is *supposed* to fail by
design, when running in dom0 so that the xen mcelog device file gets
registered first.
And *then* you need the notifier *not* unregistered on the error path so
that the timer does get deleted properly in the CPU hotplug notifier.
Btw, this fix is needed also on baremetal in the unlikely event that
misc_register(&mce_chrdev_device) fails there too.
I was unsure whether to rush it in now and decided to delay it to 3.17.
However, xen people wanted it promoted as it breaks xen when doing cpu
hotplug there. So, after a bit of simmering in tip/master for initial
smoke testing, let's move it to 3.16. It fixes a semi-regression which
got introduced in 3.16 so no need for stable tagging.
tip/x86/ras contains that exact same commit but we can't remove it
there as it is not the last one. It won't cause any merge issues, as I
confirmed locally but I should state here the special situation of this
one fix explicitly anyway.
Thanks.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Dave Airlie [Thu, 24 Jul 2014 23:16:28 +0000 (09:16 +1000)]
Merge tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel into drm-fixes
This time in time! Just 32bit-pae fix from Hugh, semaphores fun from Chris
and a fix for runtime pm cherry-picked from next.
Paulo is still working on a fix for runtime pm when X does cursor fun when
the display is off, but that one isn't ready yet.
* tag 'drm-intel-fixes-2014-07-24' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Simplify i915_gem_release_all_mmaps()
drm/i915: fix freeze with blank screen booting highmem
drm/i915: Reorder the semaphore deadlock check, again
The sa_restorer field in struct sigaction is obsolete and no longer in
the parisc implementation. However, the core code assumes the field is
present if SA_RESTORER is defined. So, the define needs to be removed.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: <stable@vger.kernel.org> Signed-off-by: Helge Deller <deller@gmx.de>
hwmon: (smsc47m192) Fix temperature limit and vrm write operations
Temperature limit clamps are applied after converting the temperature
from milli-degrees C to degrees C, so either the clamp limit needs
to be specified in degrees C, not milli-degrees C, or clamping must
happen before converting to degrees C. Use the latter method to avoid
overflows.
vrm is an u8, so the written value needs to be limited to [0, 255].
Cc: Axel Lin <axel.lin@ingics.com> Cc: stable@vger.kernel.org Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Jean Delvare <jdelvare@suse.de>
Merge tag 'renesas-fixes2-for-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes
Merge "Second Round of Renesas ARM Based SoC Fixes for v3.16" from Simon Horman
* Fix SD2CKCR register address of r8a7791 (R-Car M2) SoC
This corrects a bug introduced in v3.14 by 59e79895b9589286 ("ARM: shmobile: r8a7791: Add clocks").
However, it does not manifest in mainline code until
SDHI devices were enabled on the Koelsch board in v3.15 by 2c60a7df72711fb8 ("ARM: shmobile: Add SDHI devices for Koelsch DTS").
It also manifests on the Henninger board when
SDHI devices were enabled in v3.16-rc1 by 1299df03d7191ab4 ("ARM: shmobile: henninger: add SDHI0/2 DT support")
* tag 'renesas-fixes2-for-v3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: r8a7791: Fix SD2CKCR register address
In this case mountpoint_last() gets an extra refcount on path->mnt
Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Ian Kent <raven@themaw.net> Acked-by: Jeff Layton <jlayton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de>
direct-io: fix uninitialized warning in do_direct_IO()
The following warnings:
fs/direct-io.c: In function ‘__blockdev_direct_IO’:
fs/direct-io.c:1011:12: warning: ‘to’ may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/direct-io.c:913:16: note: ‘to’ was declared here
fs/direct-io.c:1011:12: warning: ‘from’ may be used uninitialized in this function [-Wmaybe-uninitialized]
fs/direct-io.c:913:10: note: ‘from’ was declared here
are false positive because dio_get_page() either fails, or sets both
'from' and 'to'.
Paul Bolle said ...
Maybe it's better to move initializing "to" and "from" out of
dio_get_page(). That _might_ make it easier for both the the reader and
the compiler to understand what's going on. Something like this:
Christoph Hellwig said ...
The fix of moving the code definitively looks nicer, while I think
uninitialized_var is horrible wart that won't get anywhere near my code.
Boaz Harrosh: I agree with Christoph and Paul
Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Catalin Marinas:
"Fix arm64 regression introduced by limiting the CMA buffer to ZONE_DMA
on platforms where RAM starts above 4GB (and ZONE_DMA becoming 0)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB
Merge tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux
Pull Xtensa fixes from Chris Zankel:
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
* tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
xtensa: fix sysmem reservation at the end of existing block
xtensa: add fixup for double exception raised in window overflow
Merge tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for the v3.16 series. Sorry that
some of these arrive late, the summer heat in Sweden makes me slow.
- an IRQ handling fix for the STi driver, also for stable
- another IRQ fix for the RCAR GPIO driver
- a MAINTAINERS entry"
* tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: rcar: Add support for DT IRQ flags
MAINTAINERS: Add entry for the Renesas pin controller driver
pinctrl: st: Fix irqmux handler
Merge branch 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata regression fix from Tejun Heo:
"The last libata/for-3.16-fixes pull contained a regression introduced
by 1871ee134b73 ("libata: support the ata host which implements a
queue depth less than 32") which in turn was a fix for a regression
introduced earlier while changing queue tag order to accomodate hard
drives which perform poorly if tags are not allocated in circular
order (ugh...).
The regression happens only for SAS controllers making use of libata
to serve ATA devices. They don't fill an ata_host field which is used
by the new tag allocation function leading to NULL dereference.
This patch adds a new intermediate field ata_host->n_tags which is
initialized for both SAS and !SAS cases to fix the issue"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: introduce ata_host->n_tags to avoid oops on SAS controllers
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here is a handful of powerpc fixes for 3.16. They are all pretty
simple and self contained and should still make this release"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: use _GLOBAL_TOC for memmove
powerpc/pseries: dynamically added OF nodes need to call of_node_init
powerpc: subpage_protect: Increase the array size to take care of 64TB
powerpc: Fix bugs in emulate_step()
powerpc: Disable doorbells on Power8 DD1.x
Merge tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull slab fix from Mike Snitzer:
"This fixes the broken duplicate slab name check in
kmem_cache_sanity_check() that has been repeatedly reported (as
recently as today against Fedora rawhide).
Pekka seemed to have it staged for a late 3.15-rc in his 'slab/urgent'
branch but never sent a pull request, see:
https://lkml.org/lkml/2014/5/23/648"
* tag 'urgent-slab-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
slab_common: fix the check for duplicate slab names
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: hugetlb: fix copy_hugetlb_page_range()
simple_xattr: permit 0-size extended attributes
mm/fs: fix pessimization in hole-punching pagecache
shmem: fix splicing from a hole while it's punched
shmem: fix faulting into a hole, not taking i_mutex
mm: do not call do_fault_around for non-linear fault
sh: also try passing -m4-nofpu for SH2A builds
zram: avoid lockdep splat by revalidate_disk
mm/rmap.c: fix pgoff calculation to handle hugepage correctly
coredump: fix the setting of PF_DUMPCORE
Commit 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry") changed the order of
huge_ptep_set_wrprotect() and huge_ptep_get(), which leads to breakage
in some workloads like hugepage-backed heap allocation via libhugetlbfs.
This patch fixes it.
Fixes 4a705fef9862 ("hugetlb: fix copy_hugetlb_page_range() to handle
migration/hwpoisoned entry"), so is applicable to -stable kernels which
include it.
If a filesystem uses simple_xattr to support user extended attributes,
LTP setxattr01 and xfstests generic/062 fail with "Cannot allocate
memory": simple_xattr_alloc()'s wrap-around test mistakenly excludes
values of zero size. Fix that off-by-one (but apparently no filesystem
needs them yet).
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm/fs: fix pessimization in hole-punching pagecache
I wanted to revert my v3.1 commit d0823576bf4b ("mm: pincer in
truncate_inode_pages_range"), to keep truncate_inode_pages_range() in
synch with shmem_undo_range(); but have stepped back - a change to
hole-punching in truncate_inode_pages_range() is a change to
hole-punching in every filesystem (except tmpfs) that supports it.
If there's a logical proof why no filesystem can depend for its own
correctness on the pincer guarantee in truncate_inode_pages_range() - an
instant when the entire hole is removed from pagecache - then let's
revisit later. But the evidence is that only tmpfs suffered from the
livelock, and we have no intention of extending hole-punch to ramfs. So
for now just add a few comments (to match or differ from those in
shmem_undo_range()), and fix one silliness noticed in d0823576bf4b...
Its "index == start" addition to the hole-punch termination test was
incomplete: it opened a way for the end condition to be missed, and the
loop go on looking through the radix_tree, all the way to end of file.
Fix that pessimization by resetting index when detected in inner loop.
Note that it's actually hard to hit this case, without the obsessive
concurrent faulting that trinity does: normally all pages are removed in
the initial trylock_page() pass, and this loop finds nothing to do. I
had to "#if 0" out the initial pass to reproduce bug and test fix.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem: fix splicing from a hole while it's punched
shmem_fault() is the actual culprit in trinity's hole-punch starvation,
and the most significant cause of such problems: since a page faulted is
one that then appears page_mapped(), needing unmap_mapping_range() and
i_mmap_mutex to be unmapped again.
But it is not the only way in which a page can be brought into a hole in
the radix_tree while that hole is being punched; and Vlastimil's testing
implies that if enough other processors are busy filling in the hole,
then shmem_undo_range() can be kept from completing indefinitely.
shmem_file_splice_read() is the main other user of SGP_CACHE, which can
instantiate shmem pagecache pages in the read-only case (without holding
i_mutex, so perhaps concurrently with a hole-punch). Probably it's
silly not to use SGP_READ already (using the ZERO_PAGE for holes): which
ought to be safe, but might bring surprises - not a change to be rushed.
shmem_read_mapping_page_gfp() is an internal interface used by
drivers/gpu/drm GEM (and next by uprobes): it should be okay. And
shmem_file_read_iter() uses the SGP_DIRTY variant of SGP_CACHE, when
called internally by the kernel (perhaps for a stacking filesystem,
which might rely on holes to be reserved): it's unclear whether it could
be provoked to keep hole-punch busy or not.
We could apply the same umbrella as now used in shmem_fault() to
shmem_file_splice_read() and the others; but it looks ugly, and use over
a range raises questions - should it actually be per page? can these get
starved themselves?
The origin of this part of the problem is my v3.1 commit d0823576bf4b
("mm: pincer in truncate_inode_pages_range"), once it was duplicated
into shmem.c. It seemed like a nice idea at the time, to ensure
(barring RCU lookup fuzziness) that there's an instant when the entire
hole is empty; but the indefinitely repeated scans to ensure that make
it vulnerable.
Revert that "enhancement" to hole-punch from shmem_undo_range(), but
retain the unproblematic rescanning when it's truncating; add a couple
of comments there.
Remove the "indices[0] >= end" test: that is now handled satisfactorily
by the inner loop, and mem_cgroup_uncharge_start()/end() are too light
to be worth avoiding here.
But if we do not always loop indefinitely, we do need to handle the case
of swap swizzled back to page before shmem_free_swap() gets it: add a
retry for that case, as suggested by Konstantin Khlebnikov; and for the
case of page swizzled back to swap, as suggested by Johannes Weiner.
Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> [3.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shmem: fix faulting into a hole, not taking i_mutex
Commit f00cdc6df7d7 ("shmem: fix faulting into a hole while it's
punched") was buggy: Sasha sent a lockdep report to remind us that
grabbing i_mutex in the fault path is a no-no (write syscall may already
hold i_mutex while faulting user buffer).
We tried a completely different approach (see following patch) but that
proved inadequate: good enough for a rational workload, but not good
enough against trinity - which forks off so many mappings of the object
that contention on i_mmap_mutex while hole-puncher holds i_mutex builds
into serious starvation when concurrent faults force the puncher to fall
back to single-page unmap_mapping_range() searches of the i_mmap tree.
So return to the original umbrella approach, but keep away from i_mutex
this time. We really don't want to bloat every shmem inode with a new
mutex or completion, just to protect this unlikely case from trinity.
So extend the original with wait_queue_head on stack at the hole-punch
end, and wait_queue item on the stack at the fault end.
This involves further use of i_lock to guard against the races: lockdep
has been happy so far, and I see fs/inode.c:unlock_new_inode() holds
i_lock around wake_up_bit(), which is comparable to what we do here.
i_lock is more convenient, but we could switch to shmem's info->lock.
This issue has been tagged with CVE-2014-4171, which will require commit f00cdc6df7d7 and this and the following patch to be backported: we
suggest to 3.1+, though in fact the trinity forkbomb effect might go
back as far as 2.6.16, when madvise(,,MADV_REMOVE) came in - or might
not, since much has changed, with i_mmap_mutex a spinlock before 3.0.
Anyone running trinity on 3.0 and earlier? I don't think we need care.
Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lukas Czerner <lczerner@redhat.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> [3.1+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>