Tomasz Figa [Sat, 11 Jan 2014 21:39:02 +0000 (22:39 +0100)]
mmc: sdhci-s3c: Cache bus clock rates
To fix scheduling while atomic happening in sdhci_s3c_set_clock() caused
by calling clk_get_rate() that might sleep, this patch modifies the
driver to cache rates of all bus clocks at probe time and then only use
those cache values.
Tomasz Figa [Sat, 11 Jan 2014 21:39:01 +0000 (22:39 +0100)]
mmc: sdhci-s3c: Use shifts to divide by powers of two
Current implementation of sdhci_s3c_consider_clock() is highly
inefficient due to multiple integer divisions by variable performed in a
loop. Since only divisors that are powers of two are considered, this
patch replaces them with respective shifts, removing all the integer
divisions.
Dinh Nguyen [Tue, 18 Feb 2014 02:31:02 +0000 (20:31 -0600)]
dts: socfpga: Add support for SD/MMC on the SOCFPGA platform
Introduce "altr,socfpga-dw-mshc" to enable Altera's SOCFPGA platform
specific implementation of the dw_mmc driver.
Also add the "syscon" binding to the "altr,sys-mgr" node. The clock
driver can use the syscon driver to toggle the register for the SD/MMC
clock phase shift settings.
Finally, fix an indentation error for the sysmgr node.
Dinh Nguyen [Tue, 18 Feb 2014 02:31:01 +0000 (20:31 -0600)]
mmc: dw_mmc: Add support for SOCFPGA's platform specific implementation
Like the rockchip, Altera's SOCFPGA platform specific implementation of the
dw_mmc driver requires using the HOLD register for SD commands. This patch
renames dw_mci_rockchip_prepare_command to dw_mci_pltfm_prepare_command so
that SOCFPGA and Rockchip can use it.
Dinh Nguyen [Tue, 18 Feb 2014 02:31:00 +0000 (20:31 -0600)]
mmc: dw_mmc-socfpga: Remove the SOCFPGA specific platform for dw_mmc
It turns now that the only really platform specific code that is needed for
SOCFPGA is using the SDMMC_CMD_USE_HOLD_REG in the prepare_command function.
Since the Rockchip already has this functionality, re-use the code that is
already in dw_mmc-pltfm.c.
Sachin Kamat [Tue, 25 Feb 2014 09:48:28 +0000 (15:18 +0530)]
mmc: dw_mmc: Fix NULL pointer dereference
If mrq->sbc is not NULL but data->stop happens to be NULL,
it will lead to NULL pointer dereferencing. Avoid this by
having a NULL check for data->stop.
Sachin Kamat [Tue, 25 Feb 2014 09:48:27 +0000 (15:18 +0530)]
mmc: mvsdio: Cleanup mmc-mvsdio.h header
Commit c02cecb92ed4 ("ARM: orion: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Sachin Kamat [Tue, 25 Feb 2014 09:48:26 +0000 (15:18 +0530)]
mmc: msm: Cleanup mmc-msm_sdcc.h header
Commit 1ef21f6343ff ("ARM: msm: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.
Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Sachin Kamat [Tue, 25 Feb 2014 09:48:25 +0000 (15:18 +0530)]
mmc: dw_mmc: Add missing description
Commit 0976f16d ("mmc: dw_mmc: add support tuning scheme") introduced
the execute_tuning hook but did not add its description for kernel docs.
Update the same.
Tim Kryger [Thu, 5 Dec 2013 19:20:41 +0000 (11:20 -0800)]
mmc: sdhci-bcm-kona: Add basic use of clocks
Enable the external clock needed by the host controller during the
probe and disable it during the remove.
Signed-off-by: Tim Kryger <tim.kryger@linaro.org> Reviewed-by: Markus Mayer <markus.mayer@linaro.org> Reviewed-by: Matt Porter <matt.porter@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Mon, 13 Jan 2014 15:49:31 +0000 (16:49 +0100)]
mmc: mmci: Enable support for busy detection for ux500 variant
The ux500 variants have HW busy detection support, which is indicated
by the busy_detect flag. For these variants let's enable the
MMC_CAP_WAIT_WHILE_BUSY flag and add the support for it.
The mmc core will provide the RSP_BUSY command flag for those requests
we should care about busy detection. Regarding the max_busy_timeout,
the HW don't support busy detection timeouts so at this initial step
let's make it simple and set it to zero to indicate we are able to
support any timeout.
Cc: Russell King <linux@arm.linux.org.uk> Cc: Johan Rudholm <jrudholm@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Fri, 10 Jan 2014 13:51:42 +0000 (14:51 +0100)]
mmc: mmci: Handle CMD irq before DATA irq
In case of a read operation both MCI_CMDRESPEND and MCI_DATAEND can be
set in the status register when entering the interrupt handler. This is
due to that the card start sending data before the host has
acknowledged the command response.
To resolve the issue for this scenario, we must start by handling the
CMD irq instead of the DATA irq. The reason is beacuse the completion
of the DATA irq will not respect the current command and then causing
it to be garbled.
Cc: Russell King <linux@arm.linux.org.uk> Cc: Johan Rudholm <jrudholm@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Tue, 14 Jan 2014 20:31:35 +0000 (21:31 +0100)]
mmc: block: Fixup busy detection while invoking stop cmd at recovery
When sending a stop command at the recovery path, use a R1B response
when the failing data request are a WRITE. Thus we also care about the
busy detection completion in this case.
For a failing READ request, we use a R1 response for the stop command,
since we don't need to care about busy detection in this case.
To align behavior between hosts supporting MMC_CAP_WAIT_WHILE_BUSY and
those who are not, we add a CMD13 polling method for the card's status.
We also respect whether the host has specified the max_busy_timeout,
which means we may fallback to CMD13 polling if the timeout is greater
than what the host are able to support.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Wed, 29 Jan 2014 12:11:27 +0000 (13:11 +0100)]
mmc: block: Respect hw busy detection in card_busy_detect()
Currently for write request we don't trust the hw busy detection to be
fully handled by host, thus we also poll the card's status until we see
it's gets out of the busy state.
Still there are scenarios where it will a benefit to trust the hw busy
detection done by the host, since no additional polling is needed.
Let's prepare card_busy_detect() to be able to handle this.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Wed, 29 Jan 2014 10:01:55 +0000 (11:01 +0100)]
mmc: block: Implement card_busy_detect() for busy detection
To complete a data write request we poll for the card's status register
by sending CMD13. The are other scenarios when this polling method are
needed, which is why we here moves this code to it's own function. No
functional change.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Tue, 14 Jan 2014 20:24:21 +0000 (21:24 +0100)]
mmc: block: Use R1 responses for stop cmds for read requests
While using open ended transmission and thus ending the transfer by
sending a stop command, we shall use R1B only for writes and R1 shall
be used for reads. Previously R1B were used in both cases.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Tue, 14 Jan 2014 22:17:36 +0000 (23:17 +0100)]
mmc: core: Respect host's max_busy_timeout when sending sleep cmd
When sending the sleep command for host drivers supporting
MMC_CAP_WAIT_WHILE_BUSY, we need to confirm that max_busy_timeout is
big enough comparing to the sleep timeout specified from card's
EXT_CSD. If this isn't case, we use a R1 response instead of R1B and
fallback to use a delay instead.
Do note that a max_busy_timeout set to zero by the host, is interpreted
as it can cope with whatever timeout the mmc core provides it with.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Tue, 28 Jan 2014 13:15:34 +0000 (14:15 +0100)]
mmc: core: Fixup busy detection for mmc switch operations
If the host controller supports busy detection in HW, we expect the
MMC_CAP_WAIT_WHILE_BUSY to be set. Likewise the corresponding
host->max_busy_timeout should reflect the maximum busy detection
timeout supported by the host.
Previously we expected a host that supported MMC_CAP_WAIT_WHILE_BUSY to
cope with any timeout, which just isn't feasible due to HW limitations.
For most switch operations, R1B responses are expected and thus we need
to check for busy detection completion. To cope with cases where the
requested busy detection timeout is greater than what the host are able
to support, we fallback to use a R1 response instead. This will prevent
the host from doing HW busy detection.
In those cases, busy detection completion is handled by polling the for
the card's status using CMD13. This is the same mechanism used when the
host doesn't support MMC_CAP_WAIT_WHILE_BUSY.
Do note, a host->max_busy_timeout set to zero, is interpreted by the
mmc core as it don't know what the host supports. It will then provide
the host with whatever timeout the mmc core finds suitable.
For some cases the mmc core has unfurtunate no clue of what timeout to
use. In these cases we provide the host with a timeout value of zero,
which the host may interpret as use whatever timeout it finds suitable.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Tue, 28 Jan 2014 13:05:39 +0000 (14:05 +0100)]
mmc: core: Minor simplifications to __mmc_switch
Instead of using several references to card->host, let's use a local
variable. That means we can remove the BUG_ON verifications for the
same pointers.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Wed, 18 Dec 2013 08:57:38 +0000 (09:57 +0100)]
mmc: core: Rename max_discard_to to max_busy_timeout
Rename host->max_discard_to to host->max_busy_timeout, to reflect that
it tells the mmc core layer about the maximum supported busy detection
timeout by the host.
This timeout is at the moment only applicable to erase/trim/discard
commands. By the renaming we provide the option of make use of it for
other commands that cares about busy detection. In other words, those
commands that wants an R1B response, like for example the mmc switch
command.
Do note that the max_busy_timeout is supposed to be specified only by
hosts supporting MMC_CAP_WAIT_WHILE_BUSY.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Micky Ching [Mon, 17 Feb 2014 08:45:48 +0000 (16:45 +0800)]
mmc: rtsx: add support for pre_req and post_req
Add support for non-blocking request, pre_req() runs dma_map_sg() and
post_req() runs dma_unmap_sg(). This patch can increase card read/write
speed, especially for high speed card and slow CPU(for some embedded
platform).
Users can get a great benefit from this patch. if CPU frequency is 800MHz,
SDR104 or DDR50 card read/write speed may increase more than 15%.
test results:
intel i3(800MHz - 2.3GHz), SD card clock 208MHz
Micky Ching [Mon, 17 Feb 2014 08:45:46 +0000 (16:45 +0800)]
mmc: rtsx: fix card poweroff bug
If the host driver removed while card in the slot, the host will not
power off card power correctly. This bug is produced because host
eject flag set before the last mmc_set_ios callback, we should set the
eject flag after power off.
Signed-off-by: Micky Ching <micky_ching@realsil.com.cn> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:43 +0000 (18:01 +0200)]
mmc: omap: Add erase capability
This patch adds the erase capability to OMAP1/OMAP2420 MMC driver. Idea is
the same than in commit 93caf8e ("omap_hsmmc: add erase capability") that we
disable the data timeout interrupt for erases.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:42 +0000 (18:01 +0200)]
mmc: omap: Remove always set use_dma flag from struct mmc_omap_host
Because use_dma is set only in mmc_omap_probe and unset nowhere there is no
need to carry that flag in struct mmc_omap_host for mmc_omap_prepare_data
function.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:41 +0000 (18:01 +0200)]
mmc: omap: Convert to devm_ioremap_resource
Simplify probe and cleanup code by using devm_ioremap_resource. This also
makes probe code to follow more common allocate private struct followed by
other initialization style.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:40 +0000 (18:01 +0200)]
mmc: omap: Remove mem_res field from struct mmc_omap_host
Field mem_res in struct mmc_omap_host is used only once in mmc_omap_probe
when setting the phys_base field so we may just se the phys_base straight
and remove needless mem_res.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:39 +0000 (18:01 +0200)]
mmc: omap: Remove duplicate host->irq assignment
host-irq is set twice so remove needless one.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:38 +0000 (18:01 +0200)]
mmc: omap: Convert to devm_kzalloc
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Jarkko Nikula [Sat, 22 Feb 2014 16:01:37 +0000 (18:01 +0200)]
mmc: omap: Fix NULL pointer dereference due uninitialized cover_tasklet
Omap MMC driver initialization can cause a NULL pointer dereference in
tasklet_hi_action on Nokia N810 if its miniSD cover is open during driver
initialization.
[ 1.070000] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 1.080000] pgd = c0004000
[ 1.080000] [00000000] *pgd=00000000
[ 1.080000] Internal error: Oops: 80000005 [#1] PREEMPT ARM
[ 1.080000] Modules linked in:
[ 1.080000] CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 3.13.0-rc2+ #95
[ 1.080000] Workqueue: events menelaus_work
[ 1.080000] task: c7863340 ti: c7878000 task.ti: c7878000
[ 1.080000] PC is at 0x0
[ 1.080000] LR is at tasklet_hi_action+0x68/0xa4
...
[ 1.080000] [<c003543c>] (tasklet_hi_action+0x68/0xa4) from [<c0034dd0>] (__do_softirq+0xbc/0x208)
[ 1.080000] [<c0034dd0>] (__do_softirq+0xbc/0x208) from [<c003521c>] (irq_exit+0x84/0xac)
[ 1.080000] [<c003521c>] (irq_exit+0x84/0xac) from [<c00135cc>] (handle_IRQ+0x64/0x84)
[ 1.080000] [<c00135cc>] (handle_IRQ+0x64/0x84) from [<c000859c>] (omap2_intc_handle_irq+0x54/0x68)
[ 1.080000] [<c000859c>] (omap2_intc_handle_irq+0x54/0x68) from [<c0015be0>] (__irq_svc+0x40/0x74)
[ 1.080000] Exception stack(0xc7879d70 to 0xc7879db8)
[ 1.080000] 9d60: 000003f10000000a000000090000001c
[ 1.080000] 9d80: c7879e70c780bc10c780bc10000000000000000100008603c780bc78c7879e4e
[ 1.080000] 9da0: 00000002c7879db8c00343b0c0160c9c20000113ffffffff
[ 1.080000] [<c0015be0>] (__irq_svc+0x40/0x74) from [<c0160c9c>] (__aeabi_uidiv+0x20/0x9c)
[ 1.080000] [<c0160c9c>] (__aeabi_uidiv+0x20/0x9c) from [<c00343b0>] (msecs_to_jiffies+0x18/0x24)
[ 1.080000] [<c00343b0>] (msecs_to_jiffies+0x18/0x24) from [<c01ec3ec>] (omap_i2c_xfer+0x30c/0x458)
[ 1.080000] [<c01ec3ec>] (omap_i2c_xfer+0x30c/0x458) from [<c01e9724>] (__i2c_transfer+0x3c/0x74)
[ 1.080000] [<c01e9724>] (__i2c_transfer+0x3c/0x74) from [<c01eac4c>] (i2c_transfer+0x78/0x94)
[ 1.080000] [<c01eac4c>] (i2c_transfer+0x78/0x94) from [<c01eb0bc>] (i2c_smbus_xfer+0x3c0/0x4f8)
[ 1.080000] [<c01eb0bc>] (i2c_smbus_xfer+0x3c0/0x4f8) from [<c01eb414>] (i2c_smbus_write_byte_data+0x34/0x3c)
[ 1.080000] [<c01eb414>] (i2c_smbus_write_byte_data+0x34/0x3c) from [<c01bb308>] (menelaus_write_reg+0x1c/0x40)
[ 1.080000] [<c01bb308>] (menelaus_write_reg+0x1c/0x40) from [<c01bb904>] (menelaus_work+0xa0/0xc4)
[ 1.080000] [<c01bb904>] (menelaus_work+0xa0/0xc4) from [<c00439c4>] (process_one_work+0x1fc/0x334)
[ 1.080000] [<c00439c4>] (process_one_work+0x1fc/0x334) from [<c0043d6c>] (worker_thread+0x244/0x380)
[ 1.080000] [<c0043d6c>] (worker_thread+0x244/0x380) from [<c0049d04>] (kthread+0xc0/0xd4)
[ 1.080000] [<c0049d04>] (kthread+0xc0/0xd4) from [<c0012758>] (ret_from_fork+0x14/0x3c)
[ 1.080000] Code: bad PC value
[ 1.090000] ---[ end trace 7bc2fc7cd14f1d95 ]---
[ 1.100000] Kernel panic - not syncing: Fatal exception in interrupt
Reason for this is that omap_notify_cover_event which calls
tasklet_hi_schedule gets called before struct cover_tasklet is initialized.
Call to omap_notify_cover_event on Nokia N810 happens from menelaus.c PMIC
driver via board-n8x0.c during execution of mmc_add_host in case of open
miniSD cover.
Fix this by moving cover_timer and cover_tasklet initialization before
mmc_add_host call in mmc_omap_new_slot.
Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Acked-by: Tony Lindgren <tony@atomide.com> Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Chris Ball <chris@printf.net>
Ulf Hansson [Wed, 18 Dec 2013 10:59:17 +0000 (11:59 +0100)]
mmc: core: Enable MMC_CAP2_CACHE_CTRL as default
There are no reason to why the use of a non-volatile internal eMMC
cache should be controlled by a host cap. Instead let's just enable it
if the eMMC card supports it.
Ulf Hansson [Mon, 16 Dec 2013 13:46:00 +0000 (14:46 +0100)]
mmc: core: Remove support for MMC_CAP2_NO_SLEEP_CMD
There are no active users of this host capability. The primary reason
for adding this cap was due to a bug in ux500 boot loader code, which
is not a relevant issue any more. So, let's remove it.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Chris Ball <chris@printf.net>
Linus Torvalds [Mon, 10 Feb 2014 02:14:53 +0000 (18:14 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull SELinux fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
SELinux: Fix kernel BUG on empty security contexts.
selinux: add SOCK_DIAG_BY_FAMILY to the list of netlink message types
Linus Torvalds [Mon, 10 Feb 2014 02:12:07 +0000 (18:12 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A couple of fixes, both -stable fodder. The O_SYNC bug is fairly
old..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a kmap leak in virtio_console
fix O_SYNC|O_APPEND syncing the wrong range on write()
Al Viro [Sun, 2 Feb 2014 12:05:05 +0000 (07:05 -0500)]
fix a kmap leak in virtio_console
While we are at it, don't do kmap() under kmap_atomic(), *especially*
for a page we'd allocated with GFP_KERNEL. It's spelled "page_address",
and had that been more than that, we'd have a real trouble - kmap_high()
can block, and doing that while holding kmap_atomic() is a Bad Idea(tm).
Al Viro [Sun, 9 Feb 2014 20:18:09 +0000 (15:18 -0500)]
fix O_SYNC|O_APPEND syncing the wrong range on write()
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
when sync_page_range() had been introduced; generic_file_write{,v}() correctly
synced
pos_after_write - written .. pos_after_write - 1
but generic_file_aio_write() synced
pos_before_write .. pos_before_write + written - 1
instead. Which is not the same thing with O_APPEND, obviously.
A couple of years later correct variant had been killed off when
everything switched to use of generic_file_aio_write().
All users of generic_file_aio_write() are affected, and the same bug
has been copied into other instances of ->aio_write().
The fix is trivial; the only subtle point is that generic_write_sync()
ought to be inlined to avoid calculations useless for the majority of
calls.
Linus Torvalds [Sun, 9 Feb 2014 19:12:26 +0000 (11:12 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix data corruption when reading/updating compressed extents
Btrfs: don't loop forever if we can't run because of the tree mod log
btrfs: reserve no transaction units in btrfs_ioctl_set_features
btrfs: commit transaction after setting label and features
Btrfs: fix assert screwup for the pending move stuff
Linus Torvalds [Sun, 9 Feb 2014 18:09:49 +0000 (10:09 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Tooling fixes, mostly related to the KASLR fallout, but also other
fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf buildid-cache: Check relocation when checking for existing kcore
perf tools: Adjust kallsyms for relocated kernel
perf tests: No need to set up ref_reloc_sym
perf symbols: Prevent the use of kcore if the kernel has moved
perf record: Get ref_reloc_sym from kernel map
perf machine: Set up ref_reloc_sym in machine__create_kernel_maps()
perf machine: Add machine__get_kallsyms_filename()
perf tools: Add kallsyms__get_function_start()
perf symbols: Fix symbol annotation for relocated kernel
perf tools: Fix include for non x86 architectures
perf tools: Fix AAAAARGH64 memory barriers
perf tools: Demangle kernel and kernel module symbols too
perf/doc: Remove mention of non-existent set_perf_event_pending() from design.txt
Btrfs: fix data corruption when reading/updating compressed extents
When using a mix of compressed file extents and prealloc extents, it
is possible to fill a page of a file with random, garbage data from
some unrelated previous use of the page, instead of a sequence of zeroes.
A simple sequence of steps to get into such case, taken from the test
case I made for xfstests, is:
This results in the following file items in the fs tree:
item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
inode generation 6 transid 6 size 542872 block group 0 mode 100600
item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
inode ref index 2 namelen 6 name: foobar
item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
extent data disk byte 0 nr 0 gen 6
extent data offset 0 nr 24576 ram 266240
extent compression 0
item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
prealloc data disk byte 12849152 nr 241664 gen 6
prealloc data offset 0 nr 241664
item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
extent data disk byte 12845056 nr 4096 gen 6
extent data offset 0 nr 20480 ram 20480
extent compression 2
item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
prealloc data disk byte 13090816 nr 405504 gen 6
prealloc data offset 0 nr 258048
The on disk extent at offset 266240 (which corresponds to 1 single disk block),
contains 5 compressed chunks of file data. Each of the first 4 compress 4096
bytes of file data, while the last one only compresses 3024 bytes of file data.
Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
1072 bytes) should always return zeroes (our next extent is a prealloc one).
The solution here is the compression code path to zero the remaining (untouched)
bytes of the last page it uncompressed data into, as the information about how
much space the file data consumes in the last page is not known in the upper layer
fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
the remainder of the page but only if it corresponds to the last page of the inode
and if the inode's size is not a multiple of the page size.
This would cause not only returning random data on reads, but also permanently
storing random data when updating parts of the region that should be zeroed.
For the example above, it means updating a single byte in the region [285648 ; 286720[
would store that byte correctly but also store random data on disk.
A test case for xfstests follows soon.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
Josef Bacik [Fri, 7 Feb 2014 18:57:59 +0000 (13:57 -0500)]
Btrfs: don't loop forever if we can't run because of the tree mod log
A user reported a 100% cpu hang with my new delayed ref code. Turns out I
forgot to increase the count check when we can't run a delayed ref because of
the tree mod log. If we can't run any delayed refs during this there is no
point in continuing to look, and we need to break out. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
David Sterba [Fri, 7 Feb 2014 13:34:04 +0000 (14:34 +0100)]
btrfs: reserve no transaction units in btrfs_ioctl_set_features
Added in patch "btrfs: add ioctls to query/change feature bits online"
modifications to superblock don't need to reserve metadata blocks when
starting a transaction.
Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
Jeff Mahoney [Fri, 7 Feb 2014 13:33:57 +0000 (14:33 +0100)]
btrfs: commit transaction after setting label and features
The set_fslabel ioctl uses btrfs_end_transaction, which means it's
possible that the change will be lost if the system crashes, same for
the newly set features. Let's use btrfs_commit_transaction instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
Josef Bacik [Wed, 5 Feb 2014 21:19:21 +0000 (16:19 -0500)]
Btrfs: fix assert screwup for the pending move stuff
Wang noticed that he was failing btrfs/030 even though me and Filipe couldn't
reproduce. Turns out this is because Wang didn't have CONFIG_BTRFS_ASSERT set,
which meant that a key part of Filipe's original patch was not being built in.
This appears to be a mess up with merging Filipe's patch as it does not exist in
his original patch. Fix this by changing how we make sure del_waiting_dir_move
asserts that it did not error and take the function out of the ifdef check.
This makes btrfs/030 pass with the assert on or off. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Filipe Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
Linus Torvalds [Sat, 8 Feb 2014 22:31:39 +0000 (14:31 -0800)]
Merge tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl fixes from Linus Walleij:
"First round of pin control fixes for v3.14:
- Protect pinctrl_list_add() with the proper mutex. This was
identified by RedHat. Caused nasty locking warnings was rootcased
by Stanislaw Gruszka.
- Avoid adding dangerous debugfs files when either half of the
subsystem is unused: pinmux or pinconf.
- Various fixes to various drivers: locking, hardware particulars, DT
parsing, error codes"
* tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: tegra: return correct error type
pinctrl: do not init debugfs entries for unimplemented functionalities
pinctrl: protect pinctrl_list add
pinctrl: sirf: correct the pin index of ac97_pins group
pinctrl: imx27: fix offset calculation in imx_read_2bit
pinctrl: vt8500: Change devicetree data parsing
pinctrl: imx27: fix wrong offset to ICONFB
pinctrl: at91: use locked variant of irq_set_handler
Linus Torvalds [Sat, 8 Feb 2014 19:54:43 +0000 (11:54 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Quite a varied little collection of fixes. Most of them are
relatively small or isolated; the biggest one is Mel Gorman's fixes
for TLB range flushing.
A couple of AMD-related fixes (including not crashing when given an
invalid microcode image) and fix a crash when compiled with gcov"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, microcode, AMD: Unify valid container checks
x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
x86/efi: Allow mapping BGRT on x86-32
x86: Fix the initialization of physnode_map
x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
x86/intel/mid: Fix X86_INTEL_MID dependencies
arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
x86: mm: change tlb_flushall_shift for IvyBridge
x86/mm: Eliminate redundant page table walk during TLB range flushing
x86/mm: Clean up inconsistencies when flushing TLB ranges
mm, x86: Account for TLB flushes only when debugging
x86/AMD/NB: Fix amd_set_subcaches() parameter type
x86/quirks: Add workaround for AMD F16h Erratum792
x86, doc, kconfig: Fix dud URL for Microcode data
Dave Kleikamp [Fri, 7 Feb 2014 20:36:10 +0000 (14:36 -0600)]
jfs: fix generic posix ACL regression
I missed a couple errors in reviewing the patches converting jfs
to use the generic posix ACL function. Setting ACL's currently
fails with -EOPNOTSUPP.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> Reported-by: Michael L. Semon <mlsemon35@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
Linus Torvalds [Fri, 7 Feb 2014 22:17:18 +0000 (14:17 -0800)]
Merge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single kernfs fix to resolve a much-reported lockdep issue
with the removal of entries in sysfs"
* tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag
Linus Torvalds [Fri, 7 Feb 2014 20:35:56 +0000 (12:35 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
"There is an RBD fix for a crash due to the immutable bio changes, an
error path fix, and a locking fix in the recent redirect support"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
libceph: do not dereference a NULL bio pointer
libceph: take map_sem for read in handle_reply()
libceph: factor out logic from ceph_osdc_start_request()
libceph: fix error handling in ceph_osdc_init()
Linus Torvalds [Fri, 7 Feb 2014 20:19:50 +0000 (12:19 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Relax VDSO alignment requirements so that the kernel-picked one (4K)
does not conflict with the dynamic linker's one (64K)
- VDSO gettimeofday fix
- Barrier fixes for atomic operations and cache flushing
- TLB invalidation when overriding early page mappings during boot
- Wired up new 32-bit arm (compat) syscalls
- LSM_MMAP_MIN_ADDR when COMPAT is enabled
- defconfig update
- Clean-up (comments, pgd_alloc).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: defconfig: Expand default enabled features
arm64: asm: remove redundant "cc" clobbers
arm64: atomics: fix use of acquire + release for full barrier semantics
arm64: barriers: allow dsb macro to take option parameter
security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
arm64: compat: Wire up new AArch32 syscalls
arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
arm64: vdso: fix coarse clock handling
arm64: simplify pgd_alloc
arm64: fix typo: s/SERRROR/SERROR/
arm64: Invalidate the TLB when replacing pmd entries during boot
arm64: Align CMA sizes to PAGE_SIZE
arm64: add DSB after icache flush in __flush_icache_all()
arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k
Linus Torvalds [Fri, 7 Feb 2014 20:19:06 +0000 (12:19 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"hree minor patches. All have sat in -next for a few days"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: fpu.h: Fix build when CONFIG_BUG is not set
MIPS: Wire up sched_setattr/sched_getattr syscalls
MIPS: Alchemy: Fix DB1100 GPIO registration
Linus Torvalds [Fri, 7 Feb 2014 20:16:36 +0000 (12:16 -0800)]
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of small fixes. Mostly driver ones. There is one core
regression fix on a patch that was meant to fix some race issues on
vb2, but that actually caused more harm than good. So, we're just
reverting it for now"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] adv7842: Composite free-run platfrom-data fix
[media] v4l2-dv-timings: fix GTF calculation
[media] hdpvr: Fix memory leak in debug
[media] af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2
[media] mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset
[media] mxl111sf: Fix unintentional garbage stack read
[media] cx24117: use a valid dev pointer for dev_err printout
[media] cx24117: remove dead code in always 'false' if statement
[media] update Michael Krufky's email address
[media] vb2: Check if there are buffers before streamon
[media] Revert "[media] videobuf_vm_{open,close} race fixes"
[media] go7007-loader: fix usb_dev leak
[media] media: bt8xx: add missing put_device call
[media] exynos4-is: Compile in fimc-lite runtime PM callbacks conditionally
[media] exynos4-is: Compile in fimc runtime PM callbacks conditionally
[media] exynos4-is: Fix error paths in probe() for !pm_runtime_enabled()
[media] s5p-jpeg: Fix wrong NV12 format parameters
[media] s5k5baf: allow to handle arbitrary long i2c sequences
Linus Torvalds [Fri, 7 Feb 2014 20:14:24 +0000 (12:14 -0800)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix PMBus driver problem with some multi-page voltage sensors and fix
da9055 interrupt initialization"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (da9055) Remove use of regmap_irq_get_virq()
hwmon: (pmbus) Support per-page exponent in linear mode
Linus Torvalds [Fri, 7 Feb 2014 20:12:21 +0000 (12:12 -0800)]
Merge tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These include a fix for a recent ACPI hotplug regression, four
concurrency related fixes and one PCI device removal fix for
ACPI-based PCI hotplug (ACPIPHP), intel_pstate fix that should go into
stable, three simple ACPI cleanups and a new entry for the ACPI video
blacklist.
Specifics:
- Fix for a recent ACPI hotplug regression causing a NULL pointer
dereference to occur while handling ACPI eject notifications for
already ejected devices. From Toshi Kani.
- Four concurrency-related fixes for ACPIPHP. Two of them add
missing locking and the other two fix race conditions related to
reference counting.
- ACPIPHP fix to avoid NULL pointer dereferences during device
removal involving Virtual Funcions.
- intel_pstate fix to make it compute the percentage of time the CPU
is busy properly. From Dirk Brandewie.
- Removal of two unnecessary NULL pointer checks in ACPI code and a
fix for sscanf() format string from Dan Carpenter and Luis G.F.
- New ACPI video blacklist entry for HP EliteBook Revolve 810 from
Mika Westerberg"
* tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / hotplug: Fix panic on eject to ejected device
ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
ACPI / proc: remove unneeded NULL check
ACPI / utils: remove a pointless NULL check
ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
intel_pstate: Take core C0 time into account for core busy calculation
ACPI / hotplug / PCI: Fix bridge removal race vs dock events
ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order
Ilya Dryomov [Mon, 3 Feb 2014 11:56:33 +0000 (13:56 +0200)]
libceph: take map_sem for read in handle_reply()
Handling redirect replies requires both map_sem and request_mutex.
Taking map_sem unconditionally near the top of handle_reply() avoids
possible race conditions that arise from releasing request_mutex to be
able to acquire map_sem in redirect reply case. (Lock ordering is:
map_sem, request_mutex, crush_mutex.)
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Ilya Dryomov [Fri, 31 Jan 2014 17:33:39 +0000 (19:33 +0200)]
libceph: factor out logic from ceph_osdc_start_request()
Factor out logic from ceph_osdc_start_request() into a new helper,
__ceph_osdc_start_request(). ceph_osdc_start_request() now amounts to
taking locks and calling __ceph_osdc_start_request().
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Mark Rutland [Fri, 7 Feb 2014 17:12:45 +0000 (17:12 +0000)]
arm64: defconfig: Expand default enabled features
FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.
Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.
This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.
A few options which don't need to appear in defconfig are trimmed:
* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Tue, 4 Feb 2014 12:29:13 +0000 (12:29 +0000)]
arm64: asm: remove redundant "cc" clobbers
cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Will Deacon [Tue, 4 Feb 2014 12:29:12 +0000 (12:29 +0000)]
arm64: atomics: fix use of acquire + release for full barrier semantics
Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.
On arm64, these operations have been incorrectly implemented as follows:
// A, B, C are independent memory locations
<Access [A]>
// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
<Access [C]>
The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).
Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.
The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:
<Access [A]>
// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:
<Access [A]>
// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier
<Access [C]>
The simple observations here are:
- The dmb ensures that no subsequent accesses (e.g. the access to C)
can enter or pass the atomic sequence.
- The dmb also ensures that no prior accesses (e.g. the access to A)
can pass the atomic sequence.
- Therefore, no prior access can pass a subsequent access, or
vice-versa (i.e. A is strictly ordered before C).
- The stlxr ensures that no prior access can pass the store component
of the atomic operation.
The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.
From an (arbitrary) observer's point of view, there are two scenarios:
1. We have observed the ldxr. This means that if we perform a store to
[B], the ldxr will still return older data. If we can observe the
ldxr, then we can potentially observe the permitted re-ordering
with the access to A, which is clearly an issue when compared to
the dmb variant of the code. Thankfully, the exclusive monitor will
save us here since it will be cleared as a result of the store and
the ldxr will retry. Notice that any use of a later memory
observation to imply observation of the ldxr will also imply
observation of the access to A, since the stlxr/dmb ensure strict
ordering.
2. We have not observed the ldxr. This means we can perform a store
and influence the later ldxr. However, that doesn't actually tell
us anything about the access to [A], so we've not lost anything
here either when compared to the dmb variant.
This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.
Cc: <stable@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Adam Thomson [Thu, 6 Feb 2014 18:03:17 +0000 (18:03 +0000)]
hwmon: (da9055) Remove use of regmap_irq_get_virq()
Remove use of regmap_irq_get_virq() in driver probe which was
conflicting with use of platform_get_irq_byname().
platform_get_irq_byname() already returns the VIRQ number due
to MFD core translation so using regmap_irq_get_virq() on that
returned value results in an incorrect IRQ being requested.
The driver probes then fail because of this.
Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Thu, 6 Feb 2014 21:49:03 +0000 (13:49 -0800)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge a bunch of fixes from Andrew Morton:
"Commit 579f82901f6f ("swap: add a simple detector for inappropriate
swapin readahead") is a feature. No probs if you decide to defer it
until the next merge window.
It has been sitting in my tree for over a year because of my dislike
of all the magic numbers, but recent discussion with Hugh has made me
give up"
* emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
mm/swap: fix race on swap_info reuse between swapoff and swapon
swap: add a simple detector for inappropriate swapin readahead
ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
Documentation/kernel-parameters.txt: fix memmap= language
KOSAKI Motohiro [Thu, 6 Feb 2014 20:04:28 +0000 (12:04 -0800)]
mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.
Tang Chen [Thu, 6 Feb 2014 20:04:27 +0000 (12:04 -0800)]
arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
The following path will cause array out of bound.
memblock_add_region() will always set nid in memblock.reserved to
MAX_NUMNODES. In numa_register_memblks(), after we set all nid to
correct valus in memblock.reserved, we called setup_node_data(), and
used memblock_alloc_nid() to allocate memory, with nid set to
MAX_NUMNODES.
The nodemask_t type can be seen as a bit array. And the index is 0 ~
MAX_NUMNODES-1.
After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
MAX_NUMNODES-1].
See below:
numa_init()
|---> numa_register_memblks()
| |---> memblock_set_node(memory) set correct nid in memblock.memory
| |---> memblock_set_node(reserved) set correct nid in memblock.reserved
| |......
| |---> setup_node_data()
| |---> memblock_alloc_nid() here, nid is set to MAX_NUMNODES (1024)
|......
|---> numa_clear_kernel_node_hotplug()
|---> node_set() here, we have an index 1024, and overflowed
This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
this problem.
Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Tested-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tang Chen [Thu, 6 Feb 2014 20:04:25 +0000 (12:04 -0800)]
arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
was not initialized. So we need to initialize it.
[akpm@linux-foundation.org: use NODE_MASK_NONE, per David] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: David Rientjes <rientjes@google.com> Tested-by: Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>