]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
13 years agowatchdog: coh901327: convert to use watchdog core
Linus Walleij [Fri, 16 Mar 2012 08:14:12 +0000 (09:14 +0100)]
watchdog: coh901327: convert to use watchdog core

This converts the COH901327 watchdog to use the watchdog core.
I followed Wolframs document, looked at some other drivers and
tested it on the U300.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Add support for WDIOC_GETTIMELEFT IOCTL in watchdog core
Viresh Kumar [Fri, 16 Mar 2012 08:14:00 +0000 (09:14 +0100)]
watchdog: Add support for WDIOC_GETTIMELEFT IOCTL in watchdog core

This patch adds support for WDIOC_GETTIMELEFT IOCTL in watchdog core. So, there
is another function pointer added to struct watchdog_ops, which can be passed by
drivers to support this IOCTL.

Related documentation is updated too.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: ep93xx_wdt: timeout is an unsigned int value.
Wim Van Sebroeck [Thu, 22 Mar 2012 08:37:10 +0000 (09:37 +0100)]
watchdog: ep93xx_wdt: timeout is an unsigned int value.

the timeout is a positive thus unsigned int value.
Also re-add the comment about the actual heartbeat.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: ep93xx_wdt: Fix timeout after conversion to watchdog core
Mika Westerberg [Sun, 18 Mar 2012 11:09:52 +0000 (13:09 +0200)]
watchdog: ep93xx_wdt: Fix timeout after conversion to watchdog core

After the conversion of this driver to the watchdog core, I noticed that we
miss setting the initial timeout of the wdt device.
This results in a failure of the WDIOC_GETTIMEOUT ioctl call.

Reviewed-by: Mika Westerberg <mika.westerberg@iki.fi>
Tested-by: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Convert ep93xx driver to watchdog core
H Hartley Sweeten [Wed, 14 Mar 2012 17:31:50 +0000 (10:31 -0700)]
watchdog: Convert ep93xx driver to watchdog core

Convert the ep93xx_wdt driver to the watchdog framework API.

Also, use the dev_<fmt> functions instead of pr_<fmt> for logging.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Ryan Mallon <rmallon@gmail.com>
Cc: Mika Westerberg <mika.westerberg@iki.fi>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: sp805: Use devm routines
Viresh Kumar [Mon, 12 Mar 2012 04:22:15 +0000 (09:52 +0530)]
watchdog: sp805: Use devm routines

sp805 driver currently uses normal kzalloc, ioremap, etc routines. This patch
replaces these routines with devm_kzalloc and devm_request_mem_region etc, so
that we don't need to handle freeing of resources for error cases and module
removal routine.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: sp805: replace readl/writel with lighter _relaxed variants
Viresh Kumar [Mon, 12 Mar 2012 04:22:14 +0000 (09:52 +0530)]
watchdog: sp805: replace readl/writel with lighter _relaxed variants

readl/writel versions for ARM contain memory barrier instruction for
synchronizing DMA buffers.  These are not required at least on this
module.  So use lighter _relaxed variants.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: sp805: Fix documentation style comment
Viresh Kumar [Mon, 12 Mar 2012 04:22:13 +0000 (09:52 +0530)]
watchdog: sp805: Fix documentation style comment

@ was missing before variables names, in their description. Also adev is
mentioned as dev in comment. Fix both these issues.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: mpcore_wdt: Allow platform_get_irq() to fail
Viresh Kumar [Mon, 12 Mar 2012 04:22:00 +0000 (09:52 +0530)]
watchdog: mpcore_wdt: Allow platform_get_irq() to fail

irq is not necessary for mpcore wdt. Don't return error if it is not passed. But
if it is passed, then request_irq must pass.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: mpcore_wdt: Use devm routines
Viresh Kumar [Mon, 12 Mar 2012 04:21:59 +0000 (09:51 +0530)]
watchdog: mpcore_wdt: Use devm routines

mpcore_wdt driver currently uses normal kzalloc, request_irq, ioremap, etc
routines. This patch replaces these routines with devm_kzalloc and
devm_request_mem_region etc, so that we don't need to handle freeing of
resources for error cases and module removal routine.

Also, request_irq is moved before registering misc device, so that we are ready
for irq as soon as device is registered.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: mpcore_wdt: Rename dev to pdev for pointing to struct platform_device
Viresh Kumar [Mon, 12 Mar 2012 04:21:58 +0000 (09:51 +0530)]
watchdog: mpcore_wdt: Rename dev to pdev for pointing to struct platform_device

Pointer to struct platform_device is named as dev, which makes it confusing when
we write statements like dev->dev to access struct device within it.

This patch renames such names to pdev.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: xen: don't clear is_active when xen_wdt_stop() failed
Jan Beulich [Mon, 19 Mar 2012 09:32:28 +0000 (09:32 +0000)]
watchdog: xen: don't clear is_active when xen_wdt_stop() failed

xen_wdt_release() shouldn't clear is_active even when the watchdog
didn't get stopped (which by itself shouldn't happen, but let's return
a proper error in this case rather than adding a BUG() upon hypercall
failure).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: xen: don't unconditionally enable the watchdog during resume
Jan Beulich [Mon, 19 Mar 2012 09:30:33 +0000 (09:30 +0000)]
watchdog: xen: don't unconditionally enable the watchdog during resume

This was found to be a problem particularly after guest migration.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Wouter de Geus <benv-xensource.com@junerules.com>
Reported-by: Ian Campbell <Ian.Campbell@citrix.com>
Tested-by: Wouter de Geus <benv-xensource.com@junerules.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: fix compiler error for missing parenthesis
Jaehoon Chung [Fri, 16 Mar 2012 06:27:21 +0000 (15:27 +0900)]
watchdog: fix compiler error for missing parenthesis

Joe's patch(watchdog: Use pr_<fmt> and pr_<level>) missed parenthesis in s3c2410_wdt.c.

Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: ep93xx_wdt.c: fix platform probe
Wim Van Sebroeck [Tue, 13 Mar 2012 08:06:12 +0000 (09:06 +0100)]
watchdog: ep93xx_wdt.c: fix platform probe

Fix the device/driver init so that the misc_register
happens as last (since this opens userspace access to
the device).

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: ep93xx: Convert the watchdog driver into a platform device.
H Hartley Sweeten [Mon, 12 Mar 2012 22:48:16 +0000 (09:48 +1100)]
watchdog: ep93xx: Convert the watchdog driver into a platform device.

Convert the ep93xx watchdog driver into a platform device and
remove it's dependency on <mach/hardware.h>.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Ryan Mallon <rmallon@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Reviewed-by: Mika Westerberg <mika.westerberg@iki.fi>
Acked-by: Arnd Bergmann <arnd@arndb.de>
13 years agowatchdog: fix set_timeout operations
Wim Van Sebroeck [Wed, 29 Feb 2012 19:20:58 +0000 (20:20 +0100)]
watchdog: fix set_timeout operations

Since we changed the behaviour of the set_timeout operation in the
watchdog API, we need to change the allready converted drivers so
that they update the timeout field at the end of the set_timeout
operation.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: watchdog_dev: Let the driver update the timeout field on set_timeout success
Hans de Goede [Mon, 12 Sep 2011 09:56:59 +0000 (11:56 +0200)]
watchdog: watchdog_dev: Let the driver update the timeout field on set_timeout success

When a set_timeout operation succeeds this does not necessarily mean that
the exact timeout requested has been achieved, because the watchdog does not
necessarily have a 1 second resolution. So rather then have the core set
the timeout member of the watchdog_device struct to the exact requested
value, instead the driver should set it to the actually achieved timeout value.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: softdog: convert to watchdog core
Alan Cox [Tue, 28 Feb 2012 22:48:11 +0000 (22:48 +0000)]
watchdog: softdog: convert to watchdog core

Convert softdog.c to the new watchdog API.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Convert max63xx_wdt driver to watchdog framework
Axel Lin [Wed, 8 Feb 2012 06:24:10 +0000 (14:24 +0800)]
watchdog: Convert max63xx_wdt driver to watchdog framework

This patch converts max63xx_wdt driver to watchdog framework.
Also use devm_* APIs to save a few error handling code.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: pnx4008: convert driver to use the watchdog framework
Wolfram Sang [Thu, 2 Feb 2012 17:48:11 +0000 (18:48 +0100)]
watchdog: pnx4008: convert driver to use the watchdog framework

Make this driver a user of the watchdog framework and remove parts now handled
by the core. Tested on a custom lpc32xx-board.

[wim@iguana.be: Added set_timeout operation]
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Convert wm8350_wdt driver to watchdog core
Axel Lin [Mon, 23 Jan 2012 07:26:59 +0000 (15:26 +0800)]
watchdog: Convert wm8350_wdt driver to watchdog core

This patch converts wm8350_wdt driver to use watchdog core APIs.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Convert jz4740_wdt driver to watchdog core
Axel Lin [Thu, 26 Jan 2012 10:10:45 +0000 (18:10 +0800)]
watchdog: Convert jz4740_wdt driver to watchdog core

This patch converts jz4740_wdt driver to use watchdog core APIs.
Also use devm_* APIs to save a few error handling code.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: nowayout is bool
Wim Van Sebroeck [Mon, 5 Mar 2012 15:51:11 +0000 (16:51 +0100)]
watchdog: nowayout is bool

nowayout is actually a boolean value.
So make it bool for all watchdog device drivers.

Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: Use pr_<fmt> and pr_<level>
Joe Perches [Wed, 15 Feb 2012 23:06:19 +0000 (15:06 -0800)]
watchdog: Use pr_<fmt> and pr_<level>

Use the current logging styles.

Make sure all output has a prefix.
Add missing newlines.
Remove now unnecessary PFX, NAME, and miscellaneous other #defines.
Coalesce formats.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: pnx4008: don't use __raw_-accessors
Wolfram Sang [Thu, 2 Feb 2012 17:48:09 +0000 (18:48 +0100)]
watchdog: pnx4008: don't use __raw_-accessors

__raw_readl/__raw_writel are not meant for drivers [1].

[1] http://thread.gmane.org/gmane.linux.ports.arm.kernel/117626

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: pnx4008: cleanup resource handling using managed devices
Wolfram Sang [Thu, 2 Feb 2012 17:48:08 +0000 (18:48 +0100)]
watchdog: pnx4008: cleanup resource handling using managed devices

The resource handling in this driver was flaky: IO_ADDRESS instead of
ioremap (and no unmapping), an unneeded static resource, no central exit
path for error cases. Fix this by converting the driver to use managed
resources. Also use dev_*-messages instead of pr_* while we are here.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: sp805_wdt: add pm callbacks to support standby/S2R/hibernation
Viresh Kumar [Fri, 24 Feb 2012 09:42:37 +0000 (15:12 +0530)]
watchdog: sp805_wdt: add pm callbacks to support standby/S2R/hibernation

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: make imx2_wdt report boot status correctly
Oskar Schirmer [Thu, 16 Feb 2012 12:17:45 +0000 (12:17 +0000)]
watchdog: make imx2_wdt report boot status correctly

Ioctl WDIOC_GETBOOTSTATUS is supposed to return some information
on why the system did (re)boot recently, value WDIOF_CARDRESET
being used to indicate watchdog induced reboot.

Up to now, imx2_wdt did not provide a value here, always returning
zero to indicate normal boot.

Do evaluate the IMX Watchdog Reset Status Register and
produce WDIOF_CARDRESET with WDIOC_GETBOOTSTATUS in case
of a watchdog induced reset.

Signed-off-by: Oskar Schirmer <oskar@scara.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agowatchdog: documentation: remove index-file
Wolfram Sang [Mon, 23 Jan 2012 19:17:38 +0000 (20:17 +0100)]
watchdog: documentation: remove index-file

The 00-index file in the watchdog directory is, like many others,
outdated (conversion-howto is missing) and doesn't contain worthwhile
additional information. As it seems to be a maintenance burden without
much gain, simply remove it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
13 years agoMerge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Mar 2012 19:20:25 +0000 (12:20 -0700)]
Merge tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull more xen updates from Konrad Rzeszutek Wilk:
 "One tiny feature that accidentally got lost in the initial git pull:
   * Add fast-EOI acking of interrupts (clear a bit instead of
     hypercall)
  And bug-fixes:
   * Fix CPU bring-up code missing a call to notify other subsystems.
   * Fix reading /sys/hypervisor even if PVonHVM drivers are not loaded.
   * In Xen ACPI processor driver: remove too verbose WARN messages, fix
     up the Kconfig dependency to be a module by default, and add
     dependency on CPU_FREQ.
   * Disable CPU frequency drivers from loading when booting under Xen
     (as we want the Xen ACPI processor to be used instead).
   * Cleanups in tmem code."

* tag 'stable/for-linus-3.4-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/acpi: Fix Kconfig dependency on CPU_FREQ
  xen: initialize platform-pci even if xen_emul_unplug=never
  xen/smp: Fix bringup bug in AP code.
  xen/acpi: Remove the WARN's as they just create noise.
  xen/tmem: cleanup
  xen: support pirq_eoi_map
  xen/acpi-processor: Do not depend on CPU frequency scaling drivers.
  xen/cpufreq: Disable the cpu frequency scaling drivers from loading.
  provide disable_cpufreq() function to disable the API.

13 years agoFix potential endless loop in kswapd when compaction is not enabled
Rik van Riel [Sat, 24 Mar 2012 14:26:21 +0000 (10:26 -0400)]
Fix potential endless loop in kswapd when compaction is not enabled

We should only test compaction_suitable if the kernel is built with
CONFIG_COMPACTION, otherwise the stub compaction_suitable function will
always return COMPACT_SKIPPED and send kswapd into an infinite loop.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg...
Linus Torvalds [Sat, 24 Mar 2012 17:41:37 +0000 (10:41 -0700)]
Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull <linux/device.h> avoidance patches from Paul Gortmaker:
 "Nearly every subsystem has some kind of header with a proto like:

void foo(struct device *dev);

  and yet there is no reason for most of these guys to care about the
  sub fields within the device struct.  This allows us to significantly
  reduce the scope of headers including headers.  For this instance, a
  reduction of about 40% is achieved by replacing the include with the
  simple fact that the device is some kind of a struct.

  Unlike the much larger module.h cleanup, this one is simply two
  commits.  One to fix the implicit <linux/device.h> users, and then one
  to delete the device.h includes from the linux/include/ dir wherever
  possible."

* tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  device.h: audit and cleanup users in main include dir
  device.h: cleanup users outside of linux/include (C files)

13 years agoMerge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg...
Linus Torvalds [Sat, 24 Mar 2012 17:24:31 +0000 (10:24 -0700)]
Merge tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
 "Fix up files in fs/ and lib/ dirs to only use module.h if they really
  need it.

  These are trivial in scope vs the work done previously.  We now have
  things where any few remaining cleanups can be farmed out to arch or
  subsystem maintainers, and I have done so when possible.  What is
  remaining here represents the bits that don't clearly lie within a
  single arch/subsystem boundary, like the fs dir and the lib dir.

  Some duplicate includes arising from overlapping fixes from
  independent subsystem maintainer submissions are also quashed."

Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).

* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  lib: reduce the use of module.h wherever possible
  fs: reduce the use of module.h wherever possible
  includecheck: delete any duplicate instances of module.h

13 years agoMerge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Linus Torvalds [Sat, 24 Mar 2012 17:08:39 +0000 (10:08 -0700)]
Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull <linux/bug.h> cleanup from Paul Gortmaker:
 "The changes shown here are to unify linux's BUG support under the one
  <linux/bug.h> file.  Due to historical reasons, we have some BUG code
  in bug.h and some in kernel.h -- i.e.  the support for BUILD_BUG in
  linux/kernel.h predates the addition of linux/bug.h, but old code in
  kernel.h wasn't moved to bug.h at that time.  As a band-aid, kernel.h
  was including <asm/bug.h> to pseudo link them.

  This has caused confusion[1] and general yuck/WTF[2] reactions.  Here
  is an example that violates the principle of least surprise:

      CC      lib/string.o
      lib/string.c: In function 'strlcat':
      lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON'
      make[2]: *** [lib/string.o] Error 1
      $
      $ grep linux/bug.h lib/string.c
      #include <linux/bug.h>
      $

  We've included <linux/bug.h> for the BUG infrastructure and yet we
  still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh -
  very confusing for someone who is new to kernel development.

  With the above in mind, the goals of this changeset are:

  1) find and fix any include/*.h files that were relying on the
     implicit presence of BUG code.
  2) find and fix any C files that were consuming kernel.h and hence
     relying on implicitly getting some/all BUG code.
  3) Move the BUG related code living in kernel.h to <linux/bug.h>
  4) remove the asm/bug.h from kernel.h to finally break the chain.

  During development, the order was more like 3-4, build-test, 1-2.  But
  to ensure that git history for bisect doesn't get needless build
  failures introduced, the commits have been reorderd to fix the problem
  areas in advance.

[1]  https://lkml.org/lkml/2012/1/3/90
[2]  https://lkml.org/lkml/2012/1/17/414"

Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul
and linux-next.

* tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  kernel.h: doesn't explicitly use bug.h, so don't include it.
  bug: consolidate BUILD_BUG_ON with other bug code
  BUG: headers with BUG/BUG_ON etc. need linux/bug.h
  bug.h: add include of it to various implicit C users
  lib: fix implicit users of kernel.h for TAINT_WARN
  spinlock: macroize assert_spin_locked to avoid bug.h dependency
  x86: relocate get/set debugreg fcns to include/asm/debugreg.

13 years agoxen/acpi: Fix Kconfig dependency on CPU_FREQ
Konrad Rzeszutek Wilk [Sat, 24 Mar 2012 13:18:57 +0000 (09:18 -0400)]
xen/acpi: Fix Kconfig dependency on CPU_FREQ

The functions: "acpi_processor_*" sound like they depend on CONFIG_ACPI_PROCESSOR
but in reality they are exposed when CONFIG_CPU_FREQ=[y|m]. As such
update the Kconfig to have this dependency and fix compile issues:

ERROR: "acpi_processor_unregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_notify_smm" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_register_performance" [drivers/xen/xen-acpi-processor.ko] undefined!
ERROR: "acpi_processor_preregister_performance" [drivers/xen/xen-acpi-processor.ko] undefined!

Note: We still need the CONFIG_ACPI
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl
Linus Torvalds [Sat, 24 Mar 2012 01:08:58 +0000 (18:08 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl

Pull sysctl updates from Eric Biederman:

 - Rewrite of sysctl for speed and clarity.

   Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
   are no longer bottlenecks in the process of adding and removing
   network devices.

   sysctl is now focused on being a filesystem instead of system call
   and the code can all be found in fs/proc/proc_sysctl.c.  Hopefully
   this means the code is now approachable.

   Much thanks is owed to Lucian Grinjincu for keeping at this until
   something was found that was usable.

 - The recent proc_sys_poll oops found by the fuzzer during hibernation
   is fixed.

* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
  sysctl: protect poll() in entries that may go away
  sysctl: Don't call sysctl_follow_link unless we are a link.
  sysctl: Comments to make the code clearer.
  sysctl: Correct error return from get_subdir
  sysctl: An easier to read version of find_subdir
  sysctl: fix memset parameters in setup_sysctl_set()
  sysctl: remove an unused variable
  sysctl: Add register_sysctl for normal sysctl users
  sysctl: Index sysctl directories with rbtrees.
  sysctl: Make the header lists per directory.
  sysctl: Move sysctl_check_dups into insert_header
  sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
  sysctl: Replace root_list with links between sysctl_table_sets.
  sysctl: Add sysctl_print_dir and use it in get_subdir
  sysctl: Stop requiring explicit management of sysctl directories
  sysctl: Add a root pointer to ctl_table_set
  sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
  sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
  sysctl: Normalize the root_table data structure.
  sysctl: Factor out insert_header and erase_header
  ...

13 years agoMerge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Mar 2012 00:59:47 +0000 (17:59 -0700)]
Merge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull AMD64 EDAC fixes from Borislav Petkov:
 "A bunch of fixes/updates for the AMD side of EDAC including

   * MCE decoding updates
   * tree-wide EDAC sweep making pci_device_ids __devinitconst
   * Scrub rate API correction
   * two amd64_edac corrections for K8 boxes and sysfs csrow nodes"

* tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  MCE, AMD: Constify error tables
  MCE, AMD: Correct bank 5 error signatures
  MCE, AMD: Rework NB MCE signatures
  MCE, AMD: Correct VB data error description
  MCE, AMD: Correct ucode patch buffer description
  MCE, AMD: Correct some MC0 error types
  EDAC: Make pci_device_id tables __devinitconst.
  EDAC: Correct scrub rate API
  amd64_edac: Fix K8 revD and later chip select sizes
  amd64_edac: Fix missing csrows sysfs nodes

13 years agoMerge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Sat, 24 Mar 2012 00:56:39 +0000 (17:56 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq

Pull cpufreq updates for 3.4 from Dave Jones: new drivers and some fixes.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  provide disable_cpufreq() function to disable the API.
  EXYNOS5250: Add support cpufreq for EXYNOS5250
  EXYNOS4X12: Add support cpufreq for EXYNOS4X12
  [CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling
  [CPUFREQ] Add S3C2416/S3C2450 cpufreq driver
  [CPUFREQ] Fix exposure of ARM_EXYNOS4210_CPUFREQ
  [CPUFREQ] EXYNOS4210: update the name of EXYNOS clock register
  [CPUFREQ] EXYNOS: Initialize locking_frequency with initial frequency
  [CPUFREQ] s3c64xx: Fix mis-cherry pick of VDDINT

Fix up trivial conflicts in Kconfig and Makefile due to just changes
next to each other (OMAP2PLUS changes vs some new EXYNOS cpufreq
drivers).

13 years agoMerge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
Linus Torvalds [Sat, 24 Mar 2012 00:51:50 +0000 (17:51 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq

Pull cpufreq fixes from Dave Jones:
 "I meant to get some of these in for 3.3 final, but left things too
  late, so I've got two trees this time."

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  cpufreq: OMAP: specify range for voltage scaling
  cpufreq: OMAP: scale voltage along with frequency
  cpufreq: OMAP driver depends CPUfreq tables

13 years agoMerge branch 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Sat, 24 Mar 2012 00:37:40 +0000 (17:37 -0700)]
Merge branch 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm

Pull #3 ARM updates from Russell King:
 "This adds gpio support to soc_common, allowing an amount of code to be
  deleted from each PCMCIA socket driver for the PXA/SA11x0 SoCs."

* 'pcmcia' of git://git.linaro.org/people/rmk/linux-arm:
  PCMCIA: sa1111: rename sa1111 socket drivers to have sa1111_ prefix.
  PCMCIA: make lubbock socket driver part of sa1111_cs
  PCMCIA: add Kconfig control for building sa11xx_base.c
  PCMCIA: sa1111: jornada720: no need to disable IRQs around sa1111_set_io
  PCMCIA: sa1111: pass along sa1111_pcmcia_configure_socket() failure code
  PCMCIA: soc_common: remove explicit wrprot initialization in socket drivers
  PCMCIA: soc_common: remove soc_pcmcia_*_irqs functions
  PCMCIA: sa11x0: h3600: convert to use new irq/gpio management
  PCMCIA: sa11x0: simpad: convert to use new irq/gpio management
  PCMCIA: sa11x0: shannon: convert to use new irq/gpio management
  PCMCIA: sa11x0: nanoengine: convert reset handling to use GPIO subsystem
  PCMCIA: sa11x0: nanoengine: convert to use new irq/gpio management
  PCMCIA: sa11x0: cerf: convert reset handling to use GPIO subsystem
  PCMCIA: sa11x0: cerf: convert to use new irq/gpio management
  PCMCIA: sa11x0: assabet: convert to use new irq/gpio management
  PCMCIA: sa1111: use new per-socket irq/gpio infrastructure
  PCMCIA: pxa: convert PXA socket drivers to use new irq/gpio management
  PCMCIA: soc_common: add GPIO support for card status signals
  PCMCIA: soc_common: move common initialization into soc_common

13 years agoMerge branch 'amba' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Sat, 24 Mar 2012 00:36:29 +0000 (17:36 -0700)]
Merge branch 'amba' of git://git.linaro.org/people/rmk/linux-arm

Pull #2 ARM updates from Russell King:
 "Further ARM AMBA primecell updates which aren't included directly in
  the previous commit.  I wanted to keep these separate as they're
  touching stuff outside arch/arm/."

* 'amba' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7362/1: AMBA: Add module_amba_driver() helper macro for amba_driver
  ARM: 7335/1: mach-u300: do away with MMC config files
  ARM: 7280/1: mmc: mmci: Cache MMCICLOCK and MMCIPOWER register
  ARM: 7309/1: realview: fix unconnected interrupts on EB11MP
  ARM: 7230/1: mmc: mmci: Fix PIO read for small SDIO packets
  ARM: 7227/1: mmc: mmci: Prepare for SDIO before setting up DMA job
  ARM: 7223/1: mmc: mmci: Fixup use of runtime PM and use autosuspend
  ARM: 7221/1: mmc: mmci: Change from using legacy suspend
  ARM: 7219/1: mmc: mmci: Change vdd_handler to a generic ios_handler
  ARM: 7218/1: mmc: mmci: Provide option to configure bus signal direction
  ARM: 7217/1: mmc: mmci: Put power register deviations in variant data
  ARM: 7216/1: mmc: mmci: Do not release spinlock in request_end
  ARM: 7215/1: mmc: mmci: Increase max_segs from 16 to 128

13 years agoMerge branch 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Sat, 24 Mar 2012 00:30:49 +0000 (17:30 -0700)]
Merge branch 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm

Pull #1 ARM updates from Russell King:
 "This one covers stuff which Arnd is waiting for me to push, as this is
  shared between both our trees and probably other trees elsewhere.

  Essentially, this contains:
   - AMBA primecell device initializer updates - mostly shrinking the
     size of the device declarations in platform code to something more
     reasonable.
   - Getting rid of the NO_IRQ crap from AMBA primecell stuff.
   - Nicolas' idle cleanups.  This in combination with the restart
     cleanups from the last merge window results in a great many
     mach/system.h files being deleted."

Yay: ~80 files, ~2000 lines deleted.

* 'for-armsoc' of git://git.linaro.org/people/rmk/linux-arm: (60 commits)
  ARM: remove disable_fiq and arch_ret_to_user macros
  ARM: make entry-macro.S depend on !MULTI_IRQ_HANDLER
  ARM: rpc: make default fiq handler run-time installed
  ARM: make arch_ret_to_user macro optional
  ARM: amba: samsung: use common amba device initializers
  ARM: amba: spear: use common amba device initializers
  ARM: amba: nomadik: use common amba device initializers
  ARM: amba: u300: use common amba device initializers
  ARM: amba: lpc32xx: use common amba device initializers
  ARM: amba: netx: use common amba device initializers
  ARM: amba: bcmring: use common amba device initializers
  ARM: amba: ep93xx: use common amba device initializers
  ARM: amba: omap2: use common amba device initializers
  ARM: amba: integrator: use common amba device initializers
  ARM: amba: realview: get rid of private platform amba_device initializer
  ARM: amba: versatile: get rid of private platform amba_device initializer
  ARM: amba: vexpress: get rid of private platform amba_device initializer
  ARM: amba: provide common initializers for static amba devices
  ARM: amba: make use of -1 IRQs warn
  ARM: amba: u300: get rid of NO_IRQ initializers
  ...

13 years agoMerge tag 'for-3.4' of git://openrisc.net/jonas/linux
Linus Torvalds [Sat, 24 Mar 2012 00:24:25 +0000 (17:24 -0700)]
Merge tag 'for-3.4' of git://openrisc.net/jonas/linux

Pull OpenRISC changes for 3.4 from Jonas Bonn:
 "This series for the OpenRISC architecture consists of mostly trivial
  fixups.  The most interesting bits of the series are:

  * A fix to the timer code whereby the shortest trigger period is set
    to 100 cycles; previously, it was possible to set this to 1 cycle,
    but by the time the register was written, that time had already
    passed and the timer interrupt would not go off until the cycle
    counter had gone a full cycle.

  * Allowing a device tree binary to be passed in to the kernel from
    u-boot.  The OpenRISC architecture has been recently merged into
    upstream u-boot, so this change gets OpenRISC Linux into sync with
    that project."

* tag 'for-3.4' of git://openrisc.net/jonas/linux:
  OpenRISC: Remove memory_start/end prototypes
  openrisc: remove semicolon from KSTK_ defs
  openrisc: sanitize use of orig_gpr11
  openrisc: fix virt_addr_valid
  OpenRISC: Export dump_stack()
  OpenRISC: Select GENERIC_ATOMIC64
  openrisc: Set shortest clock event to 100 ticks
  openrisc: included linux/thread_info.h twice
  OpenRISC: Use set_current_blocked() and block_sigmask()
  OpenRISC: Don't mask signals if we fail to setup signal stack
  OpenRISC: No need to reset handler if SA_ONESHOT
  OpenRISC: Don't reimplement force_sigsegv()
  openrisc: enable passing of flattened device tree pointer
  arch/openrisc/mm/init.c: trivial: use BUG_ON

13 years agoMerge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl...
Linus Torvalds [Sat, 24 Mar 2012 00:19:37 +0000 (17:19 -0700)]
Merge tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux

Pull miscellaneous Itanium patches from Tony Luck.

The conflicts in arch/ia64/hp/sim/simserial.c were due to patches to
simserial that had alredy been included (with lots of further cleanups)
in the serial tree.

* tag 'ia64-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  Documentation/kernel-parameters: remove inttest parameter
  [IA64] Fix ISA IRQ trigger model and polarity setting
  [IA64] Fix a couple of warnings for EXPORT_SYMBOL
  [IA64] Check return from device_register() in cx_device_register()
  [IA64] Fix warning from machine_kexec.c
  [IA64] simserial, bail out when request_irq fails
  [IA64] hpsim, initialize chip for assigned irqs
  [IA64] simserial, include some headers
  [IA64] hpsim, fix SAL handling in fw-emu
  [IA64] genirq fixup for SGI/SN
  [IA64] disable interrupts when exiting from ia64_mca_cmc_int_handler()

13 years agoMerge branch 'mmci' into amba
Russell King [Sat, 24 Mar 2012 00:10:36 +0000 (00:10 +0000)]
Merge branch 'mmci' into amba

13 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 24 Mar 2012 00:10:09 +0000 (17:10 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull additional x86 fixes from Peter Anvin:
 - address a long-standing bug related to when a kernel-spawned process
   gets a signal on an i386 kernel compiled without CONFIG_VM86.

 - fix the newly introduced build warning in arch/x86/boot.

 - fix a typo in the i386 system call table which affects building some
   libcs.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86-32: Fix endless loop when processing signals for kernel tasks
  x86, boot: Correct CFLAGS for hostprogs
  x86-32: Fix typo for mq_getsetattr in syscall table

13 years agoMerge branch 'akpm' (Andrew's patch-bomb)
Linus Torvalds [Fri, 23 Mar 2012 23:59:10 +0000 (16:59 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)

Merge second batch of patches from Andrew Morton:
 - various misc things
 - core kernel changes to prctl, exit, exec, init, etc.
 - kernel/watchdog.c updates
 - get_maintainer
 - MAINTAINERS
 - the backlight driver queue
 - core bitops code cleanups
 - the led driver queue
 - some core prio_tree work
 - checkpatch udpates
 - largeish crc32 update
 - a new poll() feature for the v4l guys
 - the rtc driver queue
 - fatfs
 - ptrace
 - signals
 - kmod/usermodehelper updates
 - coredump
 - procfs updates

* emailed from Andrew Morton <akpm@linux-foundation.org>: (141 commits)
  seq_file: add seq_set_overflow(), seq_overflow()
  proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
  procfs: speed up /proc/pid/stat, statm
  procfs: add num_to_str() to speed up /proc/stat
  proc: speed up /proc/stat handling
  fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
  coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
  coredump: remove VM_ALWAYSDUMP flag
  kmod: make __request_module() killable
  kmod: introduce call_modprobe() helper
  usermodehelper: ____call_usermodehelper() doesn't need do_exit()
  usermodehelper: kill umh_wait, renumber UMH_* constants
  usermodehelper: implement UMH_KILLABLE
  usermodehelper: introduce umh_complete(sub_info)
  usermodehelper: use UMH_WAIT_PROC consistently
  signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
  signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
  signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
  signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
  Hexagon: use set_current_blocked() and block_sigmask()
  ...

13 years agoseq_file: add seq_set_overflow(), seq_overflow()
KAMEZAWA Hiroyuki [Fri, 23 Mar 2012 22:02:55 +0000 (15:02 -0700)]
seq_file: add seq_set_overflow(), seq_overflow()

It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size.  Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.

Based on an idea from Eric Dumazet.

[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().
Pravin B Shelar [Fri, 23 Mar 2012 22:02:55 +0000 (15:02 -0700)]
proc-ns: use d_set_d_op() API to set dentry ops in proc_ns_instantiate().

The namespace cleanup path leaks a dentry which holds a reference count
on a network namespace.  Keeping that network namespace from being freed
when the last user goes away.  Leaving things like vlan devices in the
leaked network namespace.

If you use ip netns add for much real work this problem becomes apparent
pretty quickly.  It light testing the problem hides because frequently
you simply don't notice the leak.

Use d_set_d_op() so that DCACHE_OP_* flags are set correctly.

This issue exists back to 3.0.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoprocfs: speed up /proc/pid/stat, statm
KAMEZAWA Hiroyuki [Fri, 23 Mar 2012 22:02:54 +0000 (15:02 -0700)]
procfs: speed up /proc/pid/stat, statm

Process accounting applications as top, ps visit some files under
/proc/<pid>.  With seq_put_decimal_ull(), we can optimize /proc/<pid>/stat
and /proc/<pid>/statm files.

This patch adds
  - seq_put_decimal_ll() for signed values.
  - allow delimiter == 0.
  - convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.

Test result on a system with 2000+ procs.

Before patch:
  [kamezawa@bluextal test]$ top -b -n 1 | wc -l
  2223
  [kamezawa@bluextal test]$ time top -b -n 1 > /dev/null

  real    0m0.675s
  user    0m0.044s
  sys     0m0.121s

  [kamezawa@bluextal test]$ time ps -elf > /dev/null

  real    0m0.236s
  user    0m0.056s
  sys     0m0.176s

After patch:
  kamezawa@bluextal ~]$ time top -b -n 1 > /dev/null

  real    0m0.657s
  user    0m0.052s
  sys     0m0.100s

  [kamezawa@bluextal ~]$ time ps -elf > /dev/null

  real    0m0.198s
  user    0m0.050s
  sys     0m0.145s

Considering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoprocfs: add num_to_str() to speed up /proc/stat
KAMEZAWA Hiroyuki [Fri, 23 Mar 2012 22:02:54 +0000 (15:02 -0700)]
procfs: add num_to_str() to speed up /proc/stat

== stat_check.py
num = 0
with open("/proc/stat") as f:
        while num < 1000 :
                data = f.read()
                f.seek(0, 0)
                num = num + 1
==

perf shows

    20.39%  stat_check.py  [kernel.kallsyms]    [k] format_decode
    13.41%  stat_check.py  [kernel.kallsyms]    [k] number
    12.61%  stat_check.py  [kernel.kallsyms]    [k] vsnprintf
    10.85%  stat_check.py  [kernel.kallsyms]    [k] memcpy
     4.85%  stat_check.py  [kernel.kallsyms]    [k] radix_tree_lookup
     4.43%  stat_check.py  [kernel.kallsyms]    [k] seq_printf

This patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().

On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.150s
user    0m0.026s
sys     0m0.121s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real    0m0.055s
user    0m0.022s
sys     0m0.030s

[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Righi <andrea@betterlinux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Turner <pjt@google.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc: speed up /proc/stat handling
Eric Dumazet [Fri, 23 Mar 2012 22:02:53 +0000 (15:02 -0700)]
proc: speed up /proc/stat handling

On a typical 16 cpus machine, "cat /proc/stat" gives more than 4096 bytes,
and is slow :

  # strace -T -o /tmp/STRACE cat /proc/stat | wc -c
  5826
  # grep "cpu " /tmp/STRACE
  read(0, "cpu  1949310 19 2144714 12117253"..., 32768) = 5826 <0.001504>

Thats partly because show_stat() must be called twice since initial
buffer size is too small (4096 bytes for less than 32 possible cpus)

Fix this by :

 1) Taking into account nr_irqs in the initial buffer sizing.

 2) Using ksize() to allow better filling of initial buffer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofs/proc/kcore.c: make get_sparsemem_vmemmap_info() static
Djalal Harouni [Fri, 23 Mar 2012 22:02:52 +0000 (15:02 -0700)]
fs/proc/kcore.c: make get_sparsemem_vmemmap_info() static

get_sparsemem_vmemmap_info() is only used inside fs/proc/kcore.c

Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocoredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP
Jason Baron [Fri, 23 Mar 2012 22:02:51 +0000 (15:02 -0700)]
coredump: add VM_NODUMP, MADV_NODUMP, MADV_CLEAR_NODUMP

Since we no longer need the VM_ALWAYSDUMP flag, let's use the freed bit
for 'VM_NODUMP' flag.  The idea is is to add a new madvise() flag:
MADV_DONTDUMP, which can be set by applications to specifically request
memory regions which should not dump core.

The specific application I have in mind is qemu: we can add a flag there
that wouldn't dump all of guest memory when qemu dumps core.  This flag
might also be useful for security sensitive apps that want to absolutely
make sure that parts of memory are not dumped.  To clear the flag use:
MADV_DODUMP.

[akpm@linux-foundation.org: s/MADV_NODUMP/MADV_DONTDUMP/, s/MADV_CLEAR_NODUMP/MADV_DODUMP/, per Roland]
[akpm@linux-foundation.org: fix up the architectures which broke]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocoredump: remove VM_ALWAYSDUMP flag
Jason Baron [Fri, 23 Mar 2012 22:02:51 +0000 (15:02 -0700)]
coredump: remove VM_ALWAYSDUMP flag

The motivation for this patchset was that I was looking at a way for a
qemu-kvm process, to exclude the guest memory from its core dump, which
can be quite large.  There are already a number of filter flags in
/proc/<pid>/coredump_filter, however, these allow one to specify 'types'
of kernel memory, not specific address ranges (which is needed in this
case).

Since there are no more vma flags available, the first patch eliminates
the need for the 'VM_ALWAYSDUMP' flag.  The flag is used internally by
the kernel to mark vdso and vsyscall pages.  However, it is simple
enough to check if a vma covers a vdso or vsyscall page without the need
for this flag.

The second patch then replaces the 'VM_ALWAYSDUMP' flag with a new
'VM_NODUMP' flag, which can be set by userspace using new madvise flags:
'MADV_DONTDUMP', and unset via 'MADV_DODUMP'.  The core dump filters
continue to work the same as before unless 'MADV_DONTDUMP' is set on the
region.

The qemu code which implements this features is at:

  http://people.redhat.com/~jbaron/qemu-dump/qemu-dump.patch

In my testing the qemu core dump shrunk from 383MB -> 13MB with this
patch.

I also believe that the 'MADV_DONTDUMP' flag might be useful for
security sensitive apps, which might want to select which areas are
dumped.

This patch:

The VM_ALWAYSDUMP flag is currently used by the coredump code to
indicate that a vma is part of a vsyscall or vdso section.  However, we
can determine if a vma is in one these sections by checking it against
the gate_vma and checking for a non-NULL return value from
arch_vma_name().  Thus, freeing a valuable vma bit.

Signed-off-by: Jason Baron <jbaron@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agokmod: make __request_module() killable
Oleg Nesterov [Fri, 23 Mar 2012 22:02:50 +0000 (15:02 -0700)]
kmod: make __request_module() killable

As Tetsuo Handa pointed out, request_module() can stress the system
while the oom-killed caller sleeps in TASK_UNINTERRUPTIBLE.

The task T uses "almost all" memory, then it does something which
triggers request_module().  Say, it can simply call sys_socket().  This
in turn needs more memory and leads to OOM.  oom-killer correctly
chooses T and kills it, but this can't help because it sleeps in
TASK_UNINTERRUPTIBLE and after that oom-killer becomes "disabled" by the
TIF_MEMDIE task T.

Make __request_module() killable.  The only necessary change is that
call_modprobe() should kmalloc argv and module_name, they can't live in
the stack if we use UMH_KILLABLE.  This memory is freed via
call_usermodehelper_freeinfo()->cleanup.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agokmod: introduce call_modprobe() helper
Oleg Nesterov [Fri, 23 Mar 2012 22:02:49 +0000 (15:02 -0700)]
kmod: introduce call_modprobe() helper

No functional changes.  Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agousermodehelper: ____call_usermodehelper() doesn't need do_exit()
Oleg Nesterov [Fri, 23 Mar 2012 22:02:49 +0000 (15:02 -0700)]
usermodehelper: ____call_usermodehelper() doesn't need do_exit()

Minor cleanup.  ____call_usermodehelper() can simply return, no need to
call do_exit() explicitely.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agousermodehelper: kill umh_wait, renumber UMH_* constants
Oleg Nesterov [Fri, 23 Mar 2012 22:02:48 +0000 (15:02 -0700)]
usermodehelper: kill umh_wait, renumber UMH_* constants

No functional changes.  It is not sane to use UMH_KILLABLE with enum
umh_wait, but obviously we do not want another argument in
call_usermodehelper_* helpers.  Kill this enum, use the plain int.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agousermodehelper: implement UMH_KILLABLE
Oleg Nesterov [Fri, 23 Mar 2012 22:02:47 +0000 (15:02 -0700)]
usermodehelper: implement UMH_KILLABLE

Implement UMH_KILLABLE, should be used along with UMH_WAIT_EXEC/PROC.
The caller must ensure that subprocess_info->path/etc can not go away
until call_usermodehelper_freeinfo().

call_usermodehelper_exec(UMH_KILLABLE) does
wait_for_completion_killable.  If it fails, it uses
xchg(&sub_info->complete, NULL) to serialize with umh_complete() which
does the same xhcg() to access sub_info->complete.

If call_usermodehelper_exec wins, it can safely return.  umh_complete()
should get NULL and call call_usermodehelper_freeinfo().

Otherwise we know that umh_complete() was already called, in this case
call_usermodehelper_exec() falls back to wait_for_completion() which
should succeed "very soon".

Note: UMH_NO_WAIT == -1 but it obviously should not be used with
UMH_KILLABLE.  We delay the neccessary cleanup to simplify the back
porting.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agousermodehelper: introduce umh_complete(sub_info)
Oleg Nesterov [Fri, 23 Mar 2012 22:02:47 +0000 (15:02 -0700)]
usermodehelper: introduce umh_complete(sub_info)

Preparation.  Add the new trivial helper, umh_complete().  Currently it
simply does complete(sub_info->complete).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agousermodehelper: use UMH_WAIT_PROC consistently
Oleg Nesterov [Fri, 23 Mar 2012 22:02:46 +0000 (15:02 -0700)]
usermodehelper: use UMH_WAIT_PROC consistently

A few call_usermodehelper() callers use the hardcoded constant instead of
the proper UMH_WAIT_PROC, fix them.

Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosignal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/
Oleg Nesterov [Fri, 23 Mar 2012 22:02:46 +0000 (15:02 -0700)]
signal: zap_pid_ns_processes: s/SEND_SIG_NOINFO/SEND_SIG_FORCED/

Change zap_pid_ns_processes() to use SEND_SIG_FORCED, it looks more
clear compared to SEND_SIG_NOINFO which relies on from_ancestor_ns logic
send_signal().

It is also more efficient if we need to kill a lot of tasks because it
doesn't alloc sigqueue.

While at it, add the __fatal_signal_pending(task) check as a minor
optimization.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosignal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()
Oleg Nesterov [Fri, 23 Mar 2012 22:02:45 +0000 (15:02 -0700)]
signal: oom_kill_task: use SEND_SIG_FORCED instead of force_sig()

Change oom_kill_task() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL).  With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.

And this is more correct.  force_sig() can race with the exiting thread
even if oom_kill_task() checks p->mm != NULL, while
do_send_sig_info(group => true) kille the whole process.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosignal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths
Oleg Nesterov [Fri, 23 Mar 2012 22:02:45 +0000 (15:02 -0700)]
signal: cosmetic, s/from_ancestor_ns/force/ in prepare_signal() paths

Cosmetic, rename the from_ancestor_ns argument in prepare_signal()
paths.  After the previous change it doesn't match the reality.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosignal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE
Oleg Nesterov [Fri, 23 Mar 2012 22:02:44 +0000 (15:02 -0700)]
signal: give SEND_SIG_FORCED more power to beat SIGNAL_UNKILLABLE

force_sig_info() and friends have the special semantics for synchronous
signals, this interface should not be used if the target is not current.
And it needs the fixes, in particular the clearing of SIGNAL_UNKILLABLE
is not exactly right.

However there are callers which have to use force_ exactly because it
clears SIGNAL_UNKILLABLE and thus it can kill the CLONE_NEWPID tasks,
although this is almost always is wrong by various reasons.

With this patch SEND_SIG_FORCED ignores SIGNAL_UNKILLABLE, like we do if
the signal comes from the ancestor namespace.

This makes the naming in prepare_signal() paths insane, fixed by the
next cleanup.

Note: this only affects SIGKILL/SIGSTOP, but this is enough for
force_sig() abusers.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoHexagon: use set_current_blocked() and block_sigmask()
Matt Fleming [Fri, 23 Mar 2012 22:02:43 +0000 (15:02 -0700)]
Hexagon: use set_current_blocked() and block_sigmask()

As described in e6fa16ab9c1e ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures.  In the past some architectures got this code
wrong, so using this helper function should stop that from happening
again.

Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Richard Kuo <rkuo@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: remove PTRACE_SEIZE_DEVEL bit
Denys Vlasenko [Fri, 23 Mar 2012 22:02:43 +0000 (15:02 -0700)]
ptrace: remove PTRACE_SEIZE_DEVEL bit

PTRACE_SEIZE code is tested and ready for production use, remove the
code which requires special bit in data argument to make PTRACE_SEIZE
work.

Strace team prepares for a new release of strace, and we would like to
ship the code which uses PTRACE_SEIZE, preferably after this change goes
into released kernel.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match
Denys Vlasenko [Fri, 23 Mar 2012 22:02:42 +0000 (15:02 -0700)]
ptrace: renumber PTRACE_EVENT_STOP so that future new options and events can match

PTRACE_EVENT_foo and PTRACE_O_TRACEfoo used to match.

New PTRACE_EVENT_STOP is the first event which has no corresponding
PTRACE_O_TRACE option.  If we will ever want to add another such option,
its PTRACE_EVENT's value will collide with PTRACE_EVENT_STOP's value.

This patch changes PTRACE_EVENT_STOP value to prevent this.

While at it, added a comment - the one atop PTRACE_EVENT block, saying
"Wait extended result codes for the above trace options", is not true
for PTRACE_EVENT_STOP.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter
Denys Vlasenko [Fri, 23 Mar 2012 22:02:42 +0000 (15:02 -0700)]
ptrace: make PTRACE_SEIZE set ptrace options specified in 'data' parameter

This can be used to close a few corner cases in strace where we get
unwanted racy behavior after attach, but before we have a chance to set
options (the notorious post-execve SIGTRAP comes to mind), and removes
the need to track "did we set opts for this task" state in strace
internals.

While we are at it:

Make it possible to extend SEIZE in the future with more functionality
by passing non-zero 'addr' parameter.  To that end, error out if 'addr'
is non-zero.  PTRACE_ATTACH did not (and still does not) have such
check, and users (strace) do pass garbage there...  let's avoid
repeating this mistake with SEIZE.

Set all task->ptrace bits in one operation - before this change, we were
adding PT_SEIZED and PT_PTRACE_CAP with task->ptrace |= BIT ops.  This
was probably ok (not a bug), but let's be on a safer side.

Changes since v2: use (unsigned long) casts instead of (long) ones, move
PTRACE_SEIZE_DEVEL-related code to separate lines of code.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code
Denys Vlasenko [Fri, 23 Mar 2012 22:02:41 +0000 (15:02 -0700)]
ptrace: simplify PTRACE_foo constants and PTRACE_SETOPTIONS code

Exchange PT_TRACESYSGOOD and PT_PTRACE_CAP bit positions, which makes
PT_option bits contiguous and therefore makes code in
ptrace_setoptions() much simpler.

Every PTRACE_O_TRACEevent is defined to (1 << PTRACE_EVENT_event)
instead of using explicit numeric constants, to ensure we don't mess up
relationship between bit positions and event ids.

PT_EVENT_FLAG_SHIFT was not particularly useful, PT_OPT_FLAG_SHIFT with
value of PT_EVENT_FLAG_SHIFT-1 is easier to use.

PT_TRACE_MASK constant is nuked, the only its use is replaced by
(PTRACE_O_MASK << PT_OPT_FLAG_SHIFT).

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: don't modify flags on PTRACE_SETOPTIONS failure
Denys Vlasenko [Fri, 23 Mar 2012 22:02:40 +0000 (15:02 -0700)]
ptrace: don't modify flags on PTRACE_SETOPTIONS failure

On ptrace(PTRACE_SETOPTIONS, pid, 0, <opts>), we used to set those
option bits which are known, and then fail with -EINVAL if there are
some unknown bits in <opts>.

This is inconsistent with typical error handling, which does not change
any state if input is invalid.

This patch changes PTRACE_SETOPTIONS behavior so that in this case, we
return -EINVAL and don't change any bits in task->ptrace.

It's very unlikely that there is userspace code in the wild which will
be affected by this change: it should have the form

    ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_BOGUSOPT)

where PTRACE_O_BOGUSOPT is a constant unknown to the kernel.  But kernel
headers, naturally, don't contain any PTRACE_O_BOGUSOPTs, thus the only
way userspace can use one if it defines one itself.  I can't see why
anyone would do such a thing deliberately.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: don't send SIGTRAP on exec if SEIZED
Oleg Nesterov [Fri, 23 Mar 2012 22:02:40 +0000 (15:02 -0700)]
ptrace: don't send SIGTRAP on exec if SEIZED

ptrace_event(PTRACE_EVENT_EXEC) sends SIGTRAP if PT_TRACE_EXEC is not
set.  This is because this SIGTRAP predates PTRACE_O_TRACEEXEC option,
we do not need/want this with PT_SEIZED which can set the options during
attach.

Suggested-by: Pedro Alves <palves@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: Indan Zupancic <indan@nul.nu>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoptrace: the killed tracee should not enter the syscall
Oleg Nesterov [Fri, 23 Mar 2012 22:02:39 +0000 (15:02 -0700)]
ptrace: the killed tracee should not enter the syscall

Another old/known problem.  If the tracee is killed after it reports
syscall_entry, it starts the syscall and debugger can't control this.
This confuses the users and this creates the security problems for
ptrace jailers.

Change tracehook_report_syscall_entry() to return non-zero if killed,
this instructs syscall_trace_enter() to abort the syscall.

Reported-by: Chris Evans <scarybeasts@gmail.com>
Tested-by: Indan Zupancic <indan@nul.nu>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Denys Vlasenko <vda.linux@googlemail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofat: fix bug in enforcing Long File Name length
Namjae Jeon [Fri, 23 Mar 2012 22:02:39 +0000 (15:02 -0700)]
fat: fix bug in enforcing Long File Name length

Since '*outlen' is initialized to zero, it is currently possible to
create a filename of length (FAT_LFN_LEN + 1) when utf8 is not enabled.
To enforce the FAT_LFN_LEN limit, we must perform one less iteration.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofat: clean up xlate_to_uni()
Namjae Jeon [Fri, 23 Mar 2012 22:02:38 +0000 (15:02 -0700)]
fat: clean up xlate_to_uni()

xlate_to_uni() is called by vfat_build_slots() with sbi->nls_io as the
final argument.  nls_io can never be null at this point because the
check is already being done in fat_fill_super() wherein the mount fails
if it is null.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Ravishankar N <cyberax82@gmail.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: ds1307: generalise ram size and offset
Austin Boyle [Fri, 23 Mar 2012 22:02:38 +0000 (15:02 -0700)]
rtc: ds1307: generalise ram size and offset

Generalise NVRAM to support RAM with other size and offset, such as the
64 bytes of SRAM on the mcp7941x.

[rdunlap@xenotime.net: fix printk format warning]
Signed-off-by: Austin Boyle <Austin.Boyle@aviatnet.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Anders <danders.dev@gmail.com>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: ds1307: comment and format cleanup
David Anders [Fri, 23 Mar 2012 22:02:37 +0000 (15:02 -0700)]
rtc: ds1307: comment and format cleanup

Do some cleanup of the comment sections as well as correct some
formatting issues reported by checkpatch.pl.

Signed-off-by: David Anders <x0132446@ti.com>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Austin Boyle <Austin.Boyle@aviatnet.com>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: ds1307: simplify irq setup code
Wolfram Sang [Fri, 23 Mar 2012 22:02:37 +0000 (15:02 -0700)]
rtc: ds1307: simplify irq setup code

No need to have two seperate if-blocks for setting up the irq.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: David Anders <danders.dev@gmail.com>
Cc: Austin Boyle <Austin.Boyle@aviatnet.com>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: ds1307: refactor chip_desc table
Wolfram Sang [Fri, 23 Mar 2012 22:02:36 +0000 (15:02 -0700)]
rtc: ds1307: refactor chip_desc table

The chip_desc table is suboptimal.  Currently it requires an entry for
every new chip type, even if it is empty.  This has already been
forgotten for the ds1388.  Refactor the code, so new entries are only
needed, when they chip type really needs a (non-empty) description.
Also make the table visually more appealing.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Austin Boyle <Austin.Boyle@aviatnet.com>
Cc: David Anders <danders.dev@gmail.com>
Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: driver for DA9052/53 PMIC v1
Ashish Jangam [Fri, 23 Mar 2012 22:02:36 +0000 (15:02 -0700)]
rtc: driver for DA9052/53 PMIC v1

RTC Driver for Dialog Semiconductor DA9052/53 PMICs.

This patch is functionally tested on Samsung SMDKV6410.

[akpm@linux-foundation.org: clean up file header layout, remove unneeded initialisation of local arrays]
Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: Paul Gortmaker <p_gortmaker@yahoo.com>
Cc: David Dajun Chen <dchen@diasemi.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-max8925.c: fix alarm->enabled mistake in max8925_rtc_read_alarm/max89...
Kevin Liu [Fri, 23 Mar 2012 22:02:36 +0000 (15:02 -0700)]
drivers/rtc/rtc-max8925.c: fix alarm->enabled mistake in max8925_rtc_read_alarm/max8925_rtc_set_alarm

max8925_rtc_read_alarm() should set alrm->enabled based on both
ALARM_IRQ_MASK and ALARM_CTRL setting.  max8925_rtc_set_alarm() should
enable/disable alarm according to ALARM_CTRL reg setting.

Signed-off-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-max8925.c: fix max8925_rtc_read_alarm() return value error
Kevin Liu [Fri, 23 Mar 2012 22:02:35 +0000 (15:02 -0700)]
drivers/rtc/rtc-max8925.c: fix max8925_rtc_read_alarm() return value error

max8925_rtc_read_alarm should always return 0 with success

Signed-off-by: Kevin Liu <kliu5@marvell.com>
Signed-off-by: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-pm8xxx.c: make pm8xxx_rtc_pm_ops static
Navin P [Fri, 23 Mar 2012 22:02:34 +0000 (15:02 -0700)]
drivers/rtc/rtc-pm8xxx.c: make pm8xxx_rtc_pm_ops static

Signed-off-by: Navin P <zicrim@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc: remove IRQF_DISABLED
Yong Zhang [Fri, 23 Mar 2012 22:02:34 +0000 (15:02 -0700)]
drivers/rtc: remove IRQF_DISABLED

Since commit e58aa3d2d0cc ("genirq: run irq handlers with interrupts
disabled") we run all interrupt handlers with interrupts disabled and we
even check and yell when an interrupt handler returns with interrupts
enabled - see commit b738a50a2026 ("genirq: warn when handler enables
interrupts").

So now this flag is a NOOP and can be removed.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Wan ZongShun <mcuos.com@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-twl.c: return correct RTC event from ISR
Venu Byravarasu [Fri, 23 Mar 2012 22:02:34 +0000 (15:02 -0700)]
drivers/rtc/rtc-twl.c: return correct RTC event from ISR

Following changes are made as part of this change:

1. As TWL RTC supports periodic interrupt, the correct event should be
   RTC_PF instead of RTC_UF.

2. No need to initialize variable "events" to 0 & then OR it with the
   event values.  Hence fixing it.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-twl.c: simplify RTC interrupt clearing
Venu Byravarasu [Fri, 23 Mar 2012 22:02:33 +0000 (15:02 -0700)]
drivers/rtc/rtc-twl.c: simplify RTC interrupt clearing

For clearing RTC interrupt, programming ALARM bit only is sufficient, as
all other bits are any way not affected by writing 0 to them.

Hence removed unwanted OR operation.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-twl.c: enable RTC irrespective of its prior state
Venu Byravarasu [Fri, 23 Mar 2012 22:02:33 +0000 (15:02 -0700)]
drivers/rtc/rtc-twl.c: enable RTC irrespective of its prior state

As part of probe, before enabling RTC, RTC_CTRL register is read to check
if it is already running.  If RTC is used by kernel alone, then this read
is not required.  Even if RTC was enabled already by boot loader, setting
STOP_RTC bit again should not harm.  Hence removed unwanted read
operation.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/rtc/rtc-twl.c: optimize IRQ bit access
Venu Byravarasu [Fri, 23 Mar 2012 22:02:32 +0000 (15:02 -0700)]
drivers/rtc/rtc-twl.c: optimize IRQ bit access

As the TWL RTC driver has a cached copy of enabled RTC interrupt bits in
variable rtc_irq_bits, that can be checked before really setting or
masking any of the interrupt bits.

Signed-off-by: Venu Byravarasu <vbyravarasu@nvidia.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMIPS: add RTC support for loongson1B
zhao zhang [Fri, 23 Mar 2012 22:02:32 +0000 (15:02 -0700)]
MIPS: add RTC support for loongson1B

Add RTC support(TOY counter0) for loongson1B SOC

Signed-off-by: zhao zhang <zhzhl555@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: convert rtc i2c drivers to module_i2c_driver
Axel Lin [Fri, 23 Mar 2012 22:02:31 +0000 (15:02 -0700)]
rtc: convert rtc i2c drivers to module_i2c_driver

Factor out some boilerplate code for i2c driver registration into
module_i2c_driver.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Piotr Ziecik <kosmo@semihalf.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Srikanth Srinivasan <srikanth.srinivasan@freescale.com>
Cc: Mike Rapoport <mike@compulab.co.il>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Roman Fietze <roman.fietze@telemotive.de>
Cc: Herbert Valerio Riedel <hvr@gnu.org>
Cc: Alexander Bigga <ab@mycable.de>
Cc: Dale Farnsworth <dale@farnsworth.org>
Cc: Gregory Hermant <gregory.hermant@calao-systems.com>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Martyn Welch <martyn.welch@ge.com>
Cc: Byron Bradley <byron.bbradley@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc: convert rtc spi drivers to module_spi_driver
Axel Lin [Fri, 23 Mar 2012 22:02:30 +0000 (15:02 -0700)]
rtc: convert rtc spi drivers to module_spi_driver

Factor out some boilerplate code for spi driver registration into
module_spi_driver.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Mark Jackson <mpfj@mimc.co.uk>
Cc: Dennis Aberilla <denzzzhome@yahoo.com>
Cc: Nikolaus Voss <n.voss@weinmann.de>
Cc: "Kim B. Heino" <Kim.Heino@bluegiga.com>
Cc: Raphael Assenat <raph@raphnet.net>
Cc: Chris Verges <chrisv@cyberswitching.com>
Cc: Magnus Damm <damm@opensource.se>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc/rtc-spear: call platform_set_drvdata() before registering rtc device
Viresh Kumar [Fri, 23 Mar 2012 22:02:30 +0000 (15:02 -0700)]
rtc/rtc-spear: call platform_set_drvdata() before registering rtc device

rtc_device_register() calls rtc-spear routines internally.  These
routines call dev_get_drvdata() to get struct spear_rtc_config.
Currently, platform_set_drvdata is called after rtc device is
registered.  This causes system to crash, as dev_get_drvdata returns
NULL.

For this we need to call platform_set_drvdata() before registering rtc
device.  This requires further cleanup, that leads to removal of
dev_set_drvdata on rtc->dev, which was just not required at all.

Also, we change the parameter to request_irq and pass pointer to config
instead of pointer to rtc struct.

This patch brings all above changes.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Shiraz Hashim <shiraz.hashim@st.com>
Cc: Deepak Sikri <deepak.sikri@st.com>
Acked-by: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc/spear: fix for RTC_AIE_ON and RTC_AIE_OFF ioctl errors
Shiraz Hashim [Fri, 23 Mar 2012 22:02:29 +0000 (15:02 -0700)]
rtc/spear: fix for RTC_AIE_ON and RTC_AIE_OFF ioctl errors

Define API for '.alarm_irq_enable' to enable and disable alarm irq. This
is required by the framework else RTC_AIE_ON and RTC_AIE_OFF ioctls
return errors.

Signed-off-by: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Deepak Sikri <deepak.sikri@st.com>
Acked-by: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agortc-spear: fix for balancing the enable_irq_wake in Power Mgmt
Deepak Sikri [Fri, 23 Mar 2012 22:02:29 +0000 (15:02 -0700)]
rtc-spear: fix for balancing the enable_irq_wake in Power Mgmt

Handle the fix for unbalanced irq for the cases when enable_irq_wake
fails, and a warning related to same is displayed on the console.  The
workaround is handled at the driver level.

Signed-off-by: Deepak Sikri <deepak.sikri@st.com>
Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Acked-by: Rajeev Kumar <rajeev-dlh.kumar@st.com>
Cc: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinit/do_mounts.c: print error code on mount failure
Bernhard Walle [Fri, 23 Mar 2012 22:02:28 +0000 (15:02 -0700)]
init/do_mounts.c: print error code on mount failure

Printing the error code makes it easier to debug the cause of a mount
failure.  For example I had the problem that the root file system could
not be mounted read-writeable because my SD card was write-protected.
Without an error code it looks like the SD card was not detected at all.

Signed-off-by: Bernhard Walle <bernhard@bwalle.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinit: check printed flag to skip printing message
Diwakar Tundlam [Fri, 23 Mar 2012 22:02:28 +0000 (15:02 -0700)]
init: check printed flag to skip printing message

Otherwise the 'Calibration skipped' message gets printed everytime a CPU
is hotplugged in, cluttering console for systems that frequently hotplug
CPUs.

Signed-off-by: Diwakar Tundlam <dtundlam@nvidia.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Sameer Nanda <snanda@chromium.org>
Cc: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoepoll: remove unneeded variable in reverse_path_check()
Dan Carpenter [Fri, 23 Mar 2012 22:02:28 +0000 (15:02 -0700)]
epoll: remove unneeded variable in reverse_path_check()

We never use the length variable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoepoll: comment the funky #ifdef
Steven Rostedt [Fri, 23 Mar 2012 22:02:27 +0000 (15:02 -0700)]
epoll: comment the funky #ifdef

Looking for a bug in -rt, I stumbled across this code here from: commit
2dfa4eeab0fc ("epoll keyed wakeups: teach epoll about hints coming with
the wakeup key"), specifically:

  #ifdef CONFIG_DEBUG_LOCK_ALLOC
  static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
                                      unsigned long events, int subclass)
  {
         unsigned long flags;

         spin_lock_irqsave_nested(&wqueue->lock, flags, subclass);
         wake_up_locked_poll(wqueue, events);
         spin_unlock_irqrestore(&wqueue->lock, flags);
  }
  #else
  static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
                                      unsigned long events, int subclass)
  {
         wake_up_poll(wqueue, events);
  }
  #endif

You change the function of ep_wake_up_nested() depending on whether
CONFIG_DEBUG_LOCK_ALLOC is set or not.  This looks awfully suspicious,
and there's no comment to explain why.  I initially thought that this
was trying to fool lockdep, and hiding a real bug.

Investigating it, I found the creation of wake_up_nested() (which no
longer exists) but was created for the sole purpose of epoll and its
strange wake ups, as explained in commit 0ccf831cbee9 ("lockdep:
annotate epoll")

Although the commit message says "annotate epoll" the change log is much
better at explaining what is happening than what is in the actual code.
Thus a comment is really necessary here.  And to save the time of other
developers from having to go trudging through the git logs trying to
figure out why this code exists.

I took parts of the change log and placed it into a comment above the
affected code.  This will make the description of what is happening more
visible to new developers that have to look at this code for the first
time.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>