]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
17 years agoext4: Invert the locking order of page_lock and transaction start
Jan Kara [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Invert the locking order of page_lock and transaction start

This changes are needed to support data=ordered mode handling via
inodes.  This enables us to get rid of the journal heads and buffer
heads for data buffers in the ordered mode.  With the changes, during
tranasaction commit we writeout the inode pages using the
writepages()/writepage(). That implies we take page lock during
transaction commit. This can cause a deadlock with the locking order
page_lock -> jbd2_journal_start, since the jbd2_journal_start can wait
for the journal_commit to happen and the journal_commit now needs to
take the page lock. To avoid this dead lock reverse the locking order.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agovfs: Move mark_inode_dirty() from under page lock in generic_write_end()
Jan Kara [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
vfs: Move mark_inode_dirty() from under page lock in generic_write_end()

There's no need to call mark_inode_dirty() under page lock in
generic_write_end(). It unnecessarily makes hold time of page lock longer
and more importantly it forces locking order of page lock and transaction
start for journaling filesystems.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Use page_mkwrite vma_operations to get mmap write notification.
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Use page_mkwrite vma_operations to get mmap write notification.

We would like to get notified when we are doing a write on mmap section.
This is needed with respect to preallocated area. We split the preallocated
area into initialzed extent and uninitialzed extent in the call back. This
let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and
that would result in data loss. The changes are also needed to handle ENOSPC
when writing to an mmap section of files with holes.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Documentation updates.
Jose R. Santos [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Documentation updates.

Some of the information in Documentation/filesystems/ext4.txt is out
of date and in need of an update.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: fix online resize with mballoc
Frederic Bohe [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: fix online resize with mballoc

Update group infos when updating a group's descriptor.
Add group infos when adding a group's descriptor.
Refresh cache pages used by mb_alloc when changes occur.
This will probably need modifications when META_BG resizing will be allowed.

Signed-off-by: Frederic Bohe <frederic.bohe@bull.net>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
17 years agoext4: use atomic functions to set bh_state
Eric Sandeen [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: use atomic functions to set bh_state

Use the BUFFER_FNS functions (set_buffer_foo) to set buffer
head state atomically instead of nonatomic __set_bit().

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Set journal pointer to NULL when journal is released
Jan Kara [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Set journal pointer to NULL when journal is released

Set sbi->s_journal to NULL after we call journal_destroy(). This
will be later needed because after journal_destroy() is called,
ext4_clear_inode() can still be called for some inodes (e.g. root
inode) and we'll need to detect there that journal doesn't exists
anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: mballoc avoid use root reserved blocks for non root allocation
Mingming Cao [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: mballoc avoid use root reserved blocks for non root allocation

mballoc allocation missed check for blocks reserved for root users. Add
ext4_has_free_blocks() check before allocation. Also modified
ext4_has_free_blocks() to support multiple block allocation request.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: call blkdev_issue_flush on fsync
Eric Sandeen [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: call blkdev_issue_flush on fsync

To ensure that bits are truly on-disk after an fsync,
we should call blkdev_issue_flush if barriers are supported.

Inspired by an old thread on barriers, by reiserfs & xfs
which do the same, and by a patch SuSE ships with their kernel

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: cleanup block allocator
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: cleanup block allocator

Move the code related to block allocation to a single function and add helper
funtions to differient allocation for data and meta data blocks

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Use inode preallocation with -o noextents
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Use inode preallocation with -o noextents

When mballoc is enabled, block allocation for old block-based
files are allocated using mballoc allocator instead of old
block-based allocator. The old ext3 block reservation is turned
off when mballoc is turned on.

However, the in-core preallocation is not enabled for block-based/
non-extent based file block allocation. This result in performance
regression, as now we don't have "reservation" ore in-core preallocation
to prevent interleaved fragmentation in multiple writes workload.

This patch fix this by enable per inode in-core preallocation
for non extent files when mballoc is used.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: fix ext4_init_block_bitmap() for metablock block group
Akinobu Mita [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: fix ext4_init_block_bitmap() for metablock block group

When meta_bg feature is enabled and s_first_meta_bg != 0,
ext4_init_block_bitmap() miscalculates the number of block used by
the group descriptor table (0 or 1 for metablock block group)

This patch fixes this by using ext4_bg_num_gdb()

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Andreas Dilger <adilger@sun.com>
17 years agoext4: Fix sparse warning
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Fix sparse warning

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: cleanup never-used magic numbers from htree code
Li Zefan [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: cleanup never-used magic numbers from htree code

dx_root_limit() will had some dead code which forced it to always return
20, and dx_node_limit to always return 22 for debugging purposes.
Remove it.

Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Fix ext4_ext_journal_restart() to reflect errors up to the caller
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Fix ext4_ext_journal_restart() to reflect errors up to the caller

Fix ext4_ext_journal_restart() so it returns any errors reported by
ext4_journal_extend() and ext4_journal_restart().

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Make ext4_ext_find_extent fills ext_path completely
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Make ext4_ext_find_extent fills ext_path completely

When pos=0 or depth, the fields of ext4_ext_path is are not
completely filled.  This patch also removes some unnecessary code.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: return error when calling ext4_ext_split failed
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: return error when calling ext4_ext_split failed

ext4_ext_create_new_leaf must return error when its
calling to ext4_ext_split failed.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Update i_disksize properly when allocating from fallocate area.
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Update i_disksize properly when allocating from fallocate area.

When allocating unitialized space at the end of file which had been
preallocated with the FALLOC_FL_KEEP_SIZE option, the file size is not
updated at that time.  But the later we are not updating the file size
when writing to that preallocated space.

These changes are for code correctness.  This patch allows us to update
the i_disksize at the write_end() callback of filesystem properly.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: remove quota allocation when ext4_mb_new_blocks fails
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: remove quota allocation when ext4_mb_new_blocks fails

Quota allocation is not removed when ext4_mb_new_blocks calls
kmem_cache_alloc failed.  Also make sure the allocation context is freed
on the error path.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: remove redundant code in ext4_fill_super()
Li Zefan [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: remove redundant code in ext4_fill_super()

The previous sb_min_blocksize() has already set the block size.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agojbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction
Mingming Cao [Mon, 14 Jul 2008 01:06:39 +0000 (21:06 -0400)]
jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit transaction

journal_try_to_free_buffers() could race with jbd commit transaction
when the later is holding the buffer reference while waiting for the
data buffer to flush to disk. If the caller of
journal_try_to_free_buffers() request tries hard to release the buffers,
it will treat the failure as error and return back to the caller. We
have seen the directo IO failed due to this race.  Some of the caller of
releasepage() also expecting the buffer to be dropped when passed with
GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers().

With this patch, if the caller is passing the GFP_KERNEL to indicating
this call could wait, in case of try_to_free_buffers() failed, let's
waiting for journal_commit_transaction() to finish commit the current
committing transaction , then try to free those buffers again with
journal locked.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: New inode allocation for FLEX_BG meta-data groups.
Jose R. Santos [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: New inode allocation for FLEX_BG meta-data groups.

This patch mostly controls the way inode are allocated in order to
make ialloc aware of flex_bg block group grouping.  It achieves this
by bypassing the Orlov allocator when block group meta-data are packed
toghether through mke2fs.  Since the impact on the block allocator is
minimal, this patch should have little or no effect on other block
allocation algorithms. By controlling the inode allocation, it can
basically control where the initial search for new block begins and
thus indirectly manipulate the block allocator.

This allocator favors data and meta-data locality so the disk will
gradually be filled from block group zero upward.  This helps improve
performance by reducing seek time.  Since the group of inode tables
within one flex_bg are treated as one giant inode table, uninitialized
block groups would not need to partially initialize as many inode
table as with Orlov which would help fsck time as the filesystem usage
goes up.

Signed-off-by: Jose R. Santos <jrs@us.ibm.com>
Signed-off-by: Valerie Clement <valerie.clement@bull.net>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agojbd2: Add commit time into the commit block
Theodore Ts'o [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
jbd2: Add commit time into the commit block

Carlo Wood has demonstrated that it's possible to recover deleted
files from the journal.  Something that will make this easier is if we
can put the time of the commit into commit block.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: replace __FUNCTION__ occurrences
Stoyan Gaydarov [Mon, 14 Jul 2008 01:03:29 +0000 (21:03 -0400)]
ext4: replace __FUNCTION__ occurrences

__FUNCTION__ is gcc-specific, use __func__ instead

Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoext4: fix error processing in mb_free_blocks
Shen Feng [Mon, 14 Jul 2008 01:03:31 +0000 (21:03 -0400)]
ext4: fix error processing in mb_free_blocks

The error processing of the return value of mb_free_blocks is meanless
because it only returns 0.  This fix includes

- make mb_free_blocks return void

- remove the error processing part in callers

- unlock group before calling ext4_error in mb_free_blocks

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoext4: error proc entry creation when the fs/ext4 is not correctly created
Shen Feng [Mon, 14 Jul 2008 01:03:31 +0000 (21:03 -0400)]
ext4: error proc entry creation when the fs/ext4 is not correctly created

When the directory fs/ext4 is not correctly created under proc, the entry
under this directory should not be created.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoext4: fix build failure if DX_DEBUG is enabled
Li Zefan [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: fix build failure if DX_DEBUG is enabled

ext4_next_entry() is used by the debugging function dx_show_leaf(), so
it must be defined before that function.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Remove unused variable from ext4_show_options
Theodore Ts'o [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Remove unused variable from ext4_show_options

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Rename read_block_bitmap() to ext4_read_block_bitmap()
Theodore Ts'o [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Rename read_block_bitmap() to ext4_read_block_bitmap()

Since this a non-static function, make it be ext4 specific to avoid
conflicts with potentially other filesystems.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: remove double definitions of xattr macros
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: remove double definitions of xattr macros

remove the definitions of macros XATTR_TRUSTED_PREFIX and XATTR_USER_PREFIX
since they are defined in linux/xattr.h

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: miscellaneous error checks and coding cleanups for mballoc
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: miscellaneous error checks and coding cleanups for mballoc

ext4_mb_seq_history_open(): check if sbi->s_mb_history is NULL

ext4_mb_history_init(): replace kmalloc and memset with kzalloc

ext4_mb_init_backend(): remove memset since kzalloc is used

ext4_mb_init(): the return value of ext4_mb_init_backend is int,
but i is unsigned, replace it with a new int variable.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: add error processing when calling ext4_mb_init_cache in mballoc
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: add error processing when calling ext4_mb_init_cache in mballoc

Add error processing for ext4_mb_load_buddy when it calls
ext4_mb_init_cache.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Fix ext4_mb_init_cache return error
Mingming Cao [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Fix ext4_mb_init_cache return error

ext4_mb_init_cache() incorrectly always return EIO on success. This
causes the caller of ext4_mb_init_cache() fail when it checks the return
value.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: improve some code in rb tree part of dir.c
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: improve some code in rb tree part of dir.c

* remove unnecessary code in free_rb_tree_fname

* rename free_rb_tree_fname to ext4_htree_create_dir_info
  since it and ext4_htree_free_dir_info are a pair

* replace kmalloc with kzalloc in ext4_htree_free_dir_info

All these make the code more readable and simple.
PS: this patch is also suitable for ext3.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: switch to seq_files
Alexey Dobriyan [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: switch to seq_files

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
17 years agoext4: Use BUG_ON() instead of BUG()
Julia Lawall [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Use BUG_ON() instead of BUG()

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: start searching for the right extent from the goal group.
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: start searching for the right extent from the goal group.

With mballoc we search for the best extent using different
criteria. We should always use the goal group when we are
starting with a new criteria.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: fix comments to say "ext4"
Shen Feng [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: fix comments to say "ext4"

Change second/third to fourth.

Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: Fix mb_find_next_bit not to return larger than max
Aneesh Kumar K.V [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: Fix mb_find_next_bit not to return larger than max

Some architectures implement ext4_find_next_bit and
ext4_find_next_zero_bit in such a way that they return
greater than max for some input values. Make sure
mb_find_next_bit and mb_find_next_zero_bit return the
right values.

On 2.6.25 we have include/asm-x86/bitops_32.h
static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
{
unsigned x = 0;

while (x < size) {
unsigned long val = *addr++;
if (val)
return __ffs(val) + x;
x += (sizeof(*addr)<<3);
}
return x;
}

This can return value greater than size.

Reported and fixed here for lustre

https://bugzilla.lustre.org/show_bug.cgi?id=15932
https://bugzilla.lustre.org/attachment.cgi?id=17205

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
17 years agoext4: validate directory entry data before use
Duane Griffin [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: validate directory entry data before use

ext4_dx_find_entry uses ext4_next_entry without verifying that the entry is
valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop
to check the validity of entries before checking whether they match and
moving onto the next one.

There are other uses of ext4_next_entry in this file which also look
problematic. They should be reviewed and fixed if/when we have a test-case
that triggers them.

This patch fixes the first case (image hdb.25.softlockup.gz) reported in
http://bugzilla.kernel.org/show_bug.cgi?id=10882.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoext4: handle deleting corrupted indirect blocks
Duane Griffin [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: handle deleting corrupted indirect blocks

While freeing indirect blocks we attach a journal head to the parent buffer
head, free the blocks, then journal the parent. If the indirect block list
is corrupted and points to the parent the journal head will be detached
when the block is cleared, causing an OOPS.

Check for that explicitly and handle it gracefully.

This patch fixes the third case (image hdb.20000057.nullderef.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoext4: handle corrupted orphan list at mount
Duane Griffin [Fri, 11 Jul 2008 23:27:31 +0000 (19:27 -0400)]
ext4: handle corrupted orphan list at mount

If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext4_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext4_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
17 years agoLinux 2.6.26 v2.6.26
Linus Torvalds [Sun, 13 Jul 2008 21:51:29 +0000 (14:51 -0700)]
Linux 2.6.26

17 years agodevcgroup: fix permission check when adding entry to child cgroup
Li Zefan [Sun, 13 Jul 2008 19:14:04 +0000 (12:14 -0700)]
devcgroup: fix permission check when adding entry to child cgroup

 # cat devices.list
 c 1:3 r
 # echo 'c 1:3 w' > sub/devices.allow
 # cat sub/devices.list
 c 1:3 w

As illustrated, the parent group has no write permission to /dev/null, so
it's child should not be allowed to add this write permission.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodevcgroup: always show positive major/minor num
Li Zefan [Sun, 13 Jul 2008 19:14:02 +0000 (12:14 -0700)]
devcgroup: always show positive major/minor num

 # echo "b $((0x7fffffff)):$((0x80000000)) rwm" > devices.allow
 # cat devices.list
 b 214748364:-21474836 rwm

though a major/minor number of 0x800000000 is meaningless, we
should not cast it to a negative value.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoDocumentation/HOWTO: correct wrong kernel bugzilla FAQ URL
Jiri Pirko [Sun, 13 Jul 2008 19:13:59 +0000 (12:13 -0700)]
Documentation/HOWTO: correct wrong kernel bugzilla FAQ URL

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 Jul 2008 18:03:59 +0000 (11:03 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  cpusets, hotplug, scheduler: fix scheduler domain breakage

17 years agocpusets, hotplug, scheduler: fix scheduler domain breakage
Dmitry Adamushko [Sun, 13 Jul 2008 00:10:29 +0000 (02:10 +0200)]
cpusets, hotplug, scheduler: fix scheduler domain breakage

Commit f18f982ab ("sched: CPU hotplug events must not destroy scheduler
domains created by the cpusets") introduced a hotplug-related problem as
described below:

Upon CPU_DOWN_PREPARE,

  update_sched_domains() -> detach_destroy_domains(&cpu_online_map)

does the following:

/*
 * Force a reinitialization of the sched domains hierarchy. The domains
 * and groups cannot be updated in place without racing with the balancing
 * code, so we temporarily attach all running cpus to the NULL domain
 * which will prevent rebalancing while the sched domains are recalculated.
 */

The sched-domains should be rebuilt when a CPU_DOWN ops. has been
completed, effectively either upon CPU_DEAD{_FROZEN} (upon success) or
CPU_DOWN_FAILED{_FROZEN} (upon failure -- restore the things to their
initial state). That's what update_sched_domains() also does but only
for !CPUSETS case.

With f18f982ab, sched-domains' reinitialization is delegated to
CPUSETS code:

cpuset_handle_cpuhp() -> common_cpu_mem_hotplug_unplug() ->
rebuild_sched_domains()

Being called for CPU_UP_PREPARE and if its callback is called after
update_sched_domains()), it just negates all the work done by
update_sched_domains() -- i.e. a soon-to-be-offline cpu is included in
the sched-domains and that makes it visible for the load-balancer
while the CPU_DOWN ops. is in progress.

__migrate_live_tasks() moves the tasks off a 'dead' cpu (it's already
"offline" when this function is called).

try_to_wake_up() is called for one of these tasks from another CPU ->
the load-balancer (wake_idle()) picks up a "dead" CPU and places the
task on it. Then e.g. BUG_ON(rq->nr_running) detects this a bit later
-> oops.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: miaox@cn.fujitsu.com
Cc: rostedt@goodmis.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 12 Jul 2008 21:34:31 +0000 (14:34 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix ldt limit for 64 bit

17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6
Linus Torvalds [Sat, 12 Jul 2008 21:34:11 +0000 (14:34 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
  [SCSI] bsg: fix oops on remove
  [SCSI] fusion: default MSI to disabled for SPI and FC controllers
  [SCSI] ipr: Fix HDIO_GET_IDENTITY oops for SATA devices
  [SCSI] mptspi: fix oops in mptspi_dv_renegotiate_work()
  [SCSI] erase invalid data returned by device

17 years agocifs: fix wksidarr declaration to be big-endian friendly
Jeff Layton [Sat, 12 Jul 2008 20:48:00 +0000 (13:48 -0700)]
cifs: fix wksidarr declaration to be big-endian friendly

The current definition of wksidarr works fine on little endian arches
(since cpu_to_le32 is a no-op there), but on big-endian arches, it fails
to compile with this error:

error: braced-group within expression allowed only inside a function

The problem is that this static declaration has cpu_to_le32 embedded
within it, and that expands into a function macro.  We need to use
__constant_cpu_to_le32() instead.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agocifs: fix inode leak in cifs_get_inode_info_unix
Jeff Layton [Sat, 12 Jul 2008 20:47:59 +0000 (13:47 -0700)]
cifs: fix inode leak in cifs_get_inode_info_unix

Try this:

    mount a share with unix extensions
    create a file on it
    umount the share

You'll get the following message in the ring buffer:

VFS: Busy inodes after unmount of cifs. Self-destruct in 5 seconds.  Have a
nice day...

...the problem is that cifs_get_inode_info_unix is creating and hashing
a new inode even when it's going to return error anyway. The first
lookup when creating a file returns an error so we end up leaking this
inode before we do the actual create. This appears to be a regression
caused by commit 0e4bbde94fdc33f5b3d793166b21bf768ca3e098.

The following patch seems to fix it for me, and fixes a minor
formatting nit as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofrv: fix irqs_disabled() to return an int, not an unsigned long
David Howells [Sat, 12 Jul 2008 20:47:58 +0000 (13:47 -0700)]
frv: fix irqs_disabled() to return an int, not an unsigned long

Fix FRV irqs_disabled() to return an int, not an unsigned long to avoid
this warning:

kernel/sched.c: In function '__might_sleep':
kernel/sched.c:8198: warning: format '%d' expects type 'int', but argument 3 has type 'long unsigned int'

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoOProfile kernel maintainership changes
Robert Richter [Sat, 12 Jul 2008 20:47:57 +0000 (13:47 -0700)]
OProfile kernel maintainership changes

Cc: Philippe Elie <phil.el@wanadoo.fr>
Cc: John Levon <levon@movementarian.org>
Cc: Maynard Johnson <maynardj@us.ibm.com>
Cc: Richard Purdie <rpurdie@openedhand.com>
Cc: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>
Cc: Jason Yeh <jason.yeh@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-pcf8563: add chip id
Jon Smirl [Sat, 12 Jul 2008 20:47:56 +0000 (13:47 -0700)]
rtc-pcf8563: add chip id

Add the rtc8564 chip entry

Signed-off-by: Jon Smirl <jonsmirl@gmail.com>
Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agortc-fm3130: fix chip naming
Alessandro Zummo [Sat, 12 Jul 2008 20:47:55 +0000 (13:47 -0700)]
rtc-fm3130: fix chip naming

Fix chip naming from fm3031-rtc to fm3031

Signed-off-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoov7670: clean up ov7670_read semantics
Andres Salomon [Sat, 12 Jul 2008 20:47:54 +0000 (13:47 -0700)]
ov7670: clean up ov7670_read semantics

Cortland Setlow pointed out a bug in ov7670.c where the result from
ov7670_read() was just being checked for !0, rather than <0.  This made me
realize that ov7670_read's semantics were rather confusing; it both fills
in 'value' with the result, and returns it.  This is goes against general
kernel convention; so rather than fixing callers, let's fix the function.

This makes ov7670_read return <0 in the case of an error, and 0 upon
success. Thus, code like:

res = ov7670_read(...);
if (!res)
goto error;

..will work properly.

Signed-off-by: Cortland Setlow <csetlow@tower-research.com>
Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoserial8250: sanity check nr_uarts on all paths.
Eric W. Biederman [Sat, 12 Jul 2008 20:47:53 +0000 (13:47 -0700)]
serial8250: sanity check nr_uarts on all paths.

I had 8250.nr_uarts=16 in the boot line of a test kernel and I had a weird
mysterious crash in sysfs.  After taking an in-depth look I realized that
CONFIG_SERIAL_8250_NR_UARTS was set to 4 and I was walking off the end of
the serial8250_ports array.

Ouch!!!

Don't let this happen to someone else.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agofbdev: bugfix for multiprocess defio
Jaya Kumar [Sat, 12 Jul 2008 20:47:51 +0000 (13:47 -0700)]
fbdev: bugfix for multiprocess defio

This patch is a bugfix for how defio handles multiple processes manipulating
the same framebuffer.

Thanks to Bernard Blackham for identifying this bug.

It occurs when two applications mmap the same framebuffer and concurrently
write to the same page.  Normally, this doesn't occur since only a single
process mmaps the framebuffer.  The symptom of the bug is that the mapping
applications will hang.  The cause is that defio incorrectly tries to add the
same page twice to the pagelist.  The solution I have is to walk the pagelist
and check for a duplicate before adding.  Since I needed to walk the pagelist,
I now also keep the pagelist in sorted order.

Signed-off-by: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: Bernard Blackham <bernard@largestprime.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodrivers/isdn/i4l/isdn_common.c fix small resource leak
Darren Jenkins [Sat, 12 Jul 2008 20:47:50 +0000 (13:47 -0700)]
drivers/isdn/i4l/isdn_common.c fix small resource leak

Coverity CID: 1356 RESOURCE_LEAK

I found a very old patch for this that was Acked but did not get applied
https://lists.linux-foundation.org/pipermail/kernel-janitors/2006-September/016362.html

There looks to be a small leak in isdn_writebuf_stub() in isdn_common.c, when
copy_from_user() returns an un-copied data length (length != 0).  The below
patch should be a minimally invasive fix.

Signed-off-by: Darren Jenkins <darrenrjenkins@gmailcom>
Acked-by: Karsten Keil <kkeil@suse.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agodrivers/char/pcmcia/ipwireless/hardware.c fix resource leak
Darren Jenkins [Sat, 12 Jul 2008 20:47:49 +0000 (13:47 -0700)]
drivers/char/pcmcia/ipwireless/hardware.c fix resource leak

Coverity CID: 2172 RESOURCE_LEAK

When pool_allocate() tries to enlarge a packet, if it can not allocate enough
memory, it returns NULL without first freeing the old packet.

This patch just frees the packet first.

Signed-off-by: Darren Jenkins <darrenrjenkins@gmail.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[SCSI] bsg: fix oops on remove
James Bottomley [Mon, 7 Jul 2008 20:50:01 +0000 (15:50 -0500)]
[SCSI] bsg: fix oops on remove

If you do a modremove of any sas driver, you run into an oops on
shutdown when the host is removed (coming from the host bsg device).
The root cause seems to be that there's a use after free of the
bsg_class_device:  In bsg_kref_release_function, this is used (to do a
put_device(bcg->parent) after bcg->release has been called.  In sas (and
possibly many other things) bcd->release frees the queue which contains
the bsg_class_device, so we get a put_device on unreferenced memory.
Fix this by taking a copy of the pointer to the parent before releasing
bsg.

Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
17 years ago[SCSI] fusion: default MSI to disabled for SPI and FC controllers
James Bottomley [Fri, 11 Jul 2008 03:10:55 +0000 (22:10 -0500)]
[SCSI] fusion: default MSI to disabled for SPI and FC controllers

There's a fault on the FC controllers that makes them not respond
correctly to MSI.  The SPI controllers are fine, but are likely to be
onboard on older motherboards which don't handle MSI correctly, so
default both these cases to disabled.  Enable by setting the module
parameter mpt_msi_enable=1.

For the SAS case, enable MSI by default, but it can be disabled by
setting the module parameter mpt_msi_enable=0.

Cc: "Prakash, Sathya" <sathya.prakash@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
17 years agox86: fix ldt limit for 64 bit
Michael Karcher [Fri, 11 Jul 2008 16:04:46 +0000 (18:04 +0200)]
x86: fix ldt limit for 64 bit

Fix size of LDT entries. On x86-64, ldt_desc is a double-sized descriptor.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog
Linus Torvalds [Sat, 12 Jul 2008 00:00:17 +0000 (17:00 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  [PATCH] IPMI: return correct value from ipmi_write

17 years ago[PATCH] IPMI: return correct value from ipmi_write
Mark Rustad [Thu, 10 Jul 2008 19:27:11 +0000 (14:27 -0500)]
[PATCH] IPMI: return correct value from ipmi_write

This patch corrects the handling of write operations to the IPMI watchdog
to work as intended by returning the number of characters actually
processed. Without this patch, an "echo V >/dev/watchdog" enables the
watchdog if IPMI is providing the watchdog function.

Signed-off-by: Mark Rustad <MRustad@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
17 years ago[SCSI] ipr: Fix HDIO_GET_IDENTITY oops for SATA devices
Brian King [Fri, 11 Jul 2008 18:37:50 +0000 (13:37 -0500)]
[SCSI] ipr: Fix HDIO_GET_IDENTITY oops for SATA devices

Currently, ipr does not support HDIO_GET_IDENTITY to SATA devices.
An oops occurs if userspace attempts to send the command. Since hald
issues the command, ensure we fail the ioctl in ipr. This is a
temporary solution to the oops. Once the ipr libata EH conversion
is upstream, ipr will fully support HDIO_GET_IDENTITY.

Tested-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
17 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Fri, 11 Jul 2008 18:37:55 +0000 (11:37 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata-acpi: don't call sleeping function from invalid context
  Added Targa Visionary 1000 IDE adapter to pata_sis.c
  libata-acpi: filter out DIPM enable

17 years agoFix reference counting race on log buffers
Dave Chinner [Fri, 11 Jul 2008 07:43:55 +0000 (17:43 +1000)]
Fix reference counting race on log buffers

When we release the iclog, we do an atomic_dec_and_lock to determine if
we are the last reference and need to trigger update of log headers and
writeout.  However, in xlog_state_get_iclog_space() we also need to
check if we have the last reference count there.  If we do, we release
the log buffer, otherwise we decrement the reference count.

But the compare and decrement in xlog_state_get_iclog_space() is not
atomic, so both places can see a reference count of 2 and neither will
release the iclog.  That leads to a filesystem hang.

Close the race by replacing the atomic_read() and atomic_dec() pair with
atomic_add_unless() to ensure that they are executed atomically.

Signed-off-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Tim Shimmin <tes@sgi.com>
Tested-by: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agolibata-acpi: don't call sleeping function from invalid context
Zhang Rui [Fri, 11 Jul 2008 13:42:03 +0000 (09:42 -0400)]
libata-acpi: don't call sleeping function from invalid context

The problem is introduced by commit
664d080c41463570b95717b5ad86e79dc1be0877.

acpi_evaluate_integer is a sleeping function,
and it should not be called with spin_lock_irqsave.
https://bugzilla.redhat.com/show_bug.cgi?id=451399

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years agoAdded Targa Visionary 1000 IDE adapter to pata_sis.c
Kai Krakow [Sun, 6 Jul 2008 12:22:26 +0000 (14:22 +0200)]
Added Targa Visionary 1000 IDE adapter to pata_sis.c

This enables short 40-wire detection for my laptop thus
enabling UDMA/100.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years agolibata-acpi: filter out DIPM enable
Tejun Heo [Sun, 6 Jul 2008 14:15:03 +0000 (23:15 +0900)]
libata-acpi: filter out DIPM enable

Some BIOSen enable DIPM via _GTF which causes command timeouts under
certain configuration.  This didn't occur on 2.6.25 because 2.6.25
defaulted to SRST, so _GTF wasn't executed during boot probe, so ahci
host reset disabled DIPM and as _GTF wasn't executed after SRST, DIPM
wasn't enabled.  On 2.6.26, hardreset is used during probe and after
probe _GTF is executed enabling DIPM and thus the failures.

This patch could theoretically disable DIPM on machines which used to
have it enabled on 2.6.25 but AFAIK ahci is currently the only driver
which uses SATA ACPI hierarchy (_SDD) and as the host reset would have
always disabled DIPM, this shouldn't happen.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years agortc: fix reported IRQ rate for when HPET is enabled
Paul Gortmaker [Fri, 11 Jul 2008 00:30:48 +0000 (17:30 -0700)]
rtc: fix reported IRQ rate for when HPET is enabled

The IRQ rate reported back by the RTC is incorrect when HPET is enabled.

Newer hardware that has HPET to emulate the legacy RTC device gets this value
wrong since after it sets the rate, it returns before setting the variable
used to report the IRQ rate back to users of the device -- so the set rate and
the reported rate get out of sync.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Brownell <david-b@pacbell.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoFix name of Russell King in various comments
Uwe Kleine-König [Fri, 11 Jul 2008 00:30:46 +0000 (17:30 -0700)]
Fix name of Russell King in various comments

This patch was created by

git grep -E -l 'Rus(el|s?e)l King' | xargs -r -t perl -p -i -e 's/Rus(el|s?e)l King/Russell King/g'

Signed-off-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com>
Most-Definitely-Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agorapidio: fix device reference counting
Eugene Surovegin [Fri, 11 Jul 2008 00:30:44 +0000 (17:30 -0700)]
rapidio: fix device reference counting

Fix RapidIO device reference counting.

Signed-of-by: Eugene Surovegin <ebs@ebshome.net>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agotpm: add Intel TPM TIS device HID
Marcin Obara [Fri, 11 Jul 2008 00:30:42 +0000 (17:30 -0700)]
tpm: add Intel TPM TIS device HID

This patch adds Intel TPM TIS device HID:  ICO0102

Signed-off-by: Marcin Obara <marcin_obara@users.sourceforge.net>
Acked-by: Marcel Selhorst <tpm@selhorst.net>
Acked-by: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Fri, 11 Jul 2008 00:58:47 +0000 (17:58 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  tun: Persistent devices can get stuck in xoff state
  xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
  ipv6: missed namespace context in ipv6_rthdr_rcv
  netlabel: netlink_unicast calls kfree_skb on error path by itself
  ipv4: fib_trie: Fix lookup error return
  tcp: correct kcalloc usage
  ip: sysctl documentation cleanup
  Documentation: clarify tcp_{r,w}mem sysctl docs
  netfilter: nf_nat_snmp_basic: fix a range check in NAT for SNMP
  netfilter: nf_conntrack_tcp: fix endless loop
  libertas: fix memory alignment problems on the blackfin
  zd1211rw: stop beacons on remove_interface
  rt2x00: Disable synchronization during initialization
  rc80211_pid: Fix fast_start parameter handling
  sctp: Add documentation for sctp sysctl variable
  ipv6: fix race between ipv6_del_addr and DAD timer
  irda: Fix netlink error path return value
  irda: New device ID for nsc-ircc
  irda: via-ircc proper dma freeing
  sctp: Mark the tsn as received after all allocations finish
  ...

17 years agotun: Persistent devices can get stuck in xoff state
Max Krasnyansky [Thu, 10 Jul 2008 23:59:11 +0000 (16:59 -0700)]
tun: Persistent devices can get stuck in xoff state

The scenario goes like this. App stops reading from tun/tap.
TX queue gets full and driver does netif_stop_queue().
App closes fd and TX queue gets flushed as part of the cleanup.
Next time the app opens tun/tap and starts reading from it but
the xoff state is not cleared. We're stuck.
Normally xoff state is cleared when netdev is brought up. But
in the case of persistent devices this happens only during
initial setup.

The fix is trivial. If device is already up when an app opens
it we clear xoff state and that gets things moving again.

Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoxfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info
Steffen Klassert [Thu, 10 Jul 2008 23:55:37 +0000 (16:55 -0700)]
xfrm: Add a XFRM_STATE_AF_UNSPEC flag to xfrm_usersa_info

Add a XFRM_STATE_AF_UNSPEC flag to handle the AF_UNSPEC behavior for
the selector family. Userspace applications can set this flag to leave
the selector family of the xfrm_state unspecified.  This can be used
to to handle inter family tunnels if the selector is not set from
userspace.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoipv6: missed namespace context in ipv6_rthdr_rcv
Denis V. Lunev [Thu, 10 Jul 2008 23:54:50 +0000 (16:54 -0700)]
ipv6: missed namespace context in ipv6_rthdr_rcv

Signed-off-by: Denis V. Lunev <den@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agonetlabel: netlink_unicast calls kfree_skb on error path by itself
Denis V. Lunev [Thu, 10 Jul 2008 23:53:39 +0000 (16:53 -0700)]
netlabel: netlink_unicast calls kfree_skb on error path by itself

So, no need to kfree_skb here on the error path. In this case we can
simply return.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoipv4: fib_trie: Fix lookup error return
Ben Hutchings [Thu, 10 Jul 2008 23:52:52 +0000 (16:52 -0700)]
ipv4: fib_trie: Fix lookup error return

In commit a07f5f508a4d9728c8e57d7f66294bf5b254ff7f "[IPV4] fib_trie: style
cleanup", the changes to check_leaf() and fn_trie_lookup() were wrong - where
fn_trie_lookup() would previously return a negative error value from
check_leaf(), it now returns 0.

Now fn_trie_lookup() doesn't appear to care about plen, so we can revert
check_leaf() to returning the error value.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Tested-by: William Boughton <bill@boughton.de>
Acked-by: Stephen Heminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agotcp: correct kcalloc usage
Milton Miller [Thu, 10 Jul 2008 23:51:32 +0000 (16:51 -0700)]
tcp: correct kcalloc usage

kcalloc is supposed to be called with the count as its first argument and
the element size as the second.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoip: sysctl documentation cleanup
Stephen Hemminger [Thu, 10 Jul 2008 23:50:26 +0000 (16:50 -0700)]
ip: sysctl documentation cleanup

Reduced version of the spelling cleanup patch.

Take out the confusing language in tcp_frto, and organize the
undocumented values.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoDocumentation: clarify tcp_{r,w}mem sysctl docs
J. Bruce Fields [Thu, 10 Jul 2008 23:47:41 +0000 (16:47 -0700)]
Documentation: clarify tcp_{r,w}mem sysctl docs

Fix some of the defaults and attempt to clarify some language.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
17 years agoslub: Fix use-after-preempt of per-CPU data structure
Dmitry Adamushko [Thu, 10 Jul 2008 20:21:58 +0000 (22:21 +0200)]
slub: Fix use-after-preempt of per-CPU data structure

Vegard Nossum reported a crash in kmem_cache_alloc():

BUG: unable to handle kernel paging request at da87d000
IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
*pde = 28180163 *pte = 1a87d160
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
EIP is at kmem_cache_alloc+0xc7/0xe0
EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

and analyzed it:

  "The register %ecx looks innocent but is very important here. The disassembly:

       mov    %edx,%ecx
       shr    $0x2,%ecx
       rep stos %eax,%es:(%edi) <-- the fault

   So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
   (0x6b6b6b6b >> 2 == 0x1adadada.)

   %ecx is the counter for the memset, from here:

       memset(object, 0, c->objsize);

  i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
  Where did "c" come from? Uh-oh...

       c = get_cpu_slab(s, smp_processor_id());

  This looks like it has very much to do with CPU hotplug/unplug. Is
  there a race between SLUB/hotplug since the CPU slab is used after it
  has been freed?"

Good analysis.

Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
can be migrated on another CPU right after local_irq_restore() and
before memset().  The inital cpu can become offline in the mean time (or
a migration is a consequence of the CPU going offline) so its
'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).

At some point of time the caller continues on another CPU having an
obsolete pointer...

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoexec: fix stack excutability without PT_GNU_STACK
Hugh Dickins [Thu, 10 Jul 2008 20:19:20 +0000 (21:19 +0100)]
exec: fix stack excutability without PT_GNU_STACK

Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.

Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup.  Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.

Reported-by: pageexec@freemail.hu
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfashe...
Linus Torvalds [Thu, 10 Jul 2008 20:11:01 +0000 (13:11 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
  ocfs2: Fix flags in ocfs2_file_lock

17 years agoMerge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 10 Jul 2008 19:34:55 +0000 (12:34 -0700)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched: fix cpu hotplug, cleanup
  sched: fix cpu hotplug

17 years agosched: fix cpu hotplug, cleanup
Linus Torvalds [Thu, 10 Jul 2008 18:25:03 +0000 (11:25 -0700)]
sched: fix cpu hotplug, cleanup

Clean up __migrate_task(): to just have separate "done" and "fail"
cases, instead of that "out" case with random error behavior.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoMerge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 10 Jul 2008 18:19:53 +0000 (11:19 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix /dev/mem compatibility under PAT

17 years agoFix PREEMPT_RCU without HOTPLUG_CPU
Nick Piggin [Thu, 10 Jul 2008 07:25:35 +0000 (17:25 +1000)]
Fix PREEMPT_RCU without HOTPLUG_CPU

PREEMPT_RCU without HOTPLUG_CPU is broken.  The rcu_online_cpu is called
to initially populate rcu_cpu_online_map with all online CPUs when the
hotplug event handler is installed, and also to populate the map with
CPUs as they come online.  The former case is meant to happen with and
without HOTPLUG_CPU, but without HOTPLUG_CPU, the rcu_offline_cpu
function is no-oped -- while it still gets called, it does not set the
rcu CPU map.

With a blank RCU CPU map, grace periods get to tick by completely
oblivious to active RCU read side critical sections.  This results in
free-before-grace bugs.

Fix is obvious once the problem is known. (Also, change __devinit to
__cpuinit so the function gets thrown away on !HOTPLUG_CPU kernels).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Reported-and-tested-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ Nick is my personal hero of the day - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoftrace: Documentation
Steven Rostedt [Thu, 10 Jul 2008 16:46:01 +0000 (12:46 -0400)]
ftrace: Documentation

This is the long awaited ftrace.txt. It explains in quite detail how to
use ftrace and the various tracers.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoarch/x86/kernel/.gitignore: Added vmlinux.lds to .gitignore file because it shouldn...
Daniel Guilak [Thu, 10 Jul 2008 16:39:32 +0000 (09:39 -0700)]
arch/x86/kernel/.gitignore: Added vmlinux.lds to .gitignore file because it shouldn't be tracked.

Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agokernel/kprobes.c: Made kprobe_blacklist static.
Daniel Guilak [Thu, 10 Jul 2008 16:38:19 +0000 (09:38 -0700)]
kernel/kprobes.c: Made kprobe_blacklist static.

Signed-off-by: Daniel Guilak <daniel@danielguilak.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 10 Jul 2008 17:10:02 +0000 (10:10 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: chainiv - Invoke completion function

17 years agoMerge branch 'for-2.6.26' of git://neil.brown.name/md
Linus Torvalds [Thu, 10 Jul 2008 16:49:46 +0000 (09:49 -0700)]
Merge branch 'for-2.6.26' of git://neil.brown.name/md

* 'for-2.6.26' of git://neil.brown.name/md:
  md: ensure all blocks are uptodate or locked when syncing

17 years agoocfs2: Fix flags in ocfs2_file_lock
Mark Fasheh [Thu, 10 Jul 2008 16:25:39 +0000 (09:25 -0700)]
ocfs2: Fix flags in ocfs2_file_lock

The stack-glue merge changed the way we use flags in dlmglue in that we now
use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
code only partially updated. This took a while to show up though, because
the lock level constants are actually identical between o2dlm and fs/dlm.
The *_CONVERT and *_NOQUEUE flags have different values though, which is
eventually causing a crash in flags_to_o2dlm().

Signed-off-by: Mark Fasheh <mfasheh@suse.com>
17 years agocrypto: chainiv - Invoke completion function
Herbert Xu [Thu, 10 Jul 2008 09:42:36 +0000 (17:42 +0800)]
crypto: chainiv - Invoke completion function

When chainiv postpones requests it never calls their completion functions.
This causes symptoms such as memory leaks when IPsec is in use.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
17 years agox86: fix /dev/mem compatibility under PAT
Venkatesh Pallipadi [Thu, 10 Jul 2008 08:09:59 +0000 (10:09 +0200)]
x86: fix /dev/mem compatibility under PAT

Add ioremap_default(), which gives a sane mapping without worrying about
type conflicts.

Use it in /dev/mem read in place of ioremap(), as with ioremap(),
any mapping of the region (other than UC_MINUS) will cause a conflict
and failure of /dev/mem read.

Should address the vbetest failure reported at:

  http://bugzilla.kernel.org/show_bug.cgi?id=11057

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>