]> git.apps.os.sepia.ceph.com Git - ceph-client.git/log
ceph-client.git
12 years agojbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()
Paul Gortmaker [Thu, 13 Jun 2013 02:47:35 +0000 (22:47 -0400)]
jbd2: drop checkpoint mutex when waiting in __jbd2_log_wait_for_space()

While trying to debug an an issue under extreme I/O loading
on preempt-rt kernels, the following backtrace was observed
via SysRQ output:

rm              D ffff8802203afbc0  4600  4878   4748 0x00000000
 ffff8802217bfb78 0000000000000082 ffff88021fc2bb80 ffff88021fc2bb80
 ffff88021fc2bb80 ffff8802217bffd8 ffff8802217bffd8 ffff8802217bffd8
 ffff88021f1d4c80 ffff88021fc2bb80 ffff8802217bfb88 ffff88022437b000
Call Trace:
 [<ffffffff8172dc34>] schedule+0x24/0x70
 [<ffffffff81225b5d>] jbd2_log_wait_commit+0xbd/0x140
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff81223635>] jbd2_log_do_checkpoint+0xf5/0x520
 [<ffffffff81223b09>] __jbd2_log_wait_for_space+0xa9/0x1f0
 [<ffffffff8121dc40>] start_this_handle.isra.10+0x2e0/0x530
 [<ffffffff81060390>] ? __init_waitqueue_head+0x50/0x50
 [<ffffffff8121e0a3>] jbd2__journal_start+0xc3/0x110
 [<ffffffff811de7ce>] ? ext4_rmdir+0x6e/0x230
 [<ffffffff8121e0fe>] jbd2_journal_start+0xe/0x10
 [<ffffffff811f308b>] ext4_journal_start_sb+0x5b/0x160
 [<ffffffff811de7ce>] ext4_rmdir+0x6e/0x230
 [<ffffffff811435c5>] vfs_rmdir+0xd5/0x140
 [<ffffffff8114370f>] do_rmdir+0xdf/0x120
 [<ffffffff8105c6b4>] ? task_work_run+0x44/0x80
 [<ffffffff81002889>] ? do_notify_resume+0x89/0x100
 [<ffffffff817361ae>] ? int_signal+0x12/0x17
 [<ffffffff81145d85>] sys_unlinkat+0x25/0x40
 [<ffffffff81735f22>] system_call_fastpath+0x16/0x1b

What is interesting here, is that we call log_wait_commit, from
within wait_for_space, but we are still holding the checkpoint_mutex
as it surrounds mostly the whole of wait_for_space.  And then, as we
are waiting, journal_commit_transaction can run, and if the JBD2_FLUSHED
bit is set, then we will also try to take the same checkpoint_mutex.

It seems that we need to drop the checkpoint_mutex while sitting in
jbd2_log_wait_commit, if we want to guarantee that progress can be made
by jbd2_journal_commit_transaction().  There does not seem to be
anything preempt-rt specific about this, other then perhaps increasing
the odds of it happening.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: relocate assert after state lock in journal_commit_transaction()
Paul Gortmaker [Thu, 13 Jun 2013 02:46:35 +0000 (22:46 -0400)]
jbd2: relocate assert after state lock in journal_commit_transaction()

The state lock is taken after we are doing an assert on the state
value, not before.  So we might in fact be doing an assert on a
transient value.  Ensure the state check is within the scope of
the state lock being taken.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: Fix fsync error handling after filesystem abort
Dmitry Monakhov [Thu, 13 Jun 2013 02:38:04 +0000 (22:38 -0400)]
ext4: Fix fsync error handling after filesystem abort

If filesystem was aborted after inode's write back is complete
but before its metadata was updated we may return success
results in data loss.
In order to handle fs abort correctly we have to check
fs state once we discover that it is in MS_RDONLY state

Test case: http://patchwork.ozlabs.org/patch/244297

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: fix data integrity for ext4_sync_fs
Dmitry Monakhov [Thu, 13 Jun 2013 02:25:07 +0000 (22:25 -0400)]
ext4: fix data integrity for ext4_sync_fs

Inode's data or non journaled quota may be written w/o jounral so we
_must_ send a barrier at the end of ext4_sync_fs. But it can be
skipped if journal commit will do it for us.

Also fix data integrity for nojournal mode.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: optimize jbd2_journal_force_commit
Dmitry Monakhov [Thu, 13 Jun 2013 02:24:07 +0000 (22:24 -0400)]
jbd2: optimize jbd2_journal_force_commit

Current implementation of jbd2_journal_force_commit() is suboptimal because
result in empty and useless commits. But callers just want to force and wait
any unfinished commits. We already have jbd2_journal_force_commit_nested()
which does exactly what we want, except we are guaranteed that we do not hold
journal transaction open.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarily
Theodore Ts'o [Wed, 12 Jun 2013 15:48:29 +0000 (11:48 -0400)]
ext4: don't use EXT4_FREE_BLOCKS_FORGET unnecessarily

Commit 18888cf0883c: "ext4: speed up truncate/unlink by not using
bforget() unless needed" removed the use of EXT4_FREE_BLOCKS_FORGET in
the most important codepath for file systems using extents, but a
similar optimization also can be done for file systems using indirect
blocks, and for the two special cases in the ext4 extents code.

Cc: Andrey Sidorov <qrxd43@motorola.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: add cond_resched() to ext4_free_blocks() & ext4_mb_regular_allocator()
Theodore Ts'o [Wed, 12 Jun 2013 15:43:02 +0000 (11:43 -0400)]
ext4: add cond_resched() to ext4_free_blocks() & ext4_mb_regular_allocator()

For a file systems with a very large number of block groups, if all of
the block group bitmaps are in memory and the file system is
relatively badly fragmented, it's possible ext4_mb_regular_allocator()
to take a long time trying to find a good match.  This is especially
true if the tuning parameter mb_max_to_scan has been sent to a very
large number.  So add a cond_resched() to avoid soft lockup warnings
and to provide better system responsiveness.

For ext4_free_blocks(), if we are deleting a large range of blocks,
and data=journal is enabled so that EXT4_FREE_BLOCKS_FORGET is passed,
the loop to call sb_find_get_block() and to call ext4_forget() can
take over 10-15 milliseocnds or more.  So it's better to add a
cond_resched() here a well.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: use ext4_da_writepages() for all modes
Theodore Ts'o [Thu, 6 Jun 2013 18:00:46 +0000 (14:00 -0400)]
ext4: use ext4_da_writepages() for all modes

Rename ext4_da_writepages() to ext4_writepages() and use it for all
modes.  We still need to iterate over all the pages in the case of
data=journalling, but in the case of nodelalloc/data=ordered (which is
what file systems mounted using ext3 backwards compatibility will use)
this will allow us to use a much more efficient I/O submission path.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: optimize test_root()
Theodore Ts'o [Thu, 6 Jun 2013 15:40:37 +0000 (11:40 -0400)]
ext4: optimize test_root()

The test_root() function could potentially loop forever due to
overflow issues.  So rewrite test_root() to avoid this issue; as a
bonus, it is 38% faster when benchmarked via a test loop:

int main(int argc, char **argv)
{
int  i;

for (i = 0; i < 1 << 24; i++) {
if (test_root(i, 7))
printf("%d\n", i);
}
}

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: add sanity check to ext4_get_group_info()
Theodore Ts'o [Thu, 6 Jun 2013 15:16:43 +0000 (11:16 -0400)]
ext4: add sanity check to ext4_get_group_info()

The group number passed to ext4_get_group_info() should be valid, but
let's add an assert to check this before we start creating a pointer
based on that group number and dereferencing it.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: verify group number in verify_group_input() before using it
Theodore Ts'o [Thu, 6 Jun 2013 15:14:31 +0000 (11:14 -0400)]
ext4: verify group number in verify_group_input() before using it

Check the group number for sanity earilier, before calling routines
such as ext4_bg_has_super() or ext4_group_overhead_blocks().

Reported-by: Jonathan Salwan <jonathan.salwan@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: add check to io_submit_init_bio
Theodore Ts'o [Thu, 6 Jun 2013 14:18:22 +0000 (10:18 -0400)]
ext4: add check to io_submit_init_bio

The bio_alloc() function can return NULL if the memory allocation
fails.  So we need to check for this.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: remove ext4_ioend_wait()
Jan Kara [Tue, 4 Jun 2013 18:46:12 +0000 (14:46 -0400)]
ext4: remove ext4_ioend_wait()

Now that we clear PageWriteback after extent conversion, there's no
need to wait for io_end processing in ext4_evict_inode().  Running
AIO/DIO keeps file reference until aio_complete() is called so
ext4_evict_inode() cannot be called.  For io_end structures resulting
from buffered IO waiting is happening because we wait for
PageWriteback in truncate_inode_pages().

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: don't wait for extent conversion in ext4_punch_hole()
Jan Kara [Tue, 4 Jun 2013 18:44:36 +0000 (14:44 -0400)]
ext4: don't wait for extent conversion in ext4_punch_hole()

We don't have to wait for extent conversion in ext4_punch_hole() as
buffered IO for the punched range has been flushed and waited upon
(thus all extent conversions for that range have completed).  Also we
wait for all DIO to finish using inode_dio_wait() so there cannot be
any extent conversions pending due to direct IO.

Also remove ext4_flush_unwritten_io() since it's unused now.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: Remove wait for unwritten extents in ext4_ind_direct_IO()
Jan Kara [Tue, 4 Jun 2013 18:41:29 +0000 (14:41 -0400)]
ext4: Remove wait for unwritten extents in ext4_ind_direct_IO()

We don't have to wait for unwritten extent conversion in
ext4_ind_direct_IO() as all writes that happened before DIO are
flushed by the generic code and extent conversion has happened before
we cleared PageWriteback bit.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: remove i_mutex from ext4_file_sync()
Jan Kara [Tue, 4 Jun 2013 18:40:39 +0000 (14:40 -0400)]
ext4: remove i_mutex from ext4_file_sync()

After removal of ext4_flush_unwritten_io() call, ext4_file_sync()
doesn't need i_mutex anymore. Forcing of transaction commits doesn't
need i_mutex as there's nothing inode specific in that code apart from
grabbing transaction ids from the inode. So remove the lock.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: use generic_file_fsync() in ext4_file_fsync() in nojournal mode
Jan Kara [Tue, 4 Jun 2013 18:40:09 +0000 (14:40 -0400)]
ext4: use generic_file_fsync() in ext4_file_fsync() in nojournal mode

Just use the generic function instead of duplicating it.  We only need
to reshuffle the read-only check a bit (which is there to prevent
writing to a filesystem which has been remounted read-only after error
I assume).

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: remove wait for unwritten extent conversion from ext4_truncate()
Jan Kara [Tue, 4 Jun 2013 18:30:00 +0000 (14:30 -0400)]
ext4: remove wait for unwritten extent conversion from ext4_truncate()

Since PageWriteback bit is now cleared after extents are converted
from unwritten to written ones, we have full exclusion of writeback
path from truncate (truncate_inode_pages() waits for PageWriteback
bits to get cleared on all invalidated pages).  Exclusion from DIO
path is achieved by inode_dio_wait() call in ext4_setattr().  So
there's no need to wait for extent convertion in ext4_truncate()
anymore.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: protect extent conversion after DIO with i_dio_count
Jan Kara [Tue, 4 Jun 2013 18:27:38 +0000 (14:27 -0400)]
ext4: protect extent conversion after DIO with i_dio_count

Make sure extent conversion after DIO happens while i_dio_count is
still elevated so that inode_dio_wait() waits until extent conversion
is done.  This removes the need for explicit waiting for extent
conversion in some cases.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: defer clearing of PageWriteback after extent conversion
Jan Kara [Tue, 4 Jun 2013 18:23:41 +0000 (14:23 -0400)]
ext4: defer clearing of PageWriteback after extent conversion

Currently PageWriteback bit gets cleared from put_io_page() called
from ext4_end_bio().  This is somewhat inconvenient as extent tree is
not fully updated at that time (unwritten extents are not marked as
written) so we cannot read the data back yet.  This design was
dictated by lock ordering as we cannot start a transaction while
PageWriteback bit is set (we could easily deadlock with
ext4_da_writepages()).  But now that we use transaction reservation
for extent conversion, locking issues are solved and we can move
PageWriteback bit clearing after extent conversion is done.  As a
result we can remove wait for unwritten extent conversion from
ext4_sync_file() because it already implicitely happens through
wait_on_page_writeback().

We implement deferring of PageWriteback clearing by queueing completed
bios to appropriate io_end and processing all the pages when io_end is
going to be freed instead of at the moment ext4_io_end() is called.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: split extent conversion lists to reserved & unreserved parts
Jan Kara [Tue, 4 Jun 2013 18:21:02 +0000 (14:21 -0400)]
ext4: split extent conversion lists to reserved & unreserved parts

Now that we have extent conversions with reserved transaction, we have
to prevent extent conversions without reserved transaction (from DIO
code) to block these (as that would effectively void any transaction
reservation we did).  So split lists, work items, and work queues to
reserved and unreserved parts.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: use transaction reservation for extent conversion in ext4_end_io
Jan Kara [Tue, 4 Jun 2013 17:21:11 +0000 (13:21 -0400)]
ext4: use transaction reservation for extent conversion in ext4_end_io

Later we would like to clear PageWriteback bit only after extent
conversion from unwritten to written extents is performed.  However it
is not possible to start a transaction after PageWriteback is set
because that violates lock ordering (and is easy to deadlock).  So we
have to reserve a transaction before locking pages and sending them
for IO and later we use the transaction for extent conversion from
ext4_end_io().

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: remove buffer_uninit handling
Jan Kara [Tue, 4 Jun 2013 17:19:34 +0000 (13:19 -0400)]
ext4: remove buffer_uninit handling

There isn't any need for setting BH_Uninit on buffers anymore.  It was
only used to signal we need to mark io_end as needing extent
conversion in add_bh_to_extent() but now we can mark the io_end
directly when mapping extent.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: restructure writeback path
Jan Kara [Tue, 4 Jun 2013 17:17:40 +0000 (13:17 -0400)]
ext4: restructure writeback path

There are two issues with current writeback path in ext4.  For one we
don't necessarily map complete pages when blocksize < pagesize and
thus needn't do any writeback in one iteration.  We always map some
blocks though so we will eventually finish mapping the page.  Just if
writeback races with other operations on the file, forward progress is
not really guaranteed. The second problem is that current code
structure makes it hard to associate all the bios to some range of
pages with one io_end structure so that unwritten extents can be
converted after all the bios are finished.  This will be especially
difficult later when io_end will be associated with reserved
transaction handle.

We restructure the writeback path to a relatively simple loop which
first prepares extent of pages, then maps one or more extents so that
no page is partially mapped, and once page is fully mapped it is
submitted for IO. We keep all the mapping and IO submission
information in mpage_da_data structure to somewhat reduce stack usage.
Resulting code is somewhat shorter than the old one and hopefully also
easier to read.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: better estimate credits needed for ext4_da_writepages()
Jan Kara [Tue, 4 Jun 2013 17:01:11 +0000 (13:01 -0400)]
ext4: better estimate credits needed for ext4_da_writepages()

We limit the number of blocks written in a single loop of
ext4_da_writepages() to 64 when inode uses indirect blocks.  That is
unnecessary as credit estimates for mapping logically continguous run
of blocks is rather low even for inode with indirect blocks.  So just
lift this limitation and properly calculate the number of necessary
credits.

This better credit estimate will also later allow us to always write
at least a single page in one iteration.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: improve writepage credit estimate for files with indirect blocks
Jan Kara [Tue, 4 Jun 2013 16:56:55 +0000 (12:56 -0400)]
ext4: improve writepage credit estimate for files with indirect blocks

ext4_ind_trans_blocks() wrongly used 'chunk' argument to decide whether
blocks mapped are logically contiguous. That is wrong since the argument
informs whether the blocks are physically contiguous. As the blocks
mapped are always logically contiguous and that's all
ext4_ind_trans_blocks() cares about, just remove the 'chunk' argument.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: deprecate max_writeback_mb_bump sysfs attribute
Jan Kara [Tue, 4 Jun 2013 16:51:16 +0000 (12:51 -0400)]
ext4: deprecate max_writeback_mb_bump sysfs attribute

This attribute is now unused so deprecate it.  We still show the old
default value to keep some compatibility but we don't allow writing to
that attribute anymore.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: stop messing with nr_to_write in ext4_da_writepages()
Jan Kara [Tue, 4 Jun 2013 16:50:24 +0000 (12:50 -0400)]
ext4: stop messing with nr_to_write in ext4_da_writepages()

Writeback code got better in how it submits IO and now the number of
pages requested to be written is usually higher than original 1024.
The number is now dynamically computed based on observed throughput
and is set to be about 0.5 s worth of writeback.  E.g. on ordinary
SATA drive this ends up somewhere around 10000 as my testing shows.
So remove the unnecessary smarts from ext4_da_writepages().

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: provide wrappers for transaction reservation calls
Jan Kara [Tue, 4 Jun 2013 16:37:50 +0000 (12:37 -0400)]
ext4: provide wrappers for transaction reservation calls

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: transaction reservation support
Jan Kara [Tue, 4 Jun 2013 16:35:11 +0000 (12:35 -0400)]
jbd2: transaction reservation support

In some cases we cannot start a transaction because of locking
constraints and passing started transaction into those places is not
handy either because we could block transaction commit for too long.
Transaction reservation is designed to solve these issues.  It
reserves a handle with given number of credits in the journal and the
handle can be later attached to the running transaction without
blocking on commit or checkpointing.  Reserved handles do not block
transaction commit in any way, they only reduce maximum size of the
running transaction (because we have to always be prepared to
accomodate request for attaching reserved handle).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: remove unused waitqueues
Jan Kara [Tue, 4 Jun 2013 16:24:11 +0000 (12:24 -0400)]
jbd2: remove unused waitqueues

j_wait_logspace and j_wait_checkpoint are unused.  Remove them.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: fix race in t_outstanding_credits update in jbd2_journal_extend()
Jan Kara [Tue, 4 Jun 2013 16:22:15 +0000 (12:22 -0400)]
jbd2: fix race in t_outstanding_credits update in jbd2_journal_extend()

jbd2_journal_extend() first checked whether transaction can accept
extending handle with more credits and then added credits to
t_outstanding_credits.  This can race with start_this_handle() adding
another handle to a transaction and thus overbooking a transaction.
Make jbd2_journal_extend() use atomic_add_return() to close the race.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: cleanup needed free block estimates when starting a transaction
Jan Kara [Tue, 4 Jun 2013 16:12:57 +0000 (12:12 -0400)]
jbd2: cleanup needed free block estimates when starting a transaction

__jbd2_log_space_left() and jbd_space_needed() were kind of odd.
jbd_space_needed() accounted also credits needed for currently
committing transaction while it didn't account for credits needed for
control blocks.  __jbd2_log_space_left() then accounted for control
blocks as a fraction of free space.  Since results of these two
functions are always only compared against each other, this works
correct but is somewhat strange.  Move the estimates so that
jbd_space_needed() returns number of blocks needed for a transaction
including control blocks and __jbd2_log_space_left() returns free
space in the journal (with the committing transaction already
subtracted).  Rename functions to jbd2_log_space_left() and
jbd2_space_needed() while we are changing them.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: remove outdated comment
Jan Kara [Tue, 4 Jun 2013 16:10:11 +0000 (12:10 -0400)]
jbd2: remove outdated comment

The comment about credit estimates isn't true anymore. We do what the
comment describes now.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: refine waiting for shadow buffers
Jan Kara [Tue, 4 Jun 2013 16:08:56 +0000 (12:08 -0400)]
jbd2: refine waiting for shadow buffers

Currently when we add a buffer to a transaction, we wait until the
buffer is removed from BJ_Shadow list (so that we prevent any changes
to the buffer that is just written to the journal).  This can take
unnecessarily long as a lot happens between the time the buffer is
submitted to the journal and the time when we remove the buffer from
BJ_Shadow list.  (e.g.  We wait for all data buffers in the
transaction, we issue a cache flush, etc.)  Also this creates a
dependency of do_get_write_access() on transaction commit (namely
waiting for data IO to complete) which we want to avoid when
implementing transaction reservation.

So we modify commit code to set new BH_Shadow flag when temporary
shadowing buffer is created and we clear that flag once IO on that
buffer is complete.  This allows do_get_write_access() to wait only
for BH_Shadow bit and thus removes the dependency on data IO
completion.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: remove journal_head from descriptor buffers
Jan Kara [Tue, 4 Jun 2013 16:06:01 +0000 (12:06 -0400)]
jbd2: remove journal_head from descriptor buffers

Similarly as for metadata buffers, also log descriptor buffers don't
really need the journal head. So strip it and remove BJ_LogCtl list.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: don't create journal_head for temporary journal buffers
Jan Kara [Tue, 4 Jun 2013 16:01:45 +0000 (12:01 -0400)]
jbd2: don't create journal_head for temporary journal buffers

When writing metadata to the journal, we create temporary buffer heads
for that task.  We also attach journal heads to these buffer heads but
the only purpose of the journal heads is to keep buffers linked in
transaction's BJ_IO list.  We remove the need for journal heads by
reusing buffer_head's b_assoc_buffers list for that purpose.  Also
since BJ_IO list is just a temporary list for transaction commit, we
use a private list in jbd2_journal_commit_transaction() for that thus
removing BJ_IO list from transaction completely.

Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: use io_end for multiple bios
Jan Kara [Tue, 4 Jun 2013 15:58:58 +0000 (11:58 -0400)]
ext4: use io_end for multiple bios

Change writeback path to create just one io_end structure for the
extent to which we submit IO and share it among bios writing that
extent. This prevents needless splitting and joining of unwritten
extents when they cannot be submitted as a single bio.

Bugs in ENOMEM handling found by Linux File System Verification project
(linuxtesting.org) and fixed by Alexey Khoroshilov
<khoroshilov@ispras.ru>.

CC: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: fix overflow when counting used blocks on 32-bit architectures
Jan Kara [Fri, 31 May 2013 23:39:56 +0000 (19:39 -0400)]
ext4: fix overflow when counting used blocks on 32-bit architectures

The arithmetics adding delalloc blocks to the number of used blocks in
ext4_getattr() can easily overflow on 32-bit archs as we first multiply
number of blocks by blocksize and then divide back by 512. Make the
arithmetics more clever and also use proper type (unsigned long long
instead of unsigned long).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archs
Jan Kara [Fri, 31 May 2013 23:38:56 +0000 (19:38 -0400)]
ext4: fix data offset overflow in ext4_xattr_fiemap() on 32-bit archs

On 32-bit architectures with 32-bit sector_t computation of data offset
in ext4_xattr_fiemap() can overflow resulting in reporting bogus data
location. Fix the problem by typing block number to proper type before
shifting.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: fix overflows in SEEK_HOLE, SEEK_DATA implementations
Jan Kara [Fri, 31 May 2013 23:37:56 +0000 (19:37 -0400)]
ext4: fix overflows in SEEK_HOLE, SEEK_DATA implementations

ext4_lblk_t is just u32 so multiplying it by blocksize can easily
overflow for files larger than 4 GB. Fix that by properly typing the
block offsets before shifting.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
12 years agoext4: fix data offset overflow on 32-bit archs in ext4_inline_data_fiemap()
Jan Kara [Fri, 31 May 2013 23:33:42 +0000 (19:33 -0400)]
ext4: fix data offset overflow on 32-bit archs in ext4_inline_data_fiemap()

On 32-bit archs when sector_t is defined as 32-bit the logic computing
data offset in ext4_inline_data_fiemap(). Fix that by properly typing
the shifted value.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: suppress ext4 orphan messages on mount
Paul Taysom [Tue, 28 May 2013 11:51:21 +0000 (07:51 -0400)]
ext4: suppress ext4 orphan messages on mount

Suppress the messages releating to processing the ext4 orphan list
("truncating inode" and "deleting unreferenced inode") unless the
debug option is on, since otherwise they end up taking up space in the
log that could be used for more useful information.

Tested by opening several files, unlinking them, then
crashing the system, rebooting the system and examining
/var/log/messages.

Addresses the problem described in http://crbug.com/220976

Signed-off-by: Paul Taysom <taysom@chromium.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
12 years agojbd2: fix block tag checksum verification brokenness
Darrick J. Wong [Tue, 28 May 2013 11:31:59 +0000 (07:31 -0400)]
jbd2: fix block tag checksum verification brokenness

Al Viro complained of a ton of bogosity with regards to the jbd2 block
tag header checksum.  This one checksum is 16 bits, so cut off the
upper 16 bits and treat it as a 16-bit value and don't mess around
with be32* conversions.  Fortunately metadata checksumming is still
"experimental" and not in a shipping e2fsprogs, so there should be few
users affected by this.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
12 years agojbd2: use kmem_cache_zalloc for allocating journal head
Zheng Liu [Tue, 28 May 2013 11:27:11 +0000 (07:27 -0400)]
jbd2: use kmem_cache_zalloc for allocating journal head

This commit tries to use kmem_cache_zalloc instead of kmem_cache_alloc/
memset when a new journal head is alloctated.

Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
12 years agoext4: make punch hole code path work with bigalloc
Lukas Czerner [Tue, 28 May 2013 03:33:35 +0000 (23:33 -0400)]
ext4: make punch hole code path work with bigalloc

Currently punch hole is disabled in file systems with bigalloc
feature enabled. However the recent changes in punch hole patch should
make it easier to support punching holes on bigalloc enabled file
systems.

This commit changes partial_cluster handling in ext4_remove_blocks(),
ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
partial_cluster is unsigned long long type and it makes sure that we
will free the partial cluster if all extents has been released from that
cluster. However it has been specifically designed only for truncate.

With punch hole we can be freeing just some extents in the cluster
leaving the rest untouched. So we have to make sure that we will notice
cluster which still has some extents. To do this I've changed
partial_cluster to be signed long long type. The only scenario where
this could be a problem is when cluster_size == block size, however in
that case there would not be any partial clusters so we're safe. For
bigger clusters the signed type is enough. Now we use the negative value
in partial_cluster to mark such cluster used, hence we know that we must
not free it even if all other extents has been freed from such cluster.

This scenario can be described in simple diagram:

|FFF...FF..FF.UUU|
 ^----------^
  punch hole

. - free space
| - cluster boundary
F - freed extent
U - used extent

Also update respective tracepoints to use signed long long type for
partial_cluster.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: update ext4_ext_remove_space trace point
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: update ext4_ext_remove_space trace point

Add "end" variable.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: remove unused code from ext4_remove_blocks()
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: remove unused code from ext4_remove_blocks()

The "head removal" branch in the condition is never used in any code
path in ext4 since the function only caller ext4_ext_rm_leaf() will make
sure that the extent is properly split before removing blocks. Note that
there is a bug in this branch anyway.

This commit removes the unused code completely and makes use of
ext4_error() instead of printk if dubious range is provided.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: remove unused discard_partial_page_buffers
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: remove unused discard_partial_page_buffers

The discard_partial_page_buffers is no longer used anywhere so we can
simply remove it including the *_no_lock variant and
EXT4_DISCARD_PARTIAL_PG_ZERO_UNMAPPED define.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: use ext4_zero_partial_blocks in punch_hole
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: use ext4_zero_partial_blocks in punch_hole

We're doing to get rid of ext4_discard_partial_page_buffers() since it is
duplicating some code and also partially duplicating work of
truncate_pagecache_range(), moreover the old implementation was much
clearer.

Now when the truncate_inode_pages_range() can handle truncating non page
aligned regions we can use this to invalidate and zero out block aligned
region of the punched out range and then use ext4_block_truncate_page()
to zero the unaligned blocks on the start and end of the range. This
will greatly simplify the punch hole code. Moreover after this commit we
can get rid of the ext4_discard_partial_page_buffers() completely.

We also introduce function ext4_prepare_punch_hole() to do come common
operations before we attempt to do the actual punch hole on
indirect or extent file which saves us some code duplication.

This has been tested on ppc64 with 1k block size with fsx and xfstests
without any problems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: truncate_inode_pages() in orphan cleanup path
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: truncate_inode_pages() in orphan cleanup path

Currently we do not tell mm to zero out tail of the page before truncate
in orphan_cleanup(). This is ok, because the page should not be
uptodate, however this may eventually change and I might cause problems.

Call truncate_inode_pages() as precautionary measure. Thanks Jan Kara
for pointing this out.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoRevert "ext4: fix fsx truncate failure"
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
Revert "ext4: fix fsx truncate failure"

This reverts commit 189e868fa8fdca702eb9db9d8afc46b5cb9144c9.

This commit reintroduces the use of ext4_block_truncate_page() in ext4
truncate operation instead of ext4_discard_partial_page_buffers().

The statement in the commit description that the truncate operation only
zero block unaligned portion of the last page is not exactly right,
since truncate_pagecache_range() also zeroes and invalidate the unaligned
portion of the page. Then there is no need to zero and unmap it once more
and ext4_block_truncate_page() was doing the right job, although we
still need to update the buffer head containing the last block, which is
exactly what ext4_block_truncate_page() is doing.

Moreover the problem described in the commit is fixed more properly with
commit

15291164b22a357cb211b618adfef4fa82fc0de3
jbd2: clear BH_Delay & BH_Unwritten in journal_unmap_buffer

This was tested on ppc64 machine with block size of 1024 bytes without
any problems.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoext4: Call ext4_jbd2_file_inode() after zeroing block
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
ext4: Call ext4_jbd2_file_inode() after zeroing block

In data=ordered mode we should call ext4_jbd2_file_inode() so that crash
after the truncate transaction has committed does not expose stall data
in the tail of the block.

Thanks Jan Kara for pointing that out.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoRevert "ext4: remove no longer used functions in inode.c"
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
Revert "ext4: remove no longer used functions in inode.c"

This reverts commit ccb4d7af914e0fe9b2f1022f8ea6c300463fd5e6.

This commit reintroduces functions ext4_block_truncate_page() and
ext4_block_zero_page_range() which has been previously removed in favour
of ext4_discard_partial_page_buffers().

In future commits we want to reintroduce those function and remove
ext4_discard_partial_page_buffers() since it is duplicating some code
and also partially duplicating work of truncate_pagecache_range(),
moreover the old implementation was much clearer.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agomm: teach truncate_inode_pages_range() to handle non page aligned ranges
Lukas Czerner [Tue, 28 May 2013 03:32:35 +0000 (23:32 -0400)]
mm: teach truncate_inode_pages_range() to handle non page aligned ranges

This commit changes truncate_inode_pages_range() so it can handle non
page aligned regions of the truncate. Currently we can hit BUG_ON when
the end of the range is not page aligned, but we can handle unaligned
start of the range.

Being able to handle non page aligned regions of the page can help file
system punch_hole implementations and save some work, because once we're
holding the page we might as well deal with it right away.

In previous commits we've changed ->invalidatepage() prototype to accept
'length' argument to be able to specify range to invalidate. No we can
use that new ability in truncate_inode_pages_range().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agoreiserfs: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:58:51 +0000 (23:58 -0400)]
reiserfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in reiserfs_invalidatepage()

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: reiserfs-devel@vger.kernel.org
12 years agogfs2: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:58:49 +0000 (23:58 -0400)]
gfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in gfs2_invalidatepage().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: cluster-devel@redhat.com
12 years agoceph: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:58:48 +0000 (23:58 -0400)]
ceph: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ceph_invalidatepage().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Sage Weil <sage@inktank.com>
Cc: ceph-devel@vger.kernel.org
12 years agoocfs2: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:58:46 +0000 (23:58 -0400)]
ocfs2: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in ocfs2_invalidatepage().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Joel Becker <jlbec@evilplan.org>
12 years agoxfs: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:58:01 +0000 (23:58 -0400)]
xfs: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in xfs_vm_invalidatepage()

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
12 years agojbd: change journal_invalidatepage() to accept length
Lukas Czerner [Wed, 22 May 2013 03:26:36 +0000 (23:26 -0400)]
jbd: change journal_invalidatepage() to accept length

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in journal_invalidatepage() and all the users in ext3 file
system. Also update ext3 trace point to print out length argument.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
12 years agoext4: use ->invalidatepage() length argument
Lukas Czerner [Wed, 22 May 2013 03:25:01 +0000 (23:25 -0400)]
ext4: use ->invalidatepage() length argument

->invalidatepage() aop now accepts range to invalidate so we can make
use of it in all ext4 invalidatepage routines.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
12 years agojbd2: change jbd2_journal_invalidatepage to accept length
Lukas Czerner [Wed, 22 May 2013 03:20:03 +0000 (23:20 -0400)]
jbd2: change jbd2_journal_invalidatepage to accept length

invalidatepage now accepts range to invalidate and there are two file
system using jbd2 also implementing punch hole feature which can benefit
from this. We need to implement the same thing for jbd2 layer in order to
allow those file system take benefit of this functionality.

This commit adds length argument to the jbd2_journal_invalidatepage()
and updates all instances in ext4 and ocfs2.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
12 years agomm: change invalidatepage prototype to accept length
Lukas Czerner [Wed, 22 May 2013 03:17:23 +0000 (23:17 -0400)]
mm: change invalidatepage prototype to accept length

Currently there is no way to truncate partial page where the end
truncate point is not at the end of the page. This is because it was not
needed and the functionality was enough for file system truncate
operation to work properly. However more file systems now support punch
hole feature and it can benefit from mm supporting truncating page just
up to the certain point.

Specifically, with this functionality truncate_inode_pages_range() can
be changed so it supports truncating partial page at the end of the
range (currently it will BUG_ON() if 'end' is not at the end of the
page).

This commit changes the invalidatepage() address space operation
prototype to accept range to be invalidated and update all the instances
for it.

We also change the block_invalidatepage() in the same way and actually
make a use of the new length argument implementing range invalidation.

Actual file system implementations will follow except the file systems
where the changes are really simple and should not change the behaviour
in any way .Implementation for truncate_page_range() which will be able
to accept page unaligned ranges will follow as well.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Hugh Dickins <hughd@google.com>
12 years agoLinux 3.10-rc2 v3.10-rc2
Linus Torvalds [Mon, 20 May 2013 21:37:38 +0000 (14:37 -0700)]
Linux 3.10-rc2

12 years agoMerge tag 'stable/for-linus-3.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Mon, 20 May 2013 21:25:19 +0000 (14:25 -0700)]
Merge tag 'stable/for-linus-3.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen fixes from Konrad Rzeszutek Wilk:
 - Regression fix in xen privcmd fixing a memory leak.
 - Add Documentation for tmem driver.
 - Simplify and remove code in the tmem driver.
 - Cleanups.

* tag 'stable/for-linus-3.10-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen: Fixed assignment error in if statement
  xen/xenbus: Fixed over 80 character limit issue
  xen/xenbus: Fixed indentation error in switch case
  xen/tmem: Don't use self[ballooning|shrinking] if frontswap is off.
  xen/tmem: Remove the usage of '[no|]selfballoon' and use 'tmem.selfballooning' bool instead.
  xen/tmem: Remove the usage of 'noselfshrink' and use 'tmem.selfshrink' bool instead.
  xen/tmem: Remove the boot options and fold them in the tmem.X parameters.
  xen/tmem: s/disable_// and change the logic.
  xen/tmem: Fix compile warning.
  xen/tmem: Split out the different module/boot options.
  xen/tmem: Move all of the boot and module parameters to the top of the file.
  xen/tmem: Cleanup. Remove the parts that say temporary.
  xen/privcmd: fix condition in privcmd_close()

12 years agoMerge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck...
Linus Torvalds [Mon, 20 May 2013 18:36:52 +0000 (11:36 -0700)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

 - Error path fixes for abituguru and iio_hwmon drivers.

 - Drop erroneously created attributes from nct6775 driver.

 - Drop redundant safety on cache lifetime for tmp401 driver.

 - Add explicit maintainer for LM95234 and TMP401 drivers.

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  MAINTAINERS: Add myself as maintainer for LM95234 and TMP401 drivers
  hwmon: (tmp401) Drop redundant safety on cache lifetime
  hwmon: fix error return code in abituguru_probe()
  hwmon: (iio_hwmon) Fix null pointer dereference
  hwmon: (nct6775) Do not create non-existing attributes
  hwmon: (iio_hwmon) Fix missing iio_channel_release_all call if devm_kzalloc fail

12 years agox86: Fix bit corruption at CPU resume time
Linus Torvalds [Mon, 20 May 2013 18:36:03 +0000 (11:36 -0700)]
x86: Fix bit corruption at CPU resume time

In commit 78d77df71510 ("x86-64, init: Do not set NX bits on non-NX
capable hardware") we added the early_pmd_flags that gets the NX bit set
when a CPU supports NX. However, the new variable was marked __initdata,
because the main _use_ of this is in an __init routine.

However, the bit setting happens from secondary_startup_64(), which is
called not only at bootup, but on every secondary CPU start.  Including
resuming from STR and at CPU hotplug time.  So the value cannot be
__initdata.

Reported-bisected-and-tested-by: Michal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org # v3.9
Acked-by: Peter Anvin <hpa@linux.intel.com>
Cc: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoxen: Fixed assignment error in if statement
Lisa Nguyen [Thu, 16 May 2013 05:59:40 +0000 (22:59 -0700)]
xen: Fixed assignment error in if statement

Fixed assignment error in if statement in balloon.c

Signed-off-by: Lisa Nguyen <lisa@xenapiadmin.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
12 years agoxen/xenbus: Fixed over 80 character limit issue
Lisa Nguyen [Thu, 16 May 2013 06:48:03 +0000 (23:48 -0700)]
xen/xenbus: Fixed over 80 character limit issue

Fixed the format length of the xenbus_backend_ioctl()
function to meet the 80 character limit in
xenbus_dev_backend.c

Signed-off-by: Lisa Nguyen <lisa@xenapiadmin.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
12 years agoxen/xenbus: Fixed indentation error in switch case
Lisa Nguyen [Thu, 16 May 2013 06:47:11 +0000 (23:47 -0700)]
xen/xenbus: Fixed indentation error in switch case

Fixed the indentation error in the switch case in
xenbus_dev_backend.c

Signed-off-by: Lisa Nguyen <lisa@xenapiadmin.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
12 years agoMerge tag 'pinctrl-fixes-v3.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Mon, 20 May 2013 14:59:46 +0000 (07:59 -0700)]
Merge tag 'pinctrl-fixes-v3.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:

 - Three fixes to make the boot path for device tree work properly on
   the Nomadik pin controller.

 - Compile warning fix for the vt8500 driver.

 - Fix error path in pinctrl-single.

 - Free mappings in error path of the Lantiq controller.

 - Documentation fixes.

* tag 'pinctrl-fixes-v3.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl/lantiq: Free mapping configs for both pin and groups
  pinctrl: single: fix error return code in pcs_parse_one_pinctrl_entry()
  pinctrl: generic: Fix typos and clarify comments
  pinctrl: vt8500: Fix incorrect data in WM8750 pinctrl table
  pinctrl: abx500: Rejiggle platform data and DT initialisation
  pinctrl: abx500: Specify failed sub-driver by ID instead of driver_data

12 years agoMerge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Linus Torvalds [Mon, 20 May 2013 14:58:51 +0000 (07:58 -0700)]
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild fix from Michal Marek:
 "In an attempt to improve make rpm-pkg, I broke make binrpm-pkg"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  package: Makefile: unbreak binrpm-pkg target

12 years agoMAINTAINERS: Add myself as maintainer for LM95234 and TMP401 drivers
Guenter Roeck [Mon, 20 May 2013 03:44:27 +0000 (20:44 -0700)]
MAINTAINERS: Add myself as maintainer for LM95234 and TMP401 drivers

I wrote the LM95234 driver and extended the TMP401 driver substantially,
and I have hardware to test both, so it makes sense to explicitly
maintain them.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>
12 years agoMerge tag 'dm-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Linus Torvalds [Sun, 19 May 2013 19:35:30 +0000 (12:35 -0700)]
Merge tag 'dm-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull device mapper fix from Alasdair Kergon:
 "A patch to fix metadata resizing with device-mapper thin devices."

* tag 'dm-3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm thin: fix metadata dev resize detection

12 years agodm thin: fix metadata dev resize detection
Alasdair G Kergon [Sun, 19 May 2013 17:57:50 +0000 (18:57 +0100)]
dm thin: fix metadata dev resize detection

Fix detection of the need to resize the dm thin metadata device.

The code incorrectly tried to extend the metadata device when it
didn't need to due to a merging error with patch 24347e9 ("dm thin:
detect metadata device resizing").

  device-mapper: transaction manager: couldn't open metadata space map
  device-mapper: thin metadata: tm_open_with_sm failed
  device-mapper: thin: aborting transaction failed
  device-mapper: thin: switching pool to failure mode

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
12 years agohwmon: (tmp401) Drop redundant safety on cache lifetime
Jean Delvare [Sun, 19 May 2013 14:57:30 +0000 (16:57 +0200)]
hwmon: (tmp401) Drop redundant safety on cache lifetime

time_after (as opposed to time_after_equal) already ensures that the
cache lifetime is at least as much as requested. There is no point in
manually adding another jiffy to that value, and this can confuse the
reader into wrong interpretation.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux...
Linus Torvalds [Sat, 18 May 2013 18:35:28 +0000 (11:35 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
 "Miao Xie has been very busy, fixing races and enospc problems and many
  other small but important pieces.

  Alexandre Oliva discovered some problems with how our error handling
  was interacting with the block layer and for now has disabled our
  partial handling of sub-page writes.  The real sub-page work is in a
  series of patches from IBM that we still need to integrate and test.
  The code Alexandre has turned off was really incomplete.

  Josef has more error handling fixes and an important fix for the new
  skinny extent format.

  This also has my fix for the tracepoint crash from late in 3.9.  It's
  the first stage in a larger clean up to get rid of btrfs_bio and make
  a proper bioset for all the items we need to tack into the bio.  For
  now the bioset only holds our mirror_num and stripe_index, but for the
  next merge window I'll shuffle more in."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
  Btrfs: use a btrfs bioset instead of abusing bio internals
  Btrfs: make sure roots are assigned before freeing their nodes
  Btrfs: explicitly use global_block_rsv for quota_tree
  btrfs: do away with non-whole_page extent I/O
  Btrfs: don't invoke btrfs_invalidate_inodes() in the spin lock context
  Btrfs: remove BUG_ON() in btrfs_read_fs_tree_no_radix()
  Btrfs: pause the space balance when remounting to R/O
  Btrfs: fix unprotected root node of the subvolume's inode rb-tree
  Btrfs: fix accessing a freed tree root
  Btrfs: return errno if possible when we fail to allocate memory
  Btrfs: update the global reserve if it is empty
  Btrfs: don't steal the reserved space from the global reserve if their space type is different
  Btrfs: optimize the error handle of use_block_rsv()
  Btrfs: don't use global block reservation for inode cache truncation
  Btrfs: don't abort the current transaction if there is no enough space for inode cache
  Correct allowed raid levels on balance.
  Btrfs: fix possible memory leak in replace_path()
  Btrfs: fix possible memory leak in the find_parent_nodes()
  Btrfs: don't allow device replace on RAID5/RAID6
  Btrfs: handle running extent ops with skinny metadata
  ...

12 years agoMerge branch 'devm_no_resource_check' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 18 May 2013 17:54:54 +0000 (10:54 -0700)]
Merge branch 'devm_no_resource_check' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull devm usage cleanup from Wolfram Sang:
 "Lately, I have been experimenting how to improve the devm interface to
  make writing device drivers easier and less error prone while also
  getting rid of its subtle issues.  I think it has more potential but
  still needs work and definately conistency, especiall in its usage.

  The first thing I come up with is a low hanging fruit regarding
  devm_ioremap_resouce().  This function already checks if the passed
  resource is valid and gives an error message if not.  So, we can
  remove similar checks from the drivers and get rid of a bit of code
  and a number of inconsistent error strings.

  This series only removes the unneeded check iff devm_ioremap_resource
  follows platform_get_resource directly.  The previous version tried to
  shuffle code if needed, too, what lead to an embarrasing bug.  It
  turned out to me that shuffling code for all cases found will make the
  automated script too complex, so I am unsure if an automated cleanup
  is the proper tool for this case.  Removing the easy stuff seems
  worthwhile to me, though.

  Despite various architectures and platform dependencies, I managed to
  compile test 45 out of 57 modified files locally using heuristics and
  defconfigs."

Pulled because: 296 deletions, 0 additions.

* 'devm_no_resource_check' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (33 commits)
  sound/soc/kirkwood: don't check resource with devm_ioremap_resource
  sound/soc/fsl: don't check resource with devm_ioremap_resource
  arch/mips/lantiq/xway: don't check resource with devm_ioremap_resource
  arch/arm/plat-samsung: don't check resource with devm_ioremap_resource
  arch/arm/mach-tegra: don't check resource with devm_ioremap_resource
  drivers/watchdog: don't check resource with devm_ioremap_resource
  drivers/w1/masters: don't check resource with devm_ioremap_resource
  drivers/video/omap2/dss: don't check resource with devm_ioremap_resource
  drivers/video/omap2: don't check resource with devm_ioremap_resource
  drivers/usb/phy: don't check resource with devm_ioremap_resource
  drivers/usb/host: don't check resource with devm_ioremap_resource
  drivers/usb/gadget: don't check resource with devm_ioremap_resource
  drivers/usb/chipidea: don't check resource with devm_ioremap_resource
  drivers/thermal: don't check resource with devm_ioremap_resource
  drivers/staging/nvec: don't check resource with devm_ioremap_resource
  drivers/staging/dwc2: don't check resource with devm_ioremap_resource
  drivers/spi: don't check resource with devm_ioremap_resource
  drivers/rtc: don't check resource with devm_ioremap_resource
  drivers/pwm: don't check resource with devm_ioremap_resource
  drivers/pinctrl: don't check resource with devm_ioremap_resource
  ...

12 years agoMerge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Linus Torvalds [Sat, 18 May 2013 17:46:50 +0000 (10:46 -0700)]
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull device tree fixes from Grant Likely:
 "Device tree bug fixes and documentation updates for v3.10

  Nothing earth shattering here.  A build failure fix, and fix for
  releasing nodes and some documenation updates."

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  Documentation/devicetree: make semantic of initrd-end more explicit
  of/base: release the node correctly in of_parse_phandle_with_args()
  of/documentation: move video device bindings to a common place
  <linux/of_platform.h>: fix compilation warnings with DT disabled

12 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sat, 18 May 2013 17:36:37 +0000 (10:36 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "Patching up across the field.  The reversion of the two ASID patches
  is particularly important as it was breaking many platforms."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: ralink: use the dwc2 driver for the rt305x USB controller
  MIPS: Extract schedule_mfi info from __schedule
  MIPS: Fix sibling call handling in get_frame_info
  MIPS: MSP71xx: remove inline marking of EXPORT_SYMBOL functions
  MIPS: Make virt_to_phys() work for all unmapped addresses.
  MIPS: Fix build error for crash_dump.c in 3.10-rc1
  MIPS: Xway: Fix clk leak
  Revert "MIPS: Allow ASID size to be determined at boot time."
  Revert "MIPS: microMIPS: Support dynamic ASID sizing."

12 years agoMerge tag 'kmemleak-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas...
Linus Torvalds [Sat, 18 May 2013 17:21:32 +0000 (10:21 -0700)]
Merge tag 'kmemleak-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull kmemleak patches from Catalin Marinas:
 "Kmemleak now scans all the writable and non-executable module sections
  to avoid false positives (previously it was only scanning specific
  sections and missing .ref.data)."

* tag 'kmemleak-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  kmemleak: No need for scanning specific module sections
  kmemleak: Scan all allocated, writeable and not executable module sections

12 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas...
Linus Torvalds [Sat, 18 May 2013 17:20:46 +0000 (10:20 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64

Pull arm64 fixes from Catalin Marinas:
 "Fixes for duplicate definition of early_console, kernel/time/Kconfig
  include, __flush_dcache_all() set/way computing, debug (locking, bit
  testing).  The of_platform_populate() was moved to an arch_init_call()
  to allow subsys_init_call() drivers to probe the DT."

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
  arm64: debug: fix mdscr.ss check when enabling debug exceptions
  arm64: Do not source kernel/time/Kconfig explicitly
  arm64: mm: Fix operands of clz in __flush_dcache_all
  arm64: Invoke the of_platform_populate() at arch_initcall() level
  arm64: debug: clear mdscr_el1 instead of taking the OS lock
  arm64: Fix duplicate definition of early_console

12 years agosound/soc/kirkwood: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:57 +0000 (15:19 +0200)]
sound/soc/kirkwood: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agosound/soc/fsl: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:57 +0000 (15:19 +0200)]
sound/soc/fsl: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agoarch/mips/lantiq/xway: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:56 +0000 (15:19 +0200)]
arch/mips/lantiq/xway: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: John Crispin <blogic@openwrt.org>
12 years agoarch/arm/plat-samsung: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:55 +0000 (15:19 +0200)]
arch/arm/plat-samsung: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agoarch/arm/mach-tegra: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:55 +0000 (15:19 +0200)]
arch/arm/mach-tegra: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Stephen Warren <swarren@nvidia.com>
12 years agodrivers/watchdog: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:54 +0000 (15:19 +0200)]
drivers/watchdog: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/w1/masters: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:53 +0000 (15:19 +0200)]
drivers/w1/masters: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/video/omap2/dss: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:53 +0000 (15:19 +0200)]
drivers/video/omap2/dss: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/video/omap2: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:53 +0000 (15:19 +0200)]
drivers/video/omap2: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/usb/phy: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:53 +0000 (15:19 +0200)]
drivers/usb/phy: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/usb/host: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:52 +0000 (15:19 +0200)]
drivers/usb/host: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
12 years agodrivers/usb/gadget: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:52 +0000 (15:19 +0200)]
drivers/usb/gadget: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/usb/chipidea: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:52 +0000 (15:19 +0200)]
drivers/usb/chipidea: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
12 years agodrivers/thermal: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:51 +0000 (15:19 +0200)]
drivers/thermal: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/staging/nvec: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:51 +0000 (15:19 +0200)]
drivers/staging/nvec: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/staging/dwc2: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:51 +0000 (15:19 +0200)]
drivers/staging/dwc2: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
12 years agodrivers/spi: don't check resource with devm_ioremap_resource
Wolfram Sang [Sun, 12 May 2013 13:19:51 +0000 (15:19 +0200)]
drivers/spi: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Stephen Warren <swarren@nvidia.com>