Yan, Zheng [Mon, 6 May 2013 01:09:59 +0000 (09:09 +0800)]
mds: defer releasing cap if necessary
When inode is freezing or frozen, we defer processing MClientCaps
messages and cap release embedded in requests. The same deferral
logical should also cover MClientCapRelease messages.
Yan, Zheng [Thu, 16 May 2013 17:44:23 +0000 (01:44 +0800)]
mds: fix Locker::request_inode_file_caps()
After sending cache rejoin message, replica need notify auth MDS when
cap_wanted changes. But it can send MInodeFileCaps message only after
receiving auth MDS' rejoin ack. Locker::request_inode_file_caps() has
correct wait logical, but it skips sending MInodeFileCaps message if
the auth MDS is still in rejoin state.
The fix is defer sending MInodeFileCaps message until the auth MDS
is active. It makes the function's wait logical less tricky.
mds: send slave request after target MDS is active
when failure of peer is detected, MDCache::handle_mds_failure()
checks if there are requests waiting for slave replies from the
failed peer, and adds them to the "wait for active peer" list.
The "retry request" logical only covers slave requests sent before
MDCache::handle_mds_failure() is called. If a slave request was
sent while peer isn't up, we wait for its reply forever.
Yan, Zheng [Tue, 7 May 2013 00:56:11 +0000 (08:56 +0800)]
mds: remove buggy cache rejoin code
I previously added code to handle a corner case of cache rejoin:
entire subtree, together with the inode subtree root belongs to,
were trimmed between sending cache rejoin and receiving rejoin ack.
In this case, we should send cache expire message to the subtree's
auth MDS. But the code is complete broken, remove it temporarily.
Current code uses import state to detect obsolete import discover/prep
message. it does not work for the case: cancel a subtree import, import
the same subtree again, the discover/prep message for the first import
get dispatched.
For unlink/rename request, the target dentry's linkage may change
before all locks are acquired. So we need check if the existing stray
dentry is valid.
Yan, Zheng [Wed, 15 May 2013 08:35:39 +0000 (16:35 +0800)]
mds: fix slave commit tracking
MDS may crash after journalling a slave commit, but before sending
commit ack to the master. Later when the MDS restarts, it will not
send commit ack to the master. So the master waits for the commit
ack forever. The fix is remove failed MDS from requests' uncommitted
slave list. When failed MDS recovers, its resolve message will tell
the master which slave requests are not committed. The master will
re-add the recovering MDS to requests' uncommitted slave list if
necessary.
Yehuda Sadeh [Thu, 23 May 2013 04:34:52 +0000 (21:34 -0700)]
rgw: iterate usage entries from correct entry
Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.
Sage Weil [Wed, 22 May 2013 21:29:37 +0000 (14:29 -0700)]
messages/MOSDMarkMeDown: fix uninit field
Fixes valgrind warning:
==14803== Use of uninitialised value of size 8
==14803== at 0x12E7614: sctp_crc32c_sb8_64_bit (sctp_crc32.c:567)
==14803== by 0x12E76F8: update_crc32 (sctp_crc32.c:609)
==14803== by 0x12E7720: ceph_crc32c_le (sctp_crc32.c:733)
==14803== by 0x105085F: ceph::buffer::list::crc32c(unsigned int) (buffer.h:427)
==14803== by 0x115D7B2: Message::calc_front_crc() (Message.h:441)
==14803== by 0x1159BB0: Message::encode(unsigned long, bool) (Message.cc:170)
==14803== by 0x1323934: Pipe::writer() (Pipe.cc:1524)
==14803== by 0x13293D9: Pipe::Writer::entry() (Pipe.h:59)
==14803== by 0x120A398: Thread::_entry_func(void*) (Thread.cc:41)
==14803== by 0x503BE99: start_thread (pthread_create.c:308)
==14803== by 0x6C6E4BC: clone (clone.S:112)
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 21 May 2013 22:22:56 +0000 (15:22 -0700)]
OSDMonitor: skip new pools in update_pools_status() and get_pools_health()
New pools won't be full. mon->pgmon()->pg_map.pg_pool_sum[poolid] will
implicitly create an entry for poolid causing register_new_pgs() to assume that
the newly created pgs in the new pool are in fact a result of a split
preventing MOSDPGCreate messages from being sent out.
Fixes: #4813
Backport: cuttlefish Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism
We don't need it after all. If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried. If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.
Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.
Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: Paxos: finish queued proposals instead of clearing the list
By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned: for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.
This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished. Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false. By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.
Fixes: #4895 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 17 May 2013 03:37:05 +0000 (20:37 -0700)]
sysvinit: fix enumeration of local daemons when specifying type only
- prepend $local to the $allconf list at the top
- remove $local special case for all case
- fix the type prefix checks to explicitly check for prefixes
Fugly bash, but works!
Backport: cuttlefish, bobtail Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Fri, 17 May 2013 01:40:29 +0000 (18:40 -0700)]
udev: install disk/by-partuuid rules
Wheezy's udev (175-7.2) has broken rules for the /dev/disk/by-partuuid/
symlinks that ceph-disk relies on. Install parallel rules that work. On
new udev, this is harmless; old older udev, this will make life better.
Fixes: #4865
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 17 May 2013 00:58:48 +0000 (17:58 -0700)]
mon: clear pg delta after some period
If we have not pg_map updates, the delta doesn't update, and can get stuck
with the velocity right before activity stopped. This is confusing, and
can cause incorrect health warnings about in-progress recovery.
To fix this, zero the delta if there is no activity for
'mon delta reset interval' seconds.
Fixes: #5077 Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 14 May 2013 23:35:48 +0000 (16:35 -0700)]
FileJournal: adjust write_pos prior to unlocking write_lock
In committed_thru, we use write_pos to reset the header.start value in cases
where seq is past the end of our journalq. It is therefore important that the
journalq be updated atomically with write_pos (that is, under the write_lock).
The call to align_bl() is moved into do_write in order to ensure that write_pos
is adjusted correctly prior to write_bl().
Also, we adjust pos at the end of write_bl() such that pos \in [get_top(),
header.max_size) after write_bl().
Fixes: #5020 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Loic Dachary [Tue, 14 May 2013 08:52:40 +0000 (10:52 +0200)]
update op added to a waiting queue or discarded
The decision to discard an op happens either in OSD or in PG.
The operation queue goes to a single OpWQ object if waiting_map does not impose a delay op_queue.
The decision to add an op to a waiting queue regardless of its type is updated.
The decision to add a CEPH_MSG_OSD_OP to a waiting queue is described in full.
Danny Al-Gaaf [Tue, 14 May 2013 17:20:29 +0000 (19:20 +0200)]
rgw/rgw_user.cc: fix possible NULL pointer dereference
CID 1019559 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "usr" to function
"RGWUser::get_store()", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 14 May 2013 17:15:23 +0000 (19:15 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
Assert if straydn is NULL.
CID 1019554 (#2 of 2): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "straydn" to function
"MDSCacheObject::is_auth() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 14 May 2013 17:07:29 +0000 (19:07 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
Assert of straydn is NULL here.
CID 1019558 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "straydn" to function
"CDentry::get_dir() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 14 May 2013 17:02:20 +0000 (19:02 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
Assert if destdn == NULL.
CID 1019557 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "destdn" to function
"CDentry::get_dir() const", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 14 May 2013 15:02:56 +0000 (17:02 +0200)]
src/dupstore.cc: check return value of list_collections()
CID 1019545 (#1 of 1): Unchecked return value (CHECKED_RETURN)
check_return: Calling function "ObjectStore::list_collections
(std::vector<coll_t, std::allocator<coll_t> > &)" without
checking return value (as is done elsewhere 5 out of 6 times).
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Tue, 14 May 2013 14:50:57 +0000 (16:50 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
CID 1019555 (#1 of 1): Dereference after null check (FORWARD_NULL)
var_deref_model: Passing null pointer "in" to function
"Server::_need_force_journal(CInode *, bool)", which dereferences it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 14:19:46 +0000 (16:19 +0200)]
src/rbd.cc: use 64-bits to shift 'order'
CID 1019568 (#1 of 1): Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression "1 << *order" with
type "int" (32 bits, signed) is evaluated using 32-bit arithmetic before being
used in a context which expects an expression of type "uint64_t" (64 bits,
unsigned). To avoid overflow, cast the left operand to "uint64_t" before
performing the left shift.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 14:02:04 +0000 (16:02 +0200)]
mon/Monitor.cc: init 'timecheck_acks' with '0' in constructor
CID 1019623 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member "timecheck_acks" is not
initialized in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 13:37:24 +0000 (15:37 +0200)]
mon/Monitor.h: init 'crc' in constructor with '0'
CID 1019624 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member "crc" is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
CID 1019626 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member "flags" is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 12:36:53 +0000 (14:36 +0200)]
test/test_cors.cc: initialize key_type in constructor
CID 1019635 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
uninit_member: Non-static class member "kt" is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 11:52:32 +0000 (13:52 +0200)]
test_filejournal.cc: cleanup memory in destructor
CID 716885 (#1 of 1): Resource leak in object (CTOR_DTOR_LEAK)
alloc_new: Allocating memory by calling "new C_SafeCond(&this->lock,
&this->cond, &this->done, NULL)".
ctor_dtor_leak: The constructor allocates field "c" of "C_Sync" but
the destructor and whatever functions it calls do not free it.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Mon, 13 May 2013 11:40:03 +0000 (13:40 +0200)]
librbd/test_librbd.cc: free memory in test_list_children()
CID 719581 (#7 of 7): Resource leak (RESOURCE_LEAK)
CID 719581 (#6 of 7): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "pools" going out of scope leaks the
storage it points to.
CID 719582 (#6-7 of 7): Resource leak (RESOURCE_LEAK)
leaked_storage: Variable "children" going out of scope leaks
the storage it points to.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Josh Durgin [Sun, 12 May 2013 21:53:26 +0000 (14:53 -0700)]
librbd: add options to enable balanced or localized reads for snapshots
Since snapshots never change, it's safe to read from replicas for them.
A common use for this would be reading from a parent snapshot shared by
many clones.
Convert LibrbdWriteback and AioRead to use the ObjectOperation api
so we can set flags. Fortunately the external wrapper holds no data,
so its lifecycle doesn't need to be managed.
Include a simple workunit that sets the flags in various combinations
and looks for their presence in the logs from 'rbd export'.