Sage Weil [Wed, 29 May 2013 20:15:41 +0000 (13:15 -0700)]
osd: distinguish between definitely healthy and definitely not unhealthy
is_unhealthy() will assume they are healthy for some period after we
send our first ping attempt. is_healthy() is now a strict check that we
know they are healthy.
Switch the failure report check to use is_unhealthy(); use is_healthy()
everywhere else, including the waiting-for-healthy pre-boot checks.
Sage Weil [Mon, 27 May 2013 22:24:56 +0000 (15:24 -0700)]
osd: simplify is_healthy() check during boot
This has a slight behavior change in that we ask the mon for the latest
osdmap if our internal heartbeat is failing. That isn't useful yet, but
will be shortly.
- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either. this is
overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd
Fixes: #5172 Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split
Otherwise, the links might be ordered after the in progress
operation tag write. We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.
Fixes: #5180
Backport: cuttlefish, bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Thu, 23 May 2013 04:34:52 +0000 (21:34 -0700)]
rgw: iterate usage entries from correct entry
Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.
Sage Weil [Wed, 22 May 2013 15:44:52 +0000 (08:44 -0700)]
osd: ping both front and back interfaces
Send ping requests to both the front and back hb addrs for peer osds. If
the front hb addr is not present, do not send it and interpret a reply
as coming from both. This handles the transition from old to new OSDs
seamlessly.
Note both the front and back rx times. Both need to be up to date in order
for the peer to be healthy.
Sage Weil [Wed, 22 May 2013 21:29:37 +0000 (14:29 -0700)]
messages/MOSDMarkMeDown: fix uninit field
Fixes valgrind warning:
==14803== Use of uninitialised value of size 8
==14803== at 0x12E7614: sctp_crc32c_sb8_64_bit (sctp_crc32.c:567)
==14803== by 0x12E76F8: update_crc32 (sctp_crc32.c:609)
==14803== by 0x12E7720: ceph_crc32c_le (sctp_crc32.c:733)
==14803== by 0x105085F: ceph::buffer::list::crc32c(unsigned int) (buffer.h:427)
==14803== by 0x115D7B2: Message::calc_front_crc() (Message.h:441)
==14803== by 0x1159BB0: Message::encode(unsigned long, bool) (Message.cc:170)
==14803== by 0x1323934: Pipe::writer() (Pipe.cc:1524)
==14803== by 0x13293D9: Pipe::Writer::entry() (Pipe.h:59)
==14803== by 0x120A398: Thread::_entry_func(void*) (Thread.cc:41)
==14803== by 0x503BE99: start_thread (pthread_create.c:308)
==14803== by 0x6C6E4BC: clone (clone.S:112)
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 21 May 2013 22:22:56 +0000 (15:22 -0700)]
OSDMonitor: skip new pools in update_pools_status() and get_pools_health()
New pools won't be full. mon->pgmon()->pg_map.pg_pool_sum[poolid] will
implicitly create an entry for poolid causing register_new_pgs() to assume that
the newly created pgs in the new pool are in fact a result of a split
preventing MOSDPGCreate messages from being sent out.
Fixes: #4813
Backport: cuttlefish Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism
We don't need it after all. If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried. If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.
Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.
Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: Paxos: finish queued proposals instead of clearing the list
By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned: for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.
This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished. Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false. By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.
Fixes: #4895 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Danny Al-Gaaf [Wed, 22 May 2013 15:28:06 +0000 (17:28 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
CID 716997 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "MDSCacheObject::is_auth() const".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:25:16 +0000 (17:25 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
ID 716998 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "operator <<(std::ostream &, CInode &)".
CID 716998 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "MDCache::add_replica_dir(ceph::buffer::list::iterator &,
CInode *, int, std::list<Context *, std::allocator<Context *> > &)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:21:59 +0000 (17:21 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
CID 716999 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "CInode::put_stickydirs()".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:17:01 +0000 (17:17 +0200)]
mds/Migrator.cc: fix dereference NULL return value
CID 717000 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "operator <<(std::ostream &, CDir &)".
CID 717000 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "Migrator::import_reverse_unfreeze(CDir *)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:06:40 +0000 (17:06 +0200)]
mds/Migrator.cc: fix possible NULL pointer dereference
Move dout() calls behind the related asserts to prevent possible NULL
pointer dereference.
CID 717001 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "diri" when calling
"operator <<(std::ostream &, CInode
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 17 May 2013 12:38:24 +0000 (14:38 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
Add asserts to solve these CID issues:
CID 717002 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "CDir::lookup(std::string const &, snapid_t)".
CID 717002 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "CDir::lookup(std::string const &, snapid_t)".
CID 717003 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dn" when
calling "operator <<(std::ostream &, CDentry &)"
CID 717003 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "straydn"
when calling "CDentry::push_projected_linkage()".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>