mon: Monitor: backup monmap using all ceph features instead of quorum's
When a monitor is freshly created and for some reason its initial sync is
aborted, it will end up with an incorrect backup monmap. This monmap is
incorrect in the sense that it will not contain the monitor's names as
it will expect on the next run.
This results from us being using the quorum features to encode the monmap
when backing it up, instead of CEPH_FEATURES_ALL.
Fixes: #5203 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Wed, 29 May 2013 16:49:11 +0000 (09:49 -0700)]
osd: do not assume head obc object exists when getting snapdir
For a list-snaps operation on the snapdir, do not assume that the obc for the
head means the object exists. This fixes a race between a head deletion and
a list-snaps that wrongly returns ENOENT, triggered by the DiffItersateStress
test when thrashing OSDs.
Fixes: #5183
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Samuel Just <sam.just@inktank.com>
- check against both front and back cons; either one may have failed.
- close *both* front and back before reopening either. this is
overkill, but slightly simpler code.
- fix leak of con when marking down
- handle race against osdmap update and note_down_osd
Fixes: #5172 Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 28 May 2013 18:10:05 +0000 (11:10 -0700)]
HashIndex: sync top directory during start_split,merge,col_split
Otherwise, the links might be ordered after the in progress
operation tag write. We need the in progress operation tag to
correctly recover from an interrupted merge, split, or col_split.
Fixes: #5180
Backport: cuttlefish, bobtail Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Thu, 23 May 2013 04:34:52 +0000 (21:34 -0700)]
rgw: iterate usage entries from correct entry
Fixes: #5152
When iterating through usage entries, and when user id was
provided, we started at the user's first entry and not from
the entry indexed by the request start time.
This commit fixes the issue.
Sage Weil [Wed, 22 May 2013 15:44:52 +0000 (08:44 -0700)]
osd: ping both front and back interfaces
Send ping requests to both the front and back hb addrs for peer osds. If
the front hb addr is not present, do not send it and interpret a reply
as coming from both. This handles the transition from old to new OSDs
seamlessly.
Note both the front and back rx times. Both need to be up to date in order
for the peer to be healthy.
Sage Weil [Wed, 22 May 2013 21:29:37 +0000 (14:29 -0700)]
messages/MOSDMarkMeDown: fix uninit field
Fixes valgrind warning:
==14803== Use of uninitialised value of size 8
==14803== at 0x12E7614: sctp_crc32c_sb8_64_bit (sctp_crc32.c:567)
==14803== by 0x12E76F8: update_crc32 (sctp_crc32.c:609)
==14803== by 0x12E7720: ceph_crc32c_le (sctp_crc32.c:733)
==14803== by 0x105085F: ceph::buffer::list::crc32c(unsigned int) (buffer.h:427)
==14803== by 0x115D7B2: Message::calc_front_crc() (Message.h:441)
==14803== by 0x1159BB0: Message::encode(unsigned long, bool) (Message.cc:170)
==14803== by 0x1323934: Pipe::writer() (Pipe.cc:1524)
==14803== by 0x13293D9: Pipe::Writer::entry() (Pipe.h:59)
==14803== by 0x120A398: Thread::_entry_func(void*) (Thread.cc:41)
==14803== by 0x503BE99: start_thread (pthread_create.c:308)
==14803== by 0x6C6E4BC: clone (clone.S:112)
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Tue, 21 May 2013 22:22:56 +0000 (15:22 -0700)]
OSDMonitor: skip new pools in update_pools_status() and get_pools_health()
New pools won't be full. mon->pgmon()->pg_map.pg_pool_sum[poolid] will
implicitly create an entry for poolid causing register_new_pgs() to assume that
the newly created pgs in the new pool are in fact a result of a split
preventing MOSDPGCreate messages from being sent out.
Fixes: #4813
Backport: cuttlefish Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
mon: Paxos: get rid of the 'prepare_bootstrap()' mechanism
We don't need it after all. If we are in the middle of some proposal,
then we guarantee that said proposal is likely to be retried. If we
haven't yet proposed, then it's forever more likely that a client will
eventually retry the message that triggered this proposal.
Basically, this mechanism attempted at fixing a non-problem, and was in
fact triggering some unforeseen issues that would have required increasing
the code complexity for no good reason.
Fixes: #5102 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: Paxos: finish queued proposals instead of clearing the list
By finishing these Contexts, we make sure the Contexts they enclose (to be
called once the proposal goes through) will behave as their were initially
planned: for instance, a C_Command() may retry the command if a -EAGAIN
is passed to 'finish_contexts', while a C_Trimmed() will simply set
'going_to_trim' to false.
This aims at fixing at least a bug in which Paxos will stop trimming if an
election is triggered while a trim is queued but not yet finished. Such
happens because it is the C_Trimmed() context that is responsible for
resetting 'going_to_trim' back to false. By clearing all the contexts on
the proposal list instead of finishing them, we stay forever unable to
trim Paxos again as 'going_to_trim' will stay True till the end of time as
we know it.
Fixes: #4895 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Danny Al-Gaaf [Wed, 22 May 2013 15:28:06 +0000 (17:28 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
CID 716997 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "MDSCacheObject::is_auth() const".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:25:16 +0000 (17:25 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
ID 716998 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "operator <<(std::ostream &, CInode &)".
CID 716998 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "MDCache::add_replica_dir(ceph::buffer::list::iterator &,
CInode *, int, std::list<Context *, std::allocator<Context *> > &)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:21:59 +0000 (17:21 +0200)]
mds/Migrator.cc: fix possible dereference NULL return value
CID 716999 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "in" when
calling "CInode::put_stickydirs()".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:17:01 +0000 (17:17 +0200)]
mds/Migrator.cc: fix dereference NULL return value
CID 717000 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "operator <<(std::ostream &, CDir &)".
CID 717000 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir" when
calling "Migrator::import_reverse_unfreeze(CDir *)".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 15:06:40 +0000 (17:06 +0200)]
mds/Migrator.cc: fix possible NULL pointer dereference
Move dout() calls behind the related asserts to prevent possible NULL
pointer dereference.
CID 717001 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "diri" when calling
"operator <<(std::ostream &, CInode
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 17 May 2013 12:38:24 +0000 (14:38 +0200)]
mds/Server.cc: fix possible NULL pointer dereference
Add asserts to solve these CID issues:
CID 717002 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "CDir::lookup(std::string const &, snapid_t)".
CID 717002 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dir"
when calling "CDir::lookup(std::string const &, snapid_t)".
CID 717003 (#1 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "dn" when
calling "operator <<(std::ostream &, CDentry &)"
CID 717003 (#2 of 2): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "straydn"
when calling "CDentry::push_projected_linkage()".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Fri, 17 May 2013 12:15:23 +0000 (14:15 +0200)]
src/rbd.cc: silence CID COPY_PASTE_ERROR warning
CID 1021212 (#1 of 1): Copy-paste error (COPY_PASTE_ERROR)
copy_paste_error: "r" in "r = -*__errno_location()" looks like
a copy-paste error. Should it say "fd" instead?
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 22 May 2013 13:42:52 +0000 (15:42 +0200)]
mds/MDS.cc: fix dereference null return value
Fix for:
returned_null: Function "SessionMap::get_session(entity_name_t)" returns
null (checked 12 out of 14 times)
CID 739601 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null
"this->sessionmap.get_session(entity_name_t::CLIENT(client.v))" when
calling "MDS::send_message_client_counted(Message *, Session *)"
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Danny Al-Gaaf [Wed, 15 May 2013 16:14:06 +0000 (18:14 +0200)]
mds/MDCache.cc: fix possible NULL pointer dereference
Assert if 'cur' is NULL.
CID 966616 (#1 of 1): Dereference null return value (NULL_RETURNS)
dereference: Dereferencing a pointer that might be null "cur" when
calling "CInode::is_dir()".
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>