Sage Weil [Wed, 3 Jul 2013 22:36:39 +0000 (15:36 -0700)]
osd/OSDMap: handle case where some new osds have hb_front and others don't
Do not assume that because at least one OSD has an hb_front addr that they
all do, or else we will end up assigning garbage here and later thinking
it is a addr (or, more precisely, != entity_addr_t()).
Fixes: #5460 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Tue, 2 Jul 2013 21:43:17 +0000 (14:43 -0700)]
sysvinit, upstart: handle symlinks to dirs in /var/lib/ceph/*
Match a symlink to a dir, not just dirs. This fixes the osd case of e.g.,
creating an osd in /data/osd$id in which ceph-disk makes a symlink from
/var/lib/ceph/osd/ceph-$id.
Fix proposed by Matt Thompson <matt.thompson@mandiant.com>; extended to
include the upstart users too.
Fixes: #5490 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Tue, 2 Jul 2013 20:43:29 +0000 (13:43 -0700)]
osdc/Objecter: resend command map version checks on reconnect
We already do this for Ops and LingerOps, but missed this when we added
CommandOps to the mix. The result is that an ill-timed mon disconnect will
leave a command map check (and thus the command) hanging.
Sage Weil [Tue, 2 Jul 2013 00:33:11 +0000 (17:33 -0700)]
rgw: add RGWFormatter_Plain allocation to sidestep cranky strlen()
Valgrind complains about an invalid read when we don't pad the allocation,
and because it is inlined we can't whitelist it for valgrind. Workaround
the warning by just padding our allocations a bit.
Fixes: #5346
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 28 Jun 2013 18:50:11 +0000 (11:50 -0700)]
client: fix remaining Inode::put() caller, and make method psuedo-private
Not sure I can make this actually private and make Client::put_inode() a
friend method (making all of Client a friend would defeat the purpose).
This works well enough, though!
Sage Weil [Fri, 28 Jun 2013 04:39:35 +0000 (21:39 -0700)]
client: use put_inode on MetaRequest inode refs
When we drop the request inode refs, we need to use put_inode() to ensure
they get cleaned up properly (removed from inode_map, caps released, etc.).
Do this explicitly here (as we do with all other inode put() paths that
matter).
Fixes: #5381
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com>
Dan Mick [Wed, 26 Jun 2013 01:23:22 +0000 (18:23 -0700)]
Makefile.am: fix libglobal.la race with ceph_test_cors
ceph_test_cors had libglobal.la in its _LDFLAGS macro definition;
it should have been in _LDADD. Moreover, things using libglobal.la
ought to be using LIBGLOBAL_LDA to add it to _LDADD. Fix them all.
Signed-off-by: Dan Mick <dan.mick@inktank.com> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Loic Dachary [Tue, 25 Jun 2013 14:10:02 +0000 (16:10 +0200)]
get_xattr() can return more than 4KB
Instead of failing if the attribute to be returned is larger than 4KB,
double the buffer size each time librados.rados_getxattr returns
-errno.ERANGE and try again.
Loic Dachary [Tue, 25 Jun 2013 13:04:34 +0000 (15:04 +0200)]
skip TEST(EXT4StoreTest, _detect_fs) if DISK or MOUNTPOINT are undefined
The TEST(EXT4StoreTest, _detect_fs) test is meant to be run from
qa/workunits/filestore/filestore.sh, after the ext4 file system was
created. If the DISK and MOUNTPOINT environment variables are not
defined, display a message explaining the expected environment and
silentely skip the test. The tests in store_test.cc are not unit tests
because they depend on their environment.
Sage Weil [Tue, 25 Jun 2013 00:58:48 +0000 (17:58 -0700)]
mon/AuthMonitor: ensure initial rotating keys get encoded when create_initial called 2x
The create_initial() method may get called multiple times; make sure it
will unconditionally generate new/initial rotating keys. Move the block
up so that we can easily assert as much.
Sage Weil [Mon, 24 Jun 2013 23:37:29 +0000 (16:37 -0700)]
osd: tolerate racing threads starting recovery ops
We sample the (max - active) recovery ops to know how many to start, but
do not hold the lock over the full duration, such that it is possible to
start too many ops. This isn't problematic except that our condition
checks for being == max but not beyond it, and we will continue to start
recovery ops when we shouldn't. Fix this by adjusting the conditional
to be <=.
Reported-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Mon, 24 Jun 2013 19:52:44 +0000 (12:52 -0700)]
common/pick_addresses: behave even after internal_safe_to_start_threads
ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.
Work around this by observing the change while we adjust the value. We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.
Fixes: #5195, #5205
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Mon, 24 Jun 2013 01:09:55 +0000 (18:09 -0700)]
msgr: clear_pipe+queue reset when replacing lossy connections
We already handle the lossless replacement and lossy fault paths, but
not the lossy replacement. This fixes an assert(!cleared) in the
reaper. Adjust comments appropriately.
Sage Weil [Mon, 17 Jun 2013 21:14:02 +0000 (14:14 -0700)]
msg/Pipe: goto fail_unlocked on early failures in accept()
Instead of duplicating an incomplete cleanup sequence (that does not
clear_pipe()), goto fail_unlocked and do the cleanup in a generic way.
s/rc/r/ while we are here.
Sage Weil [Mon, 17 Jun 2013 19:47:11 +0000 (12:47 -0700)]
msgr: clear_pipe inside pipe_lock on mark_down_all
Observed a segfault in rebind -> mark_down_all -> clear_pipe -> put that
may have been due to a racing thread clearing the connection_state pointer.
Do the clear_pipe() call under the protection of pipe_lock, as we do in
all other contexts.
Sage Weil [Sun, 23 Jun 2013 15:52:46 +0000 (08:52 -0700)]
mon: do not leak no_reply messages
I think I assumed no_reply() was releasing the references, but it is
not. Which is better, since send_reply() doesn't either. Fix the leaks
by dropping the message ref explicitly.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Thu, 20 Jun 2013 18:11:50 +0000 (11:11 -0700)]
mon: make 'log ...' command wait for commit before reply
Previously we would just dump the command argument to our local log client
and reply immediately, which could lose the message if we then restarted.
Instead, commit directly and wait before replying.
Also, log as the actual client, not as the monitor processing the message.
Fixes: #5409 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Samuel Just [Thu, 20 Jun 2013 01:57:05 +0000 (18:57 -0700)]
FileStore: apply changes after disabling m_filestore_replica_fadvise
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit ed8b0e65bde14d0a3a08bc233dee6a997e379dcc)
Sage Weil [Thu, 20 Jun 2013 00:27:49 +0000 (17:27 -0700)]
ceph-disk: use unix lock instead of lockfile class
The lockfile class relies on file system trickery to get safe mutual
exclusion. However, the unix syscalls do this for us. More
importantly, the unix locks go away when the owning process dies, which
is behavior that we want here.
Fixes: #5387
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
ceph-disk: make list_partition behave with unusual device names
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda. Use /sys/block/$device/$partition existence
instead.
Loic Dachary [Wed, 19 Jun 2013 20:50:30 +0000 (22:50 +0200)]
PGLog::rewind_divergent_log must not call mark_dirty_from on end()
PGLog::rewind_divergent_log is dereferencing iterator "p" though it is
already past the end of its container. When entering the loop for the
first time, p is log.log.end() and must not be dereferenced.
mark_dirty_from must only be called after p--. It
will not rewind past begin() because of the