Laurent Barbe [Fri, 21 Jun 2013 15:17:09 +0000 (17:17 +0200)]
Add rc script for rbd map/unmap
Init script for mapping/unmapping rbd device on startup and shutdown.
On start, map rbd dev according to /etc/rbdmap, and force mount -a
On stop, umount file system depending on rbd and unmap all rbd
Since some distribution use symlink for /etc/mtab, the user-space attribute _netdev is not enough to umount file system before rbd dev.
(also concern: #1790)
Sage Weil [Tue, 25 Jun 2013 00:58:48 +0000 (17:58 -0700)]
mon/AuthMonitor: ensure initial rotating keys get encoded when create_initial called 2x
The create_initial() method may get called multiple times; make sure it
will unconditionally generate new/initial rotating keys. Move the block
up so that we can easily assert as much.
Sage Weil [Mon, 24 Jun 2013 23:37:29 +0000 (16:37 -0700)]
osd: tolerate racing threads starting recovery ops
We sample the (max - active) recovery ops to know how many to start, but
do not hold the lock over the full duration, such that it is possible to
start too many ops. This isn't problematic except that our condition
checks for being == max but not beyond it, and we will continue to start
recovery ops when we shouldn't. Fix this by adjusting the conditional
to be <=.
Reported-by: Stefan Priebe <s.priebe@profihost.ag> Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: David Zafman <david.zafman@inktank.com>
Sage Weil [Mon, 24 Jun 2013 19:52:44 +0000 (12:52 -0700)]
common/pick_addresses: behave even after internal_safe_to_start_threads
ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.
Work around this by observing the change while we adjust the value. We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.
Fixes: #5195, #5205
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Sage Weil [Mon, 24 Jun 2013 01:09:55 +0000 (18:09 -0700)]
msgr: clear_pipe+queue reset when replacing lossy connections
We already handle the lossless replacement and lossy fault paths, but
not the lossy replacement. This fixes an assert(!cleared) in the
reaper. Adjust comments appropriately.
Sage Weil [Mon, 17 Jun 2013 21:14:02 +0000 (14:14 -0700)]
msg/Pipe: goto fail_unlocked on early failures in accept()
Instead of duplicating an incomplete cleanup sequence (that does not
clear_pipe()), goto fail_unlocked and do the cleanup in a generic way.
s/rc/r/ while we are here.
Sage Weil [Mon, 17 Jun 2013 19:47:11 +0000 (12:47 -0700)]
msgr: clear_pipe inside pipe_lock on mark_down_all
Observed a segfault in rebind -> mark_down_all -> clear_pipe -> put that
may have been due to a racing thread clearing the connection_state pointer.
Do the clear_pipe() call under the protection of pipe_lock, as we do in
all other contexts.
Sage Weil [Sun, 23 Jun 2013 15:52:46 +0000 (08:52 -0700)]
mon: do not leak no_reply messages
I think I assumed no_reply() was releasing the references, but it is
not. Which is better, since send_reply() doesn't either. Fix the leaks
by dropping the message ref explicitly.
Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Thu, 20 Jun 2013 18:11:50 +0000 (11:11 -0700)]
mon: make 'log ...' command wait for commit before reply
Previously we would just dump the command argument to our local log client
and reply immediately, which could lose the message if we then restarted.
Instead, commit directly and wait before replying.
Also, log as the actual client, not as the monitor processing the message.
Fixes: #5409 Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Samuel Just [Thu, 20 Jun 2013 01:57:05 +0000 (18:57 -0700)]
FileStore: apply changes after disabling m_filestore_replica_fadvise
Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit ed8b0e65bde14d0a3a08bc233dee6a997e379dcc)
Sage Weil [Thu, 20 Jun 2013 00:27:49 +0000 (17:27 -0700)]
ceph-disk: use unix lock instead of lockfile class
The lockfile class relies on file system trickery to get safe mutual
exclusion. However, the unix syscalls do this for us. More
importantly, the unix locks go away when the owning process dies, which
is behavior that we want here.
Fixes: #5387
Backport: cuttlefish Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
ceph-disk: make list_partition behave with unusual device names
When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda. Use /sys/block/$device/$partition existence
instead.
Loic Dachary [Wed, 19 Jun 2013 20:50:30 +0000 (22:50 +0200)]
PGLog::rewind_divergent_log must not call mark_dirty_from on end()
PGLog::rewind_divergent_log is dereferencing iterator "p" though it is
already past the end of its container. When entering the loop for the
first time, p is log.log.end() and must not be dereferenced.
mark_dirty_from must only be called after p--. It
will not rewind past begin() because of the
Loic Dachary [Mon, 17 Jun 2013 09:45:22 +0000 (11:45 +0200)]
unit tests for PGLog::proc_replica_log
The tests covers 100% of the LOC of proc_replica_log. It is broken down
in 7 cases to enumerate all the situations it must address. Each case
is isolated in a independant code block where the conditions are
reproduced.
All tests are done on omissing and oinfo because they are the only
data structures that can be modified by proc_replica_log.
The first case is a noop and checks that only last_complete gets
updated when there are no logs.
The following case includes entries that are supposed to be ignored (
x7, x8 and xa ), however this is not an actual proof that the code
ignoring them is actually run : it only shows in the code coverage
report.
The log entry (1,3) modifies the object x9 but the olog entry
(2,3) deletes it : log is authoritative and the object is added
to missing. x7 is divergent and ignored. x8 has a more recent
version in the log and the olog entry is ignored. xa is past
last_backfill and ignored.
The other cases are a variation of the first case with minimal changes
to make them easier to understand and adapt. For instance most of them
start with a tail that is the same ( object with hash x5 and both at
version 1,1 ).
Loic Dachary [Mon, 10 Jun 2013 16:39:47 +0000 (18:39 +0200)]
add constness to PGLog::proc_replica_log
The function is made const by replacing a single call to log.objects[]
with log.objects.find. The olog argument is also a const and does not
require any change.
Sage Weil [Fri, 7 Jun 2013 18:14:58 +0000 (11:14 -0700)]
mon/Paxos: not readable when LOCKED
If we are re-proposing a previously accepted value from a previous quorum,
we should not consider it readable, because it is possible it was exposed
to clients as committed (2/3 accepted) but not recored to be committed, and
we do not want to expose old state as readable when new state was
previously readable.
Sage Weil [Fri, 31 May 2013 23:39:37 +0000 (16:39 -0700)]
mon/Paxos: go active *after* refreshing
The update_from_paxos() methods occasionally like to trigger new activity.
As long as they check is_readable() and is_writeable(), they will defer
until we go active and that activity will happen in the normal callbacks.
This fixes the problem where we active but is_writeable() is still false,
triggered by PGMonitor::check_osd_map().
Sage Weil [Sun, 2 Jun 2013 23:57:11 +0000 (16:57 -0700)]
mon/Paxos: do paxos refresh in finish_proposal; and refactor
Do the paxos refresh inside finish_proposal, ordered *after* the leader
assertion so that MonmapMonitor::update_from_paxos() calling bootstrap()
does not kill us.
Also, remove unnecessary finish_queued_proposal() and move the logic inline
where the bad leader assertion is obvious.
Sage Weil [Sun, 2 Jun 2013 23:14:01 +0000 (16:14 -0700)]
mon: explicitly refresh_from_paxos() when leveldb state changes
Instead of opportunistically calling each service's update_from_paxos(),
instead explicitly refresh all in-memory state whenever we know the
paxos state may have changed. This is simpler and less fragile.
Sage Weil [Wed, 19 Jun 2013 17:50:49 +0000 (10:50 -0700)]
client: fix warning
client/Client.cc: In member function 'int Client::_read_sync(Fh*, uint64_t, uint64_t, ceph::bufferlist*)':
warning: client/Client.cc:5831:13: comparison between signed and unsigned integer expressions [-Wsign-compare]
Yan, Zheng [Sat, 15 Jun 2013 23:01:52 +0000 (07:01 +0800)]
mds: fix race between scatter gather and dirfrag export
If we gather dirty scatter lock state while corresponding dirfrag
is been exporting, we may receive different dirfrag states from
two MDS and we need to find which one is the newest. The solution
is adding a new variable "migrate seq" to dirfrag, increase it by
one when dirfrag's auth MDS changes. When gathering dirty scatter
lock state, use "migrate seq" to find the newest dirfrag state.