Josh Durgin [Thu, 6 Oct 2011 00:07:07 +0000 (17:07 -0700)]
osd, pg: ignore responses to obsolete queries
This adds a query_epoch to notify and log messages, which are
sent in response to queries from the primary during peering. To
guarantee we don't try to process old logs and notifies after
restarting peering, query_epoch is set to the epoch at which the
query was sent. If query_epoch is less than last_peering_reset,
the primary discards the message.
This caused a "bad state machine event" crash in the following
scenario:
1. Primary tells a stray to generate a backlog at epoch 199.
2. The up set changes because a stray goes up.
3. Primary restarts peering at epoch 200.
4. Stray gets new map for epoch 200, sees that acting set did not
change, and sends log to primary.
5. Primary crashes.
Related to #1403, #1449 Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Greg Farnum [Thu, 6 Oct 2011 16:58:48 +0000 (09:58 -0700)]
cephx: don't leak Authorizers on each request
It's not clear to me why this is a class member -- it's only
written to or read from in this function, which allocates a fresh
one each time it's called.
Greg Farnum [Wed, 5 Oct 2011 16:50:51 +0000 (09:50 -0700)]
monclient: add an initialized bool to guard shutdown.
The addition of a Finisher in 9c56070bc20878e87fcb4715b0a3559dd1aaf9ff
broke shutdown in the case where MonClient::init() was never called, so
add a guard variable to keep track.
I'm not sure this is actually the best solution (Timer guard itself,
for instance; maybe Finisher should too?), but I don't want to change
the Finisher interface without looking at it more carefully than I'm
going to right now.
Greg Farnum [Tue, 4 Oct 2011 18:03:19 +0000 (11:03 -0700)]
rgw: remove select_bucket_placement from RGWAccess interface.
RGWRados::create_bucket is the only user now, so make it private
and make the interface a little tighter.
(We are going to need to handle placement at some point in the future,
but the interface needs to be designed a lot more carefully than this
one [wasn't].)
Greg Farnum [Tue, 4 Oct 2011 17:52:22 +0000 (10:52 -0700)]
rgw: remove rgw_create_bucket.
Push all its extra functionality down into RGWRados::create_bucket. Convert
callers to the different interface (there's no reason to pass in the
bucket name apart from the bucket, and all callers know if they're
using a system bucket or not).
Sage Weil [Tue, 4 Oct 2011 20:33:17 +0000 (13:33 -0700)]
osd: get latest osdmaps before booting
- get the latest osdmaps before adding/marking ourselves up
- behave if there is a discontinuity in the osdmap history
This lets us behave sanely if an osd has been down for a very long time,
or if we replace (wipe) an osd, or otherwise take a fresh new osd and add
it to an aged cluster (with lots of old maps or, more likely, an oldest
map that has a large epoch).
Sage Weil [Tue, 4 Oct 2011 20:43:29 +0000 (13:43 -0700)]
mon: limit maps send on onetime osdmap subscribe
This throttles the load put on the monitor by making the client request
osdmaps in discrete chunks. Use a feature bit to control this, since the
old kernel clients will only send a single onetime request if they think
they are behind.
Greg Farnum [Mon, 3 Oct 2011 22:40:35 +0000 (15:40 -0700)]
rgw: don't specify create_pool and set_marker in create_bucket.
It's wildly inappropriate for that kind of implementation detail to
leak out of the interface. For the moment, leave in a "system_bucket"
parameter, which provides the necessary information. But we should
probably be able to push all of that information down below the
interface layer eventually, and then get rid of that param too.
Sage Weil [Mon, 3 Oct 2011 22:23:15 +0000 (15:23 -0700)]
radosgw: make stop succeed when not running
This fixes apt-get errors like
No /usr/bin/radosgw found running; none killed.
invoke-rc.d: initscript radosgw, action "stop" failed.
dpkg: warning: subprocess old pre-removal script returned error exit status 1
Greg Farnum [Mon, 3 Oct 2011 22:14:02 +0000 (15:14 -0700)]
rgw: move rgw_bucket_select_host_pool behind RGWAccess as select_bucket_placement
This doesn't really belong in front of the interface. Maybe later we
can hide it completely, but for now we can put it behind the RGWAccess
interface in a way that at least pretends to be backing-store-agnostic.
Greg Farnum [Mon, 3 Oct 2011 19:01:56 +0000 (12:01 -0700)]
rgw: remove preallocation of pools
Rename rgw_bucket_allocate_pool to rgw_bucket_select_host_pool, since
that better describes functionality we might actually use someday (and
what it's actually doing now).
Brandon Seibel [Mon, 3 Oct 2011 18:45:17 +0000 (11:45 -0700)]
mds: fix possible deadlock in multi-mds setup
This should fix the file_excl case on a file_max update when there
is more than one mds.
If we dont issue caps here its possible when we eval_gather on
the returning lockack that we don't push the lock forward which
will leave the inode auth pinned. If an export occurs that tries
to freeze a tree this inode is in, we'll have a freezing tree
waiting for an auth_pin that can no longer be removed.
Josh Durgin [Mon, 3 Oct 2011 17:27:19 +0000 (10:27 -0700)]
ReplicatedPG: reset return code after find_object_context
This way the object is actually deleted when it has no snapshots,
since the transaction is not aborted. This makes
test_rbd:TestImage.test_rollback_to_snap_sparse pass.
Sage Weil [Fri, 30 Sep 2011 22:44:06 +0000 (15:44 -0700)]
mds: make jouranl writeable in MDLog::append()
When restarting a stopped MDS, we need to mark the Journaler read/write
before we use it. Do this in MDLog::append(), when we position the write
pointer at the end of the journal.
Sage Weil [Fri, 30 Sep 2011 22:41:50 +0000 (15:41 -0700)]
mdcache: tolerate no subtrees in create_subtree_map()
We don't really need mydir here. It is normally opened up in the
subsequent call to open_root(). That will be identical on journal replay,
so no need to have it beforehand (and in the ESubtreeMap entry).