Noah Watkins [Wed, 2 Nov 2011 04:52:48 +0000 (21:52 -0700)]
hadoop: simplify workingDir handling; add home directory
1. Simplifies the handling of paths by allowing them to be passed
around and manipulated in their fully qualified form. Before
paths are passed into native Ceph calls the path-only portion
is extracted.
2. Sets the initial working directory to be the default home
directory for a user (e.g. /user/<username>/).
Noah Watkins [Wed, 2 Nov 2011 00:25:49 +0000 (17:25 -0700)]
hadoop: emulate Ceph file owner as current user
Make CephFileSystem tell Hadoop that the owner
of all files is the current user. This provides
zero security or isolation, but allows Hadoop
to be used with its default security settings.
A future solution will need to be developed that
provides some isolation, and gives a better user
experience.
Noah Watkins [Tue, 1 Nov 2011 23:35:12 +0000 (16:35 -0700)]
hadoop: use standard log4j logging facility
Replace ceph.debug(msg, level) with LOG.level(msg)
provided by the log4j facility used by Hadoop. The
level can now be provided on a class-by-class basis
by modifying conf/log4j.properties.
Josh Durgin [Tue, 1 Nov 2011 17:40:41 +0000 (10:40 -0700)]
monclient: fail fast when our auth protocols aren't supported
This handles the case where the server does not support any of the
authentication protocols that the client does. Previously this error
would never be propagated, and you'd only know something went wrong
when the optional timeout expired. Now, monclient->authenticate()
fails as soon as it gets the first response from the monitor.
Samuel Just [Tue, 1 Nov 2011 18:16:53 +0000 (11:16 -0700)]
PG: set_last_peering_reset in Reset constructor
If an osd in the prior set comes up, we can restart peering without a
new peering interval starting. However, we still want to ignore
anything we previously requested from replicas.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Tue, 1 Nov 2011 17:40:41 +0000 (10:40 -0700)]
monclient: fail fast when our auth protocols aren't supported
This handles the case where the server does not support any of the
authentication protocols that the client does. Previously this error
would never be propagated, and you'd only know something went wrong
when the optional timeout expired. Now, monclient->authenticate()
fails as soon as it gets the first response from the monitor.
Samuel Just [Fri, 28 Oct 2011 21:18:12 +0000 (14:18 -0700)]
PG: Create new snap directories independently on replica
Previously, we shipped over the collection creation as part
of the transaction. However, the snap directory on the
replica might or might not exist already due to recovery
progress.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Josh Durgin [Fri, 28 Oct 2011 01:11:28 +0000 (18:11 -0700)]
auth: return unknown if no supported auth is found
If NONE is supported, it will already be in the list of supported
protocols, so there's no need to default to it here. This prevents
clients that request the NONE protocol from authenticating when the
server only accepts CEPHX. Instead, they get -ENOTSUP from the
AuthMonitor.
Sage Weil [Thu, 27 Oct 2011 16:47:20 +0000 (09:47 -0700)]
filejournal: journal_replay_from
Force journal replay from a point other than the op_seq recorded by the
fs. This is useful if you want to skip bad entries in the journal (e.g.,
because they were non-idempotent and you know they were applied and the fs
operations were fully ordered).
Sage Weil [Mon, 24 Oct 2011 20:55:29 +0000 (13:55 -0700)]
osd: fix last_complete adjustment after recovering an object
After we recover each object, we try to raise the last_complete value
(and matching complete_to iterator). If our log was purely a backlog, this
won't necessarily end up bringing last_complete all the way up to the
last_update value, and we'll fail an assert later.
If complete_to does reach the end of the log, then we fast-forward
last_complete to last_update.
The crash we were hitting was in finish_recovery(), and looked something
like
Sage Weil [Sun, 23 Oct 2011 06:07:10 +0000 (23:07 -0700)]
osd: fix generate_past_intervals maybe_went_rw on oldest interval
We stop working backwards when we hit last_epoch_clean, which means for the
oldest interval first_epoch may not be the _real_ first_epoch. (We can't
continue working backward because we may have thrown out those maps
entirely.)
However, if the last_epoch_clean epoch is contained within that interval,
we know that the OSD did in fact go rw because it had to have completed
recovery (and thus peering) to set last_clean_epoch in the first place.
This fixes cases where two different nodes have slightly different
past intervals, generate different prior probe sets as a result, and
flip/flop on the acting set choice. (It may have eventually resolved when
the wrongly excluded node's notify races and arrives in time to be
considered, but that's still clearly no good.)
This does leave the start epoch for that oldest interval incorrect. That
doesn't currently matter except that it's confusing, but I'm not sure how
to mark it properly, or if it's worth the effort.
Sage Weil [Tue, 25 Oct 2011 05:21:43 +0000 (22:21 -0700)]
osd: fix/simplify op discard checks
Use a helper to determine when we should discard an op due to the client
being disconnected. Use this when the op is first received, (re)queued,
and dequeued.
Fix the check to keep ops that are replayed ACKs, as we should make every
effort to reapply those even when the client goes away.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 25 Oct 2011 04:44:36 +0000 (21:44 -0700)]
osd: handle missing/degraded in op thread
The _handle_op() method (and friends) are called when an op is initially
queued and when it is requeued. In the requeue case we have to be more
careful because the caller may be in the middle of doing all sorts of
random stuff. That means we need to limit ourselves to queueing or
discarding the op, and refrain from doing anything else with dangerous
side effects.