Sage Weil [Tue, 8 Mar 2011 00:25:30 +0000 (16:25 -0800)]
mds: use projected subtree in rename anchor check
We want to (try to) reanchor the directory on rename when our _projected_
subtree is not a leaf. If we use the normal get_subtree_root() call,
we get NULL if we are unlinked, which makes is_leaf_subtree() crash.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Mon, 7 Mar 2011 19:32:20 +0000 (11:32 -0800)]
osd: include all stray peers in might_have_unfound
We should always consider any OSD that has a copy of the PG as a possible
location for missing objects. There are cases where might_have_unfound is
not completed. For example,
- objects on [1,2]
- 2 marked down/out
- objects on [1,3]
- recovery completes, last_epoch_clean is set.
- 2 comes back online
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 4 Mar 2011 21:59:24 +0000 (13:59 -0800)]
osd: include all up peers in might_have_unfound when desperate
If our might_have_unfound calculation was off (it currently can be, see
#865) we could prematurely give up. Try any up OSD at this stage just to
be sure.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 4 Mar 2011 17:39:59 +0000 (09:39 -0800)]
osd: recover_primary if recover_replicas starts no ops
recover_replicas may fail to start anything if we see an unexpected error.
In that case, try recover_primary immediately instead of waiting for the
PG to (hopefully) get requeued for recovery later.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 4 Mar 2011 17:38:47 +0000 (09:38 -0800)]
osd: discover more missing if unfound and do_recovery can't start anything
If we couldn't start any recovery ops and things are still
unfound, see if we can discover more missing object locations.
It may be that our initial locations were bad and we errored
out while trying to pull.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
IoCtx::from_rados_ioctx_t creates an IoCtx out of a rados_ioctx_t.
However, this IoCtx must share ownership of the IoCtxImpl pointer with
the C API user who first called rados_ioctx_create. This must be done
via a reference count inside the IoCtxImpl.
Also add a copy constructor and assignment operator to class IoCtx,
since it's now cheap to have them.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Log a version message whenever we open the dout log, not just the first
time. However, only output it to log files and syslog. Spewing versions
to stderr and stdout was determined to be annoying.
Rename dout_emergency_impl to dout_emergency_to_file_and_syslog to
better reflect its function.
Rename ceph_version_to_string to pretty_version_to_string.
Add get_process_name to do just that. Re-arrange some version.h methods.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Conflicts:
Sage Weil [Thu, 3 Mar 2011 00:13:54 +0000 (16:13 -0800)]
mds: rip out rename linkmerge support
It turns out POSIX says rename(a,b) is a no-op when a and b link to the
same inode. This is super weird but good news because it means we can
rip out a bunch of poorly tested code.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Alexandre Oliva [Wed, 2 Mar 2011 21:39:09 +0000 (13:39 -0800)]
cmds/cosd: Fix IsHeapProfilerRunning implicit return type cast.
G++ complains about the difference between the return type of tcmalloc's
IsHeapProfilerRunning (int) and the return type of the function that
g_conf.profiler_running is supposed to point to (bool). We could
probably get away with a type-cast, but as a compiler developer and
former C++ language lawyer, I'd rather not take the risk of destroying
the universe by invoking undefined behavior ;-)
Sage Weil [Wed, 2 Mar 2011 00:02:48 +0000 (16:02 -0800)]
osd: update missing_loc when infering an empty missing set
We infer an empty missing set, but weren't calculating object locations
based on that. Usually it was okay because we already had another
location, but not always! And especially not when one location turns out
to be bad and we need to go to another.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 1 Mar 2011 23:11:47 +0000 (15:11 -0800)]
osd: add object to missing if we find it missing on disk
If the recovery finds the object missing on disk during recovery, add it
to the local missing set so we can (hopefully) recover it from another
replica.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 1 Mar 2011 00:05:08 +0000 (16:05 -0800)]
osd: trigger discover_all_missing after replay delay
We were calling discover_all_missing only when we went immediately active,
not after we were in the replay state (which triggers from a timer event
that calls OSD::activate_pg(). Move the call into PG::activate() so that
we catch both callers.
This requires passing in a query_map from the caller. While we're at it,
clean up some other instances where we are defining a new query_map
deep within the call tree.
Fixes: #847 (I hope) Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
The headers and ceph_fs.cc are written such that they can be shared
verbatim between the kernel and userspace code. Omitting the headers
was deliberate, because they differ depending on the build environment.
The default file layout seems fine in config.cc, since it is declared
in config.h, and is a bunch of tunables we generally try to keep in
config.cc.
The previous change changed all PoolHandle uses to IoContext. This
change also renames the variable names.
Also fix a few API functions whose names weren't quite right after the
previous change. rados_pool_list really does just list pools-- it has
nothing to do with ioctxes.
rados_ioctx_change_auid should be rados_ioctx_pool_set_auid. Although it
takes an ioctx as an argument, it operates on the pool.
rados_ioctx_close should just return void. APIs where the close
operation can fail are broken. What is the user supposed to do if
closing doesn't work?
Also, fix a few test programs that got overlooked earlier.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Thu, 24 Feb 2011 20:31:58 +0000 (12:31 -0800)]
FileStore.h: reorder queue operations in _journaled_ahead
In writeahead mode, an op could dissappear from jq without immediately
reappearing in q. Thus, q can be empty before seq is requeued and
finished. _journaled_ahead will now enqueue the op in q before removing
from jq.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
This commit introduced an error in parallel journaling mode.
OpSequencer::flush is only meant to ensure that the ops have become
readable, not necessarily journalled.
Sage Weil [Thu, 24 Feb 2011 13:49:29 +0000 (05:49 -0800)]
Makefile: fix libatomic_ops linking
LDADD seems to have no effect on the final link command. Switching this
back to AM_LDFLAGS. This was changed as in 1c7d8f1ac2c, although it's not
clear that the change was intentional...
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Thu, 24 Feb 2011 08:34:37 +0000 (00:34 -0800)]
mds: remove "N stopped" from short mdsmap summary
It's confusing because it sounds like we're talking about daemons, when we
really just mean there are some ranks that created some ondisk state but
aren't currently part of the running cluster.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 23 Feb 2011 23:08:58 +0000 (15:08 -0800)]
mds: strengthen assertions in rejoin ack
The ACK only contains items we asked for with a WEAK request. Assert as
much. (The old continue bits were from ~2007, when this was originally
written.)
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>