Sage Weil [Wed, 2 Mar 2011 21:10:37 +0000 (13:10 -0800)]
msgr: fix chdir after daemonize
We don't care of the mkdir succeeds. It has dubious value anyway, though;
if you specify a unique directory for the daemon the caller may as well
create it.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Greg Farnum [Wed, 2 Mar 2011 22:13:38 +0000 (14:13 -0800)]
tcmalloc: switch the interface.
Previously, we used function pointers. Fun for me to learn about, icky
to actually have!
Now we use our own wrapper functions with two implementations -- one
for with tcmalloc and one without. Make those programs which
are tcmalloc-aware build with the appropriate implementation source
at compile-time, but leave the wrapper function stubs in
no matter what.
While we're at it, implement two of the "MallocExtension" calls in
the OSD.
Sage Weil [Wed, 2 Mar 2011 13:51:11 +0000 (05:51 -0800)]
osd: cache map bufferlists until they are flushed to disk
Another thread may share maps with a peer. Make sure they pull bufferlists
out of our cache if this happens prior to the encoded versions being
written to disk.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 1 Mar 2011 00:05:08 +0000 (16:05 -0800)]
osd: trigger discover_all_missing after replay delay
We were calling discover_all_missing only when we went immediately active,
not after we were in the replay state (which triggers from a timer event
that calls OSD::activate_pg(). Move the call into PG::activate() so that
we catch both callers.
This requires passing in a query_map from the caller. While we're at it,
clean up some other instances where we are defining a new query_map
deep within the call tree.
Fixes: #847 (I hope) Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
IoCtx::from_rados_ioctx_t creates an IoCtx out of a rados_ioctx_t.
However, this IoCtx must share ownership of the IoCtxImpl pointer with
the C API user who first called rados_ioctx_create. This must be done
via a reference count inside the IoCtxImpl.
Also add a copy constructor and assignment operator to class IoCtx,
since it's now cheap to have them.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Tue, 1 Mar 2011 01:07:57 +0000 (17:07 -0800)]
PG: unify scrub_received_maps and peer_scrub_maps
Previously, incoming maps were placed into peer_scrub_maps and merged
into scrub_received_maps during scrub_gather_replica_maps. Now,
sub_op_scrub_map merges the maps into scrub_received_maps directly.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Fri, 11 Feb 2011 22:46:05 +0000 (14:46 -0800)]
PG: refactor scrubmap comparison and repair logic
The previous version gave erroneous results. This version seems simpler
and can be more easily unit tested as the error detection logic has been
seperated from the repair logic.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Thu, 10 Feb 2011 21:49:15 +0000 (13:49 -0800)]
PG: replica_scrub also should not block
As with scrub, replica scrub wait()ed for last_update_complete to catch
up to last_update. Now, it will requeue the message when that condition
is satisfied.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Thu, 10 Feb 2011 20:35:35 +0000 (12:35 -0800)]
PG: make scrub non-blocking
Previously, scrub would block using wait until
1. last_update_applied==last_update and
2. all replica scrub maps are up-to-date
1. is now handled by requeueing scrub once last_update_applied catches
up to last_update. (see op_applied and scrub)
2. is handled in scrub_finalize. scrub_finalize will be scheduled using
the scrub_finalize_wq once scrub_waiting_on hits 0. (see scrub and
sub_op_scrub_map)
scrub_finalize also handles comparing the maps and reporting/repairing
errors.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Thu, 10 Feb 2011 20:27:11 +0000 (12:27 -0800)]
OSD: add scrub_finalize_wq
Scrub currently blocks while waiting on replica maps and for
last_update_applied==last_update. Also, the subsequent checking of the
primary and replica maps is done while occupying a disk_tp slot. The
scrubmap checking will be moved into scrub_finalize and performed in
this wq in the op_tp.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
The headers and ceph_fs.cc are written such that they can be shared
verbatim between the kernel and userspace code. Omitting the headers
was deliberate, because they differ depending on the build environment.
The default file layout seems fine in config.cc, since it is declared
in config.h, and is a bunch of tunables we generally try to keep in
config.cc.
The previous change changed all PoolHandle uses to IoContext. This
change also renames the variable names.
Also fix a few API functions whose names weren't quite right after the
previous change. rados_pool_list really does just list pools-- it has
nothing to do with ioctxes.
rados_ioctx_change_auid should be rados_ioctx_pool_set_auid. Although it
takes an ioctx as an argument, it operates on the pool.
rados_ioctx_close should just return void. APIs where the close
operation can fail are broken. What is the user supposed to do if
closing doesn't work?
Also, fix a few test programs that got overlooked earlier.
Signed-off-by: Colin McCabe <colin.mccabe@dreamhost.com>
Samuel Just [Thu, 24 Feb 2011 20:31:58 +0000 (12:31 -0800)]
FileStore.h: reorder queue operations in _journaled_ahead
In writeahead mode, an op could dissappear from jq without immediately
reappearing in q. Thus, q can be empty before seq is requeued and
finished. _journaled_ahead will now enqueue the op in q before removing
from jq.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
This commit introduced an error in parallel journaling mode.
OpSequencer::flush is only meant to ensure that the ops have become
readable, not necessarily journalled.
Samuel Just [Thu, 24 Feb 2011 20:31:58 +0000 (12:31 -0800)]
FileStore.h: reorder queue operations in _journaled_ahead
In writeahead mode, an op could dissappear from jq without immediately
reappearing in q. Thus, q can be empty before seq is requeued and
finished. _journaled_ahead will now enqueue the op in q before removing
from jq.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
This commit introduced an error in parallel journaling mode.
OpSequencer::flush is only meant to ensure that the ops have become
readable, not necessarily journalled.
Sage Weil [Wed, 23 Feb 2011 22:25:06 +0000 (14:25 -0800)]
mds: fix export cancellation vs nested freezes
Prevent freezes from completing while we are canceling exports. Otherwise
if we are freezing /a/b and /a, and cancel /a/b, we may inadvertantly
complete the freeze on /a (synchronously) and confuse ourselves. Pin
all freezes beforehand so that when we cancel each one we do not cause
any others to prematurely complete.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>