]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agofilestore: refactor op_queue/journal locking
Sage Weil [Tue, 30 Nov 2010 15:51:16 +0000 (07:51 -0800)]
filestore: refactor op_queue/journal locking

- Combine journal_lock and lock.
- Move throttling outside of the lock (this fixes potential deadlock in
  parallel journal mode)
- Make interface nomenclature a bit more helpful

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: do not throttle op_queue in queue_op()
Sage Weil [Tue, 30 Nov 2010 15:22:37 +0000 (07:22 -0800)]
filestore: do not throttle op_queue in queue_op()

In parallel mode, queue_op is called while holding the journal lock, so it
is not okay to throttle there.  Instead, throttle in the caller.

The throttling still needs improvement, but this at least fixes the locking
problem.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: simplify apply_transactions
Sage Weil [Tue, 30 Nov 2010 00:38:55 +0000 (16:38 -0800)]
filestore: simplify apply_transactions

Always use queue_transactions, even in no-journal case.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart.sh: don't specify journaling mode
Sage Weil [Mon, 29 Nov 2010 19:51:13 +0000 (11:51 -0800)]
vstart.sh: don't specify journaling mode

Let the autodetection kick in, or let the dev specify via -o '...'.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: PG::trim: add assert
Colin Patrick McCabe [Mon, 29 Nov 2010 19:15:45 +0000 (11:15 -0800)]
osd: PG::trim: add assert

Assert that we're not trimming the PG log past last_complete.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: _process_pg_info: add assert for replicas
Colin Patrick McCabe [Mon, 29 Nov 2010 17:29:17 +0000 (09:29 -0800)]
osd: _process_pg_info: add assert for replicas

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: dump_missing: also dump missing_loc
Colin Patrick McCabe [Thu, 25 Nov 2010 07:36:14 +0000 (23:36 -0800)]
osd: dump_missing: also dump missing_loc

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: discover_all_missing fix
Colin Patrick McCabe [Thu, 25 Nov 2010 07:13:43 +0000 (23:13 -0800)]
osd: discover_all_missing fix

Don't request information from an OSD unless it is up and part of the
might_have_unfound set. Add more logging.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agogui: some cleanup
Colin Patrick McCabe [Wed, 24 Nov 2010 00:37:20 +0000 (16:37 -0800)]
gui: some cleanup

Rather than vectors of pointers, use vectors of NodeInfo structures.
This avoids the problem of freeing the NodeInfo structures.

GuiMonitor::gen_node_info_from_icons: initialize status.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agogui: more reindenting
Colin Patrick McCabe [Tue, 23 Nov 2010 23:39:53 +0000 (15:39 -0800)]
gui: more reindenting

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agogui: reindent a bunch of code
Colin Patrick McCabe [Tue, 23 Nov 2010 23:37:15 +0000 (15:37 -0800)]
gui: reindent a bunch of code

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoclient: remove inode from flush_caps list when auth_cap changes
Sage Weil [Tue, 23 Nov 2010 18:25:39 +0000 (10:25 -0800)]
client: remove inode from flush_caps list when auth_cap changes

Avoid confusing other code (e.g. kick_flushing_caps) by staying on the mds
flushign_caps list when we don't even have an auth_cap with them anymore.
We'll need to re-flush to a new MDS later.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix set_state_rejoin auth_pin check
Sage Weil [Tue, 23 Nov 2010 18:08:18 +0000 (10:08 -0800)]
mds: fix set_state_rejoin auth_pin check

We carry an auth pin IFF !stable AND auth.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinit-ceph: tolerate failure in cleanallogs
Sage Weil [Tue, 23 Nov 2010 21:39:38 +0000 (13:39 -0800)]
init-ceph: tolerate failure in cleanallogs

Otherwise /var/log/ceph/stat makes rm -f error out and we fail.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix recover_replicas() unfound check
Sage Weil [Tue, 23 Nov 2010 21:32:49 +0000 (13:32 -0800)]
osd: fix recover_replicas() unfound check

missing_loc.count(soid) == 0 only means unfound if it's not missing on the
primary.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: recover_primary() until primary has all found objects
Sage Weil [Tue, 23 Nov 2010 21:16:52 +0000 (13:16 -0800)]
osd: recover_primary() until primary has all found objects

The logic in that if was effectively reversed.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: only discover_all_missing if unfound
Sage Weil [Tue, 23 Nov 2010 21:16:20 +0000 (13:16 -0800)]
osd: only discover_all_missing if unfound

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add get_num_unfound() helper
Sage Weil [Tue, 23 Nov 2010 21:15:48 +0000 (13:15 -0800)]
osd: add get_num_unfound() helper

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: only search_for_missing if there are unfound objects
Sage Weil [Tue, 23 Nov 2010 20:46:51 +0000 (12:46 -0800)]
osd: only search_for_missing if there are unfound objects

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: removing unused variable, fix warning
Sage Weil [Tue, 23 Nov 2010 20:33:00 +0000 (12:33 -0800)]
osd: removing unused variable, fix warning

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix is_all_uptodate()
Sage Weil [Tue, 23 Nov 2010 20:32:50 +0000 (12:32 -0800)]
osd: fix is_all_uptodate()

This should only return true when recovery is done, i.e., no more missing
objects.  Nothing to do with unfound.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix PG::is_all_uptodate
Colin Patrick McCabe [Tue, 23 Nov 2010 18:42:32 +0000 (10:42 -0800)]
osd: fix PG::is_all_uptodate

In PG::is_all_uptodate, don't try to look for peer_missing[osd->whoami].
The primary keeps that in PG::missing!

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: PG::read_log: don't be clever with lost xattr
Colin Patrick McCabe [Mon, 22 Nov 2010 23:55:42 +0000 (15:55 -0800)]
osd: PG::read_log: don't be clever with lost xattr

Formerly, we had a special case in read_log for dealing with objects
whose objects were present on the disk, but not their attributes. This
conflicts with our plans to mark objects as lost by putting a bit in the
object attributes, since without those attributes, we'll never know if
the objects were formerly marked as lost.

This should almost never happen, and if it does, we just handle the
objects as missing in the normal way.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoRename peer_summary_requested to peer_backlog_req
Colin Patrick McCabe [Fri, 19 Nov 2010 23:25:00 +0000 (15:25 -0800)]
Rename  peer_summary_requested to peer_backlog_req

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoBuild might_have_unfound set at activation
Colin Patrick McCabe [Fri, 19 Nov 2010 23:02:46 +0000 (15:02 -0800)]
Build might_have_unfound set at activation

The might_have_unfound set is used by the primary OSD during recovery.
This set tracks the OSDs which might have unfound objects that the
primary OSD needs. As we receive Missing from each OSD in
might_have_unfound, we will remove the OSD from the set.

When might_have_unfound is empty, we will mark objects as LOST if the
latest version of the object resided on an OSD marked as lost.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomonmaptool: Return a non-zero error code and print a useful error
Samuel Just [Tue, 23 Nov 2010 20:25:11 +0000 (12:25 -0800)]
monmaptool: Return a non-zero error code and print a useful error
message if unable to read the monmap file.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: allow for old fs's with stray instead of stray0
Sage Weil [Tue, 23 Nov 2010 17:43:49 +0000 (09:43 -0800)]
mds: allow for old fs's with stray instead of stray0

New fs's get stray0, but we want to still behave with old ones.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Tue, 23 Nov 2010 17:37:13 +0000 (09:37 -0800)]
Merge branch 'testing' into unstable

Conflicts:
configure.ac

14 years agov0.23.1 v0.23.1
Sage Weil [Sun, 21 Nov 2010 23:23:29 +0000 (15:23 -0800)]
v0.23.1

14 years agomon: always use send_reply for auth replies
Sage Weil [Tue, 23 Nov 2010 06:41:57 +0000 (22:41 -0800)]
mon: always use send_reply for auth replies

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomon: simplify send_reply code
Sage Weil [Tue, 23 Nov 2010 06:41:42 +0000 (22:41 -0800)]
mon: simplify send_reply code

No need to specify destination in send_reply, as we always have the request
for reference.

Simplify MRoute constructors (keep the ones we use) for tid and bcast
best-effort case.

Do NOT do a best-effort forward of a reply with a tid specified if the tid
is not in the routed-request map.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: add assert to _process_pg_info
Colin Patrick McCabe [Tue, 23 Nov 2010 01:37:55 +0000 (17:37 -0800)]
osd: add assert to _process_pg_info

When activating an inactive replica, assert that we are doing so based
on a message from the primary.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: re-indent some code in _process_pg_info
Colin Patrick McCabe [Tue, 23 Nov 2010 01:31:50 +0000 (17:31 -0800)]
osd: re-indent some code in _process_pg_info

Re-indent the code and add a comment.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomsgr: tolerate 0 bytes from tcp_read_nonblocking
Sage Weil [Tue, 23 Nov 2010 00:12:10 +0000 (16:12 -0800)]
msgr: tolerate 0 bytes from tcp_read_nonblocking

This can happen, I belive when we get a signal or something.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinit-ceph: fix (and test!) cleanlogs and cleanalllogs
Sage Weil [Mon, 22 Nov 2010 00:24:51 +0000 (16:24 -0800)]
init-ceph: fix (and test!) cleanlogs and cleanalllogs

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix rejoin_scour_survivor_replicas inode check
Sage Weil [Mon, 22 Nov 2010 23:43:31 +0000 (15:43 -0800)]
mds: fix rejoin_scour_survivor_replicas inode check

We want to remove replicas that we don't ack, but those don't appear in
the strong_inode map; they're appended to the base_inode bufferlist.  Make
a (temporary) set to track who those are so that we know who to get rid of.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agotypes: Allow inodeno_t structs to alias.
Greg Farnum [Mon, 22 Nov 2010 23:04:22 +0000 (15:04 -0800)]
types: Allow inodeno_t structs to alias.

This removes a compiler warning that appeared in a gcc upgrade and
is apparently erroneous, about its usage violating strict-aliasing rules
when the + operator is used.

14 years agomessenger: init rc to -1, removing compiler warning.
Greg Farnum [Mon, 22 Nov 2010 23:02:54 +0000 (15:02 -0800)]
messenger: init rc to -1, removing compiler warning.

This actually is initialized before all uses, but compilers tend to
have trouble with assignment in if-else branches, and -1 is considered
invalid so there's no danger of refactoring breaking anything.

14 years agoCauses the MDSes to switch among a set of stray directories when
Samuel Just [Tue, 16 Nov 2010 23:29:40 +0000 (15:29 -0800)]
Causes the MDSes to switch among a set of stray directories when
switching to a new journal segment.

MDSCache:
The stray member has been replaced with strays, an array of inodes
representing the set of available stray directories, as well as
stray_index indicating the index of the current stray directory.

get_stray() now returns a pointer to the current stray directory
inode.

advance_stray() advances stray_index to the next stray directory.

migrate_stray no longer takes a source argument, the source mds
is inferred from the parent of the dir entry.

stray dir entries are now stray<index> rather than stray.

scan_stray_dir now scans all stray directories.

MDSLog:
start_new_segment now calls advance_stray() on MDSCache to force a new
stray directory.

mdstypes:
NUM_STRAY indicates the number of stray directories to use per MDS

MDS_INO_STRAY now takes an index argument as well as the mds number

MDS_INO_STRAY_OWNER(i) returns the mds owner of the stray directory i

MDS_INO_STRAY_OWNER(i) returns the index of the stray directory i

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agoTimer must be initialized in Client::init and shutdown in
Samuel Just [Mon, 22 Nov 2010 18:53:55 +0000 (10:53 -0800)]
Timer must be initialized in Client::init and shutdown in
Client::shutdown.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agogenerate_past_intervals:generate back to lastclean
Colin Patrick McCabe [Mon, 22 Nov 2010 18:32:18 +0000 (10:32 -0800)]
generate_past_intervals:generate back to lastclean

PG::generate_past_intervals needs to generate all the intervals back to
history.last_epoch_clean, rather than just to
history.last_epoch_started. This is required by
PG::build_might_have_unfound, which needs to examine these intervals
when building the might_have_unfound set.

Move the check for whether past_intervals is up-to-date into
generate_past_intervals itself. Fix the check.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agovstart.sh: 'init-ceph stop' instead of 'stop.sh'
Sage Weil [Mon, 22 Nov 2010 18:07:40 +0000 (10:07 -0800)]
vstart.sh: 'init-ceph stop' instead of 'stop.sh'

This just makes it easier to run multiple vstart sessions as the same user
on the same host.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'osd_msgr' into unstable
Sage Weil [Mon, 22 Nov 2010 17:55:37 +0000 (09:55 -0800)]
Merge branch 'osd_msgr' into unstable

14 years agomds: remove bogus assert
Sage Weil [Fri, 19 Nov 2010 23:55:24 +0000 (15:55 -0800)]
mds: remove bogus assert

Causes problems during resolve finish.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: do not eval subtree root when replay|resolve
Sage Weil [Fri, 19 Nov 2010 23:10:54 +0000 (15:10 -0800)]
mds: do not eval subtree root when replay|resolve

This is nonsensical.  And can lead to scatter_writebehind, which breaks
horribly.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: trim exported subtree _after_ adjusting auth
Sage Weil [Fri, 19 Nov 2010 22:56:19 +0000 (14:56 -0800)]
mds: trim exported subtree _after_ adjusting auth

We need to set the subtree bounds before trimming it away, or else we may
throw out things we're still auth for.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: resolve cleanup
Sage Weil [Fri, 19 Nov 2010 21:58:31 +0000 (13:58 -0800)]
mds: resolve cleanup

Only track ambiguous imports and such if we get a resolve message while in
the resolve state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: bind to new cluster address when wrongly marked down
Sage Weil [Mon, 22 Nov 2010 17:49:43 +0000 (09:49 -0800)]
osd: bind to new cluster address when wrongly marked down

If we come back up on the same address, there is a possible race.  Other
nodes will mark_down when they see us go down.  If we go up first, queue
some messages, and _then_ they see that we're down and mark_down, the
messages we queued will get lost.  Since it's stateful on the cluster
backend, we need to introduce an ordering so that closing out the _old_
session doesn't break the new session.  We do this by binding to a new
address (just a different port, actually) before marking ourselves back
up.

Fixes #592.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: implement rebind() to pick a new port
Sage Weil [Mon, 22 Nov 2010 17:45:29 +0000 (09:45 -0800)]
msgr: implement rebind() to pick a new port

Closes out all old connections and binds to a _different_ port.  This
ensures that someone doing mark_down on our old address won't get us.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoclient: only encode_cap_releases once per request.
Greg Farnum [Mon, 22 Nov 2010 16:50:32 +0000 (08:50 -0800)]
client: only encode_cap_releases once per request.

Accomplish this by making a list of cap releases in the (permanent)
MetaRequest, and then copying that into the (potentially-temporary)
MClientRequest.

14 years agomon: clean up cluster_addr code a bit, better debug output
Sage Weil [Mon, 22 Nov 2010 04:52:31 +0000 (20:52 -0800)]
mon: clean up cluster_addr code a bit, better debug output

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: send correct ip addrs to monitor for cluster_, hb_addr
Sage Weil [Mon, 22 Nov 2010 04:48:49 +0000 (20:48 -0800)]
osd: send correct ip addrs to monitor for cluster_, hb_addr

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: fix cluster_addr encoding; printing
Sage Weil [Mon, 22 Nov 2010 04:46:41 +0000 (20:46 -0800)]
osdmap: fix cluster_addr encoding; printing

The cluster addrs were getting lost because we were checking v instead of
ev.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: unconditionally set up separate msgr instance for osd<->osd msgs
Sage Weil [Mon, 22 Nov 2010 03:59:43 +0000 (19:59 -0800)]
osd: unconditionally set up separate msgr instance for osd<->osd msgs

Always set up cluster_messenger (before we would only do so if there was
an explicit address configured for it).  The overhead to do so is minimal,
it simplifies the code, and will allow us to fix down->up transitions
(later).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: only warn about disk write cache on kernels <2.6.33
Sage Weil [Mon, 22 Nov 2010 00:16:43 +0000 (16:16 -0800)]
filestore: only warn about disk write cache on kernels <2.6.33

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix search_for_missing: old last_update implies object not present
Sage Weil [Mon, 22 Nov 2010 00:15:25 +0000 (16:15 -0800)]
osd: fix search_for_missing: old last_update implies object not present

For example, if an osd sends an empty PG::Info (last_update = 0'0) and
empty missing, we should not conclude that the object is there.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoinit-ceph: fix cleanlogs for no log_sym_dir case
Sage Weil [Mon, 22 Nov 2010 00:09:13 +0000 (16:09 -0800)]
init-ceph: fix cleanlogs for no log_sym_dir case

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoOSDMap: const cleanup
Colin Patrick McCabe [Sat, 20 Nov 2010 03:15:11 +0000 (19:15 -0800)]
OSDMap: const cleanup

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds-dumper: Define Dumper::~Dumper()
Colin Patrick McCabe [Sat, 20 Nov 2010 03:14:29 +0000 (19:14 -0800)]
mds-dumper: Define Dumper::~Dumper()

To fix compile error.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG::pull: fix test for unfound
Colin Patrick McCabe [Fri, 19 Nov 2010 22:21:00 +0000 (14:21 -0800)]
ReplicatedPG::pull: fix test for unfound

The test for unfound objects was reversed, leading us to try to pull
unfound objects and refrain from pulling objects that we knew how to
get. Should fix bug #585.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosdmap: fix printing, again
Sage Weil [Fri, 19 Nov 2010 21:41:58 +0000 (13:41 -0800)]
osdmap: fix printing, again

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/mds' into unstable
Sage Weil [Fri, 19 Nov 2010 18:17:58 +0000 (10:17 -0800)]
Merge remote branch 'origin/mds' into unstable

14 years agomulti-dump.sh: add diff mode
Colin Patrick McCabe [Fri, 19 Nov 2010 05:13:02 +0000 (21:13 -0800)]
multi-dump.sh: add diff mode

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoAdd multi-dump.sh
Colin Patrick McCabe [Fri, 19 Nov 2010 04:57:15 +0000 (20:57 -0800)]
Add multi-dump.sh

This is a debug tool that can dump out Ceph information at various
epochs. For instance, it can show how the OSDmap changed over time.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG::get_object_contect: fix broken calls
Colin Patrick McCabe [Thu, 18 Nov 2010 22:51:18 +0000 (14:51 -0800)]
ReplicatedPG::get_object_contect: fix broken calls

ReplicatedPG::get_object_context takes three parameters.  The last two
are "const object_locator_t& oloc" and "bool can_create".
Unfortunately, booleans can degrade to ints, and ints can be used to
initialize objects of type object_locator_t.

So when you make a call like:
> ctx->snapset_obc = get_object_context(snapoid, true);

What happens is that you actually call:
> get_object_context(snapoid, object_locator(1), false);

So you pass an invalid and *not* blank object_locator_t, and pass false
for can_create. This is not what the caller wanted. This change gets rid
of the default parameters and fixes the callers.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoReplicatedPG: call finish_recovery when needed
Colin Patrick McCabe [Thu, 18 Nov 2010 20:37:49 +0000 (12:37 -0800)]
ReplicatedPG: call finish_recovery when needed

Don't loop in ReplicatedPG::start_recovery_ops. There is already a loop
in both recover_replicas and recover_primary that will try to do as many
recovery ops as it can, there's no need to repeat it. Also, the former
loop provably would never execute more than once because of the way
the code was structured.

If there are no more recovery operations to do, and PG::is_all_uptodate
is true at the end of ReplicatedPG::start_recovery_ops, call
finish_recovery.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd_resurrection_1_impl: turn on recovery at end
Colin Patrick McCabe [Thu, 18 Nov 2010 18:09:14 +0000 (10:09 -0800)]
osd_resurrection_1_impl: turn on recovery at end

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMakefile: fix builddir weirdness
Jim Schutt [Thu, 18 Nov 2010 00:52:19 +0000 (16:52 -0800)]
Makefile: fix builddir weirdness

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
14 years agoosd: rev PG::Info encoding for last_epoch_clean change
Sage Weil [Wed, 17 Nov 2010 22:37:38 +0000 (14:37 -0800)]
osd: rev PG::Info encoding for last_epoch_clean change

This was missed by 184fbf582b27c10b47101735a4495fe8c73ad186, so any fs
created between now and then won't decode properly.  It's more important
to make an fs prior to that work, though, so that the upgrade path from
the last stable version works.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_frags' into unstable
Sage Weil [Wed, 17 Nov 2010 21:06:14 +0000 (13:06 -0800)]
Merge branch 'mds_frags' into unstable

14 years agomds: adjust dir_auth_pins on steal_dentry
Sage Weil [Wed, 17 Nov 2010 20:31:18 +0000 (12:31 -0800)]
mds: adjust dir_auth_pins on steal_dentry

dir_auth_pins is a counter of dentry auth_pins in the current dir; those
need to be added in when stealing.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: wrlock scatterlocks to prevent a gather racing with split/merge logging
Sage Weil [Wed, 17 Nov 2010 19:39:24 +0000 (11:39 -0800)]
mds: wrlock scatterlocks to prevent a gather racing with split/merge logging

We have the dirs split in our cache for some time while journaling it to
disk, before the fragment_notify goes out.  Make sure we don't do a
scatterlock gather during that time that will confuse the inode auth (who
has their dirfrags fragmented differently).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix subtree map update on dirfrag merge
Sage Weil [Wed, 17 Nov 2010 19:38:00 +0000 (11:38 -0800)]
mds: fix subtree map update on dirfrag merge

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: clear PIN_SUBTREE on split/merge in purge_strays
Sage Weil [Wed, 17 Nov 2010 19:23:15 +0000 (11:23 -0800)]
mds: clear PIN_SUBTREE on split/merge in purge_strays

This makes the helper work for merge as well as split.  Remove the special
fixups in the caller that were making split work before.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't complete freeze while parent inode is frozen
Sage Weil [Wed, 17 Nov 2010 17:20:15 +0000 (09:20 -0800)]
mds: don't complete freeze while parent inode is frozen

This makes maybe_finish_freeze() conditions match that of is_freezeable()
and avoids an assert.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: move dirty rstat inodes to new dir on refragment
Sage Weil [Wed, 17 Nov 2010 17:19:39 +0000 (09:19 -0800)]
mds: move dirty rstat inodes to new dir on refragment

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: flush log on fragment
Sage Weil [Wed, 17 Nov 2010 16:42:44 +0000 (08:42 -0800)]
mds: flush log on fragment

This makes request lock auth_pins expire, so the fragment moves along.
Otherwise we can end up waiting for the log flush timer to go off.

This isn't a complete solution; in-progress requests won't know to flush.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: initialize PIN_SUBTREE on split
Sage Weil [Wed, 17 Nov 2010 16:33:08 +0000 (08:33 -0800)]
mds: initialize PIN_SUBTREE on split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: fix discover requests, tracking wrt fragments
Sage Weil [Mon, 15 Nov 2010 22:16:43 +0000 (14:16 -0800)]
mds: fix discover requests, tracking wrt fragments

Track discover requests by tid.  The old system of tracking outstanding
discovers was kludgey and somewhat broken.  Also there is a possibility
of getting dup replies if someone does kick_requests().

There is still room for improvement with the logic detemrining when a
discover is sent: we may want to discover multiple dirfrags in parallel,
but the current code will only do one at a time.

Signed-off-by: Sage Weil <sage@newdream.net>
comment

14 years agomds: fix EFragment replay
Sage Weil [Tue, 16 Nov 2010 23:59:48 +0000 (15:59 -0800)]
mds: fix EFragment replay

If the inode already exists in our cache, adjust our (existing) fragments.
But it might not.  In that case, we just replay the metablob.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: don't fragment mdsdir or .ceph
Sage Weil [Tue, 16 Nov 2010 19:48:58 +0000 (11:48 -0800)]
mds: don't fragment mdsdir or .ceph

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoDetect broken system linux/fiemap.h
Jim Schutt [Wed, 17 Nov 2010 20:39:52 +0000 (13:39 -0700)]
Detect broken system linux/fiemap.h

RedHat 5.5 has a /usr/include/linux/fiemap.h, but it is
broken because it does not itself include linux/types.h.
As a result, __u64 and friends are not defined.

We have a Ceph-local copy of fiemap.h, so use it
if the system version is broken.

While we're at it, fix up the configure message to
note we're using a local copy.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosdmap: don't include blacklist info in summary
Sage Weil [Wed, 17 Nov 2010 18:24:21 +0000 (10:24 -0800)]
osdmap: don't include blacklist info in summary

It's confusing users and isn't that important.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoclient: Remove the I_COMPLETE flag from the parent directory in relink_inode.
Greg Farnum [Wed, 17 Nov 2010 17:58:38 +0000 (09:58 -0800)]
client: Remove the I_COMPLETE flag from the parent directory in relink_inode.

This papers over issues arising from the client's lack of proper support
for hard links, and lets it pass the snaptest-upchildrealms test.

14 years agoconfig: added max_mds
Samuel Just [Wed, 17 Nov 2010 00:07:47 +0000 (16:07 -0800)]
config: added max_mds
MDSMonitor: create_new_fs adapted to use the max_mds parameter

max_mds is now a configurable value and create_new_fs will initialize
max_mds to the specified value.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agomds: allow frag merge on subtree root
Sage Weil [Mon, 15 Nov 2010 21:43:56 +0000 (13:43 -0800)]
mds: allow frag merge on subtree root

Fix purge_stolen and adjust_dir_fragments.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dirfrag thrashing join and split
Sage Weil [Mon, 15 Nov 2010 21:24:24 +0000 (13:24 -0800)]
mds: make dirfrag thrashing join and split

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add timestamp to LogEvents
Sage Weil [Tue, 16 Nov 2010 20:07:38 +0000 (12:07 -0800)]
mds: add timestamp to LogEvents

This just gives us a bit of useful info when debugging problems.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: fix trailing + in pg state string rendering
Sage Weil [Tue, 16 Nov 2010 18:32:19 +0000 (10:32 -0800)]
osd: fix trailing + in pg state string rendering

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/unfound' into unstable
Sage Weil [Tue, 16 Nov 2010 18:10:43 +0000 (10:10 -0800)]
Merge remote branch 'origin/unfound' into unstable

14 years agomds: be less noisy about cap imports
Sage Weil [Tue, 16 Nov 2010 18:06:05 +0000 (10:06 -0800)]
mds: be less noisy about cap imports

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge branch 'mds_dir_hash' into unstable
Sage Weil [Tue, 16 Nov 2010 18:01:15 +0000 (10:01 -0800)]
Merge branch 'mds_dir_hash' into unstable

14 years agomds/client: pass dir hash over the wire
Sage Weil [Tue, 16 Nov 2010 17:50:55 +0000 (09:50 -0800)]
mds/client: pass dir hash over the wire

Add a feature bit DIRLAYOUTHASH.

Also fix client request routing for lookups (we were only hashing when
a Dentry pointer was provided, not when a relative path was).

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set dir hash on root inode
Sage Weil [Tue, 16 Nov 2010 17:47:34 +0000 (09:47 -0800)]
mds: set dir hash on root inode

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: set mode before all the file type dependent inode initialization!
Sage Weil [Tue, 16 Nov 2010 17:42:51 +0000 (09:42 -0800)]
mds: set mode before all the file type dependent inode initialization!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add DIRLAYOUTHASH feature bit
Sage Weil [Tue, 26 Oct 2010 03:41:24 +0000 (20:41 -0700)]
mds: add DIRLAYOUTHASH feature bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make dentry hash a dir layout property
Sage Weil [Mon, 25 Oct 2010 23:46:01 +0000 (16:46 -0700)]
mds: make dentry hash a dir layout property

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoRadosClient::shutdown: call monclient::shutdown
Colin Patrick McCabe [Tue, 16 Nov 2010 02:33:01 +0000 (18:33 -0800)]
RadosClient::shutdown: call monclient::shutdown

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: don't stop recovery when there are unfound
Colin Patrick McCabe [Tue, 16 Nov 2010 00:19:27 +0000 (16:19 -0800)]
osd: don't stop recovery when there are unfound

There are two phases in recovery: one where we get all the right objects
on to the primary, and another where we push all those objects out to
the replicas. Formerly, we would not start the second phase until there
were no missing objects at all on the primary.

This change modifies that so that we will start the second phase even if
there are unfound objects. However, we will still wait for all findable
missing objects to be brought to us, of course.

Get rid of uptodate_set. We can find the same information by looking at
the missing and missing_loc sets directly. Keeping the uptodate_set...
er... up-to-date would be very difficult in the presence of all the things
that can modify the missing and missing_loc sets.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agodumpjournal.cc: fix compile
Colin Patrick McCabe [Tue, 16 Nov 2010 01:01:19 +0000 (17:01 -0800)]
dumpjournal.cc: fix compile

dumpjournal needs to create its own SafeTimers and pass them in to some
constructors.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>