]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
14 years agoosd: Rename osd_mon_report_interval
Colin Patrick McCabe [Thu, 6 Jan 2011 02:29:09 +0000 (18:29 -0800)]
osd: Rename osd_mon_report_interval

Rename osd_mon_report_interval to osd_mon_report_interval_min.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomon: Introduce Monitor::leader_since
Colin Patrick McCabe [Mon, 3 Jan 2011 23:02:15 +0000 (15:02 -0800)]
mon: Introduce Monitor::leader_since

Introduce Monitor::leader_since to keep track of when the current
monitor became the leader.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'standby_replay' into unstable
Greg Farnum [Thu, 6 Jan 2011 23:39:14 +0000 (15:39 -0800)]
Merge branch 'standby_replay' into unstable

14 years agomds: Add is_any_replay() method and fill it in as appropriate.
Greg Farnum [Thu, 6 Jan 2011 23:37:59 +0000 (15:37 -0800)]
mds: Add is_any_replay() method and fill it in as appropriate.

This way we don't need to remember to call all three of is_replay(),
is_standby_replay(), is_oneshot_replay().

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMerge remote branch 'origin/unstable' into standby_replay
Greg Farnum [Thu, 6 Jan 2011 22:50:35 +0000 (14:50 -0800)]
Merge remote branch 'origin/unstable' into standby_replay

Conflicts:
src/cmds.cc
src/mds/MDS.cc
src/mds/MDS.h

14 years agolibrados: add library api versioning
Yehuda Sadeh [Thu, 6 Jan 2011 22:43:31 +0000 (14:43 -0800)]
librados: add library api versioning

14 years agojournaler: delete Contexts on finish() in new functions.
Greg Farnum [Mon, 20 Dec 2010 22:35:23 +0000 (14:35 -0800)]
journaler: delete Contexts on finish() in new functions.

Previously we weren't, and leaked memory.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomdcache: change replay trimming a bit.
Greg Farnum [Mon, 20 Dec 2010 21:32:43 +0000 (13:32 -0800)]
mdcache: change replay trimming a bit.

Previously we were re-inserting dentrys on the open list. But if
there weren't any other available dentrys to trim, this could
have led to an infinite loop!
Now, we save them in a list and pop them back in once the trim
is done.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: rename replay Contexts -- they were ambiguous at best.
Greg Farnum [Mon, 20 Dec 2010 21:10:44 +0000 (13:10 -0800)]
MDS: rename replay Contexts -- they were ambiguous at best.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: add gids to the logger file names.
Greg Farnum [Fri, 17 Dec 2010 23:56:44 +0000 (15:56 -0800)]
MDS: add gids to the logger file names.

This is just to make differentiating between the standby's files
and stuff easier.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomdlog: return EAGAIN if replay falls off the tail of the journal.
Greg Farnum [Fri, 17 Dec 2010 21:25:04 +0000 (13:25 -0800)]
mdlog: return EAGAIN if replay falls off the tail of the journal.

This can happen when we're following an active journal, and
would previously cause the MDS to shut down. Now we return EAGAIN,
so the MDS can recover as it likes.
Currently, that recovery is a simple respawn, as when we discover
we've fallen behind via probing.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agojournaler: Add init_headers function, call when reading head off disk.
Greg Farnum [Fri, 17 Dec 2010 00:47:30 +0000 (16:47 -0800)]
journaler: Add init_headers function, call when reading head off disk.

Uninitialized headers were causing a failed assert during replay,
and there's no good reason to leave them set at their defaults just
because the *current* incarnation of this MDS has never written to
disk!

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: After probing the journal, reset if we've fallen behind.
Greg Farnum [Thu, 16 Dec 2010 19:53:38 +0000 (11:53 -0800)]
mds: After probing the journal, reset if we've fallen behind.

Previously, if the journal got trimmed and we missed log entries,
we failed out in the journaling step and stopped.
This is still possible and needs to be fixed, but pre-emptively checking
that we're still in the live part of the journal narrows the race range.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: make standby_trim_segments functional. Hurray, hot standbys work!
Greg Farnum [Wed, 15 Dec 2010 00:45:50 +0000 (16:45 -0800)]
MDS: make standby_trim_segments functional. Hurray, hot standbys work!

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomdlog: Add some helper functions for accessing segments map data.
Greg Farnum [Wed, 15 Dec 2010 00:45:21 +0000 (16:45 -0800)]
mdlog: Add some helper functions for accessing segments map data.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomdcache: adjust trim() to handle running during standby-replay.
Greg Farnum [Wed, 15 Dec 2010 00:44:55 +0000 (16:44 -0800)]
mdcache: adjust trim() to handle running during standby-replay.

This just means it needs to handle files on the open list and not
trim them. Add a check for that with an assert, and keep them alive.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoelist: add a clear_list function.
Greg Farnum [Wed, 15 Dec 2010 00:43:45 +0000 (16:43 -0800)]
elist: add a clear_list function.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agolru: change control flow and an assert to keep purpose clearer.
Greg Farnum [Tue, 14 Dec 2010 18:37:13 +0000 (10:37 -0800)]
lru: change control flow and an assert to keep purpose clearer.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDSMonitor: Remove STATE_ONESHOT_REPLAY from takeover logic in tick().
Greg Farnum [Thu, 9 Dec 2010 00:30:32 +0000 (16:30 -0800)]
MDSMonitor: Remove STATE_ONESHOT_REPLAY from takeover logic in tick().

If something dies during a journal-check we shouldn't have anybody
doing standby for them, so assert out!

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDSMonitor: Do not set the rank of an MDS in standby-replay
Greg Farnum [Wed, 8 Dec 2010 17:42:57 +0000 (09:42 -0800)]
MDSMonitor: Do not set the rank of an MDS in standby-replay
or oneshot-replay modes.

This was causing issues with identification in various circumstances,
and turns out to be unnecessary. The MDS now will set its whoami
variable from the standby_for_rank field if that's appropriate.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: MDSMonitor: if MDS is in standby-replay and its leader goes down,
Greg Farnum [Wed, 8 Dec 2010 17:39:59 +0000 (09:39 -0800)]
MDS: MDSMonitor: if MDS is in standby-replay and its leader goes down,
take over as the MDS!

This means we can now exit standby-replay.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDLog: don't change expire_pos or read_pos on replay.
Greg Farnum [Tue, 7 Dec 2010 20:46:10 +0000 (12:46 -0800)]
MDLog: don't change expire_pos or read_pos on replay.

These are unnecessary or rendered irrelevant by previous commit
removing read_pos from the on-disk Header.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: Remove the unused read_pos field.
Greg Farnum [Tue, 7 Dec 2010 19:48:08 +0000 (11:48 -0800)]
Journaler: Remove the unused read_pos field.

Rename it to unused_field, fill the in-memory read_pos
from header.expire_pos, and fill unused_field with the expire_pos
for safety.
(The on-disk header pos was used to fill in read_pos, but it was
always reset to expire_pos before being used and was only ever
set at the end of replay.)

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: miscellaneous standby-replay fixes and cleanups.
Greg Farnum [Fri, 3 Dec 2010 00:38:00 +0000 (16:38 -0800)]
MDS: miscellaneous standby-replay fixes and cleanups.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoMDS: make use of the hooks to start standby-replay.
Greg Farnum [Fri, 3 Dec 2010 00:36:22 +0000 (16:36 -0800)]
MDS: make use of the hooks to start standby-replay.

This doesn't include trim, and there's no way to exit the replay!

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoosd, rados: pgls filter cleanups
Yehuda Sadeh [Thu, 6 Jan 2011 19:09:01 +0000 (11:09 -0800)]
osd, rados: pgls filter cleanups

14 years agoobjecter: use raw_pg_to_pg when needed
Sage Weil [Thu, 6 Jan 2011 18:38:39 +0000 (10:38 -0800)]
objecter: use raw_pg_to_pg when needed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMDS: Implement the hooks for standby_replay.
Greg Farnum [Wed, 1 Dec 2010 21:28:44 +0000 (13:28 -0800)]
MDS: Implement the hooks for standby_replay.

This commit adds the necessary state checks and machinery
for the MDS to go through a "looping" replay.
It does not yet implement online trimming, nor is there any
way to get the MDS into or out of a standby_replay state.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agojournaler: add reread_head_and_probe function.
Greg Farnum [Wed, 1 Dec 2010 18:05:16 +0000 (10:05 -0800)]
journaler: add reread_head_and_probe function.

It does both so callers don't need to implement
intermediate bottom-half handlers.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: add expire_pos to the ESubtreeMap.
Greg Farnum [Tue, 30 Nov 2010 22:00:32 +0000 (14:00 -0800)]
mds: add expire_pos to the ESubtreeMap.

This will allow more efficient trimming during standby_replay.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: extend the use of uint64_t instead of (signed) loff_t, et al.
Greg Farnum [Wed, 24 Nov 2010 21:44:37 +0000 (13:44 -0800)]
mds: extend the use of uint64_t instead of (signed) loff_t, et al.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agomds: rename is_standby_replay() to is_oneshot_replay.
Greg Farnum [Wed, 24 Nov 2010 21:28:49 +0000 (13:28 -0800)]
mds: rename is_standby_replay() to is_oneshot_replay.
This better represents its current purpose.

14 years agomds: Create new STATE_ONESHOT_REPLAY for the MDS.
Greg Farnum [Wed, 24 Nov 2010 00:20:05 +0000 (16:20 -0800)]
mds: Create new STATE_ONESHOT_REPLAY for the MDS.

This takes over the previous behavior of STATE_STANDBY_REPLAY,
allowing standby-replay to be used for the upcoming continuous-replay
that will enable hot standbys.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: make reprobe() an asynchronous function.
Greg Farnum [Tue, 23 Nov 2010 00:19:06 +0000 (16:19 -0800)]
Journaler: make reprobe() an asynchronous function.

This better fits the spirit of the other functions, and the MDS itself.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: make reread_head an asynchronous function.
Greg Farnum [Mon, 22 Nov 2010 20:39:34 +0000 (12:39 -0800)]
Journaler: make reread_head an asynchronous function.

This better fits the spirit of the other functions, and the MDS itself.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: redefine states to make them all unique.
Greg Farnum [Mon, 22 Nov 2010 18:54:54 +0000 (10:54 -0800)]
Journaler: redefine states to make them all unique.

Apparently PROBING and ACTIVE being identical was a mistake.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: Set the privacy of new functions correctly.
Greg Farnum [Fri, 19 Nov 2010 18:48:38 +0000 (10:48 -0800)]
Journaler: Set the privacy of new functions correctly.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: use uint64_6 instead of int64_t.
Greg Farnum [Fri, 19 Nov 2010 18:36:40 +0000 (10:36 -0800)]
Journaler: use uint64_6 instead of int64_t.

Since the values can never be negative, this is far more appropriate,
and it results in fewer casts than the other way around.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: Add function reprobe, to search for the new end of log.
Greg Farnum [Fri, 19 Nov 2010 18:13:47 +0000 (10:13 -0800)]
Journaler: Add function reprobe, to search for the new end of log.

Add new REPROBING state and split up new function probe() from _finish_read_head.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: Add reset() function, which returns it to the immediate post-ctor state
Greg Farnum [Fri, 19 Nov 2010 02:19:30 +0000 (18:19 -0800)]
Journaler: Add reset() function, which returns it to the immediate post-ctor state

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: Add a read-only setting, and asserts to make it fail on writes if readonly.
Greg Farnum [Thu, 18 Nov 2010 22:51:38 +0000 (14:51 -0800)]
Journaler: Add a read-only setting, and asserts to make it fail on writes if readonly.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: add new reread_head function and state.
Greg Farnum [Thu, 18 Nov 2010 19:56:04 +0000 (11:56 -0800)]
Journaler: add new reread_head function and state.

This is to facilitate the forthcoming up_shadow MDS state.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: remove unused vector<snapid_t> snaps from recover().
Greg Farnum [Thu, 18 Nov 2010 00:36:46 +0000 (16:36 -0800)]
Journaler: remove unused vector<snapid_t> snaps from recover().

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoJournaler: set state to STATE_ACTIVE in _finish_probe_end.
Greg Farnum [Thu, 18 Nov 2010 00:33:41 +0000 (16:33 -0800)]
Journaler: set state to STATE_ACTIVE in _finish_probe_end.

This was never actually getting set, although it doesn't matter
since STATE_ACTIVE and STATE_PROBING are defined to be the same.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agoobjecter, librados: propagate extra pgls info to client
Yehuda Sadeh [Thu, 6 Jan 2011 18:20:37 +0000 (10:20 -0800)]
objecter, librados: propagate extra pgls info to client

14 years agocommon: dout_create_rank_symlink: init if needed
Colin Patrick McCabe [Thu, 6 Jan 2011 02:15:12 +0000 (18:15 -0800)]
common: dout_create_rank_symlink: init if needed

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: remove stray reference& in FragmentMarking context
Sage Weil [Thu, 6 Jan 2011 00:42:33 +0000 (16:42 -0800)]
mds: remove stray reference& in FragmentMarking context

Led to confusing occasional(!) crashes on marking completion.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: change refragment journaling/store strategy
Sage Weil [Wed, 5 Jan 2011 23:31:06 +0000 (15:31 -0800)]
mds: change refragment journaling/store strategy

We had a serious problem before where we were updating the cache and
redivvying up the dentries among fragments, but not immediately
journaling it.  This was okay only if we were lucky and no other update
journaled something (e.g. some random child journaling its ancestors).

Instead, journal (PREPARE) immediately and in parallel with the new
dirfrag stores.  When the stores complete, journal again (COMMIT).  On
journal replay, for any PREPAREs without matching COMMITS we immediately
journal a ROLLBACK.

Other behavior is essentially unchanged.  We don't send the notify until
both the PREPARE and STORES complete.  But that part doesn't really matter:
if we restart and rollback, peers will find out during resolve/rejoin,
as before.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make adjust_dir_fragments always adjust fragtree
Sage Weil [Wed, 5 Jan 2011 23:17:36 +0000 (15:17 -0800)]
mds: make adjust_dir_fragments always adjust fragtree

If we have the inode but no dirfrags, we still need to adjust the
inode dirfragtree.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoMerge remote branch 'origin/testing' into unstable
Sage Weil [Thu, 6 Jan 2011 00:48:11 +0000 (16:48 -0800)]
Merge remote branch 'origin/testing' into unstable

14 years agomds: fix can_authpin assert on post-fragment commit
Sage Weil [Wed, 5 Jan 2011 20:49:58 +0000 (12:49 -0800)]
mds: fix can_authpin assert on post-fragment commit

We wan to ignore the authpinnability check here; we already have the
(old) frag frozen, so no worries about starvation and retaking an auth_pin.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: add mds_debug_frag option
Sage Weil [Wed, 5 Jan 2011 19:51:30 +0000 (11:51 -0800)]
mds: add mds_debug_frag option

Verify dirfragtree matches any open dirfrags.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd, rados: pgls filter fixes
Yehuda Sadeh [Thu, 6 Jan 2011 00:50:07 +0000 (16:50 -0800)]
osd, rados: pgls filter fixes

14 years agocommon: make command-line programs log to stderr
Colin Patrick McCabe [Wed, 5 Jan 2011 19:04:49 +0000 (11:04 -0800)]
common: make command-line programs log to stderr

command-line programs (as opposed to daemons) should send their logs to
stderr rather than to a log file, syslog, etc. This is especially
important because most users want to run the ceph command-line programs
as non-root, and often only root has permissions to add to the ceph
log directory.

Create a new function, set_foreground_logging, that overrides ceph.conf
settings to force all log output to stderr. For daemons, we still only
send the very highest priority messages to stderr, and only before they
daemonize().

Don't ever log to stdout because it interferes with scripts that parse
the output of stdout. Instead, log to stderr if the user gives the
--foreground or --nodaemon argument.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agorgw_admin: call common_set_defaults as non-daemon
Colin Patrick McCabe [Wed, 5 Jan 2011 23:29:48 +0000 (15:29 -0800)]
rgw_admin: call common_set_defaults as non-daemon

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agodebian: update scripts to build ubuntu (maverick, lucid) packages too
Sage Weil [Wed, 5 Jan 2011 20:40:55 +0000 (12:40 -0800)]
debian: update scripts to build ubuntu (maverick, lucid) packages too

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: move flock types into separate header
Sage Weil [Wed, 5 Jan 2011 17:33:37 +0000 (09:33 -0800)]
mds: move flock types into separate header

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agorados tool: Remove duplicate line in usage
Wido den Hollander [Wed, 5 Jan 2011 13:08:14 +0000 (14:08 +0100)]
rados tool: Remove duplicate line in usage

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agocommon: generic_dout needs to take the dout mutex
Colin Patrick McCabe [Wed, 5 Jan 2011 02:05:11 +0000 (18:05 -0800)]
common: generic_dout needs to take the dout mutex

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: add pgls filtering by parent ino
Yehuda Sadeh [Wed, 5 Jan 2011 01:17:51 +0000 (17:17 -0800)]
osd: add pgls filtering by parent ino

14 years agocommon: handle_fatal_signal: print threadid in hex
Colin Patrick McCabe [Wed, 5 Jan 2011 01:02:48 +0000 (17:02 -0800)]
common: handle_fatal_signal: print threadid in hex

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: fix ancestor backtrace encoding
Sage Weil [Wed, 5 Jan 2011 00:14:43 +0000 (16:14 -0800)]
mds: fix ancestor backtrace encoding

Use explicit types to capture the encoding.  Include object ino in the
inode_backtrace_t so that the xattr can stand alone.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: force fragmentation for ambiguous imports as well
Sage Weil [Tue, 4 Jan 2011 22:45:34 +0000 (14:45 -0800)]
mds: force fragmentation for ambiguous imports as well

Handle needed refragmentation for processing ambiguous bounds.  That means
forcing the peers' subtree root fragmentation, and also interpreting the
peer's bounds appropriately, given that the peer's fragmentation may not
match our own.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make resolve adjust dir fragmentation as needed
Sage Weil [Tue, 4 Jan 2011 22:39:58 +0000 (14:39 -0800)]
mds: make resolve adjust dir fragmentation as needed

During resolve, adjust dir fragmentation as needed based on the subtrees
the sender explicitly claims.  The given fragmentation on the root is
always valid.  Their bounds may not be; only split our frags as needed if
they happen to be partially in and partially out of the sender's bounding
fragset.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: make get_dirfrags_under behave when dirfragtree is not coherent with dirfrag set
Sage Weil [Tue, 4 Jan 2011 22:35:41 +0000 (14:35 -0800)]
mds: make get_dirfrags_under behave when dirfragtree is not coherent with dirfrag set

This is (currently) the case during replay/resolve, although it's not
clear that it should be.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofrag: const cleanup fragset_t
Sage Weil [Tue, 4 Jan 2011 22:03:11 +0000 (14:03 -0800)]
frag: const cleanup fragset_t

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd, objecter: pgls filtering option
Yehuda Sadeh [Tue, 4 Jan 2011 23:00:34 +0000 (15:00 -0800)]
osd, objecter: pgls filtering option

14 years agoPG: Fixes bug in _scrub with checking clones
Samuel Just [Tue, 4 Jan 2011 22:30:15 +0000 (14:30 -0800)]
PG: Fixes bug in _scrub with checking clones

I introduced this bug in
4a4a1e53c7d380cd0b582c1d0685fd0ef4ef1711.
curclone++ not curclone--.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agoosd: set default pg_bits higher; pgp_bits to old value
Sage Weil [Tue, 4 Jan 2011 19:29:03 +0000 (11:29 -0800)]
osd: set default pg_bits higher; pgp_bits to old value

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoassert: print thread id in hex
Sage Weil [Tue, 4 Jan 2011 18:50:06 +0000 (10:50 -0800)]
assert: print thread id in hex

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoPG: Fix bug in scrub when checking clone sizes
Samuel Just [Tue, 4 Jan 2011 00:48:39 +0000 (16:48 -0800)]
PG: Fix bug in scrub when checking clone sizes

Previosly, _scrub checked:
assert(p->second.size == snapset.clone_size[curclone])

curclone was, however, an index into snapset.clones rather than a
snapid_t.  For clarity, curclone is now an iterator.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agoclient: fix frag selection code
Sage Weil [Tue, 4 Jan 2011 18:20:18 +0000 (10:20 -0800)]
client: fix frag selection code

Calling fragtree_t::contains() on a non-frag_t is nonsense and will crash.
And a fragtree is a complete partition of the space.  What we really want
to check is if we know where to find the specific frag_t we need.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agovstart.sh: specify keyring in ceph.conf
Sage Weil [Tue, 4 Jan 2011 18:18:29 +0000 (10:18 -0800)]
vstart.sh: specify keyring in ceph.conf

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoremove ancient uofs.h
Sage Weil [Tue, 4 Jan 2011 17:16:52 +0000 (09:16 -0800)]
remove ancient uofs.h

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomkcephfs: Clarified numosd message
Matthew Roy [Fri, 31 Dec 2010 07:42:14 +0000 (02:42 -0500)]
mkcephfs: Clarified numosd message

Signed-off-by: Matthew Roy <matthew@royhousehold.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: assert no submit_entry during replay state
Sage Weil [Fri, 24 Dec 2010 17:00:28 +0000 (09:00 -0800)]
mds: assert no submit_entry during replay state

We should never submit items to the journal during replay.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomds: start new log segment resolve start, not replay finish
Sage Weil [Fri, 24 Dec 2010 17:00:02 +0000 (09:00 -0800)]
mds: start new log segment resolve start, not replay finish

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: clean up backlog generation checks a bit
Sage Weil [Fri, 24 Dec 2010 16:36:28 +0000 (08:36 -0800)]
osd: clean up backlog generation checks a bit

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: generate backlog if needed to get last_complete >= log.tail || backlog
Sage Weil [Fri, 24 Dec 2010 16:36:05 +0000 (08:36 -0800)]
osd: generate backlog if needed to get last_complete >= log.tail || backlog

If primary or a replica has a mistrimmed pg log, we need to generate the
backlog during peering.  This sucks, because the PG won't go active for
a long time, but it's what happens when there's a bug in the code that
mis-trims the PG log!

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoosd: send sufficient log to compensate for replicas with last_complate < log.tail
Sage Weil [Fri, 24 Dec 2010 16:27:38 +0000 (08:27 -0800)]
osd: send sufficient log to compensate for replicas with last_complate < log.tail

If a replica has last_complete < log.tail and no backlog, send enough log
for them to get back into a consistent state.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agocommon: Implement max open files
Colin Patrick McCabe [Tue, 4 Jan 2011 01:18:14 +0000 (17:18 -0800)]
common: Implement max open files

In init-ceph, call ulimit -n if the user has set a maximum number of
open files, and the current maximum number of files is different.

Modify sample.ceph.conf to suggest setting a high maximum number of open
files.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosd: Make g_conf.osd_max_notify_timeout a uint32_t
Colin Patrick McCabe [Tue, 4 Jan 2011 00:11:33 +0000 (16:11 -0800)]
osd: Make g_conf.osd_max_notify_timeout a uint32_t

Make g_conf.osd_max_notify_timeout a uint32_t. Squashes an annoying
compiler warning and avoids the awkward issue of users specifying
negative timeouts.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Mon, 3 Jan 2011 23:15:26 +0000 (15:15 -0800)]
Merge branch 'testing' into unstable

14 years agomds: load root inode on replay if auth
Sage Weil [Mon, 3 Jan 2011 22:32:48 +0000 (14:32 -0800)]
mds: load root inode on replay if auth

If we are auth for the root inode, load it's initial value off of disk. We
may not see it in the log if it has not been modified.  If it has, this
is useless but fast/harmless.  This only occurs for brand-new filesystems
where the mds is immediately restarted.

Fixes #671.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agomsgr: Unlock dispatch_queue.lock when short-circuiting queue_received.
Greg Farnum [Mon, 3 Jan 2011 22:14:00 +0000 (14:14 -0800)]
msgr: Unlock dispatch_queue.lock when short-circuiting queue_received.

Previously we left the mutex locked, which is obviously bad bad bad!
I believe this was the cause of #673.

Signed-off-by: Greg Farnum <gregf@hq.newdream.net>
14 years agofilestore: assert on out of order journal pipeline submissions
Sage Weil [Mon, 3 Jan 2011 21:14:49 +0000 (13:14 -0800)]
filestore: assert on out of order journal pipeline submissions

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agofilestore: fix wake condition when journal submission blocks
Sage Weil [Mon, 3 Jan 2011 21:14:13 +0000 (13:14 -0800)]
filestore: fix wake condition when journal submission blocks

We only want to wake up if we are at the front of the line, in order to
preserve journal submission pipeline ordering.

This fixes, among other things, messages in the log like

2010-12-21 10:38:42.515974 7f0861486700 journal op_submit_finish 5364 expected 5370, OUT OF ORDER

and bug #666.

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agocommon: print thread ID in sig handlers and assert
Colin Patrick McCabe [Mon, 3 Jan 2011 20:22:56 +0000 (12:22 -0800)]
common: print thread ID in sig handlers and assert

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agomds: fix purge_stray for directories, zeroed layouts
Sage Weil [Mon, 3 Jan 2011 19:50:53 +0000 (11:50 -0800)]
mds: fix purge_stray for directories, zeroed layouts

- We don't want to purge file content on directories
- Don't fall over if a file has a zero period

Reported-by: Paul Komkoff <i@stingr.net>
Signed-off-by: Sage Weil <sage@newdream.net>
14 years agorbd: add watch option for rbd tool
Yehuda Sadeh [Mon, 3 Jan 2011 19:37:32 +0000 (11:37 -0800)]
rbd: add watch option for rbd tool

14 years agoosd: PG::Info::History: init last_epoch_clean
Colin Patrick McCabe [Wed, 29 Dec 2010 01:03:12 +0000 (17:03 -0800)]
osd: PG::Info::History: init last_epoch_clean

It seems that we have not been zeroing
PG::Info::History:last_epoch_clean when the History structure is
created. This led to some very interesting log output (and bugs!)

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoMerge branch 'testing' into unstable
Sage Weil [Mon, 3 Jan 2011 18:24:47 +0000 (10:24 -0800)]
Merge branch 'testing' into unstable

Conflicts:
configure.ac

14 years agoMerge remote branch 'origin/keyring_cleanup' into unstable
Sage Weil [Mon, 3 Jan 2011 18:24:08 +0000 (10:24 -0800)]
Merge remote branch 'origin/keyring_cleanup' into unstable

14 years agodebian: try to update pbuild env as needed
Sage Weil [Mon, 20 Dec 2010 23:59:14 +0000 (15:59 -0800)]
debian: try to update pbuild env as needed

Signed-off-by: Sage Weil <sage@newdream.net>
14 years agoSimpleMessenger.cc: Fixes a dispatch_throttler leak in queue_received
Samuel Just [Wed, 1 Dec 2010 00:52:40 +0000 (16:52 -0800)]
SimpleMessenger.cc: Fixes a dispatch_throttler leak in queue_received
when the pipe has been halted.

Signed-off-by: Samuel Just <samuelj@hq.newdream.net>
14 years agoauth: CEPH_KEYRING overrides g_conf.keyring
Colin Patrick McCabe [Sun, 2 Jan 2011 20:50:53 +0000 (12:50 -0800)]
auth: CEPH_KEYRING overrides g_conf.keyring

Allow users to choose different keyring files by setting an environment
variable, CEPH_KEYRING.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoauth: make g_conf.keyring a plain old string
Colin Patrick McCabe [Sun, 2 Jan 2011 20:19:35 +0000 (12:19 -0800)]
auth: make g_conf.keyring a plain old string

Make g_conf.keyring a plain old string rather than an array of strings.
Don't do substitution using the user's HOME variable-- this could lead
to security holes for setuid processes.

Get rid of AuthMonitor::read_keyfile because there is already a Keyring
member function, Keyring::load, that does the same thing.

qa/rbd/common.sh: we can now use cconf to figure out what the keyring
is.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agoosdmaptool: better error handling
Colin Patrick McCabe [Thu, 30 Dec 2010 23:15:39 +0000 (15:15 -0800)]
osdmaptool: better error handling

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocommon: bufferlist: handle EINTR, check close rval
Colin Patrick McCabe [Thu, 30 Dec 2010 23:04:49 +0000 (15:04 -0800)]
common: bufferlist: handle EINTR, check close rval

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>
14 years agocommon: bufferlist::read_file: return read errors
Colin Patrick McCabe [Thu, 30 Dec 2010 22:41:03 +0000 (14:41 -0800)]
common: bufferlist::read_file: return read errors

Don't ignore errors when reading a file with buffer::list.

Signed-off-by: Colin McCabe <colinm@hq.newdream.net>