]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agomon: parse 0 values properly
Sage Weil [Mon, 24 Oct 2011 18:41:13 +0000 (11:41 -0700)]
mon: parse 0 values properly

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: allow adjustment of per-pool crash_replay_interval
Sage Weil [Mon, 24 Oct 2011 18:27:20 +0000 (11:27 -0700)]
mon: allow adjustment of per-pool crash_replay_interval

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: need to print pool id for output to be useful
Sage Weil [Mon, 24 Oct 2011 04:05:56 +0000 (21:05 -0700)]
mon: need to print pool id for output to be useful

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make osd dump slightly more concise
Sage Weil [Mon, 24 Oct 2011 02:01:54 +0000 (19:01 -0700)]
osd: make osd dump slightly more concise

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pg_pool_t: set crash_replay_interval on data pool when decoding old
Sage Weil [Sun, 23 Oct 2011 23:16:03 +0000 (16:16 -0700)]
osd: pg_pool_t: set crash_replay_interval on data pool when decoding old

We want to preserve the crash_replay_interval on old clusters being
upgraded.  Kludge this by setting it to 60 (the old default) if the
crush_ruleset == 0 and owner == 0, which is normally true for just the
data pool.

This may catch other pools they created by hand, but it's still better
than having the replay interval for all pools when it is not needed.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make osd replay interval a per-pool property
Sage Weil [Sun, 23 Oct 2011 22:32:58 +0000 (15:32 -0700)]
osd: make osd replay interval a per-pool property

Change the config value to only control the interval set when the data
pool is first created (presumably during mkfs).  Start replay interval
based on the pool property.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/master' into n
Sage Weil [Sun, 23 Oct 2011 23:26:35 +0000 (16:26 -0700)]
Merge remote-tracking branch 'gh/master' into n

Conflicts:
src/osd/OSDMap.h

13 years agoosd: pg_pool_t: introduce flags, crash_replay_interval
Sage Weil [Thu, 20 Oct 2011 04:54:40 +0000 (21:54 -0700)]
osd: pg_pool_t: introduce flags, crash_replay_interval

Introduce a per-pool crash_replay_interval so we can control whether
the OSD waits for replayed ACKed but not COMMITted requests for this
PG.  For the metadata and rbd pools, for instance, the replay window
is useless.

Introduce a generic flags field, while we're modifying the encoding.

No new feature bit; piggyback on POOL3.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pg_pool_t: normalize encoding
Sage Weil [Thu, 20 Oct 2011 04:47:50 +0000 (21:47 -0700)]
osd: pg_pool_t: normalize encoding

Normalize encoding to be less awkward.  Use a FEATURE bit to indicate
whether the new encoding is supported, and encode appropriately.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoscratchtool[pp]: fix rados_conf_set/get test of log_to_stderr
Sage Weil [Sun, 23 Oct 2011 03:44:05 +0000 (20:44 -0700)]
scratchtool[pp]: fix rados_conf_set/get test of log_to_stderr

Fix this warning

warning: scratchtool.c:142: comparison with string literal results in unspecified behavior

and flips the logic.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix PG::Log::copy_after wrt backlogs (again)
Sage Weil [Sun, 23 Oct 2011 03:41:03 +0000 (20:41 -0700)]
osd: fix PG::Log::copy_after wrt backlogs (again)

Commit 68fe748fc2d703623050e8f2a448a0fd31ca8a0f fixed half of this problem,
but set this->tail incorrectly.  If we read olog.tail, the entry we are
on is a backlog entry, and probably not other.tail.  Do not reset tail in
this case because we already set it to other.tail above.

OTOH if we hit v, we do want to set this->tail to the current record as it
is the one that precedes the first log entry.

This fixes an incorrect log.tail send to other nodes, which eventually
propagates as a log bound mismatch.  For example,

2011-10-22 17:33:18.654693 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (1627'28,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 mlcod 0'0 !hml peering] merge_log log(578'5,1627'28] from osd.0 into log(1627'28,1627'28]
2011-10-22 17:33:18.654706 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (1627'28,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 mlcod 0'0 !hml peering] merge_log extending tail to 578'5
2011-10-22 17:33:18.654720 7f8a2fefe700 osd.4 2788 pg[1.1f( v 1627'28 (578'5,1627'28] n=2 ec=1 les/c 2763/2782 2788/2788/2788) [4,0] r=0 (log bound mismatch, empty) mlcod 0'0 !hml peering] merge_log result log(578'5,1627'28] missing(0) changed=1

This might fix #1526.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoradosgw: drop useless/broken set_val daemonize
Sage Weil [Fri, 21 Oct 2011 23:36:08 +0000 (16:36 -0700)]
radosgw: drop useless/broken set_val daemonize

Not sure what the intent was here anyway... but it is broken (the func
takes a string, not a bool).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoconfig: separate --log-to-stderr and --err-to-stderr
Sage Weil [Fri, 21 Oct 2011 23:35:36 +0000 (16:35 -0700)]
config: separate --log-to-stderr and --err-to-stderr

Instead of having magic values (1 == errors only to stderr, 2 =
everything), have two booleans.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: trim past intervals when we complete recovery.
Sage Weil [Fri, 21 Oct 2011 22:24:18 +0000 (15:24 -0700)]
osd: trim past intervals when we complete recovery.

We weren't trimming at all, which meant these would just accumulate
indefinitely.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: move may_need_replay calculation out of PriorSet
Sage Weil [Fri, 21 Oct 2011 22:23:51 +0000 (15:23 -0700)]
osd: move may_need_replay calculation out of PriorSet

Although they both depend on past intervals, they are unrelated.  Factor
out the may_need_replay calculation from PriorSet.  Instead, do it right
before we activate when we need to decide whether to do a replay window
or not.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix last_clean interval bounds
Sage Weil [Fri, 21 Oct 2011 22:02:34 +0000 (15:02 -0700)]
osd: fix last_clean interval bounds

It was _first and _last, inclusive, but the epochs are really points in
time, so _last should have been non-inclusive.  Rename the variables
_begin and _end, print them as proper intervals [begin,end), and fix the
PriorSet calculation to interpret the end bound properly.

Also break that check out into separate cases so that it is clear what is
really happening.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix last_clean_interval calculation
Sage Weil [Fri, 21 Oct 2011 21:45:59 +0000 (14:45 -0700)]
mon: fix last_clean_interval calculation

This up_rom == first check is old and wrong.  It may have been correct at
the time, when the OSD had a defined shutdown procedure, but that is not
currently the case.  And if/when it is, the OSD can simply provide an
accurate clean_thru value.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: eliminate CRASHED state
Sage Weil [Fri, 21 Oct 2011 21:44:56 +0000 (14:44 -0700)]
osd: eliminate CRASHED state

This was an intermediate state that indicated that replay would be needed.
It was poorly named, and not very useful.  Instead, just set the REPLAY
bit if we need replay, and then do it.  No need for a separate CRASHED.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoReplicatedPG: Include pg version in MOSDOpReply on error
Samuel Just [Fri, 21 Oct 2011 22:14:43 +0000 (15:14 -0700)]
ReplicatedPG: Include pg version in MOSDOpReply on error

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: simplify finalizing scrub on replica
Sage Weil [Fri, 21 Oct 2011 16:56:19 +0000 (09:56 -0700)]
osd: simplify finalizing scrub on replica

We can simply call osr.flush() (with pg lock held) to ensure that prior
writes are visible and scrubbable.  This avoids the funky handoff to
op_applied() (which didn't seem to work for me just now, although I didn't
fully debug.

In any case, this is much simpler.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PriorSet: acting/up membership implies still alive
Sage Weil [Fri, 21 Oct 2011 16:14:15 +0000 (09:14 -0700)]
osd: PriorSet: acting/up membership implies still alive

If the osd is in the acting or up sets, we can assume they are still alive,
even though we don't know that for sure, because if they are not, we will
rebuild PriorSet.

Note that we have a dependency here on up_thru that we could/should rebuild
PriorSet based on, IF we think it might change the value of the CRASHED
flag and IF we care enough.  Right now we don't.  Marking CRASHED when we
don't need to is conservative, and not dangerous.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/master' into wip-prior
Sage Weil [Fri, 21 Oct 2011 15:58:43 +0000 (08:58 -0700)]
Merge remote branch 'gh/master' into wip-prior

Conflicts:
src/osd/PG.cc

13 years agoOSDMonitor: reweight towards average utilization
Josh Durgin [Fri, 21 Oct 2011 00:13:21 +0000 (17:13 -0700)]
OSDMonitor: reweight towards average utilization

The existing reweight-by-utilization calculation did not take into
account the current weight of an OSD, and depended in part on the
threshold given by the user. Also send the user both the old and new
weights.

Fixes: #1636
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: PG::PriorSet: make debug_pg arg const
Sage Weil [Thu, 20 Oct 2011 22:56:15 +0000 (15:56 -0700)]
osd: PG::PriorSet: make debug_pg arg const

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet -> PriorSet
Sage Weil [Thu, 20 Oct 2011 22:51:10 +0000 (15:51 -0700)]
osd: PgPriorSet -> PriorSet

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: rename prior_set_affected -> affected_by_map
Sage Weil [Thu, 20 Oct 2011 22:50:33 +0000 (15:50 -0700)]
osd: PgPriorSet: rename prior_set_affected -> affected_by_map

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: remove obsolete comment
Sage Weil [Thu, 20 Oct 2011 22:47:54 +0000 (15:47 -0700)]
osd: PgPriorSet: remove obsolete comment

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: move prior_set_affected into PgPriorSet
Sage Weil [Thu, 20 Oct 2011 22:46:43 +0000 (15:46 -0700)]
osd: PgPriorSet: move prior_set_affected into PgPriorSet

This is really where it belongs.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: kill whoami; make PG arg strictly optional
Sage Weil [Thu, 20 Oct 2011 22:46:11 +0000 (15:46 -0700)]
osd: PgPriorSet: kill whoami; make PG arg strictly optional

It is only used for the debug output prefix.  Make it so we can leave it
out entirely (e.g. for unit tests).

We don't want to, say, pass in the string prefix itself, or else we are
stuck with generating that string even on low debug levels where it won't
be used.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'stable'
Sage Weil [Thu, 20 Oct 2011 21:12:40 +0000 (14:12 -0700)]
Merge branch 'stable'

13 years agoosd: fix requeue_ops
Sage Weil [Thu, 20 Oct 2011 21:11:20 +0000 (14:11 -0700)]
osd: fix requeue_ops

The ls argument passed to requeue_ops() is a reference, and one of the
methods we call (say, _handle_op) might want to requeue the message on the
same list we were passed, leading to an infinite loop.

Set ls contents aside to avoid that.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoperfcounters: remove dout
Sage Weil [Thu, 20 Oct 2011 20:59:12 +0000 (13:59 -0700)]
perfcounters: remove dout

We can't use this because we're part of libglobal and there is no
g_ceph_context.  And i'm too lazy to use cct.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoperfcounters: fix unit test
Sage Weil [Thu, 20 Oct 2011 20:58:14 +0000 (13:58 -0700)]
perfcounters: fix unit test

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'gh/wip-unfound'
Sage Weil [Thu, 20 Oct 2011 20:48:44 +0000 (13:48 -0700)]
Merge remote branch 'gh/wip-unfound'

13 years agofilestore: measure commit interval, latency, journal full count
Sage Weil [Thu, 20 Oct 2011 20:16:28 +0000 (13:16 -0700)]
filestore: measure commit interval, latency, journal full count

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: clean up perfcounter names
Sage Weil [Thu, 20 Oct 2011 19:45:34 +0000 (12:45 -0700)]
osd: clean up perfcounter names

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: simplify, clean up perfcounters
Sage Weil [Thu, 20 Oct 2011 19:43:20 +0000 (12:43 -0700)]
filestore: simplify, clean up perfcounters

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: simplify perfcounter lifecycle
Sage Weil [Thu, 20 Oct 2011 19:00:28 +0000 (12:00 -0700)]
filestore: simplify perfcounter lifecycle

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoperfcounters: fix addition/removal
Sage Weil [Thu, 20 Oct 2011 18:34:23 +0000 (11:34 -0700)]
perfcounters: fix addition/removal

We are not responsible for deleting removed perfcounters.

Add debugging.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: fix perfcounter definition
Sage Weil [Thu, 20 Oct 2011 18:33:38 +0000 (11:33 -0700)]
filestore: fix perfcounter definition

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: fix logger start
Sage Weil [Thu, 20 Oct 2011 17:59:03 +0000 (10:59 -0700)]
filestore: fix logger start

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoperfcounters: use simple names
Sage Weil [Thu, 20 Oct 2011 16:19:45 +0000 (09:19 -0700)]
perfcounters: use simple names

We don't need to uniquely identify ourselves in the global namespace with
the PerfCounter name.. only in the current process.  Collectd will handle
the per-daemon naming part.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoperfcounters: clean up interface a bit
Sage Weil [Thu, 20 Oct 2011 16:17:17 +0000 (09:17 -0700)]
perfcounters: clean up interface a bit

No logger_ prefix necessary.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoencoding: add optional features
Sage Weil [Thu, 20 Oct 2011 04:45:59 +0000 (21:45 -0700)]
encoding: add optional features

Update encode macros to allow a feature bitmask to be passed through
to a classes encode() method.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoassert: no 0x before thread id
Sage Weil [Sun, 16 Oct 2011 17:36:31 +0000 (10:36 -0700)]
assert: no 0x before thread id

There's no 0x prefix in the log lines either.  This makes it easier to
copy/paste word and search.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosdmap: uninline big stuff
Sage Weil [Thu, 20 Oct 2011 03:48:34 +0000 (20:48 -0700)]
osdmap: uninline big stuff

Signed-off-by: Sage Weil <sage@newdream.net>
13 years ago.gitignore: add test_filestore_idempotent
Josh Durgin [Wed, 19 Oct 2011 22:35:56 +0000 (15:35 -0700)]
.gitignore: add test_filestore_idempotent

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agotest_filestore_idempotent: initialize var
Josh Durgin [Wed, 19 Oct 2011 22:35:28 +0000 (15:35 -0700)]
test_filestore_idempotent: initialize var

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Wed, 19 Oct 2011 16:14:24 +0000 (09:14 -0700)]
Merge branch 'stable'

Conflicts:
src/mon/OSDMonitor.cc
src/osd/OSD.cc

13 years agoosdmap: make encoding based on features
Sage Weil [Wed, 19 Oct 2011 05:33:03 +0000 (22:33 -0700)]
osdmap: make encoding based on features

Instead of relying on the caller to decide whether encode_old_client()
is appropriate, pass in the feature set and encode based on that.

We leave the default at -1 to avoid changing a bazillion callers.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: normalize encoding of pg_pool_t
Sage Weil [Tue, 18 Oct 2011 16:01:00 +0000 (09:01 -0700)]
osd: normalize encoding of pg_pool_t

Instead of using a cumbersom C struct, move members into pg_pool_t and
use normal encode/decode methods.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agocrush: clean up encoder/decoder
Sage Weil [Wed, 19 Oct 2011 04:42:25 +0000 (21:42 -0700)]
crush: clean up encoder/decoder

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agouse WRITE_CLASS_ENCOER macro when possible
Sage Weil [Wed, 19 Oct 2011 04:26:46 +0000 (21:26 -0700)]
use WRITE_CLASS_ENCOER macro when possible

13 years agoencoding: WRITE_CLASS_ENCODER_MEMBER -> WRITE_CLASS_MEMBER_ENCODER
Sage Weil [Wed, 19 Oct 2011 05:25:06 +0000 (22:25 -0700)]
encoding: WRITE_CLASS_ENCODER_MEMBER -> WRITE_CLASS_MEMBER_ENCODER

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agotest_filestore_idempotent: simple tool to generate a worklaod of non-idempotent opera...
Sage Weil [Tue, 18 Oct 2011 21:12:06 +0000 (14:12 -0700)]
test_filestore_idempotent: simple tool to generate a worklaod of non-idempotent operations

Generate a workload of operations that are non-idempotent.  These are:

 transaction {
   clone A -> A.($n-1)
   write $n to A
 }
 $n++
 loop!

If we apply any transaction to the file system more than once, we will
find that the A.$n object does not contain $n, but instead contains
some larger value.

First run in 'write' mode to generate a workload and fake a crash.
Then run in 'verify' mode to see if the result was bad.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: tolerate EEXIST on mkcoll when not-btrfs
Sage Weil [Tue, 18 Oct 2011 21:13:11 +0000 (14:13 -0700)]
filestore: tolerate EEXIST on mkcoll when not-btrfs

For non-btrfs file systems we should tolerate EEXIST because we may
replay the event more than once.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomds: handle xattrs on inode creation
Sage Weil [Tue, 18 Oct 2011 20:38:21 +0000 (13:38 -0700)]
mds: handle xattrs on inode creation

Allow mknod, mkdir, symlink, create to provide xattrs for the new
inode.  This will be used by the kclient to set ACLs on new inodes
based on the parent directory.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoradosgw-admin: fix conflict with KeyType in libnss
Sage Weil [Tue, 18 Oct 2011 23:04:22 +0000 (16:04 -0700)]
radosgw-admin: fix conflict with KeyType in libnss

rgw/rgw_admin.cc:459:6: error: using typedef-name 'KeyType' after 'enum'
/usr/include/nss3/keythi.h:69:3: error: 'KeyType' has a previous declaration here

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: cur -> probe
Sage Weil [Tue, 18 Oct 2011 18:42:16 +0000 (11:42 -0700)]
osd: PgPriorSet: cur -> probe

Rename cur to probe, the set of OSDs we need to probe in order to
successfully peer.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: restructure lost checks for prior set
Sage Weil [Tue, 18 Oct 2011 18:40:06 +0000 (11:40 -0700)]
osd: PgPriorSet: restructure lost checks for prior set

When we add down osds to the cur set, we block peering because there
are OSDs that may have data we need and they are not currently up.
When that happens, marking those OSDs as lost may allow peering to
proceed.

Keep an explicit map blocked_by for exactly that set of OSDs (a subset
of cur), and compare lost_by values in prior_set_affected() to that.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: workqueue suicide timeout is infinity
Yehuda Sadeh [Tue, 18 Oct 2011 18:01:43 +0000 (11:01 -0700)]
rgw: workqueue suicide timeout is infinity

13 years agoosd: PgPriorSet: simplify (and change) CRASHED logic
Sage Weil [Tue, 18 Oct 2011 01:59:10 +0000 (18:59 -0700)]
osd: PgPriorSet: simplify (and change) CRASHED logic

Any single OSD from a given interval surviving is sufficient to ensure
that an ACKed write during that interval was committed to disk.

Currently, at least.

In any case, update the prior set calculation to reflect that.  Also
make the survival conditional a bit smarter, to include both last_clean
interval (from the OSDs previous interval of up-ness) as well as the
current interval [up_from..up_thru).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: update comment terms a bit
Sage Weil [Tue, 18 Oct 2011 01:57:16 +0000 (18:57 -0700)]
osd: PgPriorSet: update comment terms a bit

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: do not short-cut up_thru update for new PGs
Sage Weil [Tue, 18 Oct 2011 00:51:53 +0000 (17:51 -0700)]
osd: do not short-cut up_thru update for new PGs

Commit e731885d2550ee985bf875ab5bb5faf28f1693eb made it possible for
a new PG to go active without forcing the OSDs up_thru to update.
This was motivated by the desire for PG creation by radosgw to go
faster.  Radosgw no longer creates a pool per bucket, so this is not
useful there, and it is unclear what other application (that is not
abusing rados pools) would need it.

Since it complicates the prior set calculation for dubious reasons,
let's revert it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: remove unused PG member
Sage Weil [Tue, 18 Oct 2011 00:43:06 +0000 (17:43 -0700)]
osd: PgPriorSet: remove unused PG member

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: clean up comments a bit
Sage Weil [Tue, 18 Oct 2011 00:42:48 +0000 (17:42 -0700)]
osd: PgPriorSet: clean up comments a bit

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: clean up per-interval var names
Sage Weil [Tue, 18 Oct 2011 00:39:47 +0000 (17:39 -0700)]
osd: PgPriorSet: clean up per-interval var names

We don't actually use any_lost_now, but it makes the logic easier
to understand to have it there.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: revert start_since_joining check
Sage Weil [Tue, 18 Oct 2011 00:44:32 +0000 (17:44 -0700)]
osd: PgPriorSet: revert start_since_joining check

Commit 5b78f5db8c200edcc949033e1badae70fecd2e08 added a check to
prevent some sort of badness when osds were marked lost, but I can't
figure out what it was.  Remove the check for now until we can
reproduce/observe the badness in practice, and then write a test and
better motivated/docomented fix.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: do not include UP osds in prior.cur
Sage Weil [Tue, 18 Oct 2011 00:02:23 +0000 (17:02 -0700)]
osd: PgPriorSet: do not include UP osds in prior.cur

The up osds are not (directly) relevant since they are not necessarily
members of the PG.  We only care about acting OSDs, which may have
committed writes to the PG during this past interval.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: remove up_thru crap
Sage Weil [Mon, 17 Oct 2011 23:54:03 +0000 (16:54 -0700)]
osd: PgPriorSet: remove up_thru crap

This was added way back in 1cf9bebc8e5063f5f311d33e7735bcc9286e98ce,
but as far as I can tell it didn't make any sense then either.

The issue: we redo the prior set calculation if the up_thru for these
OSDs changes in the current map, but the prior_set result does not
depend on the current map's up_thru values in any way; it only depends
on the up_thru in the last epoch of each past interval, and that is
fixed in the past.

Remove the cruft.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: PgPriorSet: any_survived -> any_is_alive_now
Sage Weil [Mon, 17 Oct 2011 23:48:55 +0000 (16:48 -0700)]
osd: PgPriorSet: any_survived -> any_is_alive_now

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodoc: Change diagram to have radosgw closer to direct rados access.
Tommi Virtanen [Mon, 17 Oct 2011 23:13:14 +0000 (16:13 -0700)]
doc: Change diagram to have radosgw closer to direct rados access.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
13 years agostreamtest: do mkfs
Sage Weil [Mon, 17 Oct 2011 21:13:40 +0000 (14:13 -0700)]
streamtest: do mkfs

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agostreamtest: print to stdout
Sage Weil [Mon, 17 Oct 2011 21:12:48 +0000 (14:12 -0700)]
streamtest: print to stdout

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomkcephfs: copy ceph.conf to /etc/ceph/ceph.conf (when -a)
Sage Weil [Mon, 17 Oct 2011 17:49:46 +0000 (10:49 -0700)]
mkcephfs: copy ceph.conf to /etc/ceph/ceph.conf (when -a)

You can disable this with --no-copy-conf.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec: don't chkconfig
Sage Weil [Mon, 17 Oct 2011 15:51:47 +0000 (08:51 -0700)]
ceph.spec: don't chkconfig

This was fighting with suse insserv.  Still needs some cleanup.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec: work around build.opensuse.org
Sage Weil [Mon, 17 Oct 2011 15:50:54 +0000 (08:50 -0700)]
ceph.spec: work around build.opensuse.org

The redhat-rpm-config isn't installed on build.opensuse.org, which means
the processor is set to i386 instead of something less ancient.  This
breaks compilation on 32-bit x86.

Kludge around it here.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec: capitalize first letter to make rpmlint happy
Sage Weil [Mon, 17 Oct 2011 15:49:04 +0000 (08:49 -0700)]
ceph.spec: capitalize first letter to make rpmlint happy

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agov0.37 v0.37
Sage Weil [Mon, 17 Oct 2011 15:35:57 +0000 (08:35 -0700)]
v0.37

13 years agoosd: fix assemble_backlog
Sage Weil [Sun, 16 Oct 2011 23:07:12 +0000 (16:07 -0700)]
osd: fix assemble_backlog

This was written assuming that le->prior_version wouldn't be the version
that we have locally on disk.  Not always true!

If it is the same, then we can just keep the entry (and clear reqid).  If
it is different, keep the behavior we had (re-add, erase current).

FWIW the last time this was touched was 916b1998.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix add_next_event Missing::item::have
Sage Weil [Mon, 17 Oct 2011 03:37:28 +0000 (20:37 -0700)]
osd: fix add_next_event Missing::item::have

The missing set should be accurate up to the current point in the log.  The
log_tail has no bearing on that, nor does last_update, since we're
processing new events in forward order, and updating missing as we go.

Drops the now unused info argument... :/

This more or less reverts b418896d.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph: don't crash when sending message to !up osd
Sage Weil [Sat, 15 Oct 2011 05:56:06 +0000 (22:56 -0700)]
ceph: don't crash when sending message to !up osd

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pull old version to revert to
Sage Weil [Thu, 13 Oct 2011 23:28:53 +0000 (16:28 -0700)]
osd: pull old version to revert to

If we are the primary, and are doing a LOST_REVERT, pull the old version
of the object and update the version when we get it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: implement lost_revert
Sage Weil [Thu, 13 Oct 2011 20:03:09 +0000 (13:03 -0700)]
osd: implement lost_revert

Roll back to the last available version of an object.  If there is no
available version, delete it.

Leave the door open for other approaches later.

Currently this only works if the prior version is on the primary.  If it is
on another node, we don't pull it yet.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: simplify share_pg_log
Sage Weil [Thu, 13 Oct 2011 19:57:54 +0000 (12:57 -0700)]
osd: simplify share_pg_log

Use Log::copy_after().  Drop the useless argument.  Strip out the broken
LOST logic.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix up PG::Missing methods a bit
Sage Weil [Thu, 13 Oct 2011 19:56:28 +0000 (12:56 -0700)]
osd: fix up PG::Missing methods a bit

Pass in iterators when possible.  Stack methods instead of duplicating
functionality.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: factor out recover_primary_got() helper
Sage Weil [Thu, 13 Oct 2011 19:53:59 +0000 (12:53 -0700)]
osd: factor out recover_primary_got() helper

This handles the missing set and lsat_complete adjustment when we recover
an object on the primary.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: make C_OSD_CommittedPushedObject::op optional
Sage Weil [Thu, 13 Oct 2011 19:52:13 +0000 (12:52 -0700)]
osd: make C_OSD_CommittedPushedObject::op optional

This lets us reuse this helper for commiting recovery ops that aren't a
result of a push.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: pass version explicitly to pull
Sage Weil [Thu, 13 Oct 2011 19:51:08 +0000 (12:51 -0700)]
osd: pass version explicitly to pull

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix share_pg_log()
Sage Weil [Thu, 13 Oct 2011 19:48:41 +0000 (12:48 -0700)]
osd: fix share_pg_log()

We need to handle a log message in the ReplicaActive state.  And set the
epoch properly when we send it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomessages/MOSDPG*: clean up output a bit
Sage Weil [Thu, 13 Oct 2011 19:42:47 +0000 (12:42 -0700)]
messages/MOSDPG*: clean up output a bit

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove superfluous write_info calls
Sage Weil [Thu, 13 Oct 2011 18:51:41 +0000 (11:51 -0700)]
osd: remove superfluous write_info calls

- merge_log() will write_info (and log) as needed
- Activate() will do the same

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: all_unfound_are_queried_or_lost
Sage Weil [Thu, 13 Oct 2011 19:47:57 +0000 (12:47 -0700)]
osd: all_unfound_are_queried_or_lost

The check to make isn't whether all locations are lost, but whether all
locations are either lost or have been queried and don't have the object(s)
we want.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: adjust LOST log entry types; simplify log entry type strings
Sage Weil [Wed, 12 Oct 2011 16:20:54 +0000 (09:20 -0700)]
osd: adjust LOST log entry types; simplify log entry type strings

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix up mark_all_unfound_lost so that it actually works
Sage Weil [Thu, 6 Oct 2011 04:30:47 +0000 (21:30 -0700)]
osd: fix up mark_all_unfound_lost so that it actually works

Well, it works given our weak definition of LOST.

- use ObjectContexts properly
- move into ReplicatedPG
- no need for _as_ in name

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: implement 'flush_pg_stats' command
Sage Weil [Sat, 15 Oct 2011 03:20:11 +0000 (20:20 -0700)]
osd: implement 'flush_pg_stats' command

This flushes the current pg stats to the monitor, and blocks until the
monitor commits it.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: process commands in a workqueue
Sage Weil [Sat, 15 Oct 2011 03:19:38 +0000 (20:19 -0700)]
osd: process commands in a workqueue

This lets us do commands that can potentially block.  For example:

 - flush pg stats to osd
 - request (and wait for) latest osdmap

Currently the threadpool only has 1 thread.  i.e., one concurrent command.
That should be fine, methinks.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: feed MPGStats tids back through the MPGStatsAck
Sage Weil [Fri, 14 Oct 2011 20:55:57 +0000 (13:55 -0700)]
mon: feed MPGStats tids back through the MPGStatsAck

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove some pg stats debug cruft
Sage Weil [Fri, 14 Oct 2011 20:44:43 +0000 (13:44 -0700)]
osd: remove some pg stats debug cruft

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: handle (and reply to) direct MCommands
Sage Weil [Wed, 12 Oct 2011 17:44:19 +0000 (10:44 -0700)]
osd: handle (and reply to) direct MCommands

Signed-off-by: Sage Weil <sage@newdream.net>