]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agoosd: do not dereference ctx->op when NULL
Sage Weil [Thu, 2 Feb 2012 20:36:27 +0000 (12:36 -0800)]
osd: do not dereference ctx->op when NULL

We may not have an OpRequest.  Make the later check do the cast properly
when it is needed.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-osd-op-tracking'
Sage Weil [Thu, 2 Feb 2012 18:55:41 +0000 (10:55 -0800)]
Merge remote-tracking branch 'gh/wip-osd-op-tracking'

Reviewed-by: Sage Weil <sage@newdream.net>
13 years agocommon/Throttle: throttle in FIFO order
Jim Schutt [Wed, 1 Feb 2012 15:54:25 +0000 (08:54 -0700)]
common/Throttle: throttle in FIFO order

Under heavy write load from many clients, many reader threads will
be waiting in the policy throttler, all on a single condition variable.
When a wakeup is signalled, any of those threads may receive the
signal.  This increases the variance in the message processing
latency, and in extreme cases can significantly delay a message.

This patch causes threads to exit a throttler in the same order
they entered.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: fix osd_recover_clone_overlap
Sage Weil [Thu, 2 Feb 2012 18:31:17 +0000 (10:31 -0800)]
osd: fix osd_recover_clone_overlap

- we need to populate data_subset
- add check in calc_head_subsets() too

Fixes 2116f012.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: use obc for size in calc_head_subsets()
Sage Weil [Thu, 2 Feb 2012 18:29:39 +0000 (10:29 -0800)]
osd: use obc for size in calc_head_subsets()

No need to call stat(2) here; the caller has what we need.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: remove obsolete fs type check
Sage Weil [Thu, 2 Feb 2012 18:03:28 +0000 (10:03 -0800)]
filestore: remove obsolete fs type check

This isn't a useful check.  xfs and ext4 work too.

Fixes: #1995
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest_filejournal: fix warnings
Sage Weil [Thu, 2 Feb 2012 17:01:15 +0000 (09:01 -0800)]
test_filejournal: fix warnings

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: mark_started() osd sub ops
Greg Farnum [Thu, 2 Feb 2012 01:10:41 +0000 (17:10 -0800)]
osd: mark_started() osd sub ops

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: d'oh again! Make this real exponential, not...ever-linear.
Greg Farnum [Thu, 2 Feb 2012 00:28:35 +0000 (16:28 -0800)]
osd: d'oh again! Make this real exponential, not...ever-linear.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: OpRequest currently_* needs to look at latest, not hit.
Greg Farnum [Thu, 2 Feb 2012 00:28:18 +0000 (16:28 -0800)]
osd: OpRequest currently_* needs to look at latest, not hit.

D'oh!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge remote branch 'origin/master' into wip-osd-op-tracking
Greg Farnum [Thu, 2 Feb 2012 00:05:32 +0000 (16:05 -0800)]
Merge remote branch 'origin/master' into wip-osd-op-tracking

Conflicts:
src/osd/ReplicatedPG.h

13 years agoosd: add check_ops_in_flight()
Greg Farnum [Wed, 1 Feb 2012 21:25:37 +0000 (13:25 -0800)]
osd: add check_ops_in_flight()

By default it warns on requests that are more than 30 seconds old,
using an exponential backoff of that interval.
Also add state name retrieval to OpRequest.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: "mark" OpRequests as they move through the system.
Greg Farnum [Mon, 30 Jan 2012 22:50:28 +0000 (14:50 -0800)]
osd: "mark" OpRequests as they move through the system.

Right now these are just informational flags which can be read out. Later
they might extend to timing information, separate lists for more precise
control over latency warnings, etc.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoPG: switch op passing interface to use OpRequest
Greg Farnum [Thu, 26 Jan 2012 01:30:07 +0000 (17:30 -0800)]
PG: switch op passing interface to use OpRequest

This is all the PG/ReplicatedPG internals and the few remaining OSD callers.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: switch op passing interface to use OpRequest instead of raw Messages
Greg Farnum [Wed, 25 Jan 2012 23:51:58 +0000 (15:51 -0800)]
osd: switch op passing interface to use OpRequest instead of raw Messages

This doesn't handle the PG internals yet.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: add new OpRequest struct and an xlist to track it
Greg Farnum [Wed, 25 Jan 2012 23:48:44 +0000 (15:48 -0800)]
osd: add new OpRequest struct and an xlist to track it

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agocls_rgw: update bucket index when deleting object (with pending)
Yehuda Sadeh [Wed, 1 Feb 2012 20:55:52 +0000 (12:55 -0800)]
cls_rgw: update bucket index when deleting object (with pending)

Bug #2012. Racing delete with other operations (update or another
delete) failed to update the bucket index.

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoMerge remote branch 'gh/wip-divergent-backfill'
Sage Weil [Wed, 1 Feb 2012 18:55:45 +0000 (10:55 -0800)]
Merge remote branch 'gh/wip-divergent-backfill'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: fix assignment in PG::rewind_divergent_log()
Sage Weil [Wed, 1 Feb 2012 04:06:27 +0000 (20:06 -0800)]
osd: fix assignment in PG::rewind_divergent_log()

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote-tracking branch 'gh/wip-journal-crc'
Sage Weil [Wed, 1 Feb 2012 00:18:52 +0000 (16:18 -0800)]
Merge remote-tracking branch 'gh/wip-journal-crc'

Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agomsgr: Document recv_stamp and add a dispatch_stamp and throttle_wait.
Greg Farnum [Thu, 12 Jan 2012 20:42:21 +0000 (12:42 -0800)]
msgr: Document recv_stamp and add a dispatch_stamp and throttle_wait.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoqa: test_backfill.sh: take osd.0 down
Sage Weil [Tue, 31 Jan 2012 21:00:45 +0000 (13:00 -0800)]
qa: test_backfill.sh: take osd.0 down

Mark this down to
1- trigger the WaitActingChange vs osd down race, and
2- help trigger a divergnet log when osd.2 is blackholed+restarted during
   backfill.  e.g.,

./ceph -- tell osd.1 injectargs '--filestore-blackhole' ; sleep 10 ; ./init-ceph restart osd.1

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: restart peering if requesting acting osd goes down
Sage Weil [Tue, 31 Jan 2012 17:53:32 +0000 (09:53 -0800)]
osd: restart peering if requesting acting osd goes down

If we request an acting set, we need to restart peering if one of the
requested nodes goes down.  This prevents a deadlock where we get stuck
in WaitActingChange because we have [a,b], want [a,b,c], but c is down and
our up and acting don't actually change.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: rename recovery event NeedNewMap -> NeedActingChange
Sage Weil [Tue, 31 Jan 2012 17:40:23 +0000 (09:40 -0800)]
osd: rename recovery event NeedNewMap -> NeedActingChange

This is more precise.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: use RecoveryContext transaction, finishers on recovery completion
Sage Weil [Tue, 31 Jan 2012 15:23:10 +0000 (07:23 -0800)]
osd: use RecoveryContext transaction, finishers on recovery completion

We should use the enclosing transaction and finisher list here.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoqa: test_backfill.sh: limit pg log length so we trigger backfill
Sage Weil [Tue, 31 Jan 2012 15:16:37 +0000 (07:16 -0800)]
qa: test_backfill.sh: limit pg log length so we trigger backfill

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix divergent backfill targets
Sage Weil [Tue, 31 Jan 2012 15:25:04 +0000 (07:25 -0800)]
osd: fix divergent backfill targets

During peering, a previous backfill target may have a slightly newer
last_update than the other options, but it will not be chosen because it
is incomplete.  That caused a failed assert during activate() (#1983).

To fix, we remove the bad assert, and then fix merge_log() so that the
replica/backfill target will trim its divergent entries when it gets the
activation MLogRec.  We also fix the handling of MInfoRec, as that can
trigger the same analogous condition.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: implement filestore_blackhole hook
Sage Weil [Tue, 31 Jan 2012 01:39:23 +0000 (17:39 -0800)]
filestore: implement filestore_blackhole hook

If true, we'll drop any new transactions on the floor. Useful for
triggering failure conditions (e.g., prior to killing ceph-osd itself, to
ensure some operations don't reach the local disk).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agorgw: should remove bucket dir instead of sending intent
Yehuda Sadeh [Tue, 31 Jan 2012 01:00:37 +0000 (17:00 -0800)]
rgw: should remove bucket dir instead of sending intent

that was really useless, and also bucket cleanup was broken anyway.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agolibrados: fix a leak
Yehuda Sadeh [Tue, 31 Jan 2012 00:48:15 +0000 (16:48 -0800)]
librados: fix a leak

watch notification message was missing a ->put()

Signed-off-by: Yehuda Sadeh <yehuda.sadeh@dreamhost.com>
13 years agoosd: disable clone overlap for push/pull
Sage Weil [Mon, 30 Jan 2012 22:27:24 +0000 (14:27 -0800)]
osd: disable clone overlap for push/pull

There is a bug in the push/pull code.  Disable the recovery smarts by
default until we fix #2002.

There is currently a race (in the callers) where:
 - an adjacent clone is missing
 - we (calculate some clone overlap? and) start pulling
 - we get adjacent clone
 - we get push, calc a different overlap, and then get confused.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-warnings'
Sage Weil [Mon, 30 Jan 2012 21:42:45 +0000 (13:42 -0800)]
Merge remote branch 'gh/wip-warnings'

13 years agomon: make 'osd [out|in|down]' succeed if already whatever
Sage Weil [Mon, 30 Jan 2012 05:46:53 +0000 (21:46 -0800)]
mon: make 'osd [out|in|down]' succeed if already whatever

If we want something out and it is already out, succeed.  This makes the
client command succeed if there is a transient error and it gets resent.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoqa: encoding: silence warning
Sage Weil [Mon, 30 Jan 2012 05:05:08 +0000 (21:05 -0800)]
qa: encoding: silence warning

This is cheating, but we always use this class with int types, so it makes
this go away:

warning: test/encoding.cc:79:20: ‘*((void*)(& tu)+4).ConstructorCounter::data’ may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoqa: test/gather fix warning
Sage Weil [Mon, 30 Jan 2012 04:56:03 +0000 (20:56 -0800)]
qa: test/gather fix warning

warning: test/gather.cc:29:222: passing NULL to non-pointer argument 3 of ‘static testing::AssertionResult testing::internal::EqHelper::Compare(const char*, const char*, const T1&, T2*) [with T1 = long int, T2 = C_Gather]’ [-Wconversion-null]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoqa: test/rados-api/list fix warning
Sage Weil [Mon, 30 Jan 2012 04:54:18 +0000 (20:54 -0800)]
qa: test/rados-api/list fix warning

warning: test/rados-api/list.cc:43:156: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Wconversion-null]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest_ipaddr: reverse ASSERT_EQ order
Sage Weil [Mon, 30 Jan 2012 04:36:46 +0000 (20:36 -0800)]
test_ipaddr: reverse ASSERT_EQ order

Make these warnings go away:

warning: test/test_ipaddr.cc:217:156: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Wconversion-null]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: remove unused var
Sage Weil [Mon, 30 Jan 2012 01:26:55 +0000 (17:26 -0800)]
osd: remove unused var

warning: osd/PG.cc:1331:20: variable 'plu' set but not used [-Wunused-but-set-variable]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoadmin_socket: fix uninit warning
Sage Weil [Mon, 30 Jan 2012 01:26:14 +0000 (17:26 -0800)]
admin_socket: fix uninit warning

warning: common/admin_socket_client.cc:166:19: 'socket_fd' may be used uninitialized in this function [-Wuninitialized]

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: trim old auth states
Sage Weil [Sun, 29 Jan 2012 17:26:28 +0000 (09:26 -0800)]
mon: trim old auth states

These aren't exposed outside the monitor, so we really only keep them
around to assist in mon recovery.  Give ourselves a healthy margin over
the max join drift for that.

Fixes: #2000
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: fix rollback when current/ missing entirely
Sage Weil [Sun, 29 Jan 2012 16:48:22 +0000 (08:48 -0800)]
filestore: fix rollback when current/ missing entirely

This can happen when we are starting, rolling back, remove current/, and
then fail before we snapshot a snap_ into place.

Most of the logic was already in place for this; we tried to fix it in
cd2dedd7d190a43a6be50a7f18849fe0123c72bc but missed this piece.

Fixes: #1999
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: reset pgstats timer when we reopen monitor session
Sage Weil [Sat, 28 Jan 2012 01:32:28 +0000 (17:32 -0800)]
osd: reset pgstats timer when we reopen monitor session

Otherwise we'll reopen every second from here on out, without giving the
new session a chance to start up and do it's thing.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoclock: ignore clock_offset if cct is NULL
Sage Weil [Sat, 28 Jan 2012 19:40:08 +0000 (11:40 -0800)]
clock: ignore clock_offset if cct is NULL

This is helpful e.g. from assert.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilejournal: add corruption test to check crc checking code
Sage Weil [Thu, 26 Jan 2012 01:35:49 +0000 (17:35 -0800)]
filejournal: add corruption test to check crc checking code

Verify that the journal replay rejects a corrupted journal entry.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: assume gibberish flags imply none
Sage Weil [Thu, 26 Jan 2012 00:37:34 +0000 (16:37 -0800)]
filejournal: assume gibberish flags imply none

Old journals didn't properly initialize the flags (oops).  Assume that
any bits besides the first 2 imply no flags.

Make note that this hack needs to be removed after some time has passed,
but well before these new flags are used.  Or, such use should be
accompanied by a full header format rev and incompatibility.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: include crc in entry header/footer
Sage Weil [Thu, 26 Jan 2012 00:36:17 +0000 (16:36 -0800)]
filejournal: include crc in entry header/footer

Use the unused flags field for this.  Previously it was always 0, so this
lets us skip old entries on old journals and only worry about missing one
out of 2^32 corruptions.  New journals get a flag that strictly enforces
the crc check.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoqa: test_filejournal: test lots of small writes too
Sage Weil [Mon, 23 Jan 2012 20:03:32 +0000 (12:03 -0800)]
qa: test_filejournal: test lots of small writes too

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoqa: add test_filejournal
Sage Weil [Sat, 28 Jan 2012 19:08:52 +0000 (11:08 -0800)]
qa: add test_filejournal

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: fix header initialization
Sage Weil [Thu, 26 Jan 2012 00:12:42 +0000 (16:12 -0800)]
filejournal: fix header initialization

Make sure it's zeros to start with.  Currently flags might be gibberish!

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: clean up some errno checks
Sage Weil [Tue, 24 Jan 2012 01:00:28 +0000 (17:00 -0800)]
filejournal: clean up some errno checks

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: assert submit_entry gets >0 bytes
Sage Weil [Thu, 19 Jan 2012 00:24:52 +0000 (16:24 -0800)]
filejournal: assert submit_entry gets >0 bytes

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: initialize header before writing
Sage Weil [Thu, 19 Jan 2012 00:24:38 +0000 (16:24 -0800)]
filejournal: initialize header before writing

Avoid writing uninitialized crap.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilejournal: move zero_buf allocation
Sage Weil [Thu, 19 Jan 2012 00:21:35 +0000 (16:21 -0800)]
filejournal: move zero_buf allocation

We need header.alignment to be defined.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoclient: do not send release to down mds
Sage Weil [Sat, 28 Jan 2012 17:38:46 +0000 (09:38 -0800)]
client: do not send release to down mds

We can have a session with state where the mds is not up; don't blindly
send a message or we can get

./mds/MDSMap.h: In function 'const entity_inst_t MDSMap::get_inst(int)', in thread '0x7f092aad1910'
./mds/MDSMap.h: 465: FAILED assert(up.count(m))
 ceph version 0.35-6-g6eb8862 (commit:6eb8862e91d142451e256aaa02b34c81a4f21dea)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x70) [0x71f11a]
 2: (MDSMap::get_inst(int)+0x4b) [0x6dc191]
 3: (Client::flush_cap_releases()+0x94) [0x677e60]
 4: (Client::tick()+0x1f0) [0x690adc]
 5: (C_C_Tick::finish(int)+0x1c) [0x6f3fbe]
 6: (SafeTimer::timer_thread()+0x2c5) [0x6fbfe5]
 7: (SafeTimerThread::entry()+0x19) [0x6fe399]
 8: (Thread::_entry_func(void*)+0x20) [0x72e944]
 9: /lib/libpthread.so.0 [0x7f092dea573a]
 10: (clone()+0x6d) [0x7f092cba169d]

with a map like

$ ./ceph mds dump 85
2012-01-28 09:37:19.251946 mon <- [mds,dump,85]
2012-01-28 09:37:19.252618 mon.1 -> 'dumped mdsmap epoch 85' (0)
epoch   85
flags   0
created 2012-01-28 09:24:42.411202
modified        2012-01-28 09:28:45.093301
tableserver     0
root    0
session_timeout 60
session_autoclose       300
last_failure    0
last_failure_osd_epoch  18
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object}
max_mds 1
in      0
up      {}
failed  0
stopped
data_pools      [0]
metadata_pool   1

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Sat, 28 Jan 2012 18:04:45 +0000 (10:04 -0800)]
Merge branch 'stable'

13 years agosignal: use _exit() on SIGTERM
Sage Weil [Sat, 28 Jan 2012 17:26:46 +0000 (09:26 -0800)]
signal: use _exit() on SIGTERM

No need to call onexit handlers, static dtors, whatever.

This may help with #1996 and #1549.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest: add script for checking admin socket 'objecter_requests' output
Josh Durgin [Fri, 27 Jan 2012 19:45:26 +0000 (11:45 -0800)]
test: add script for checking admin socket 'objecter_requests' output

Just a couple internal consistency checks for now. More specific ones
would depend on workload.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoobjecter: add an admin socket command to get in-flight requests
Josh Durgin [Mon, 23 Jan 2012 21:04:14 +0000 (13:04 -0800)]
objecter: add an admin socket command to get in-flight requests

Fixes: #1881
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoadmin socket: increase debug level for successful requests
Josh Durgin [Mon, 23 Jan 2012 23:04:17 +0000 (15:04 -0800)]
admin socket: increase debug level for successful requests

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoadmin socket: add include guard
Josh Durgin [Sat, 21 Jan 2012 01:02:52 +0000 (17:02 -0800)]
admin socket: add include guard

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoCephContext: add method for retrieving admin socket
Josh Durgin [Fri, 20 Jan 2012 23:58:37 +0000 (15:58 -0800)]
CephContext: add method for retrieving admin socket

This is needed to allow higher layers in the stack to add admin socket
commands.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoMerge branch 'wip-pg-stale'
Sage Weil [Sat, 28 Jan 2012 00:40:53 +0000 (16:40 -0800)]
Merge branch 'wip-pg-stale'

13 years agomon: stale pgs -> HEALTH_WARN
Sage Weil [Fri, 27 Jan 2012 21:27:27 +0000 (13:27 -0800)]
mon: stale pgs -> HEALTH_WARN

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: mark pgs stale in pg_map if primary osd is down
Sage Weil [Fri, 27 Jan 2012 21:21:39 +0000 (13:21 -0800)]
mon: mark pgs stale in pg_map if primary osd is down

This alerts the administrator when all OSDs for a PG have failed and the
monitor doesn't receive any further updates.  Otherwise we may continue
to think a pg is active+clean when it is in fact offline.

Fixes: #1993
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: add STALE pg state bit
Sage Weil [Fri, 27 Jan 2012 21:02:28 +0000 (13:02 -0800)]
osd: add STALE pg state bit

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agov0.41 v0.41
Sage Weil [Fri, 27 Jan 2012 18:42:21 +0000 (10:42 -0800)]
v0.41

13 years agoobjector: document Objecter::init_ops()
Sage Weil [Fri, 27 Jan 2012 20:23:33 +0000 (12:23 -0800)]
objector: document Objecter::init_ops()

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoobjecter: fix out_* initialization
Sage Weil [Fri, 27 Jan 2012 20:23:23 +0000 (12:23 -0800)]
objecter: fix out_* initialization

This looks more like the real cause for #1986.  Op ctor gets a vector of
ops but out_* aren't initialized to match.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoRevert "common/Throttle: Remove unused return type on Throttle::get()"
Greg Farnum [Thu, 12 Jan 2012 19:27:55 +0000 (11:27 -0800)]
Revert "common/Throttle: Remove unused return type on Throttle::get()"

This reverts commit 4549501c9b0968ce4243e06ff7e9ef03b19de667.
We're about to use it to avoid a time lookup if possible.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoosd: remove unused PG::block_if_wrlocked declaration
Greg Farnum [Wed, 25 Jan 2012 23:58:49 +0000 (15:58 -0800)]
osd: remove unused PG::block_if_wrlocked declaration

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agofilestore: dump offending transaction on any error
Sage Weil [Fri, 27 Jan 2012 18:41:50 +0000 (10:41 -0800)]
filestore: dump offending transaction on any error

Clean this code up to explicitly whitelist what is ok so that the flow is
less annoying to follow/maintain, and so that we dump the transaction
contents on whitelisted errors.

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoobjecter: warn when OSD returns mismatched op vector
Sage Weil [Fri, 27 Jan 2012 18:40:14 +0000 (10:40 -0800)]
objecter: warn when OSD returns mismatched op vector

The osd shouldn't do this (even though we should tolerate it).

Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoobjecter: fix bounds checking on op reply demuxing
Sage Weil [Fri, 27 Jan 2012 18:39:49 +0000 (10:39 -0800)]
objecter: fix bounds checking on op reply demuxing

We can't assume that the size of out_ops (from the reply) matches the
op->out_* vectors from our request state.  In particular, the out_ops might
be shorter than what we sent the OSD if the OSD was sloppy.  Check them.

We can assume that op->ops and op->out_* all match; assert as much in
op_submit().

Fixes: #1986
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomds: remove test assert
Sage Weil [Fri, 27 Jan 2012 18:01:47 +0000 (10:01 -0800)]
mds: remove test assert

Grr!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoassert: include timestamp
Sage Weil [Fri, 27 Jan 2012 14:32:29 +0000 (06:32 -0800)]
assert: include timestamp

Also drop quotes around thread id.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: remove the unused require_current_map
Greg Farnum [Wed, 25 Jan 2012 23:33:28 +0000 (15:33 -0800)]
osd: remove the unused require_current_map

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agofilestore: fix typo
Sage Weil [Wed, 25 Jan 2012 22:07:06 +0000 (14:07 -0800)]
filestore: fix typo

Grr

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'wip-kb'
Sage Weil [Wed, 25 Jan 2012 22:03:18 +0000 (14:03 -0800)]
Merge branch 'wip-kb'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agofilestore: zero btrfs vol_args prior to ioctl
Sage Weil [Wed, 25 Jan 2012 21:52:32 +0000 (13:52 -0800)]
filestore: zero btrfs vol_args prior to ioctl

Just to be paranoid.  Nothing we haven't set *should* affect the ABI,
but...

Always do this immediately after declaration so that we catch everything.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'upstream/wip-osd-clone-obc'
Samuel Just [Wed, 25 Jan 2012 21:58:58 +0000 (13:58 -0800)]
Merge remote branch 'upstream/wip-osd-clone-obc'

13 years agomon: num_kb -> num_bytes in cluster perfcounters
Sage Weil [Wed, 25 Jan 2012 20:38:59 +0000 (12:38 -0800)]
mon: num_kb -> num_bytes in cluster perfcounters

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: remove num_kb from object_stat_sum_t stats
Sage Weil [Wed, 25 Jan 2012 20:38:06 +0000 (12:38 -0800)]
osd: remove num_kb from object_stat_sum_t stats

This is redundant--we can just use num_bytes.  If we're worried about the
per-object overhead or rounding, we can factor in some overhead based on
num_objects.

And, the kb accounting has a bug (#1988).

Avoid changing the encoding at all for now.  Next time the encoding changes
we'll drop the old field.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: improve object context debug output
Sage Weil [Wed, 25 Jan 2012 17:56:58 +0000 (09:56 -0800)]
osd: improve object context debug output

Include pointer.  This may help with #1979.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: track obc for clone from log replay
Sage Weil [Wed, 25 Jan 2012 06:03:51 +0000 (22:03 -0800)]
osd: track obc for clone from log replay

We need to keep an in-memory obc to track the state of the in-flight io
to disk.  This is analogous to when an object is pushed + written, and we
can share the same completion function.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: set object_info_t::oid properly when recovering clones
Sage Weil [Wed, 25 Jan 2012 05:34:27 +0000 (21:34 -0800)]
osd: set object_info_t::oid properly when recovering clones

I saw a case (#1973) where the clone had the oid set to the head.  That is
clearly wrong.  Not sure what damage this caused.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote branch 'gh/wip-filestore-errors'
Sage Weil [Wed, 25 Jan 2012 05:19:44 +0000 (21:19 -0800)]
Merge remote branch 'gh/wip-filestore-errors'

13 years agopackage *.py* files
Alexandre Oliva [Tue, 17 Jan 2012 19:22:17 +0000 (17:22 -0200)]
package *.py* files

Some post-install rpmbuild defaults byte-compile all packaged python
files, so don't bother removing the .pyc files, and package .py* to
get both .pyo and .pyc.  It wastes a tiny little bit of space, but it
makes the spec file portable across a wider range of rpm and python
configurations.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicam.br>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrbd: don't infinite loop when header is too large
Josh Durgin [Wed, 25 Jan 2012 00:52:27 +0000 (16:52 -0800)]
librbd: don't infinite loop when header is too large

Since snapshots are currently stored at the end of the header, having
many snapshots made the header larger than the read size, resulting in
an infinite loop when the offset was not changed.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: data_subset may be empty during sub_op_push
Samuel Just [Tue, 24 Jan 2012 22:57:07 +0000 (14:57 -0800)]
ReplicatedPG: data_subset may be empty during sub_op_push

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agofilestore: fix non-::-prefixed close
Josh Durgin [Tue, 24 Jan 2012 21:23:21 +0000 (13:23 -0800)]
filestore: fix non-::-prefixed close

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agofilestore: add debugging to each error case in lfn_open
Josh Durgin [Tue, 24 Jan 2012 21:20:20 +0000 (13:20 -0800)]
filestore: add debugging to each error case in lfn_open

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agofilestore: TEMP_FAILURE_RETRY on ::close(2)
Sage Weil [Tue, 24 Jan 2012 21:16:30 +0000 (13:16 -0800)]
filestore: TEMP_FAILURE_RETRY on ::close(2)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: return -errno from lfn_open
Sage Weil [Tue, 24 Jan 2012 17:31:39 +0000 (09:31 -0800)]
filestore: return -errno from lfn_open

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: audit + clean up error checks
Sage Weil [Tue, 24 Jan 2012 17:31:33 +0000 (09:31 -0800)]
filestore: audit + clean up error checks

- use temp var for errno
- in general return -errno from helpers

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge commit '9dc7b9233b985bf859751fc89a5b02253e829836'
Sage Weil [Mon, 23 Jan 2012 21:50:19 +0000 (13:50 -0800)]
Merge commit '9dc7b9233b985bf859751fc89a5b02253e829836'

Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: fix warning
Sage Weil [Mon, 23 Jan 2012 20:48:46 +0000 (12:48 -0800)]
rgw: fix warning

rgw/rgw_rest.cc:258: warning: comparison between signed and unsigned integer expressions

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph: bail out on first failing command
Sage Weil [Mon, 23 Jan 2012 20:43:19 +0000 (12:43 -0800)]
ceph: bail out on first failing command

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph: don't write output on error
Sage Weil [Mon, 23 Jan 2012 20:43:03 +0000 (12:43 -0800)]
ceph: don't write output on error

Accumulate all output, and write it at the end.  This way we can avoid
writing it if any of the commands fail.

Fixes: #1954
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: ignore MInfoRec, MNotifyRec in WaitActingChange
Sage Weil [Mon, 23 Jan 2012 18:21:04 +0000 (10:21 -0800)]
osd: ignore MInfoRec, MNotifyRec in WaitActingChange

We should ignore logs, infos, and notifies while we are waiting for the
map to change.  Peering has reached a dead-end (we need acting to change)
and we will redo our work when that happens.  That includes the replicas
resending notifies.

Fixes: #1958
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: fix warning in 32bit arch
Yehuda Sadeh [Mon, 23 Jan 2012 17:50:56 +0000 (09:50 -0800)]
rgw: fix warning in 32bit arch