Sage Weil [Wed, 1 Jul 2009 22:43:46 +0000 (15:43 -0700)]
uclient: fix kickback, reply handler logic
We kickback cond test used to look at the request map, but since the
reply handler removes that, it was never true. Instead, clear the
dispatch_cond pointer.
Also fix up the reply handler logic. Any reply implies unsafe. If it
is the first, signal the calling thread.
Sage Weil [Wed, 1 Jul 2009 18:41:38 +0000 (11:41 -0700)]
osd: make write mode per-PG
We can't do it per-object because the access mode determines the order
we append to the log, and that has to be sequential. It has to be per-PG,
unless a whole ton of other stuff is reworked.
This lets us capture the best access mode at least on a per-pool basis,
instead of imposing a global default.
Sage Weil [Fri, 26 Jun 2009 17:13:07 +0000 (10:13 -0700)]
buffer: throw exceptions instead of always asserting.
We still assert when the user is doing something wrong. We throw
asserts for failed memory allocs, and for buffer overruns. I
think that'll align with usage... esp the encoding/decoding.
Greg Farnum [Thu, 25 Jun 2009 19:05:15 +0000 (12:05 -0700)]
messages/MClass[Ack]: Roll back some unification.
version_t last and PaxosServiceMessage::version shouldn't
be the same in these messages. Remove that and add a new
constructor that does set the version (but it's unneeded).
Sage Weil [Thu, 25 Jun 2009 20:01:16 +0000 (13:01 -0700)]
osd: update primary's notion of peer last_update on activate
We are pushing the peer the log to bring it up to date, so
update our peer_info[peer].last_update to match. Otherwise,
we get confused if we get, say, stray content and peer() is
called later, and we have out of date peer stats.
Sage Weil [Thu, 25 Jun 2009 19:45:35 +0000 (12:45 -0700)]
osd: force RMW ordering globally
We can't mix RMW and DELAYED in the same PG without screwing
up the ordering of writes, the pg log, and so forth.
So force RMW throughout. This won't affect the mds log
appends because the client is constant. It will slow down
concurrent writes to the same object by multiple clients, but
we don't have many (any?) of those yet.
Sage Weil [Thu, 25 Jun 2009 03:43:01 +0000 (20:43 -0700)]
osd: store snapset in _snapdir object if head dne
If the _head doesn't logically exist, we can't keep it around just for
the SnapSet or else an 'ls' will have to stat in order to tell if the
head object logically exists and should be included. That's no good,
so:
- put snapset in SS_ATTR on head if it exists
- otherwise, put it SS_ATTR on a _snapdir object
Sage Weil [Wed, 24 Jun 2009 18:17:55 +0000 (11:17 -0700)]
osd: adjust recovery op accounting; explicitly track set of recovering objects
Use a single {start,finish}_recovery_op() func to start and stop
recovery ops, so that there is a single point for counter adjustments
to occur. On reset, simply call into OSD multiple times.
Also maintain a set<sobject_t> in each PG and on the OSD to track
the set of objects that are recovering. This can hopefully be
compiled out once all the bugs are identified.
We are chasing this:
osd/OSD.cc:3465: FAILED assert(recovery_ops_active >= 0)
1: ./cosd(_Z18__ceph_assert_failPKcS0_iS0_+0x3a) [0x7a769b]
2: ./cosd(_ZN3OSD18finish_recovery_opEP2PGib+0x148) [0x696bce]
3: ./cosd(_ZN12ReplicatedPG18finish_recovery_opEv+0x77) [0x6359c5]
4: ./cosd(_ZN12ReplicatedPG17sub_op_push_replyEP14MOSDSubOpReply+0x540) [0x63628a]
5: ./cosd(_ZN12ReplicatedPG15do_sub_op_replyEP14MOSDSubOpReply+0x64) [0x6407fe]
6: ./cosd(_ZN3OSD10dequeue_opEP2PG+0x224) [0x6996ee]
7: ./cosd(_ZN3OSD4OpWQ8_processEP2PG+0x21) [0x70d175]
8: ./cosd(_ZN10ThreadPool9WorkQueueI2PGE13_void_processEPv+0x28) [0x6c9f78]
9: ./cosd(_ZN10ThreadPool6workerEv+0x280) [0x7a825c]
10: ./cosd(_ZN10ThreadPool10WorkThread5entryEv+0x19) [0x70cb9f]
11: ./cosd(_ZN6Thread11_entry_funcEPv+0x20) [0x629d48]
12: /lib/libpthread.so.0 [0x7f2f1e3f33f7]
13: /lib/libc.so.6(clone+0x6d) [0x7f2f1d9c294d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Sage Weil [Wed, 24 Jun 2009 05:06:24 +0000 (22:06 -0700)]
osd: fix merge_log when log and olog share bottom
If log has 6'10 and olog has 7'10, on same object, merge_log
was failing to throw out log's 6'10 entry because the
last_kept iterator was still end(). Use a simple eversion_t
instead, and simplify existing (and otherwise correct)
log.bottom logic, but without the last_kept != end() guard
that threw us off.
09.06.23 16:52:56.032981 1145465168 osd4 485 pg[1.cd( v 469'11021/469'11021 (469'11020,469'11021] n=8151 ec=2 les=476 485/480) r=0 lcod 0'0 mlcod 469'11021 !hml crashed+peering] merge_log log(469'11020,476'11021] from osd0 into log(469'11020,469'11021]
09.06.23 16:52:56.033001 1145465168 osd4 485 pg[1.cd( v 469'11021/469'11021 (469'11020,469'11021] n=8151 ec=2 les=476 485/480) r=0 lcod 0'0 mlcod 469'11021 !hml crashed+peering] merge_log extending top to 476'11021
09.06.23 16:52:56.033033 1145465168 osd4 485 pg[1.cd( v 469'11021/469'11021 (469'11020,469'11021] n=8151 ec=2 les=476 485/480) r=0 lcod 0'0 mlcod 469'11021 !hml crashed+peering] ? 476'11021 (0'0) m 10001641d24.00000000/head by mds0.16:33860 09.06.23 16:50:28.931949
09.06.23 16:52:56.033057 1145465168 osd4 485 pg[1.cd( v 469'11021/469'11021 (469'11020,469'11021] n=8151 ec=2 les=476 485/480) r=0 lcod 0'0 mlcod 469'11021 !hml crashed+peering] merge_log 476'11021 (0'0) m 10001641d24.00000000/head by mds0.16:33860 09.06.23 16:50:28.931949
09.06.23 16:52:56.033090 1145465168 osd4 485 pg[1.cd( v 476'11021/469'11021 (469'11020,476'11021] n=8151 ec=2 les=476 485/480) r=0 lcod 0'0 mlcod 469'11021 !hml crashed+peering m=1 l=1] merge_log result log(469'11020,476'11021] missing(1) changed=1
Sage Weil [Tue, 23 Jun 2009 04:32:09 +0000 (21:32 -0700)]
osd: stop rewinding replica log when we reach log.bottom
We stop rewinding a replica log when we reach our own
log.bottom, because we don't know enough to do so in any
meaningful way, and because we can assume it is not
divergent at that point (barring any complete screwupedness).
Also, if we do change last_update, make sure last_complete is
rewound too.