git.apps.os.sepia.ceph.com Git

author	Sage Weil <sage.weil@dreamhost.com>
	Tue, 25 Oct 2011 04:44:36 +0000 (21:44 -0700)
committer	Sage Weil <sage.weil@dreamhost.com>
	Tue, 25 Oct 2011 04:54:10 +0000 (21:54 -0700)
commit	b17c9ca595a140ce54163a2e7e8966552e9c8df1
tree	6c88e0c9738b52c81e91b315c66252d63544bc17	tree \| snapshot
parent	7aa0d89bb9b7612a6b9593223df69aa200a68b2f	commit \| diff

osd: handle missing/degraded in op thread

The _handle_op() method (and friends) are called when an op is initially
queued and when it is requeued. In the requeue case we have to be more
careful because the caller may be in the middle of doing all sorts of
random stuff. That means we need to limit ourselves to queueing or
discarding the op, and refrain from doing anything else with dangerous
side effects.

This fixes a crash like

osd/ReplicatedPG.cc: In function 'void ReplicatedPG::recover_primary_got(hobject_t, eversion_t)', in thread '7f21d0189700'
osd/ReplicatedPG.cc: 4109: FAILED assert(missing.num_missing() == 0)
ceph version 0.37-105-gc2069eb (commit:c2069eb1e562ba7d753c9b5ce5c904f4f5ef6abe)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0x8ab95a]
2: (ReplicatedPG::recover_primary_got(hobject_t, eversion_t)+0x62e) [0x767eea]
3: (ReplicatedPG::sub_op_push(MOSDSubOp*)+0x2b79) [0x76abeb]
4: (ReplicatedPG::do_sub_op(MOSDSubOp*)+0x1ab) [0x74761b]
5: (OSD::dequeue_op(PG*)+0x47d) [0x820ac3]
6: (OSD::OpWQ::_process(PG*)+0x27) [0x82cc8b]

due to an object being pushed to a replica before it is activated.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>

src/osd/OSD.cc		diff \| blob \| history
src/osd/ReplicatedPG.cc		diff \| blob \| history