]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/commit
ECBackend: Don't directly use get_recovery_chunk_size() in RecoveryOp::WRITING state. 2184/head
authorMa Jianpeng <jianpeng.ma@intel.com>
Wed, 30 Jul 2014 03:03:17 +0000 (11:03 +0800)
committerMa Jianpeng <jianpeng.ma@intel.com>
Mon, 4 Aug 2014 03:09:32 +0000 (11:09 +0800)
commit076f33afb393f1b733d14c1eff99a9818b086d4f
tree08f48fea146539edea3763d1f3809e59e86d22e3
parent6e52efabc2244cf254441aca696be31fb8173c46
ECBackend: Don't directly use get_recovery_chunk_size() in RecoveryOp::WRITING state.

We cannot guarantee that conf->osd_recovery_max_chunk don't change when
recoverying a erasure object.
If change between RecoveryOp::READING and RecoveryOp::WRITING, it can cause this bug:

2014-07-30 10:12:09.599220 7f7ff26c0700 -1 osd/ECBackend.cc: In function
'void ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)' thread 7f7ff26c0700 time 2014-07-30 10:12:09.596837
osd/ECBackend.cc: 529: FAILED assert(pop.data.length() ==
sinfo.aligned_logical_offset_to_chunk_offset(
after_progress.data_recovered_to -
op.recovery_progress.data_recovered_to))

 ceph version 0.83-383-g3cfda57
(3cfda577b15039cb5c678b79bef3e561df826ed1)
 1: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,RecoveryMessages*)+0x1a50) [0x928070]
 2: (ECBackend::handle_recovery_read_complete(hobject_t const&,
boost::tuples::tuple<unsigned long, unsigned long, std::map<pg_shard_t,
ceph::buffer::list, std::less<pg_shard_t>,
std::allocator<std::pair<pg_shard_t const, ceph::buffer::list> > >,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type, boost::tuples::null_type,
boost::tuples::null_type>&, boost::optional<std::map<std::string,
ceph::buffer::list, std::less<std::string>,
std::allocator<std::pair<std::string const, ceph::buffer::list> > > >,
RecoveryMessages*)+0x90c) [0x92952c]
 3: (OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x121) [0x938481]
 4: (GenContext<std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&>::complete(std::pair<RecoveryMessages*,
ECBackend::read_result_t&>&)+0x9) [0x929d69]
 5: (ECBackend::complete_read_op(ECBackend::ReadOp&,RecoveryMessages*)+0x63) [0x91c6e3]
 6: (ECBackend::handle_sub_read_reply(pg_shard_t, ECSubReadReply&,RecoveryMessages*)+0x96d) [0x920b4d]
 7: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x17e)[0x92884e]
 8: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,ThreadPool::TPHandle&)+0x23b) [0x7b34db]
 9: (OSD::dequeue_op(boost::intrusive_ptr<PG>,std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x428)
[0x638d58]
 10: (OSD::ShardedOpWQ::_process(unsigned int,ceph::heartbeat_handle_d*)+0x346) [0x6392f6]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ce)[0xa5caae]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xa5ed00]
 13: (()+0x8182) [0x7f800b5d3182]
 14: (clone()+0x6d) [0x7f800997430d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

So we only get the get_recovery_chunk_size() at RecoverOp::READING and
record it using RecoveryOp::extent_requested.

Signed-off-by: Ma Jianpeng <jianpeng.ma@intel.com>
src/osd/ECBackend.cc