The ruleset created for an erasure coded pool has max_size set to a
fixed value of 20, which may be incorrect when more than 20 chunks are
needed and lead to obscure errors. Set it to the number of chunks,
i.e. k+m most of the time.
In a cluster with few OSDs (9 for instance), setting max_size to 20
causes performance problems when injecting a new crushmap. The monitor
will call CrushTester::test which tries 1024 mappins for all sizes
ranging from min_size to max_size. Each attempt to map more OSDs than
available will exhaust all retries (50 by default) and it takes a
significant amount of time. In a cluster with 9 OSDs, testing one such
ruleset can take up to 5 seconds.
Since the test blocks the monitor leader, a few erasure coded rulesets
will block the monitor long enough to exceed the timeouts and trigger an
election.
http://tracker.ceph.com/issues/10363 Fixes: #10363
Signed-off-by: Loic Dachary <ldachary@redhat.com>
ruleno = crush_add_rule(crush, n, ruleno);
return ruleno;
}
+ int set_rule_mask_max_size(unsigned ruleno, int max_size) {
+ crush_rule *r = get_rule(ruleno);
+ if (IS_ERR(r)) return -1;
+ return r->mask.max_size = max_size;
+ }
int set_rule_step(unsigned ruleno, unsigned step, int op, int arg1, int arg2) {
if (!crush) return -ENOENT;
crush_rule *n = get_rule(ruleno);
if (ruleid < 0)
return ruleid;
- else
+ else {
+ crush.set_rule_mask_max_size(ruleid, get_chunk_count());
return crush.get_rule_mask_ruleset(ruleid);
+ }
}
// -----------------------------------------------------------------------------
"indep", pg_pool_t::TYPE_ERASURE, ss);
if (ruleid < 0)
return ruleid;
- else
+ else {
+ crush.set_rule_mask_max_size(ruleid, get_chunk_count());
return crush.get_rule_mask_ruleset(ruleid);
+ }
}
void ErasureCodeJerasure::init(const map<string,string> ¶meters)
int steps = 4 + ruleset_steps.size();
int min_rep = 3;
- int max_rep = 30;
+ int max_rep = get_chunk_count();
int ret;
ret = crush.add_rule(steps, ruleset, pg_pool_t::TYPE_ERASURE,
min_rep, max_rep, rno);