]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/commit
crimson/osd: don't assume a pull must happen if there is no push.
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Mon, 6 Dec 2021 16:54:41 +0000 (16:54 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Mon, 6 Dec 2021 17:34:20 +0000 (17:34 +0000)
commit3f745b9eedb0e59d13212fce701e53c34f4b0d52
tree346650b728c89656b086e8fe084310524caec20e
parent5e56258a349bedb00fd23a2662612d42b8465624
crimson/osd: don't assume a pull must happen if there is no push.

In the classical OSD the `ReplicatedRecoveryBackend::recover_object()`
divides into two main flows: pull and push:

```cpp
int ReplicatedBackend::recover_object(
  const hobject_t &hoid,
  // ...
  )
{
  dout(10) << __func__ << ": " << hoid << dendl;
  RPGHandle *h = static_cast<RPGHandle *>(_h);
  if (get_parent()->get_local_missing().is_missing(hoid)) {
    ceph_assert(!obc);
    // pull
    prepare_pull(
      v,
      hoid,
      head,
      h);
  } else {
    ceph_assert(obc);
    int started = start_pushes(
      hoid,
      obc,
      h);
    // ...
  }
  return 0;
}
```

Pulls may also enter the push path (`C_ReplicatedBackend_OnPullComplete`)
but push handling doesn't draw any assumption on that. What's important,
`recover_object()` may result in no pulls and pushes.

This isn't the case of crimson as its implementation of the push path
asserts that, if no push is scheduled, `PullInfo` must be allocated.
This patch reworks this logic to reflects the classical one and to avoid
crashes like the following one:

```
DEBUG 2021-12-01 18:43:00,220 [shard 0] osd - recover_object: loaded obc: 3:4e058a2e:::smithi13839607-45:head
WARN  2021-12-01 18:43:00,220 [shard 0] none - intrusive_ptr_add_ref(p=0x6190000d7f80, use_count=3)
WARN  2021-12-01 18:43:00,220 [shard 0] none - intrusive_ptr_release(p=0x6190000d7f80, use_count=4)
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - call_with_interruption_impl clearing interrupt_cond: 0x60300012b210,N7crimson3osd20IOInterruptConditionE
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - call_with_interruption_impl: may_interrupt: false, local interrupt_condintion: 0x60300012b210, global interrupt_cond: 0x0,N7crimson3osd20IOInterruptConditionE
TRACE 2021-12-01 18:43:00,220 [shard 0] osd - set: interrupt_cond: 0x60300012b210, ref_count: 1
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-8902-g52fd47fe/rpm/el8/BUILD/ceph-17.0.
0-8902-g52fd47fe/src/crimson/osd/replicated_recovery_backend.cc:84: ReplicatedRecoveryBackend::maybe_push_shards(const hobject_t&, eversion_t)::<lambda()>: Assertion `recovery.pi' failed.
Aborting on shard 0.
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/osd/replicated_recovery_backend.cc