]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
crimson/osd: remote peering requests wait for OSD activation.
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Tue, 13 Jul 2021 12:09:39 +0000 (12:09 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Tue, 13 Jul 2021 15:02:16 +0000 (15:02 +0000)
commitdfe68f21688e49d8f2fa5617a15998c87d5a3e79
tree4b7edcb2935f25fb6c00683fda99397998a0ac31
parent507ee678485efe698def4bec00a010fa82fb5c37
crimson/osd: remote peering requests wait for OSD activation.

Before the patch `RemotePeeringRequest` instances were not
waiting for OSD activation. This was eluding the protection
from handling old, outdated peering events the `MOSDBoot`
machinery offers. The net results are crashes like this one
(`OSDState is booting` has been produced by a custom debug):

```
2021-07-07T18:20:23.293 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: DEBUG 2021-07-07 18:16:30,535 [shard 0] ms - [osd.2(cluster) v2:172.21.15.145:6802/2@62336 >> osd
.1 v2:172.21.15.145:6809/2] <== #19 === pg_lease(4.9 pg_lease(ru 60.120281219s ub 68.121276855s int 16.000000000s) e86/86) v1 (133)
2021-07-07T18:20:23.293 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: DEBUG 2021-07-07 18:16:30,536 [shard 0] osd - handle_peering_op on 4.9 from 1
2021-07-07T18:20:23.293 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: DEBUG 2021-07-07 18:16:30,536 [shard 0] osd - peering_event(id=125, detail=PeeringEvent(from=1 pg
id=4.9 sent=86 requested=86 evt=epoch_sent: 86 epoch_requested: 86 MLease epoch 86 from osd.1 pg_lease(ru 60.120281219s ub 68.121276855s int 16.000000000s))): start
2021-07-07T18:20:23.293 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: DEBUG 2021-07-07 18:16:30,536 [shard 0] osd - peering_event(id=125, detail=PeeringEvent(from=1 pg
id=4.9 sent=86 requested=86 evt=epoch_sent: 86 epoch_requested: 86 MLease epoch 86 from osd.1 pg_lease(ru 60.120281219s ub 68.121276855s int 16.000000000s))): got map 93
2021-07-07T18:20:23.294 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: DEBUG 2021-07-07 18:16:30,536 [shard 0] osd - peering_event(id=125, detail=PeeringEvent(from=1 pgid=4.9 sent=86 requested=86 evt=epoch_sent: 86 epoch_requested: 86 MLease epoch 86 from osd.1 pg_lease(ru 60.120281219s ub 68.121276855s int 16.000000000s))): OSDState is booting
2021-07-07T18:20:23.294 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: ERROR 2021-07-07 18:16:30,536 [shard 0] none - /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-5007-g3a9abb02/rpm/el8/BUILD/ceph-17.0.0-5007-g3a9abb02/src/crimson/osd/osd_operations/peering_event.cc:165 : In function 'crimson::osd::RemotePeeringEvent::get_pg()::<lambda()>', ceph_assert(%s)
2021-07-07T18:20:23.294 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: osd.state.is_active()
2021-07-07T18:20:23.294 INFO:journalctl@ceph.osd.2.smithi145.stdout:Jul 07 18:16:30 smithi145 conmon[71083]: Aborting on shard 0.
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/osd/osd.cc
src/crimson/osd/osd.h
src/crimson/osd/osd_operations/peering_event.cc
src/crimson/osd/osd_operations/peering_event.h