]> git.apps.os.sepia.ceph.com Git - ceph.git/commit
crimson/osd: prevent premature OSD activation.
authorRadoslaw Zarzynski <rzarzyns@redhat.com>
Mon, 12 Jul 2021 14:26:48 +0000 (14:26 +0000)
committerRadoslaw Zarzynski <rzarzyns@redhat.com>
Tue, 13 Jul 2021 14:38:54 +0000 (14:38 +0000)
commit76e5f5caad61cfe63924eb79f7df1b35f8c8afc1
tree292b972b26a806741d1a2b2eee97e80a60de560c
parenta5fd875665a62c8fc02b0c9473174ef383656ef0
crimson/osd: prevent premature OSD activation.

In contrast to the classical OSD:

```
int OSD::init()
{
  // ...

  {
    epoch_t bind_epoch = osdmap->get_epoch();
    service.set_epochs(NULL, NULL, &bind_epoch);
  }

  // ...

  // load up pgs (as they previously existed)
  load_pgs();
```

crimson doesn't set the `bind_epoch` when initializing. The net
result is going active prematurely which happens because the 3rd
condition (`bind_epoch < osdmap->get_up_from(whoami)`) is always
true.

```
    if (osdmap->is_up(whoami) &&
        osdmap->get_addrs(whoami) == public_msgr->get_myaddrs() &&
        bind_epoch < osdmap->get_up_from(whoami)) {
      if (state.is_booting()) {
        logger().info("osd.{}: activating...", whoami);
```

Nullifying it translates the "is it activated?" check basically
into "is it up?" verification. This is problematic in a situation
like:

1. Primary got new OSDMap but replica has not.
2. Replica restarts, sends `MOSDBoot` and receives the newer map
   from the previous point.
3. Primary sends a message that is unexpected by replica.
4. Monitor publishes a new OSDMap diven by the `MOSDBoot`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
src/crimson/osd/osd.cc