]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
4 years agomgr/cephadm/inventory: do not try to resolve current mgr host 41636/head
Sage Weil [Thu, 3 Jun 2021 14:29:00 +0000 (10:29 -0400)]
mgr/cephadm/inventory: do not try to resolve current mgr host

The CNI configuration may set up a private network for the container, which
is mapped to the hostname in /etc/hosts.  For example, my test box sets
up 10.88.0.0/24 because I was using crio + kubeadm on this host earlier
(at least I think that's why):

$ sudo podman run --rm --name test123 --entrypoint /bin/bash -it quay.ceph.io/ceph-ci/ceph:master -c "cat /etc/hosts"
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.88.0.8 f9e91bf2478f test123

In any case, we should never trust a lookup of our own hostname from inside
a container!

This isn't quite sufficient, though: if this is a single-host cluster, then
we fall back to using get_mgr_ip(). That value may be distorted by the
public_network option on the mgr, but we don't have any other good
options here, and single-node clusters are unlikely to have complex
network configs.

Refactor a bit to avoid the try/except nesting.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agopybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap
Sage Weil [Wed, 2 Jun 2021 02:31:11 +0000 (22:31 -0400)]
pybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap

The previous approach was convoluted: we tried to do a DNS lookup on the
hostname, which would fail if /etc/hosts had an entry.  Which, with podman,
it does.  And the IP it has will vary in all sorts of weird ways.  For
example, CNI on my host means that I get a dynamic address in 10.88.0.0/24.

Avoid all of that nonsense and use the IP that is in the mgrmap.  There
may be multiple IPs (v2 + v1, or maybe even IPv4 + v6 in the future); in
that case, use the first one.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/restful: use get_mgr_ip() instead of hostname
Sage Weil [Wed, 2 Jun 2021 02:31:47 +0000 (22:31 -0400)]
mgr/restful: use get_mgr_ip() instead of hostname

Now we match dashboard!

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge PR #41635 into master
Patrick Donnelly [Wed, 2 Jun 2021 15:18:22 +0000 (08:18 -0700)]
Merge PR #41635 into master

* refs/pull/41635/head:
qa: increase fragmentation to improve uniform distribution

Reviewed-by: Ramana Raja <rraja@redhat.com>
4 years agoMerge pull request #41644 from rzarzynski/wip-crimson-fix-blocked-peering
Kefu Chai [Wed, 2 Jun 2021 14:43:40 +0000 (22:43 +0800)]
Merge pull request #41644 from rzarzynski/wip-crimson-fix-blocked-peering

crimson/monc: fix subscription stall that blocked peering.

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge PR #41651 into master
Sage Weil [Wed, 2 Jun 2021 14:27:03 +0000 (10:27 -0400)]
Merge PR #41651 into master

* refs/pull/41651/head:
doc/cephadm: s/the the/the

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41645 from tchaikov/wip-crimson-osd-mkfs
Kefu Chai [Wed, 2 Jun 2021 14:10:12 +0000 (22:10 +0800)]
Merge pull request #41645 from tchaikov/wip-crimson-osd-mkfs

crimson/osd: check existing superblock when mkfs

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agodoc/cephadm: s/the the/the 41651/head
Zac Dover [Wed, 2 Jun 2021 14:06:06 +0000 (00:06 +1000)]
doc/cephadm: s/the the/the

This removes an extraneous "the" and reworks a
sentence so that it adheres to the grammatical
rules of the English language.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
4 years agocrimson/osd: check existing superblock when mkfs 41645/head
Kefu Chai [Wed, 2 Jun 2021 12:57:14 +0000 (20:57 +0800)]
crimson/osd: check existing superblock when mkfs

in case mkfs on an existing store.

this change mirrors the behavior of classic osd, also addresses the
assert failure when BlueStore tries to create a collection when it
already contains a colloection with the same collection id.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: extract OSD::_write_superblock() out
Kefu Chai [Wed, 2 Jun 2021 12:47:03 +0000 (20:47 +0800)]
crimson/osd: extract OSD::_write_superblock() out

prepare for the change to verify existing meta collection and superblock
stored in it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/monc: fix subscription stall that blocked peering. 41644/head
Radoslaw Zarzynski [Wed, 2 Jun 2021 11:59:37 +0000 (11:59 +0000)]
crimson/monc: fix subscription stall that blocked peering.

There is a scenario when the `active_con` is properly
chosen but isn't marked as `ready_to_send`.
If `renew_subs()` is called during the `on_session_opened()`,
the flag will be turned on after the subscriptions are
renewed which cannot happen as it requires the flag to be
already set. In other words: there is a circular data dependency.

The net result is stalling the subscription machinery,
particularly the `OSDMap` subs. This caused a nasty peering
issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo`
state.

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908$ less ./remote/smithi039/log/ceph-osd.1.log.gz
...
DEBUG 2021-05-26 20:19:48,134 [shard 0] osd -  pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=
-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0]
r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down  blocked_by {}
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
...
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering  got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0)
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru
...
DEBUG 2021-05-26 20:19:49,139 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active
...
DEBUG 2021-05-26 20:19:49,142 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean
...
DEBUG 2021-05-26 20:22:31,223 [shard 0] osd -  pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset
...
<a lot of flipping>
...
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at 1622060647.81581881622060647.8173316 spent on 1 events
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at 1622060647.8516333, 0.0 spent on 0 events
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo
...
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down  blocked_by {}
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.4
...
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
...
DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62
...
<plenty of osd_ping messanging but no reply to the pg_query for 2.7>
...
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== #772 =
== osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70)
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> #772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70
```

The peering request got stuck due to awaiting for `OSDMap`.

```
DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star
```

```
INFO  2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s
ubscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
...
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16]
DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16)
...
INFO  2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17]
DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17)
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17)
...
DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m
ap(20..21 src has 1..21) v4 (41)
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21]
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
...
DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m
ap(21..22 src has 1..22) v4 (41)
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22]
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20
INFO  2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18)
...
<osdmap_subscribe(18) over and over>
```

```
2021-05-26T20:19:42.048+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:42.048+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3
...
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo
n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime)
2021-05-26T20:19:49.129+0000 7f4712ffd700  5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features 3f01cfbb7ffdffff
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 15 622 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 14 578 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14..
15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480
```

```
seastar::future<> Client::renew_subs()
{
  if (!sub.have_new()) {
    logger().warn("{} - empty", __func__);
    return seastar::now();
  }
  logger().trace("{}", __func__);

  auto m = crimson::make_message<MMonSubscribe>();
  m->what = sub.get_subs();
  m->hostname = ceph_get_short_hostname();
  return send_message(std::move(m)).then([this] {
    sub.renewed();
  });
}
```

```
INFO  2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1)
DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s
ubscribe({osdmap=1}) v3 (15)
...
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
<no MMonSubcribe>
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoMerge pull request #41630 from rhcs-dashboard/fix-bucket-calculations
Ernesto Puerta [Wed, 2 Jun 2021 12:12:56 +0000 (14:12 +0200)]
Merge pull request #41630 from rhcs-dashboard/fix-bucket-calculations

mgr/dashboard: fix bucket objects and size calculations

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #41638 from tchaikov/wip-doc-crimson-doc
Kefu Chai [Wed, 2 Jun 2021 10:43:47 +0000 (18:43 +0800)]
Merge pull request #41638 from tchaikov/wip-doc-crimson-doc

doc/dev/crimson: update link to scylladb debugging tips

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agodoc/dev/crimson: update link to scylladb debugging tips 41638/head
Kefu Chai [Wed, 2 Jun 2021 09:10:25 +0000 (17:10 +0800)]
doc/dev/crimson: update link to scylladb debugging tips

the old one is not reachable anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41637 from tchaikov/wip-crimson-never-discard-future
Kefu Chai [Wed, 2 Jun 2021 09:00:53 +0000 (17:00 +0800)]
Merge pull request #41637 from tchaikov/wip-crimson-never-discard-future

crimson: always handle returned future

Reviewed-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agomgr/dashboard: fix bucket objects and size calculations 41630/head
Avan Thakkar [Tue, 1 Jun 2021 14:21:16 +0000 (19:51 +0530)]
mgr/dashboard: fix bucket objects and size calculations

Fixes: https://tracker.ceph.com/issues/51035
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
4 years agocrimson/common/interruptible_future: mark future 'nodiscard' 41637/head
Kefu Chai [Wed, 2 Jun 2021 06:16:25 +0000 (14:16 +0800)]
crimson/common/interruptible_future: mark future 'nodiscard'

so compiler is able to error out if we discard a future.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/common/errorator: mark errorator::future 'nodiscard'
Kefu Chai [Wed, 2 Jun 2021 06:15:43 +0000 (14:15 +0800)]
crimson/common/errorator: mark errorator::future 'nodiscard'

so compiler is able to error out if we discard a future.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson: always handle returned future
Kefu Chai [Wed, 2 Jun 2021 06:13:04 +0000 (14:13 +0800)]
crimson: always handle returned future

to ignore a future without good reason could lead to catastrophic
issues. see also b127fa3cdd405c71cf09875f61f107c23af6b8cf

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/os: do not return a future in finally()
Kefu Chai [Wed, 2 Jun 2021 06:11:07 +0000 (14:11 +0800)]
crimson/os: do not return a future in finally()

errorator always discard the returned future.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41026 from TRYTOBE8TME/wip-rgw-rabbitmq
Yuval Lifshitz [Wed, 2 Jun 2021 04:47:39 +0000 (07:47 +0300)]
Merge pull request #41026 from TRYTOBE8TME/wip-rgw-rabbitmq

qa/tasks: Adding RabbitMQ task for bucket notification tests

4 years agoqa: increase fragmentation to improve uniform distribution 41635/head
Patrick Donnelly [Tue, 1 Jun 2021 21:00:23 +0000 (14:00 -0700)]
qa: increase fragmentation to improve uniform distribution

Fixes: https://tracker.ceph.com/issues/51060
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #41588 from idryomov/wip-rbd-trash-purge
Ilya Dryomov [Tue, 1 Jun 2021 19:56:57 +0000 (21:56 +0200)]
Merge pull request #41588 from idryomov/wip-rbd-trash-purge

librbd: don't stop at the first unremovable image when purging

Reviewed-by: Mykola Golub <mgolub@suse.com>
4 years agoqa/tasks: Adding RabbitMQ task for bucket notification tests 41026/head
Kalpesh [Tue, 20 Apr 2021 09:14:04 +0000 (14:44 +0530)]
qa/tasks: Adding RabbitMQ task for bucket notification tests

This commit majorly consists of the RabbitMQ task which is a required and supported endpoint in bucket notification tests.
And some related changes in the AMQP tests. Major changes are:
1. Addition of RabbitMQ task
2. Documentation update for the steps to execute AMQP tests
3. Addition of attributes to the tests
4. Tox dependency removal from kafka.py

Signed-off-by: Kalpesh Pandya <kapandya@redhat.com>
4 years agoMerge pull request #41421 from s0nea/wip-dashboard-rbd-partially-rm
Ernesto Puerta [Tue, 1 Jun 2021 17:29:20 +0000 (19:29 +0200)]
Merge pull request #41421 from s0nea/wip-dashboard-rbd-partially-rm

mgr/dashboard: show partially deleted RBDs

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Laura Paduano <lpaduano@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
4 years agoMerge pull request #41606 from liu-chunmei/seastore-fix-tracker
Samuel Just [Tue, 1 Jun 2021 15:53:31 +0000 (08:53 -0700)]
Merge pull request #41606 from liu-chunmei/seastore-fix-tracker

crimson/seastore: fix assert in read_extent

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41616 from idryomov/wip-rbd-qemu-precise-repos
Ilya Dryomov [Tue, 1 Jun 2021 15:48:12 +0000 (17:48 +0200)]
Merge pull request #41616 from idryomov/wip-rbd-qemu-precise-repos

qa/tasks/qemu: precise repos have been archived

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoMerge pull request #41605 from t-msn/update-podman-detection
Kefu Chai [Tue, 1 Jun 2021 15:43:59 +0000 (23:43 +0800)]
Merge pull request #41605 from t-msn/update-podman-detection

vstart: detect podman using `command -v`

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41369 from ifed01/wip-ifed-fix-avl-enospc2
Kefu Chai [Tue, 1 Jun 2021 15:00:57 +0000 (23:00 +0800)]
Merge pull request #41369 from ifed01/wip-ifed-fix-avl-enospc2

 os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41395 from rhcs-dashboard/fix-50855-master
Ernesto Puerta [Tue, 1 Jun 2021 14:28:48 +0000 (16:28 +0200)]
Merge pull request #41395 from rhcs-dashboard/fix-50855-master

mgr/dashboard: API Version changes do not apply to pre-defined methods (list, create etc.)

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #41598 from rhcs-dashboard/fix-51026-master
Ernesto Puerta [Tue, 1 Jun 2021 14:28:07 +0000 (16:28 +0200)]
Merge pull request #41598 from rhcs-dashboard/fix-51026-master

mgr/dashboard: pass Grafana datasource in URL

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
4 years agoMerge pull request #41184 from rhcs-dashboard/fix-base-href
Ernesto Puerta [Tue, 1 Jun 2021 14:26:30 +0000 (16:26 +0200)]
Merge pull request #41184 from rhcs-dashboard/fix-base-href

mgr/dashboard: fix base-href

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge PR #41601 into master
Sage Weil [Tue, 1 Jun 2021 13:46:33 +0000 (09:46 -0400)]
Merge PR #41601 into master

* refs/pull/41601/head:
doc/foundation: remove amihan

Reviewed-by: Mike Perez <miperez@redhat.com>
4 years agoos/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators. 41369/head
Igor Fedotov [Mon, 17 May 2021 19:23:26 +0000 (22:23 +0300)]
os/bluestore: fix unexpected ENOSPC in Avl/Hybrid allocators.

Avl allocator mode was returning unexpected ENOSPC in first-fit mode if all size-
matching available extents were unaligned but applying the alignment made all of
them shorter than required. Since no lookup retry with smaller size -
ENOSPC is returned.
Additionally we should proceed with a lookup in best-fit mode even when
original size has been truncated to match the avail size.
(force_range_size_alloc==true)

Fixes: https://tracker.ceph.com/issues/50656
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
4 years agoMerge pull request #41470 from a16bitsysop/rgw_string.h
Casey Bodley [Tue, 1 Jun 2021 12:28:04 +0000 (08:28 -0400)]
Merge pull request #41470 from a16bitsysop/rgw_string.h

rgw/rgw_string.h: add missing includes for alpine and boost 1.75

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoMerge pull request #41591 from tchaikov/wip-mgr-selftest-repl
Kefu Chai [Tue, 1 Jun 2021 11:43:34 +0000 (19:43 +0800)]
Merge pull request #41591 from tchaikov/wip-mgr-selftest-repl

pybind/mgr/selftest: add "mgr self-test eval" command

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #41603 from rzarzynski/wip-crimson-fix-use-after-free-alienstore...
Kefu Chai [Tue, 1 Jun 2021 11:28:06 +0000 (19:28 +0800)]
Merge pull request #41603 from rzarzynski/wip-crimson-fix-use-after-free-alienstore-get_attr

crimson/os: fix use-after-free in AlienStore::get_attr().

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoqa/tasks/qemu: precise repos have been archived 41616/head
Ilya Dryomov [Tue, 1 Jun 2021 10:46:32 +0000 (12:46 +0200)]
qa/tasks/qemu: precise repos have been archived

Fixes: https://tracker.ceph.com/issues/51033
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agoMerge pull request #41595 from zdover23/wip-doc-cephadm-serv-man-daemon-status-2021...
Sebastian Wagner [Tue, 1 Jun 2021 09:30:10 +0000 (11:30 +0200)]
Merge pull request #41595 from zdover23/wip-doc-cephadm-serv-man-daemon-status-2021-05-30

doc/cephadm: enriching "daemon status"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
4 years agoMerge pull request #41608 from zdover23/wip-doc-cephadm-serv-man-service-spec-2021...
Sebastian Wagner [Tue, 1 Jun 2021 09:29:12 +0000 (11:29 +0200)]
Merge pull request #41608 from zdover23/wip-doc-cephadm-serv-man-service-spec-2021-05-30

doc/cephadm: enriching "Service Specification"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
4 years agocrimson/os: fix formatting in AlienStore::get_attr(). 41603/head
Radoslaw Zarzynski [Mon, 31 May 2021 23:37:04 +0000 (23:37 +0000)]
crimson/os: fix formatting in AlienStore::get_attr().

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson/os: fix use-after-free in AlienStore::get_attr().
Radoslaw Zarzynski [Mon, 31 May 2021 22:05:25 +0000 (22:05 +0000)]
crimson/os: fix use-after-free in AlienStore::get_attr().

The `FuturizedStore` interface imposes the `get_attr()`
takes the `name` parameter as `std::string_view`, and
thus burdens implementations with extending the life-
time of the data the instance refers to.

Unfortunately, `AlienStore` is unaware that prolonging
the life of a `std::string_view` instance doesn't prolong
the data memory it points to. This problem has manifested
in the following use-after-free detected at Sepia:

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136929$ less ./remote/smithi194/log/ceph-osd.7.log.gz
...
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - do_osd_ops_execute: object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head - handling op
call
DEBUG 2021-05-26 20:24:54,077 [shard 0] osd - handling op call on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - calling method lock.lock, num_read=0, num_write=0
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - handling op getxattr on object 14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head
DEBUG 2021-05-26 20:24:54,078 [shard 0] osd - getxattr on obj=14:55e1a5b4:test-rados-api-smithi067-38889-2::foo:head for attr=_lock.TestLockPP1
DEBUG 2021-05-26 20:24:54,078 [shard 0] bluestore - get_attr
=================================================================
==34068==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030001851d0 at pc 0x7f824d6a5b27 bp 0x7f822b4201c0 sp 0x7f822b41f968
READ of size 17 at 0x6030001851d0 thread T28 (alien-store-tp)
...
    #0 0x7f824d6a5b26  (/lib64/libasan.so.5+0x40b26)
    #1 0x55e2cbb2e00b  (/usr/bin/ceph-osd+0x2b6dc00b)
    #2 0x55e2d31f086e  (/usr/bin/ceph-osd+0x32d9e86e)
    #3 0x55e2d3467607 in crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) (/usr/bin/ceph-osd+0x33015607)
    #4 0x55e2d346b14a  (/usr/bin/ceph-osd+0x3301914a)
    #5 0x7f8249d32ba2  (/lib64/libstdc++.so.6+0xc2ba2)
    #6 0x7f824a00d149 in start_thread (/lib64/libpthread.so.0+0x8149)
    #7 0x7f82486edf22 in clone (/lib64/libc.so.6+0xfcf22)

0x6030001851d0 is located 0 bytes inside of 31-byte region [0x6030001851d0,0x6030001851ef)
freed by thread T0 here:
    #0 0x7f824d757688 in operator delete(void*) (/lib64/libasan.so.5+0xf2688)

previously allocated by thread T0 here:
    #0 0x7f824d7567b0 in operator new(unsigned long) (/lib64/libasan.so.5+0xf17b0)

Thread T28 (alien-store-tp) created by T0 here:
    #0 0x7f824d6b7ea3 in __interceptor_pthread_create (/lib64/libasan.so.5+0x52ea3)

SUMMARY: AddressSanitizer: heap-use-after-free (/lib64/libasan.so.5+0x40b26)
Shadow bytes around the buggy address:
  0x0c06800289e0: fd fd fd fa fa fa fd fd fd fa fa fa 00 00 00 fa
  0x0c06800289f0: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
  0x0c0680028a00: fd fa fa fa fd fd fd fa fa fa fd fd fd fa fa fa
  0x0c0680028a10: fd fd fd fa fa fa fd fd fd fa fa fa fd fd fd fa
  0x0c0680028a20: fa fa fd fd fd fa fa fa fd fd fd fa fa fa fd fd
=>0x0c0680028a30: fd fd fa fa fd fd fd fd fa fa[fd]fd fd fd fa fa
  0x0c0680028a40: fd fd fd fd fa fa fd fd fd fd fa fa 00 00 00 07
  0x0c0680028a50: fa fa 00 00 00 fa fa fa 00 00 00 fa fa fa fd fd
  0x0c0680028a60: fd fd fa fa fd fd fd fd fa fa fd fd fd fd fa fa
  0x0c0680028a70: 00 00 00 00 fa fa fd fd fd fd fa fa fd fd fd fd
  0x0c0680028a80: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==34068==ABORTING
```

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agomgr/dashboard: show partially deleted RBDs 41421/head
Tatjana Dehler [Thu, 27 May 2021 09:46:50 +0000 (11:46 +0200)]
mgr/dashboard: show partially deleted RBDs

An RBD might be partially deleted if the deletion
process has been started but was interrupted. In
this case return the RBD as part of the RBD list
and mark it as partially deleted.

Fixes: https://tracker.ceph.com/issues/48603
Signed-off-by: Tatjana Dehler <tdehler@suse.com>
4 years agoMerge pull request #41607 from liu-chunmei/seastore-cleanup-lba-get-mapping
Kefu Chai [Tue, 1 Jun 2021 08:18:00 +0000 (16:18 +0800)]
Merge pull request #41607 from liu-chunmei/seastore-cleanup-lba-get-mapping

crimson/seastore: cleanup lba manager get_mappings

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41597 from rhcs-dashboard/remove-promtool-script
Kefu Chai [Tue, 1 Jun 2021 08:17:07 +0000 (16:17 +0800)]
Merge pull request #41597 from rhcs-dashboard/remove-promtool-script

test,cmake: remove run-promtool-unitests.sh script

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/seastore: cleanup lba manager get_mappings 41607/head
chunmei-liu [Tue, 1 Jun 2021 06:44:57 +0000 (23:44 -0700)]
crimson/seastore: cleanup lba manager get_mappings

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
4 years agocrimson/seastore: fix assert in read_extent 41606/head
chunmei-liu [Tue, 1 Jun 2021 05:54:55 +0000 (22:54 -0700)]
crimson/seastore: fix assert in read_extent

lba btree root leaf is empty after osd reboot, because SegmentStateTracker's states are wrong.
and that is caused by tracker->do_write not finished then seastore closed.

in transaction manager read_extent, can't read extent.
ceph_assert(0 == "Should be impossible");

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
4 years agotest,cmake:remove run-promtool-unitests.sh script 41597/head
Aashish Sharma [Wed, 26 May 2021 07:08:33 +0000 (12:38 +0530)]
test,cmake:remove run-promtool-unitests.sh script

This PR intends to remove the run-promtool-unittests.sh script as CMakeLists.txt handles the promtool execution
(also adding the description to run these tests in Readme.md)

Signed-off-by: Aashish Sharma <aasharma@redhat.com>
4 years agomgr/dashboard: API Version changes do not apply to pre-defined methods (list, create... 41395/head
Aashish Sharma [Tue, 1 Jun 2021 05:09:24 +0000 (10:39 +0530)]
mgr/dashboard: API Version changes do not apply to pre-defined methods (list, create etc.)

Methods like list(), create(), get() etc doesn't get applied the version.Also for the endpoints that get the version changed, the docs and the request header has still the version v1.0+ in them. So with the version reduced it gives 415 error when trying to make the request. This PR fixes this issue.

Fixes: https://tracker.ceph.com/issues/50855
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
4 years agopybind/mgr/selftest: add "mgr self-test eval" command 41591/head
Kefu Chai [Sat, 29 May 2021 17:10:25 +0000 (01:10 +0800)]
pybind/mgr/selftest: add "mgr self-test eval" command

and a simple REPL client allowing developer to peek and poke the
selftest module. if this turns out to be useful, we can promote this
method into a dedicated mix-in class, so other module can use it if
developer wants to test it manually.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41514 from ideepika/wip-49592-upgrade
Mykola Golub [Mon, 31 May 2021 16:34:53 +0000 (19:34 +0300)]
Merge pull request #41514 from ideepika/wip-49592-upgrade

qa/upgrade: conditionally disable update_features tests

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Mykola Golub <mgolub@suse.com>
4 years agodoc/foundation: remove amihan 41601/head
Sage Weil [Mon, 31 May 2021 16:26:01 +0000 (11:26 -0500)]
doc/foundation: remove amihan

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/dashboard: pass Grafana datasource in URL 41598/head
Ernesto Puerta [Mon, 31 May 2021 11:45:40 +0000 (13:45 +0200)]
mgr/dashboard: pass Grafana datasource in URL

PR https://github.com/ceph/ceph/pull/24314 added support for
specifying the Grafana datasource via $datasource template variable, but
this hadn't been used from the Dashboard side so far.

As per https://grafana.com/docs/grafana/latest/variables/#templates, by
adding `var-datasource=Dashboard1`, Dashboard can specify the
datasource.

Fixes: https://tracker.ceph.com/issues/51026
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #41589 from tchaikov/wip-crimson-start-up-error
Kefu Chai [Mon, 31 May 2021 12:07:33 +0000 (20:07 +0800)]
Merge pull request #41589 from tchaikov/wip-crimson-start-up-error

crimson: handle startup failures properly

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson/os/alienstore: do not cleanup if not started 41589/head
Kefu Chai [Sat, 29 May 2021 08:24:59 +0000 (16:24 +0800)]
crimson/os/alienstore: do not cleanup if not started

there is chance stop() and umount() methods get called even if start()
is not called in the error handling path. in that case, just make these
methods no-op. to ensure that OSD behaves in that case.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/os/alienstore: create tp in AlienStore::start()
Kefu Chai [Sat, 29 May 2021 08:03:50 +0000 (16:03 +0800)]
crimson/os/alienstore: create tp in AlienStore::start()

thread pool is not needed until AlienStore::start(). with this change,
we are able to tell if the AlienStore is actually started or not in
AlienStore::stop().

as seastar::sharded<Service> start a service in two phases:

1. construct the shard instances
2. actually start them

and it stops a service in a single shot, which both stops the services
and destructs the service instance(s).

so we have to implement a proper stop() method for services whose
start() might not be called after its instance is created by
seastar::sharded<Service>::start() in case of error handling or if
we just don't want to call start().

to ensure we can skip the steps to clean up the stuff created by
start(), we need to have a flag in the sharded service, because
AlienStore is a member variable of OSD, and when we do mkfs, AlienStore
is not start()'ed, and as explained above, we have to call OSD::stop()
to ensure OSD instance is destructed properly. but OSD::stop()
calls store->umount() and store->stop() unconditionally. these methods
in AlienStore rely on a functional thread pool.

fortunately, we don't need to call these methods if the store is never
mounted or started. in a case of failed "mkfs", store is not mounted at
all but the store and osd instances are created.

so, in this change, thread pool is created in AlienStore::start(), and
we will use it to tell if AlienStore is started or not in the following
change which makes the related method no-op if AlienStore is not started
yet.

also, postpone the creation of `store` until in AlienStore::start(), so
we don't need to destroy it in the dtor of AlienStore. otherwise,
BlueStore::~BlueStore() would need to reference resources which are only
available in alien threads, but when OSD::~OSD() is called, we are in
seastar's reactor.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd/main: always stop osd as long as it started
Kefu Chai [Sat, 29 May 2021 07:08:18 +0000 (15:08 +0800)]
crimson/osd/main: always stop osd as long as it started

otherwise the sharded_service's dtor complains if we destruct it without
stopping it first, like:

FATAL: startup failed: std::system_error (error crimson::net:3, negotiation failure)
crimson-osd: ../src/seastar/include/seastar/core/sharded.hh:523: seastar::sharded<T>::~sharded() [with Service = crimson::osd::OSD]: Assertion `_instances.empty()' failed.
Aborting on shard 0.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd/main: do cleanup using defer()
Kefu Chai [Sat, 29 May 2021 07:03:01 +0000 (15:03 +0800)]
crimson/osd/main: do cleanup using defer()

since we do the startup in a seastar thread, we have the luxury of doing
cleanup using the RAII machinery.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd/main: catch exception thrown in the async() call
Kefu Chai [Sat, 29 May 2021 06:51:09 +0000 (14:51 +0800)]
crimson/osd/main: catch exception thrown in the async() call

* use seastar::app_template::run() instead of
  seastar::app_template::run_deprecated() for returning int,
  instead of returning `void`. so the application can return
  int explicitly in the continuation passed to run(). more
  readable this way.
* wrap the all the block in run() in a giant try-catch block,
  so the exceptions thrown by the startup code can be captured
  and handled.
* do not capture the exceptions individually, in the try-catch
  block anymore. the outer catch block takes care of them.

this change improves the error handling when crimson-osd launches.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agovstart: update podman detection 41605/head
Misono Tomohiro [Mon, 31 May 2021 11:58:53 +0000 (20:58 +0900)]
vstart: update podman detection

Since it is possible there is no podman process running when launching
vstart, use 'command -v' instead of 'pgrep -f'.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
4 years agoqa/upgrade: conditionally disable update_features tests 41514/head
Deepika [Mon, 24 May 2021 21:20:39 +0000 (21:20 +0000)]
qa/upgrade: conditionally disable update_features tests

with the recent support for async rbd operations from pacific+ when an
    older client(non async support) goes on upgrade, and simultaneously
    interacts with a newer client which expects the requests to be async,
    experiences hang; considering the return code for request completion to
    be acknowledgement for async request, which then keeps waiting for
    another acknowledgement of request completion.

    this if happens should be a rare only when lockowner is an old client
    and should be deferred if compatibility issues arises.

see also: 541230475d3b25ab18c4eb9bc5011060462594a6(octopus)

Signed-off-by: Deepika <dupadhya@redhat.com>
4 years agolibrbd: don't stop at the first unremovable image when purging 41588/head
Ilya Dryomov [Wed, 26 May 2021 12:21:22 +0000 (14:21 +0200)]
librbd: don't stop at the first unremovable image when purging

As there is no inherent ordering, there may be multiple removable
images past the unremovable image.  On top of that, removing a clone
may make its parent removable so perform an additional pass if any
image gets removed.

Fixes: https://tracker.ceph.com/issues/51021
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agorbd: combined error message for expected Trash::purge() errors
Ilya Dryomov [Wed, 26 May 2021 12:21:22 +0000 (14:21 +0200)]
rbd: combined error message for expected Trash::purge() errors

Output to stderr instead of the log where regular users wouldn't see
it given the elevated log level.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agodoc/cephadm: enriching "Service Specification" 41608/head
Zac Dover [Mon, 31 May 2021 04:15:56 +0000 (14:15 +1000)]
doc/cephadm: enriching "Service Specification"

This PR adds parallel construction to the "Service
Specification" section of the "Service Managment"
chapter of the cephadm documentation.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
4 years agodoc/cephadm: enriching "daemon status" 41595/head
Zac Dover [Mon, 31 May 2021 03:55:20 +0000 (13:55 +1000)]
doc/cephadm: enriching "daemon status"

This PR creates parallel structure for the
text in the "Daemon Status" section of the
cephadm Service Management chapter.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
4 years agoMerge pull request #41552 from tchaikov/wip-mgr-find-roots
Kefu Chai [Mon, 31 May 2021 01:40:50 +0000 (09:40 +0800)]
Merge pull request #41552 from tchaikov/wip-mgr-find-roots

mgr: expose CRUSHMap.find_roots()

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
4 years agoMerge pull request #41563 from cybozu/rgw-add-the-description-of-blocking-io-during...
J. Eric Ivancich [Sat, 29 May 2021 16:18:45 +0000 (12:18 -0400)]
Merge pull request #41563 from cybozu/rgw-add-the-description-of-blocking-io-during-index-resharding

rgw: add the description of blocking io during index resharding

Reviewed-by: Matt Benjamin mbenjamin@redhat.com
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
4 years agocrimson/osd/main: handle and rethrow exception in fetch_config()
Kefu Chai [Sat, 29 May 2021 06:48:11 +0000 (14:48 +0800)]
crimson/osd/main: handle and rethrow exception in fetch_config()

print more verbose error message when monc fails to connect to moitor.
for better user experience.

also, unregister all dispatchers by calling msgr->stop() before calling
monc.stop() to ensure the messenger can be shutdown gracefully.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/crimson/test_messenger: add editor variables in header
Kefu Chai [Sat, 29 May 2021 05:45:41 +0000 (13:45 +0800)]
test/crimson/test_messenger: add editor variables in header

to help emacs and vim to format the code better.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd/main: do cleanup using defer() in fetch_config()
Kefu Chai [Sat, 29 May 2021 05:44:29 +0000 (13:44 +0800)]
crimson/osd/main: do cleanup using defer() in fetch_config()

so we can stop the started services even if some of the step(s) throw or
fail.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agovstart.sh: remove unused variable
Kefu Chai [Sat, 29 May 2021 03:52:45 +0000 (11:52 +0800)]
vstart.sh: remove unused variable

osdmap_fn is not used after being initialized, so drop it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/allocator_replay_test: make allocator type configurable
Igor Fedotov [Mon, 17 May 2021 19:21:53 +0000 (22:21 +0300)]
test/allocator_replay_test: make allocator type configurable

Signed-off-by: Igor Fedotov <ifedotov@suse.com>
4 years agoMerge pull request #41278 from sebastian-philipp/mgr-cephadm-set-user-no-hosts
Kefu Chai [Sat, 29 May 2021 02:42:14 +0000 (10:42 +0800)]
Merge pull request #41278 from sebastian-philipp/mgr-cephadm-set-user-no-hosts

mgr/cephadm: Don't call _check_host without hosts

Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
4 years agoMerge pull request #41520 from tchaikov/wip-osd-unique-ptr
Kefu Chai [Sat, 29 May 2021 02:37:31 +0000 (10:37 +0800)]
Merge pull request #41520 from tchaikov/wip-osd-unique-ptr

os: let ObjectStore::create() return unique_ptr<>

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoMerge pull request #41573 from tchaikov/wip-allocat-ctor
Kefu Chai [Sat, 29 May 2021 02:36:43 +0000 (10:36 +0800)]
Merge pull request #41573 from tchaikov/wip-allocat-ctor

os/bluestore: pass string_view to ctor of Allocator

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
4 years agorbd: propagate Trash::purge() result
Ilya Dryomov [Wed, 26 May 2021 12:21:22 +0000 (14:21 +0200)]
rbd: propagate Trash::purge() result

Exit with respective status like other commands do.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
4 years agoMerge pull request #41582 from cyx1231st/wip-seastore-swap-read-extent
Kefu Chai [Fri, 28 May 2021 07:35:01 +0000 (15:35 +0800)]
Merge pull request #41582 from cyx1231st/wip-seastore-swap-read-extent

crimson/seastore: introduce and adopt LBAManager::get_mapping(t, offset)

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/seastore: adopt get_mapping(t, offset) interface 41582/head
Yingxin Cheng [Thu, 27 May 2021 15:33:25 +0000 (23:33 +0800)]
crimson/seastore: adopt get_mapping(t, offset) interface

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agocrimson/seastore: implement and test get_mapping(t, laddr)
Yingxin Cheng [Thu, 27 May 2021 08:48:47 +0000 (16:48 +0800)]
crimson/seastore: implement and test get_mapping(t, laddr)

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agocrimson/seastore: add stub to introduce get_mapping() without length
Yingxin Cheng [Thu, 27 May 2021 07:02:15 +0000 (15:02 +0800)]
crimson/seastore: add stub to introduce get_mapping() without length

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agoMerge pull request #41578 from rzarzynski/wip-crimson-monc-auth-req
Kefu Chai [Fri, 28 May 2021 00:09:07 +0000 (08:09 +0800)]
Merge pull request #41578 from rzarzynski/wip-crimson-monc-auth-req

crimson/monc: handle_auth_request() doesn't depend on active_con.

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41544 from tchaikov/wip-doc-confval
Kefu Chai [Thu, 27 May 2021 23:59:34 +0000 (07:59 +0800)]
Merge pull request #41544 from tchaikov/wip-doc-confval

doc/mgr: use confval directive to define options

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agodoc/mgr: use confval directive to define options 41544/head
Kefu Chai [Wed, 26 May 2021 04:00:57 +0000 (12:00 +0800)]
doc/mgr: use confval directive to define options

less repeating this way

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41540 from ceph/wip-15213
Yuri Weinstein [Thu, 27 May 2021 23:40:41 +0000 (16:40 -0700)]
Merge pull request #41540 from ceph/wip-15213

doc: 15.2.13 Release Notes

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Sebastian Wagner <sebastian.wagner@suse.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge PR #41483 into master
Sage Weil [Thu, 27 May 2021 23:14:53 +0000 (19:14 -0400)]
Merge PR #41483 into master

* refs/pull/41483/head:
cephadm: stop passing --no-hosts to podman
mgr/nfs: use host.addr for backend IP where possible
mgr/cephadm: convert host addr if non-IP to IP
mgr/dashboard,prometheus: new method of getting mgr IP
doc/cephadm: remove any reference to the use of DNS or /etc/hosts
mgr/cephadm: use known host addr
mgr/cephadm: resolve IP at 'orch host add' time

Reviewed-by: Sebastian Wagner <swagner@suse.com>
4 years agoMerge pull request #41561 from zdover23/wip-doc-cephadm-s-mgmt-service-status-improve...
zdover23 [Thu, 27 May 2021 21:41:40 +0000 (07:41 +1000)]
Merge pull request #41561 from zdover23/wip-doc-cephadm-s-mgmt-service-status-improvement-2021-05-26

doc/cephadm: enrich "service status"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
4 years agocephadm: stop passing --no-hosts to podman 41483/head
Sage Weil [Tue, 25 May 2021 17:55:08 +0000 (13:55 -0400)]
cephadm: stop passing --no-hosts to podman

This reverts cfc1f914ce74f1fd1f45e2efd3ba2ddcb2da129a, which is no longer
neceesary because (1) we don't use socket.getfqdn(), and (2) we generally
do not rely on DNS or /etc/hosts at all anymore (with the exception of
the upgrade transition).

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/nfs: use host.addr for backend IP where possible
Sage Weil [Wed, 26 May 2021 22:38:05 +0000 (18:38 -0400)]
mgr/nfs: use host.addr for backend IP where possible

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/cephadm: convert host addr if non-IP to IP
Sage Weil [Tue, 25 May 2021 20:10:49 +0000 (16:10 -0400)]
mgr/cephadm: convert host addr if non-IP to IP

Previously we allowed the host.addr to be a DNS name (short or fqdn).
This is problematic because of the inconsistent way that docker and podman
handle /etc/hosts, and undesirable because relying on external DNS is
an external source of failure for the cluster without any benefit in
return (simply updating DNS is not sufficient to make ceph behave).

So: update any non-IP to an IP as soon as we start up (presumably on
upgrade).  If we get a loopback address (127.0.0.1 or 127.0.1.1), then
wait and hope that the next instance of the manager has better luck.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/dashboard,prometheus: new method of getting mgr IP
Sage Weil [Tue, 25 May 2021 17:00:35 +0000 (13:00 -0400)]
mgr/dashboard,prometheus: new method of getting mgr IP

- Use a centralized method get_mgr_ip()
- Look up the hostname via DNS.  This is a bit more reliable than
getfqdn() since it will work even when podman adds the container
name to /etc/hosts.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agodoc/cephadm: remove any reference to the use of DNS or /etc/hosts
Sage Weil [Tue, 25 May 2021 16:14:39 +0000 (12:14 -0400)]
doc/cephadm: remove any reference to the use of DNS or /etc/hosts

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/cephadm: use known host addr
Sage Weil [Fri, 21 May 2021 17:31:31 +0000 (13:31 -0400)]
mgr/cephadm: use known host addr

If the host IP/addr is known, use that.  The addr might even be a FQDN
instead of an IP address, in which case we want to look that up instead
of the bare hostname.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agocrimson/monc: handle_auth_request() doesn't depend on active_con. 41578/head
Radoslaw Zarzynski [Thu, 27 May 2021 14:55:40 +0000 (14:55 +0000)]
crimson/monc: handle_auth_request() doesn't depend on active_con.

Following crash occured at Sepia [1]:

```
INFO  2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] ProtocolV2::start_accept(): targ
et_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,872 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] TRIGGER ACCEPTING, was NONE
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] SEND(26) banner: len_payload=16,
 supported=1, required=0, banner="ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(10) banner: "ceph v2
"
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT banner: payload_len=16
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] RECV(16) banner features: supported=1 required=0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] WRITE HelloFrame: my_type=osd, peer_addr=172.21.15.119:55220/0
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> unknown.? -@55220] GOT HelloFrame: my_type=client peer_addr=v2:172.21.15.119:6803/31733
INFO  2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] UPDATE: peer_type=client, policy(lossy=true server=true standby=false resetcheck=false)
DEBUG 2021-05-26 20:16:32,873 [shard 0] ms - [osd.0(client) v2:172.21.15.119:6803/31733 >> client.? -@55220] GOT AuthRequestFrame: method=2, preferred_modes={1, 2}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4622-gaa1dc559/rpm/el8/BUILD/ceph-17.0.0-4622-gaa1dc559/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
 0# 0x000055E84CF44C1F in ceph-osd
 1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
 2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
 3# 0x00007F2BC88C0B20 in /lib64/libpthread.so.0
 4# crimson::mon::Connection::get_conn() in ceph-osd
 5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
 6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
 7# 0x000055E84DF67669 in ceph-osd
 8# 0x000055E84DF68775 in ceph-osd
 9# 0x000055E846F47F60 in ceph-osd
10# 0x000055E85296770F in ceph-osd
11# 0x000055E85296CC50 in ceph-osd
12# 0x000055E852B1ECBB in ceph-osd
13# 0x000055E85267C73A in ceph-osd
14# main in ceph-osd
15# __libc_start_main in /lib64/libc.so.6
16# _start in ceph-osd
Fault at location: 0x98
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136907

When the `handle_auth_request()` happens, there is no guarantee
`active_con` is being available. This is reflected in the classical
implementation:

```cpp
int MonClient::handle_auth_request(
  Connection *con,
  // ...
  ceph::buffer::list *reply)
{
  // ...
  bool isvalid = ah->verify_authorizer(
    cct,
    *rotating_secrets,
    payload,
    auth_meta->get_connection_secret_length(),
    reply,
    &con->peer_name,
    &con->peer_global_id,
    &con->peer_caps_info,
    &auth_meta->session_key,
    &auth_meta->connection_secret,
    ac);
```

The patch transplate the same logic to crimson.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoos/bluestore: pass string_view to ctor of Allocator 41573/head
Kefu Chai [Thu, 27 May 2021 14:26:05 +0000 (22:26 +0800)]
os/bluestore: pass string_view to ctor of Allocator

just for the sake of correctness, as they don't need a full-blown
std::string, what they need is but a string like object. and they always
create a std::string instance as a member variable if they want to have
a copy of it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotools/ceph_objectstore_tool: destruct ObjectStore using unique_ptr<> 41520/head
Kefu Chai [Thu, 27 May 2021 15:14:36 +0000 (23:14 +0800)]
tools/ceph_objectstore_tool: destruct ObjectStore using unique_ptr<>

before this change, cot never destructs the created ObjectStore
instances.

after this change, they are destructed upon returning from main().

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoosd: pass unique_ptr<ObjectStore> to ctor of OSD
Kefu Chai [Thu, 27 May 2021 03:08:48 +0000 (11:08 +0800)]
osd: pass unique_ptr<ObjectStore> to ctor of OSD

less error-prone, and it's simpler to manage the resource using RAII

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoosd/OSD: remove unused include headers
Kefu Chai [Tue, 25 May 2021 07:43:47 +0000 (15:43 +0800)]
osd/OSD: remove unused include headers

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoosd/OSD: use scope_guard to umount objecstore
Kefu Chai [Tue, 25 May 2021 07:41:26 +0000 (15:41 +0800)]
osd/OSD: use scope_guard to umount objecstore

RAII can simplify the clean up logic in OSD::mkfs().

and since `ch` is a smart pointer, so it is able to take care of itself,
as long as we ensure that it is destructed before objectstore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoosd: pass unique_ptr<ObjectStore> to OSD::mkfs()
Kefu Chai [Tue, 25 May 2021 07:34:34 +0000 (15:34 +0800)]
osd: pass unique_ptr<ObjectStore> to OSD::mkfs()

less error prune this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoos: let ObjectStore::create() return unique_ptr<>
Kefu Chai [Tue, 25 May 2021 07:18:21 +0000 (15:18 +0800)]
os: let ObjectStore::create() return unique_ptr<>

instead of returning a raw pointer of ObjectStore, let
`ObjectStore::create()` return a `std::unique_ptr<ObjectStore>`.

less error prune this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>