]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
4 years agoqa: always format the pgid in hex 41908/head
Xiubo Li [Mon, 24 May 2021 02:49:09 +0000 (10:49 +0800)]
qa: always format the pgid in hex

If the pg number is larger than 9, this won't match the array index,
which was in dec just before this.

Fixes: https://tracker.ceph.com/issues/50808
Signed-off-by: Xiubo Li <xiubli@redhat.com>
4 years agoMerge pull request #41695 from tchaikov/wip-crimson-net-move
Kefu Chai [Mon, 7 Jun 2021 01:52:36 +0000 (09:52 +0800)]
Merge pull request #41695 from tchaikov/wip-crimson-net-move

crimson/net: move from out_q into sent queue

Reviewed-by: Amnon Hanuhov <ahanukov@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agoMerge pull request #41708 from tchaikov/wip-seastore-open-coll
Kefu Chai [Sun, 6 Jun 2021 01:45:16 +0000 (09:45 +0800)]
Merge pull request #41708 from tchaikov/wip-seastore-open-coll

crimson/os/seastore: open_collection() returns nullptr if DNE

Reviewed-by: Samuel Just <sjust@redhat.com>
4 years agoMerge PR #41665 into master
Sage Weil [Sat, 5 Jun 2021 20:43:36 +0000 (16:43 -0400)]
Merge PR #41665 into master

* refs/pull/41665/head:
mgr/cephadm:fix alerts sent to wrong URL

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
4 years agoMerge pull request #40652 from ronen-fr/wip-ronenf-cscrub-class
Kefu Chai [Sat, 5 Jun 2021 16:06:32 +0000 (00:06 +0800)]
Merge pull request #40652 from ronen-fr/wip-ronenf-cscrub-class

osd/scrub: modify "classic" OSD scrub state-machine to support Crimson

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #41154 from rzarzynski/wip-global-backtrace-bug-50647
Kefu Chai [Sat, 5 Jun 2021 13:41:00 +0000 (21:41 +0800)]
Merge pull request #41154 from rzarzynski/wip-global-backtrace-bug-50647

global: fault handlers cope with simultaneous faults now.

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41604 from t-msn/wip-51030
Kefu Chai [Sat, 5 Jun 2021 13:33:00 +0000 (21:33 +0800)]
Merge pull request #41604 from t-msn/wip-51030

osd/ECBackend: Fix null pointer dereference when enabling jaeger tracing

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
4 years agoMerge pull request #41501 from aclamk/wip-bluefs-safer-flush
Kefu Chai [Sat, 5 Jun 2021 13:26:04 +0000 (21:26 +0800)]
Merge pull request #41501 from aclamk/wip-bluefs-safer-flush

os/bluestore: Remove possibility of replay log and file inconsistency

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
4 years agoMerge pull request #41506 from ceph/wip-cv-batch-fixes
Kefu Chai [Sat, 5 Jun 2021 13:23:13 +0000 (21:23 +0800)]
Merge pull request #41506 from ceph/wip-cv-batch-fixes

ceph-volume: fix batch report and respect ceph.conf config values

Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>
4 years agoMerge pull request #41688 from tchaikov/wip-debian-rook
Kefu Chai [Sat, 5 Jun 2021 13:17:24 +0000 (21:17 +0800)]
Merge pull request #41688 from tchaikov/wip-debian-rook

debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-roo…

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
4 years agocrimson/os/seastore: open_collection() returns nullptr if DNE 41708/head
Kefu Chai [Sat, 5 Jun 2021 09:39:25 +0000 (17:39 +0800)]
crimson/os/seastore: open_collection() returns nullptr if DNE

we check for the existence of meta collection by trying to open it,
if it exists, we continue check for the superblock stored in it, if
the superblock does not exist or corrupted, we consider it as a failure.

before this change, open_collection() always return a valud Collection
even if the store does not have the collection with specified cid. this
behavior could be misleading in the use case above.

after this change, open_collection() looks up the collections stored in
root collection node for the specfied cid, and return nullptr if it does
not exist.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/os/seastore: use structured binding
Kefu Chai [Sat, 5 Jun 2021 09:22:35 +0000 (17:22 +0800)]
crimson/os/seastore: use structured binding

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41581 from tchaikov/wip-options-mgr-mon
Kefu Chai [Sat, 5 Jun 2021 02:06:07 +0000 (10:06 +0800)]
Merge pull request #41581 from tchaikov/wip-options-mgr-mon

common/options: extract mgr and mon options out

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #40073 from jmolmo/delete_service_causes_osd_removal
Kefu Chai [Sat, 5 Jun 2021 00:44:42 +0000 (08:44 +0800)]
Merge pull request #40073 from jmolmo/delete_service_causes_osd_removal

mgr/cephadm: Warn about OSDs to remove manually when deleting an OSD service

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
4 years agoMerge PR #41697 into master
Patrick Donnelly [Fri, 4 Jun 2021 20:07:42 +0000 (13:07 -0700)]
Merge PR #41697 into master

* refs/pull/41697/head:
script: add a few more volume mounts for sepia
script: drop ceph-fuse from docker debugging
script: enable centos debuginfo repo for debugging
script: update repo url for multi-arch builds
script: fetch autobuild.asc key via HTTPS

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41690 from tchaikov/wip-test-alloc_aging
Kefu Chai [Fri, 4 Jun 2021 17:57:03 +0000 (01:57 +0800)]
Merge pull request #41690 from tchaikov/wip-test-alloc_aging

test/objectstore/unittest_alloc_aging: init cct

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
4 years agoMerge pull request #41698 from tchaikov/wip-qa-rook
Kefu Chai [Fri, 4 Jun 2021 17:23:35 +0000 (01:23 +0800)]
Merge pull request #41698 from tchaikov/wip-qa-rook

qa/suites/orch/rook/smoke: stop testing on ubuntu 18.04

Reviewed-by: Sage Weil <sage@redhat.com>
4 years agoqa/suites/orch/rook/smoke: stop testing on ubuntu 18.04 41698/head
Kefu Chai [Fri, 4 Jun 2021 17:11:13 +0000 (01:11 +0800)]
qa/suites/orch/rook/smoke: stop testing on ubuntu 18.04

even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:

os_type: ubuntu
os_version: "18.04"

but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.

in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoscript: add a few more volume mounts for sepia 41697/head
Patrick Donnelly [Fri, 4 Jun 2021 16:33:54 +0000 (09:33 -0700)]
script: add a few more volume mounts for sepia

We now have a few Ceph file systems with various possible mount points
depending which lab machine you're using.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoscript: drop ceph-fuse from docker debugging
Patrick Donnelly [Fri, 4 Jun 2021 16:33:30 +0000 (09:33 -0700)]
script: drop ceph-fuse from docker debugging

Install this on the fly as necessary...

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoscript: enable centos debuginfo repo for debugging
Patrick Donnelly [Fri, 4 Jun 2021 16:32:52 +0000 (09:32 -0700)]
script: enable centos debuginfo repo for debugging

So we can fetch e.g. the sqlite debuginfo packages.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoscript: update repo url for multi-arch builds
Patrick Donnelly [Fri, 4 Jun 2021 16:31:19 +0000 (09:31 -0700)]
script: update repo url for multi-arch builds

Brad suggested this change based on his commit [1]. Thank you!

[1] https://github.com/ceph/ceph-ansible/commit/267cce9e8360fc8cb9c192fde2406e5dca724610

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoscript: fetch autobuild.asc key via HTTPS
Patrick Donnelly [Fri, 4 Jun 2021 16:30:04 +0000 (09:30 -0700)]
script: fetch autobuild.asc key via HTTPS

Rather than relying the key being avaiable on the LRC /ceph file system.
(Someone appears to have deleted it recently.)

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agocrimson/net: move from out_q into sent queue 41695/head
Kefu Chai [Fri, 4 Jun 2021 12:19:30 +0000 (20:19 +0800)]
crimson/net: move from out_q into sent queue

to avoid the refcounting of underlying RefCountedObject.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41679 from AmnonHanuhov/wip-get_rid_of_pending_q
Kefu Chai [Fri, 4 Jun 2021 12:13:54 +0000 (20:13 +0800)]
Merge pull request #41679 from AmnonHanuhov/wip-get_rid_of_pending_q

crimson/net: Use out_q instead of pending_q

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agocrimson/net: Use out_q instead of pending_q 41679/head
Amnon Hanuhov [Thu, 3 Jun 2021 13:57:41 +0000 (16:57 +0300)]
crimson/net: Use out_q instead of pending_q

pending_q contains the same messages as in out_q and it is only used
for creating a bytestream out of these messages. We can just use out_q for that.

Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
4 years agoMerge pull request #41631 from tchaikov/wip-keyring-decode
Kefu Chai [Fri, 4 Jun 2021 09:15:06 +0000 (17:15 +0800)]
Merge pull request #41631 from tchaikov/wip-keyring-decode

auth/KeyRing: always decode keying as plaintext

Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoMerge pull request #41587 from cfsnyder/bugfix_47738
Kefu Chai [Fri, 4 Jun 2021 09:00:48 +0000 (17:00 +0800)]
Merge pull request #41587 from cfsnyder/bugfix_47738

mgr/DaemonServer.cc: prevent mgr crashes caused by integer underflow that is triggered by large increases to pg_num/pgp_num

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41592 from tchaikov/wip-ceph-default-confffile
Kefu Chai [Fri, 4 Jun 2021 08:59:24 +0000 (16:59 +0800)]
Merge pull request #41592 from tchaikov/wip-ceph-default-confffile

ceph.in: use rados.Rados.DEFAULT_CONF_FILES

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #41594 from tchaikov/wip/test/librados/list
Kefu Chai [Fri, 4 Jun 2021 08:58:59 +0000 (16:58 +0800)]
Merge pull request #41594 from tchaikov/wip/test/librados/list

test/librados/list: print reason why test fails

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #36941 from hoamer/patch-1
Kefu Chai [Fri, 4 Jun 2021 08:57:41 +0000 (16:57 +0800)]
Merge pull request #36941 from hoamer/patch-1

doc/mgr/administrator: add a more precise description for creating key

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agodoc/mgr/administrator: add a more precise description for creating key 36941/head
hoamer [Wed, 2 Sep 2020 07:13:12 +0000 (09:13 +0200)]
doc/mgr/administrator: add a more precise description for creating key

added a more precise description to handle filename when creating key for mgr

Signed-off-by: hoamer <kontakt@sebastian-neugebauer.de>
4 years agodebian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-rook anymore 41688/head
Kefu Chai [Fri, 4 Jun 2021 03:25:12 +0000 (11:25 +0800)]
debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-rook anymore

per https://www.debian.org/doc/debian-policy/ch-relationships.html

> Recommends
>   This declares a strong, but not absolute, dependency.
>
> The Recommends field should list packages that would be found together
> with this one in all but unusual installations.

ceph-mgr-modules-core provides a set of ceph-mgr modules which are
always enabeld. but the rook module enables ceph-mgr to install and
configure a Ceph cluster using Rook. this module is very useful but
it does not have such a strong connection with ceph-mgr-modules-core.
we can always install it separately for using better intergration with
Rook.

See-also: https://tracker.ceph.com/issues/45574
Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/objectstore/unittest_alloc_aging: init cct 41690/head
Kefu Chai [Wed, 2 Jun 2021 09:54:18 +0000 (17:54 +0800)]
test/objectstore/unittest_alloc_aging: init cct

* initialize the cct use by test, otherwise g_ceph_context is
  not set at all.
* instead of using g_ceph_context, use static member variable cct.
  less dependency to the global instance.
* setup and teardown the cct for test suite, because global_init()
  initialize g_ceph_context, which cannot be set multiple times.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/objectstore: s/TearDownTestCase/TearDownTestSuite/
Kefu Chai [Wed, 2 Jun 2021 09:38:49 +0000 (17:38 +0800)]
test/objectstore: s/TearDownTestCase/TearDownTestSuite/

TearDownTestCase is deprecated by GTest. let's use the new API instead.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41652 from tchaikov/wip-qa-asock-or
Kefu Chai [Fri, 4 Jun 2021 05:50:38 +0000 (13:50 +0800)]
Merge pull request #41652 from tchaikov/wip-qa-asock-or

qa/tasks/admin_socket: support "foo || bar" as command

Reviewed-by: Samuel Just <sjust@redhat.com>
4 years agoMerge pull request #41686 from t-msn/update-trace-doc
Kefu Chai [Fri, 4 Jun 2021 04:30:23 +0000 (12:30 +0800)]
Merge pull request #41686 from t-msn/update-trace-doc

doc/dev: update how to use lttng/blkin trace

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agodoc/dev: update how to use lttng/blkin trace 41686/head
Misono Tomohiro [Fri, 4 Jun 2021 02:36:49 +0000 (11:36 +0900)]
doc/dev: update how to use lttng/blkin trace

Update doc to reflect current status.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
4 years agoMerge PR #41553 into master
Sage Weil [Fri, 4 Jun 2021 02:04:55 +0000 (22:04 -0400)]
Merge PR #41553 into master

* refs/pull/41553/head:
ceph-volume: replace __ with _ in device_id

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge PR #41636 into master
Sage Weil [Fri, 4 Jun 2021 02:04:32 +0000 (22:04 -0400)]
Merge PR #41636 into master

* refs/pull/41636/head:
mgr/cephadm/inventory: do not try to resolve current mgr host
pybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap
mgr/restful: use get_mgr_ip() instead of hostname

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41674 from tchaikov/wip-vstart-without-restful
Kefu Chai [Fri, 4 Jun 2021 01:44:58 +0000 (09:44 +0800)]
Merge pull request #41674 from tchaikov/wip-vstart-without-restful

vstart.sh: add an option named --without-restful

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agomgr/cephadm:fix alerts sent to wrong URL 41665/head
Paul Cuzner [Wed, 2 Jun 2021 23:34:19 +0000 (11:34 +1200)]
mgr/cephadm:fix alerts sent to wrong URL

The path_prefix in prometheus.yml was specifying an
endpoint prefix, which was invalid. This resulted in 404
errors when trying to send alerts to alertmanager and
blocked alerts being sent on to the ceph-dashboard API
receiver. This fix remves this prefix.

Fixes: https://tracker.ceph.com/issues/51073
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
4 years agoMerge pull request #41670 from tchaikov/wip-op-tracking-spin-off-0
Kefu Chai [Thu, 3 Jun 2021 23:50:44 +0000 (07:50 +0800)]
Merge pull request #41670 from tchaikov/wip-op-tracking-spin-off-0

crimson, common: improve const-correctness of Operation::dump()s.

Reviewed-by: Samuel Just <sjust@redhat.com>
4 years agoMerge pull request #41672 from tchaikov/wip-crimson-test-handle-fut
Kefu Chai [Thu, 3 Jun 2021 23:50:21 +0000 (07:50 +0800)]
Merge pull request #41672 from tchaikov/wip-crimson-test-handle-fut

test/crimson/seastore: always handle returned future<>

Reviewed-by: Samuel Just <sjust@redhat.com>
4 years agoMerge PR #41654 into master
Patrick Donnelly [Thu, 3 Jun 2021 20:34:54 +0000 (13:34 -0700)]
Merge PR #41654 into master

* refs/pull/41654/head:
mds: do not infinitely recursively print a metric

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
4 years agoMerge PR #41639 into master
Patrick Donnelly [Thu, 3 Jun 2021 20:33:58 +0000 (13:33 -0700)]
Merge PR #41639 into master

* refs/pull/41639/head:
mds/scrub: write root inode backtrace at creation

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge PR #41499 into master
Patrick Donnelly [Thu, 3 Jun 2021 20:33:27 +0000 (13:33 -0700)]
Merge PR #41499 into master

* refs/pull/41499/head:
qa/tasks/mds_thrash: fix thrash iteration never skip

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge PR #41443 into master
Patrick Donnelly [Thu, 3 Jun 2021 20:23:17 +0000 (13:23 -0700)]
Merge PR #41443 into master

* refs/pull/41443/head:
test: update log-ignorelist for fs:mirror test

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge PR #39910 into master
Patrick Donnelly [Thu, 3 Jun 2021 20:22:23 +0000 (13:22 -0700)]
Merge PR #39910 into master

* refs/pull/39910/head:
test: Add test for mgr hang when osd is full
mgr: Set client_check_pool_perm to false
mds: Add full caps to avoid osd full check

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge pull request #41559 from dmick/wip-grafana-container
Dan Mick [Thu, 3 Jun 2021 18:32:24 +0000 (11:32 -0700)]
Merge pull request #41559 from dmick/wip-grafana-container

monitoring/grafana/build/Makefile: revamp for arm64 builds, pushes to docker and quay, jenkins

4 years agomgr/cephadm/inventory: do not try to resolve current mgr host 41636/head
Sage Weil [Thu, 3 Jun 2021 14:29:00 +0000 (10:29 -0400)]
mgr/cephadm/inventory: do not try to resolve current mgr host

The CNI configuration may set up a private network for the container, which
is mapped to the hostname in /etc/hosts.  For example, my test box sets
up 10.88.0.0/24 because I was using crio + kubeadm on this host earlier
(at least I think that's why):

$ sudo podman run --rm --name test123 --entrypoint /bin/bash -it quay.ceph.io/ceph-ci/ceph:master -c "cat /etc/hosts"
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.88.0.8 f9e91bf2478f test123

In any case, we should never trust a lookup of our own hostname from inside
a container!

This isn't quite sufficient, though: if this is a single-host cluster, then
we fall back to using get_mgr_ip(). That value may be distorted by the
public_network option on the mgr, but we don't have any other good
options here, and single-node clusters are unlikely to have complex
network configs.

Refactor a bit to avoid the try/except nesting.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agopybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap
Sage Weil [Wed, 2 Jun 2021 02:31:11 +0000 (22:31 -0400)]
pybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap

The previous approach was convoluted: we tried to do a DNS lookup on the
hostname, which would fail if /etc/hosts had an entry.  Which, with podman,
it does.  And the IP it has will vary in all sorts of weird ways.  For
example, CNI on my host means that I get a dynamic address in 10.88.0.0/24.

Avoid all of that nonsense and use the IP that is in the mgrmap.  There
may be multiple IPs (v2 + v1, or maybe even IPv4 + v6 in the future); in
that case, use the first one.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/restful: use get_mgr_ip() instead of hostname
Sage Weil [Wed, 2 Jun 2021 02:31:47 +0000 (22:31 -0400)]
mgr/restful: use get_mgr_ip() instead of hostname

Now we match dashboard!

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #41308 from sseshasa/wip-osd-benchmark-for-mclock
Neha Ojha [Thu, 3 Jun 2021 15:39:22 +0000 (08:39 -0700)]
Merge pull request #41308 from sseshasa/wip-osd-benchmark-for-mclock

osd: Run osd bench test to override default max osd capacity for mclock

Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
4 years agoMerge pull request #41316 from cbodley/wip-50785
Casey Bodley [Thu, 3 Jun 2021 15:05:00 +0000 (11:05 -0400)]
Merge pull request #41316 from cbodley/wip-50785

rgw: parse tenant name out of rgwx-bucket-instance

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>
4 years agoMerge pull request #41677 from tchaikov/wip-oom
Kefu Chai [Thu, 3 Jun 2021 14:40:26 +0000 (22:40 +0800)]
Merge pull request #41677 from tchaikov/wip-oom

ceph.spec.in: increase the mem_per_job to 3GiB

Reviewed-by: David Galloway <dgallowa@redhat.com>
4 years agoMerge pull request #41668 from pleiadesian/patch-bucket-chown
Casey Bodley [Thu, 3 Jun 2021 14:28:35 +0000 (10:28 -0400)]
Merge pull request #41668 from pleiadesian/patch-bucket-chown

rgw: require bucket name in bucket chown

Reviewed-by: Or Friedmann <ofriedma@redhat.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
4 years agoMerge pull request #41462 from yehudasa/wip-50920
Casey Bodley [Thu, 3 Jun 2021 14:16:30 +0000 (10:16 -0400)]
Merge pull request #41462 from yehudasa/wip-50920

rgw: auth v4 client: don't convert '+' to space

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agocmake: increase the MAX_{LINK,COMPILE}_MEM 41677/head
Kefu Chai [Thu, 3 Jun 2021 12:48:53 +0000 (20:48 +0800)]
cmake: increase the MAX_{LINK,COMPILE}_MEM

based on recent observation, quite a few C++ source file take
around more than 3.0GiB to compile. for instance,
test_mock_HttpClient.cc could take up to 6270MiB memory to compile.

so increase MAX_{LINK,COMPILE}_MEM accordingly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoceph.spec.in: increase the mem_per_job to 3GiB
Kefu Chai [Thu, 3 Jun 2021 12:41:36 +0000 (20:41 +0800)]
ceph.spec.in: increase the mem_per_job to 3GiB

to lower the number of jobs, we are experiencing build failures on
a builder with 48c96t, 193 free mem. the failures were caused by
OOM killer which kills the c++ compiler

[498376.128969] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=cc1plus,pid=1387895,uid=1110
[498376.145288] Out of memory: Killed process 1387895 (cc1plus) total-vm:3323312kB, anon-rss:3164568kB, file-rss:0kB, shmem-rss:0kB, UID:1110
[498376.315185] oom_reaper: reaped process 1387895 (cc1plus), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[498377.882072] cc1plus invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

before this change, we use the total memory to calculate the number
of jobs, and assume that each job takes at most 2.5GiB mem. in the
case above, the # of job is 96.

after this change, we use the free memory, and increse the mem per job
to 3.0GiB. in the case above, the # of job would be 85.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41669 from tchaikov/wip-crimson-asok-dump-metrics
Kefu Chai [Thu, 3 Jun 2021 11:45:23 +0000 (19:45 +0800)]
Merge pull request #41669 from tchaikov/wip-crimson-asok-dump-metrics

crimson/admin: s/perf dump_seastar/dump_metrics/

Reviewed-by: Amnon Hanuhov <ahanukov@redhat.com>
4 years agovstart.sh: use here document to display multi-line message 41674/head
Kefu Chai [Thu, 3 Jun 2021 10:45:48 +0000 (18:45 +0800)]
vstart.sh: use here document to display multi-line message

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agovstart.sh: add an option named --without-restful"
Kefu Chai [Thu, 3 Jun 2021 10:42:48 +0000 (18:42 +0800)]
vstart.sh: add an option named --without-restful"

so we don't need to wait for restful module to be loaded if not working
on this mgr module.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agovstart.sh: extract create_mgr_restful_secret() out
Kefu Chai [Thu, 3 Jun 2021 10:38:08 +0000 (18:38 +0800)]
vstart.sh: extract create_mgr_restful_secret() out

for better readability, and so it's easier to make this step optional if
developer is not interested in using the restful mgr module.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agodoc: Update mclock-config-ref to reflect automated OSD benchmarking 41308/head
Sridhar Seshasayee [Wed, 12 May 2021 14:50:20 +0000 (20:20 +0530)]
doc: Update mclock-config-ref to reflect automated OSD benchmarking

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
4 years agoMerge pull request #41671 from liu-chunmei/seastore-logger
Kefu Chai [Thu, 3 Jun 2021 07:39:16 +0000 (15:39 +0800)]
Merge pull request #41671 from liu-chunmei/seastore-logger

crimson/seastore: cleanup ceph_subsystem_filestore to seastore

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agotest/crimson/seastore: declare return type explicitly 41672/head
Kefu Chai [Thu, 3 Jun 2021 07:32:20 +0000 (15:32 +0800)]
test/crimson/seastore: declare return type explicitly

for better readability

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/crimson/seastore: always handle returned future<>
Kefu Chai [Thu, 3 Jun 2021 07:28:45 +0000 (15:28 +0800)]
test/crimson/seastore: always handle returned future<>

this change also silences the [-Wunused-result] warning.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocommon: fix a formatting nit in OpTracker::dump_ops_in_flight(). 41670/head
Radoslaw Zarzynski [Tue, 19 Jan 2021 16:05:47 +0000 (17:05 +0100)]
common: fix a formatting nit in OpTracker::dump_ops_in_flight().

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson: improve const-correctness of Operation::dump()s.
Radoslaw Zarzynski [Tue, 19 Jan 2021 16:05:12 +0000 (17:05 +0100)]
crimson: improve const-correctness of Operation::dump()s.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson/seastore: cleanup ceph_subsystem_filestore to seastore 41671/head
chunmei-liu [Thu, 3 Jun 2021 06:41:42 +0000 (23:41 -0700)]
crimson/seastore: cleanup ceph_subsystem_filestore to seastore

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
4 years agoMerge pull request #41666 from tchaikov/wip-crimson-stop
Kefu Chai [Thu, 3 Jun 2021 06:33:52 +0000 (14:33 +0800)]
Merge pull request #41666 from tchaikov/wip-crimson-stop

crimson/osd: wait for SIGINT and SIGTERM before stopping

Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
4 years agoqa: use dump_metrics as alternative of get_heap_property 41652/head
Radoslaw Zarzynski [Mon, 17 May 2021 14:49:20 +0000 (14:49 +0000)]
qa: use dump_metrics as alternative of get_heap_property

"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoqa/tasks/admin_socket: support "foo || bar" as command
Kefu Chai [Wed, 2 Jun 2021 14:06:22 +0000 (22:06 +0800)]
qa/tasks/admin_socket: support "foo || bar" as command

so we can cater the needs of different implementation of osd, i.e.,
classic osd and crimson osd. they offer different set of asock commands.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/admin/osd_admin: sort forward declarations 41669/head
Kefu Chai [Thu, 3 Jun 2021 05:58:40 +0000 (13:58 +0800)]
crimson/admin/osd_admin: sort forward declarations

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/admin: fix the indent
Kefu Chai [Thu, 3 Jun 2021 05:48:27 +0000 (13:48 +0800)]
crimson/admin: fix the indent

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/admin: s/perf dump_seastar/dump_metrics/
Kefu Chai [Thu, 3 Jun 2021 05:45:05 +0000 (13:45 +0800)]
crimson/admin: s/perf dump_seastar/dump_metrics/

as a user-facing interface, no need to expose seastar in the name,
what matters to user is the content not the underlying technology or library.

so rename the command prefix to "dump_metrics"

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/admin: s/SeastarMetricsHook/DumpMetricsHook/
Kefu Chai [Thu, 3 Jun 2021 05:39:28 +0000 (13:39 +0800)]
crimson/admin: s/SeastarMetricsHook/DumpMetricsHook/

seastar is the name of one of the libraries used to implement crimson,
but the asok hook dumps not only builtin metrics in seastar, but also
the ones registered by crimson and seastore, so rename it to a more
general name.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agorgw: require bucket name in bucket chown 41668/head
Zulai Wang [Thu, 3 Jun 2021 05:13:15 +0000 (13:13 +0800)]
rgw: require bucket name in bucket chown

Checking and reporting missing the mandatory parameter avoid clueless error
message for bucket chown.

Signed-off-by: Zulai Wang <zl31wang@gmail.com>
4 years agocrimson/osd: wait for SIGINT and SIGTERM before stopping 41666/head
Kefu Chai [Thu, 3 Jun 2021 05:26:17 +0000 (13:26 +0800)]
crimson/osd: wait for SIGINT and SIGTERM before stopping

this change addresses an regression introduced by
37b83f4ed7ca69f105b93bf482cb2289cbaf9a4d. as we should not stop
services without being asked to do so.

in this change, signal handler for SIGINT and SIGTERM is registered to
handle these signals, and in the seastar thread, we wait until any of
these two signals is caught.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41627 from tchaikov/wip-mgr-repl-doc
Kefu Chai [Thu, 3 Jun 2021 01:36:15 +0000 (09:36 +0800)]
Merge pull request #41627 from tchaikov/wip-mgr-repl-doc

doc/mgr/modules: add a "debugging" section

Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
4 years agoMerge pull request #41138 from kalebskeithley/python39
Kefu Chai [Thu, 3 Jun 2021 01:34:56 +0000 (09:34 +0800)]
Merge pull request #41138 from kalebskeithley/python39

do_cmake: build with python3.9 on RHEL9

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agodo_cmake: build with python3.9 on RHEL9 41138/head
Kefu Chai [Thu, 3 Jun 2021 01:29:19 +0000 (09:29 +0800)]
do_cmake: build with python3.9 on RHEL9

rhel9 has python3.9 as of rhel9beta

Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41496 from Huber-ming/correct_spell
Kefu Chai [Thu, 3 Jun 2021 01:16:42 +0000 (09:16 +0800)]
Merge pull request #41496 from Huber-ming/correct_spell

rgw: correct the spelling of "instace"

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoosd/scrub: modify "classic" OSD scrub state-machine to support Crimson 40652/head
Ronen Friedman [Tue, 30 Mar 2021 13:39:19 +0000 (16:39 +0300)]
osd/scrub: modify "classic" OSD scrub state-machine to support Crimson

As some scrub-related functions are asynchronous in Crimson,
scrub states that call those functions cannot simply perform a
'post' or state-transition sequentially. The called operations
must arrange for a state-machine event to be sent upon completion.

Specifically, the following are now handled (on the FSM side) as async:
 - building scrub maps
 - comparing the scrub maps (and the rest of "what we
   do after a chunk is  handled")

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
4 years agoMerge PR #41635 into master
Patrick Donnelly [Wed, 2 Jun 2021 15:18:22 +0000 (08:18 -0700)]
Merge PR #41635 into master

* refs/pull/41635/head:
qa: increase fragmentation to improve uniform distribution

Reviewed-by: Ramana Raja <rraja@redhat.com>
4 years agoMerge pull request #41644 from rzarzynski/wip-crimson-fix-blocked-peering
Kefu Chai [Wed, 2 Jun 2021 14:43:40 +0000 (22:43 +0800)]
Merge pull request #41644 from rzarzynski/wip-crimson-fix-blocked-peering

crimson/monc: fix subscription stall that blocked peering.

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agomds: do not infinitely recursively print a metric 41654/head
Patrick Donnelly [Wed, 2 Jun 2021 14:28:49 +0000 (07:28 -0700)]
mds: do not infinitely recursively print a metric

Fixes: b1b44d775df3160d937c068d5e1079e24199ed6b
Fixes: https://tracker.ceph.com/issues/51067
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoMerge PR #41651 into master
Sage Weil [Wed, 2 Jun 2021 14:27:03 +0000 (10:27 -0400)]
Merge PR #41651 into master

* refs/pull/41651/head:
doc/cephadm: s/the the/the

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41645 from tchaikov/wip-crimson-osd-mkfs
Kefu Chai [Wed, 2 Jun 2021 14:10:12 +0000 (22:10 +0800)]
Merge pull request #41645 from tchaikov/wip-crimson-osd-mkfs

crimson/osd: check existing superblock when mkfs

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agodoc/cephadm: s/the the/the 41651/head
Zac Dover [Wed, 2 Jun 2021 14:06:06 +0000 (00:06 +1000)]
doc/cephadm: s/the the/the

This removes an extraneous "the" and reworks a
sentence so that it adheres to the grammatical
rules of the English language.

Signed-off-by: Zac Dover <zac.dover@gmail.com>
4 years agocrimson/osd: check existing superblock when mkfs 41645/head
Kefu Chai [Wed, 2 Jun 2021 12:57:14 +0000 (20:57 +0800)]
crimson/osd: check existing superblock when mkfs

in case mkfs on an existing store.

this change mirrors the behavior of classic osd, also addresses the
assert failure when BlueStore tries to create a collection when it
already contains a colloection with the same collection id.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: extract OSD::_write_superblock() out
Kefu Chai [Wed, 2 Jun 2021 12:47:03 +0000 (20:47 +0800)]
crimson/osd: extract OSD::_write_superblock() out

prepare for the change to verify existing meta collection and superblock
stored in it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/monc: fix subscription stall that blocked peering. 41644/head
Radoslaw Zarzynski [Wed, 2 Jun 2021 11:59:37 +0000 (11:59 +0000)]
crimson/monc: fix subscription stall that blocked peering.

There is a scenario when the `active_con` is properly
chosen but isn't marked as `ready_to_send`.
If `renew_subs()` is called during the `on_session_opened()`,
the flag will be turned on after the subscriptions are
renewed which cannot happen as it requires the flag to be
already set. In other words: there is a circular data dependency.

The net result is stalling the subscription machinery,
particularly the `OSDMap` subs. This caused a nasty peering
issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo`
state.

```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908$ less ./remote/smithi039/log/ceph-osd.1.log.gz
...
DEBUG 2021-05-26 20:19:48,134 [shard 0] osd -  pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=
-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0]
r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down  blocked_by {}
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
...
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering  got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0)
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features: 3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd -  pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru
...
DEBUG 2021-05-26 20:19:49,139 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active
...
DEBUG 2021-05-26 20:19:49,142 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd -  pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean
...
DEBUG 2021-05-26 20:22:31,223 [shard 0] osd -  pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset
...
<a lot of flipping>
...
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at 1622060647.81581881622060647.8173316 spent on 1 events
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start
INFO  2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000
INFO  2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at 1622060647.8516333, 0.0 spent on 0 events
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo
INFO  2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo
...
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down  blocked_by {}
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd -  pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>:  querying info from osd.4
...
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
...
DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62
...
<plenty of osd_ping messanging but no reply to the pg_query for 2.7>
...
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== #772 =
== osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70)
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> #772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70
```

The peering request got stuck due to awaiting for `OSDMap`.

```
DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star
```

```
INFO  2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s
ubscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO  2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO  2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
...
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4
INFO  2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16]
DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16)
...
INFO  2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17]
DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction
INFO  2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17)
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17)
...
INFO  2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17)
...
DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m
ap(20..21 src has 1..21) v4 (41)
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21]
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
...
DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m
ap(21..22 src has 1..22) v4 (41)
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22]
INFO  2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20
INFO  2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18)
...
<osdmap_subscribe(18) over and over>
```

```
2021-05-26T20:19:42.048+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:42.048+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:42.048+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3
...
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo
n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon) e1  entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000 7f4712ffd700 20  allow all
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime)
2021-05-26T20:19:49.129+0000 7f4712ffd700  5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4
2021-05-26T20:19:49.129+0000 7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features 3f01cfbb7ffdffff
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 15 622 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental    inc 14 578 bytes
2021-05-26T20:19:49.129+0000 7f4712ffd700  1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14..
15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480
```

```
seastar::future<> Client::renew_subs()
{
  if (!sub.have_new()) {
    logger().warn("{} - empty", __func__);
    return seastar::now();
  }
  logger().trace("{}", __func__);

  auto m = crimson::make_message<MMonSubscribe>();
  m->what = sub.get_subs();
  m->hostname = ceph_get_short_hostname();
  return send_message(std::move(m)).then([this] {
    sub.renewed();
  });
}
```

```
INFO  2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1)
DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s
ubscribe({osdmap=1}) v3 (15)
...
INFO  2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15)
...
INFO  2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN  2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO  2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
<no MMonSubcribe>
...
INFO  2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO  2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
<no MMonSubcribe>
```

[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/6136908

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoMerge pull request #41630 from rhcs-dashboard/fix-bucket-calculations
Ernesto Puerta [Wed, 2 Jun 2021 12:12:56 +0000 (14:12 +0200)]
Merge pull request #41630 from rhcs-dashboard/fix-bucket-calculations

mgr/dashboard: fix bucket objects and size calculations

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
4 years agoMerge pull request #41638 from tchaikov/wip-doc-crimson-doc
Kefu Chai [Wed, 2 Jun 2021 10:43:47 +0000 (18:43 +0800)]
Merge pull request #41638 from tchaikov/wip-doc-crimson-doc

doc/dev/crimson: update link to scylladb debugging tips

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agomds/scrub: write root inode backtrace at creation 41639/head
Milind Changire [Wed, 2 Jun 2021 09:42:09 +0000 (15:12 +0530)]
mds/scrub: write root inode backtrace at creation

Write root inode backtrace as soon as it is created;
Unwritten backtrace always caused scrub to fail for root inode.

Fixes: https://tracker.ceph.com/issues/50976
Signed-off-by: Milind Changire <mchangir@redhat.com>
4 years agodoc/dev/crimson: update link to scylladb debugging tips 41638/head
Kefu Chai [Wed, 2 Jun 2021 09:10:25 +0000 (17:10 +0800)]
doc/dev/crimson: update link to scylladb debugging tips

the old one is not reachable anymore.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41637 from tchaikov/wip-crimson-never-discard-future
Kefu Chai [Wed, 2 Jun 2021 09:00:53 +0000 (17:00 +0800)]
Merge pull request #41637 from tchaikov/wip-crimson-never-discard-future

crimson: always handle returned future

Reviewed-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agodoc/mgr/modules: add a "debugging" section 41627/head
Kefu Chai [Tue, 1 Jun 2021 11:58:47 +0000 (19:58 +0800)]
doc/mgr/modules: add a "debugging" section

Signed-off-by: Kefu Chai <kchai@redhat.com>