]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Xiubo Li [Mon, 24 May 2021 02:49:09 +0000 (10:49 +0800)]
qa: always format the pgid in hex
If the pg number is larger than 9, this won't match the array index,
which was in dec just before this.
Fixes: https://tracker.ceph.com/issues/50808
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Kefu Chai [Mon, 7 Jun 2021 01:52:36 +0000 (09:52 +0800)]
Merge pull request #41695 from tchaikov/wip-crimson-net-move
crimson/net: move from out_q into sent queue
Reviewed-by: Amnon Hanuhov <ahanukov@redhat.com>
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Kefu Chai [Sun, 6 Jun 2021 01:45:16 +0000 (09:45 +0800)]
Merge pull request #41708 from tchaikov/wip-seastore-open-coll
crimson/os/seastore: open_collection() returns nullptr if DNE
Reviewed-by: Samuel Just <sjust@redhat.com>
Sage Weil [Sat, 5 Jun 2021 20:43:36 +0000 (16:43 -0400)]
Merge PR #41665 into master
* refs/pull/41665/head:
mgr/cephadm:fix alerts sent to wrong URL
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 16:06:32 +0000 (00:06 +0800)]
Merge pull request #40652 from ronen-fr/wip-ronenf-cscrub-class
osd/scrub: modify "classic" OSD scrub state-machine to support Crimson
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 13:41:00 +0000 (21:41 +0800)]
Merge pull request #41154 from rzarzynski/wip-global-backtrace-bug-50647
global: fault handlers cope with simultaneous faults now.
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 13:33:00 +0000 (21:33 +0800)]
Merge pull request #41604 from t-msn/wip-51030
osd/ECBackend: Fix null pointer dereference when enabling jaeger tracing
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 13:26:04 +0000 (21:26 +0800)]
Merge pull request #41501 from aclamk/wip-bluefs-safer-flush
os/bluestore: Remove possibility of replay log and file inconsistency
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Kefu Chai [Sat, 5 Jun 2021 13:23:13 +0000 (21:23 +0800)]
Merge pull request #41506 from ceph/wip-cv-batch-fixes
ceph-volume: fix batch report and respect ceph.conf config values
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 13:17:24 +0000 (21:17 +0800)]
Merge pull request #41688 from tchaikov/wip-debian-rook
debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-roo…
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 09:39:25 +0000 (17:39 +0800)]
crimson/os/seastore: open_collection() returns nullptr if DNE
we check for the existence of meta collection by trying to open it,
if it exists, we continue check for the superblock stored in it, if
the superblock does not exist or corrupted, we consider it as a failure.
before this change, open_collection() always return a valud Collection
even if the store does not have the collection with specified cid. this
behavior could be misleading in the use case above.
after this change, open_collection() looks up the collections stored in
root collection node for the specfied cid, and return nullptr if it does
not exist.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 09:22:35 +0000 (17:22 +0800)]
crimson/os/seastore: use structured binding
for better readability
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 02:06:07 +0000 (10:06 +0800)]
Merge pull request #41581 from tchaikov/wip-options-mgr-mon
common/options: extract mgr and mon options out
Reviewed-by: Neha Ojha <nojha@redhat.com>
Kefu Chai [Sat, 5 Jun 2021 00:44:42 +0000 (08:44 +0800)]
Merge pull request #40073 from jmolmo/delete_service_causes_osd_removal
mgr/cephadm: Warn about OSDs to remove manually when deleting an OSD service
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 20:07:42 +0000 (13:07 -0700)]
Merge PR #41697 into master
* refs/pull/41697/head:
script: add a few more volume mounts for sepia
script: drop ceph-fuse from docker debugging
script: enable centos debuginfo repo for debugging
script: update repo url for multi-arch builds
script: fetch autobuild.asc key via HTTPS
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 17:57:03 +0000 (01:57 +0800)]
Merge pull request #41690 from tchaikov/wip-test-alloc_aging
test/objectstore/unittest_alloc_aging: init cct
Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Kefu Chai [Fri, 4 Jun 2021 17:23:35 +0000 (01:23 +0800)]
Merge pull request #41698 from tchaikov/wip-qa-rook
qa/suites/orch/rook/smoke: stop testing on ubuntu 18.04
Reviewed-by: Sage Weil <sage@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 17:11:13 +0000 (01:11 +0800)]
qa/suites/orch/rook/smoke: stop testing on ubuntu 18.04
even rook does not really install ceph packages in the host directly, it
uses the ceph container image. but teuthology insists on checking the
existence of debian packages by querying shaman server when it sees a
teuthology facet file which includes:
os_type: ubuntu
os_version: "18.04"
but since we've stopped building ubuntu/bionic packages, teuthology
just complains when we are scheduling test suites which are composed
from facets in qa/suites/orch/rook/smoke.
in this change, the ubuntu_18.04.yaml is dropped because ubuntu/bionic
does not really increase the test coverage of ceph. it helps to test
the rook and container runtime though.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 16:33:54 +0000 (09:33 -0700)]
script: add a few more volume mounts for sepia
We now have a few Ceph file systems with various possible mount points
depending which lab machine you're using.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 16:33:30 +0000 (09:33 -0700)]
script: drop ceph-fuse from docker debugging
Install this on the fly as necessary...
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 16:32:52 +0000 (09:32 -0700)]
script: enable centos debuginfo repo for debugging
So we can fetch e.g. the sqlite debuginfo packages.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 16:31:19 +0000 (09:31 -0700)]
script: update repo url for multi-arch builds
Brad suggested this change based on his commit [1]. Thank you!
[1] https://github.com/ceph/ceph-ansible/commit/
267cce9e8360fc8cb9c192fde2406e5dca724610
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Fri, 4 Jun 2021 16:30:04 +0000 (09:30 -0700)]
script: fetch autobuild.asc key via HTTPS
Rather than relying the key being avaiable on the LRC /ceph file system.
(Someone appears to have deleted it recently.)
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 12:19:30 +0000 (20:19 +0800)]
crimson/net: move from out_q into sent queue
to avoid the refcounting of underlying RefCountedObject.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 12:13:54 +0000 (20:13 +0800)]
Merge pull request #41679 from AmnonHanuhov/wip-get_rid_of_pending_q
crimson/net: Use out_q instead of pending_q
Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Amnon Hanuhov [Thu, 3 Jun 2021 13:57:41 +0000 (16:57 +0300)]
crimson/net: Use out_q instead of pending_q
pending_q contains the same messages as in out_q and it is only used
for creating a bytestream out of these messages. We can just use out_q for that.
Signed-off-by: Amnon Hanuhov <ahanukov@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 09:15:06 +0000 (17:15 +0800)]
Merge pull request #41631 from tchaikov/wip-keyring-decode
auth/KeyRing: always decode keying as plaintext
Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 09:00:48 +0000 (17:00 +0800)]
Merge pull request #41587 from cfsnyder/bugfix_47738
mgr/DaemonServer.cc: prevent mgr crashes caused by integer underflow that is triggered by large increases to pg_num/pgp_num
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 08:59:24 +0000 (16:59 +0800)]
Merge pull request #41592 from tchaikov/wip-ceph-default-confffile
ceph.in: use rados.Rados.DEFAULT_CONF_FILES
Reviewed-by: Neha Ojha <nojha@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 08:58:59 +0000 (16:58 +0800)]
Merge pull request #41594 from tchaikov/wip/test/librados/list
test/librados/list: print reason why test fails
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 08:57:41 +0000 (16:57 +0800)]
Merge pull request #36941 from hoamer/patch-1
doc/mgr/administrator: add a more precise description for creating key
Reviewed-by: Kefu Chai <kchai@redhat.com>
hoamer [Wed, 2 Sep 2020 07:13:12 +0000 (09:13 +0200)]
doc/mgr/administrator: add a more precise description for creating key
added a more precise description to handle filename when creating key for mgr
Signed-off-by: hoamer <kontakt@sebastian-neugebauer.de>
Kefu Chai [Fri, 4 Jun 2021 03:25:12 +0000 (11:25 +0800)]
debian/control: ceph-mgr-modules-core does not Recommend ceph-mgr-rook anymore
per https://www.debian.org/doc/debian-policy/ch-relationships.html
> Recommends
> This declares a strong, but not absolute, dependency.
>
> The Recommends field should list packages that would be found together
> with this one in all but unusual installations.
ceph-mgr-modules-core provides a set of ceph-mgr modules which are
always enabeld. but the rook module enables ceph-mgr to install and
configure a Ceph cluster using Rook. this module is very useful but
it does not have such a strong connection with ceph-mgr-modules-core.
we can always install it separately for using better intergration with
Rook.
See-also: https://tracker.ceph.com/issues/45574
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 09:54:18 +0000 (17:54 +0800)]
test/objectstore/unittest_alloc_aging: init cct
* initialize the cct use by test, otherwise g_ceph_context is
not set at all.
* instead of using g_ceph_context, use static member variable cct.
less dependency to the global instance.
* setup and teardown the cct for test suite, because global_init()
initialize g_ceph_context, which cannot be set multiple times.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 09:38:49 +0000 (17:38 +0800)]
test/objectstore: s/TearDownTestCase/TearDownTestSuite/
TearDownTestCase is deprecated by GTest. let's use the new API instead.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 05:50:38 +0000 (13:50 +0800)]
Merge pull request #41652 from tchaikov/wip-qa-asock-or
qa/tasks/admin_socket: support "foo || bar" as command
Reviewed-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 04:30:23 +0000 (12:30 +0800)]
Merge pull request #41686 from t-msn/update-trace-doc
doc/dev: update how to use lttng/blkin trace
Reviewed-by: Kefu Chai <kchai@redhat.com>
Misono Tomohiro [Fri, 4 Jun 2021 02:36:49 +0000 (11:36 +0900)]
doc/dev: update how to use lttng/blkin trace
Update doc to reflect current status.
Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Sage Weil [Fri, 4 Jun 2021 02:04:55 +0000 (22:04 -0400)]
Merge PR #41553 into master
* refs/pull/41553/head:
ceph-volume: replace __ with _ in device_id
Reviewed-by: Kefu Chai <kchai@redhat.com>
Sage Weil [Fri, 4 Jun 2021 02:04:32 +0000 (22:04 -0400)]
Merge PR #41636 into master
* refs/pull/41636/head:
mgr/cephadm/inventory: do not try to resolve current mgr host
pybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap
mgr/restful: use get_mgr_ip() instead of hostname
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Fri, 4 Jun 2021 01:44:58 +0000 (09:44 +0800)]
Merge pull request #41674 from tchaikov/wip-vstart-without-restful
vstart.sh: add an option named --without-restful
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Paul Cuzner [Wed, 2 Jun 2021 23:34:19 +0000 (11:34 +1200)]
mgr/cephadm:fix alerts sent to wrong URL
The path_prefix in prometheus.yml was specifying an
endpoint prefix, which was invalid. This resulted in 404
errors when trying to send alerts to alertmanager and
blocked alerts being sent on to the ceph-dashboard API
receiver. This fix remves this prefix.
Fixes: https://tracker.ceph.com/issues/51073
Signed-off-by: Paul Cuzner <pcuzner@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 23:50:44 +0000 (07:50 +0800)]
Merge pull request #41670 from tchaikov/wip-op-tracking-spin-off-0
crimson, common: improve const-correctness of Operation::dump()s.
Reviewed-by: Samuel Just <sjust@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 23:50:21 +0000 (07:50 +0800)]
Merge pull request #41672 from tchaikov/wip-crimson-test-handle-fut
test/crimson/seastore: always handle returned future<>
Reviewed-by: Samuel Just <sjust@redhat.com>
Patrick Donnelly [Thu, 3 Jun 2021 20:34:54 +0000 (13:34 -0700)]
Merge PR #41654 into master
* refs/pull/41654/head:
mds: do not infinitely recursively print a metric
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Patrick Donnelly [Thu, 3 Jun 2021 20:33:58 +0000 (13:33 -0700)]
Merge PR #41639 into master
* refs/pull/41639/head:
mds/scrub: write root inode backtrace at creation
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 3 Jun 2021 20:33:27 +0000 (13:33 -0700)]
Merge PR #41499 into master
* refs/pull/41499/head:
qa/tasks/mds_thrash: fix thrash iteration never skip
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 3 Jun 2021 20:23:17 +0000 (13:23 -0700)]
Merge PR #41443 into master
* refs/pull/41443/head:
test: update log-ignorelist for fs:mirror test
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Patrick Donnelly [Thu, 3 Jun 2021 20:22:23 +0000 (13:22 -0700)]
Merge PR #39910 into master
* refs/pull/39910/head:
test: Add test for mgr hang when osd is full
mgr: Set client_check_pool_perm to false
mds: Add full caps to avoid osd full check
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Dan Mick [Thu, 3 Jun 2021 18:32:24 +0000 (11:32 -0700)]
Merge pull request #41559 from dmick/wip-grafana-container
monitoring/grafana/build/Makefile: revamp for arm64 builds, pushes to docker and quay, jenkins
Sage Weil [Thu, 3 Jun 2021 14:29:00 +0000 (10:29 -0400)]
mgr/cephadm/inventory: do not try to resolve current mgr host
The CNI configuration may set up a private network for the container, which
is mapped to the hostname in /etc/hosts. For example, my test box sets
up 10.88.0.0/24 because I was using crio + kubeadm on this host earlier
(at least I think that's why):
$ sudo podman run --rm --name test123 --entrypoint /bin/bash -it quay.ceph.io/ceph-ci/ceph:master -c "cat /etc/hosts"
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.88.0.8
f9e91bf2478f test123
In any case, we should never trust a lookup of our own hostname from inside
a container!
This isn't quite sufficient, though: if this is a single-host cluster, then
we fall back to using get_mgr_ip(). That value may be distorted by the
public_network option on the mgr, but we don't have any other good
options here, and single-node clusters are unlikely to have complex
network configs.
Refactor a bit to avoid the try/except nesting.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 2 Jun 2021 02:31:11 +0000 (22:31 -0400)]
pybind/mgr/mgr_module: make get_mgr_ip() return mgr's IP from mgrmap
The previous approach was convoluted: we tried to do a DNS lookup on the
hostname, which would fail if /etc/hosts had an entry. Which, with podman,
it does. And the IP it has will vary in all sorts of weird ways. For
example, CNI on my host means that I get a dynamic address in 10.88.0.0/24.
Avoid all of that nonsense and use the IP that is in the mgrmap. There
may be multiple IPs (v2 + v1, or maybe even IPv4 + v6 in the future); in
that case, use the first one.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 2 Jun 2021 02:31:47 +0000 (22:31 -0400)]
mgr/restful: use get_mgr_ip() instead of hostname
Now we match dashboard!
Signed-off-by: Sage Weil <sage@newdream.net>
Neha Ojha [Thu, 3 Jun 2021 15:39:22 +0000 (08:39 -0700)]
Merge pull request #41308 from sseshasa/wip-osd-benchmark-for-mclock
osd: Run osd bench test to override default max osd capacity for mclock
Reviewed-by: Neha Ojha <nojha@redhat.com>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Casey Bodley [Thu, 3 Jun 2021 15:05:00 +0000 (11:05 -0400)]
Merge pull request #41316 from cbodley/wip-50785
rgw: parse tenant name out of rgwx-bucket-instance
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Shilpa Jagannath <smanjara@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 14:40:26 +0000 (22:40 +0800)]
Merge pull request #41677 from tchaikov/wip-oom
ceph.spec.in: increase the mem_per_job to 3GiB
Reviewed-by: David Galloway <dgallowa@redhat.com>
Casey Bodley [Thu, 3 Jun 2021 14:28:35 +0000 (10:28 -0400)]
Merge pull request #41668 from pleiadesian/patch-bucket-chown
rgw: require bucket name in bucket chown
Reviewed-by: Or Friedmann <ofriedma@redhat.com>
Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Casey Bodley [Thu, 3 Jun 2021 14:16:30 +0000 (10:16 -0400)]
Merge pull request #41462 from yehudasa/wip-50920
rgw: auth v4 client: don't convert '+' to space
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 12:48:53 +0000 (20:48 +0800)]
cmake: increase the MAX_{LINK,COMPILE}_MEM
based on recent observation, quite a few C++ source file take
around more than 3.0GiB to compile. for instance,
test_mock_HttpClient.cc could take up to 6270MiB memory to compile.
so increase MAX_{LINK,COMPILE}_MEM accordingly.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 12:41:36 +0000 (20:41 +0800)]
ceph.spec.in: increase the mem_per_job to 3GiB
to lower the number of jobs, we are experiencing build failures on
a builder with 48c96t, 193 free mem. the failures were caused by
OOM killer which kills the c++ compiler
[498376.128969] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/jenkins.service,task=cc1plus,pid=
1387895 ,uid=1110
[498376.145288] Out of memory: Killed process
1387895 (cc1plus) total-vm:3323312kB, anon-rss:3164568kB, file-rss:0kB, shmem-rss:0kB, UID:1110
[498376.315185] oom_reaper: reaped process
1387895 (cc1plus), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[498377.882072] cc1plus invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
before this change, we use the total memory to calculate the number
of jobs, and assume that each job takes at most 2.5GiB mem. in the
case above, the # of job is 96.
after this change, we use the free memory, and increse the mem per job
to 3.0GiB. in the case above, the # of job would be 85.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 11:45:23 +0000 (19:45 +0800)]
Merge pull request #41669 from tchaikov/wip-crimson-asok-dump-metrics
crimson/admin: s/perf dump_seastar/dump_metrics/
Reviewed-by: Amnon Hanuhov <ahanukov@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 10:45:48 +0000 (18:45 +0800)]
vstart.sh: use here document to display multi-line message
for better readability
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 10:42:48 +0000 (18:42 +0800)]
vstart.sh: add an option named --without-restful"
so we don't need to wait for restful module to be loaded if not working
on this mgr module.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 10:38:08 +0000 (18:38 +0800)]
vstart.sh: extract create_mgr_restful_secret() out
for better readability, and so it's easier to make this step optional if
developer is not interested in using the restful mgr module.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Sridhar Seshasayee [Wed, 12 May 2021 14:50:20 +0000 (20:20 +0530)]
doc: Update mclock-config-ref to reflect automated OSD benchmarking
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 07:39:16 +0000 (15:39 +0800)]
Merge pull request #41671 from liu-chunmei/seastore-logger
crimson/seastore: cleanup ceph_subsystem_filestore to seastore
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 07:32:20 +0000 (15:32 +0800)]
test/crimson/seastore: declare return type explicitly
for better readability
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 07:28:45 +0000 (15:28 +0800)]
test/crimson/seastore: always handle returned future<>
this change also silences the [-Wunused-result] warning.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Radoslaw Zarzynski [Tue, 19 Jan 2021 16:05:47 +0000 (17:05 +0100)]
common: fix a formatting nit in OpTracker::dump_ops_in_flight().
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Radoslaw Zarzynski [Tue, 19 Jan 2021 16:05:12 +0000 (17:05 +0100)]
crimson: improve const-correctness of Operation::dump()s.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
chunmei-liu [Thu, 3 Jun 2021 06:41:42 +0000 (23:41 -0700)]
crimson/seastore: cleanup ceph_subsystem_filestore to seastore
Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
Kefu Chai [Thu, 3 Jun 2021 06:33:52 +0000 (14:33 +0800)]
Merge pull request #41666 from tchaikov/wip-crimson-stop
crimson/osd: wait for SIGINT and SIGTERM before stopping
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Radoslaw Zarzynski [Mon, 17 May 2021 14:49:20 +0000 (14:49 +0000)]
qa: use dump_metrics as alternative of get_heap_property
"get_heap_property *" asock commands are exposed to operators
to check the tcmalloc internals for understanding the performance
of the memory subsystem. but crimson uses the builtin seastar allocator
which is not backed by tcmalloc. but we can dump the metrics using
the "dump_metrics" asock command which is only available from
crimson-osd.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 14:06:22 +0000 (22:06 +0800)]
qa/tasks/admin_socket: support "foo || bar" as command
so we can cater the needs of different implementation of osd, i.e.,
classic osd and crimson osd. they offer different set of asock commands.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 05:58:40 +0000 (13:58 +0800)]
crimson/admin/osd_admin: sort forward declarations
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 05:48:27 +0000 (13:48 +0800)]
crimson/admin: fix the indent
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 05:45:05 +0000 (13:45 +0800)]
crimson/admin: s/perf dump_seastar/dump_metrics/
as a user-facing interface, no need to expose seastar in the name,
what matters to user is the content not the underlying technology or library.
so rename the command prefix to "dump_metrics"
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 05:39:28 +0000 (13:39 +0800)]
crimson/admin: s/SeastarMetricsHook/DumpMetricsHook/
seastar is the name of one of the libraries used to implement crimson,
but the asok hook dumps not only builtin metrics in seastar, but also
the ones registered by crimson and seastore, so rename it to a more
general name.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Zulai Wang [Thu, 3 Jun 2021 05:13:15 +0000 (13:13 +0800)]
rgw: require bucket name in bucket chown
Checking and reporting missing the mandatory parameter avoid clueless error
message for bucket chown.
Signed-off-by: Zulai Wang <zl31wang@gmail.com>
Kefu Chai [Thu, 3 Jun 2021 05:26:17 +0000 (13:26 +0800)]
crimson/osd: wait for SIGINT and SIGTERM before stopping
this change addresses an regression introduced by
37b83f4ed7ca69f105b93bf482cb2289cbaf9a4d . as we should not stop
services without being asked to do so.
in this change, signal handler for SIGINT and SIGTERM is registered to
handle these signals, and in the seastar thread, we wait until any of
these two signals is caught.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 01:36:15 +0000 (09:36 +0800)]
Merge pull request #41627 from tchaikov/wip-mgr-repl-doc
doc/mgr/modules: add a "debugging" section
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 01:34:56 +0000 (09:34 +0800)]
Merge pull request #41138 from kalebskeithley/python39
do_cmake: build with python3.9 on RHEL9
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 01:29:19 +0000 (09:29 +0800)]
do_cmake: build with python3.9 on RHEL9
rhel9 has python3.9 as of rhel9beta
Signed-off-by: Kaleb S KEITHLEY <kkeithle@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Thu, 3 Jun 2021 01:16:42 +0000 (09:16 +0800)]
Merge pull request #41496 from Huber-ming/correct_spell
rgw: correct the spelling of "instace"
Reviewed-by: Kefu Chai <kchai@redhat.com>
Ronen Friedman [Tue, 30 Mar 2021 13:39:19 +0000 (16:39 +0300)]
osd/scrub: modify "classic" OSD scrub state-machine to support Crimson
As some scrub-related functions are asynchronous in Crimson,
scrub states that call those functions cannot simply perform a
'post' or state-transition sequentially. The called operations
must arrange for a state-machine event to be sent upon completion.
Specifically, the following are now handled (on the FSM side) as async:
- building scrub maps
- comparing the scrub maps (and the rest of "what we
do after a chunk is handled")
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
Patrick Donnelly [Wed, 2 Jun 2021 15:18:22 +0000 (08:18 -0700)]
Merge PR #41635 into master
* refs/pull/41635/head:
qa: increase fragmentation to improve uniform distribution
Reviewed-by: Ramana Raja <rraja@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 14:43:40 +0000 (22:43 +0800)]
Merge pull request #41644 from rzarzynski/wip-crimson-fix-blocked-peering
crimson/monc: fix subscription stall that blocked peering.
Reviewed-by: Kefu Chai <kchai@redhat.com>
Patrick Donnelly [Wed, 2 Jun 2021 14:28:49 +0000 (07:28 -0700)]
mds: do not infinitely recursively print a metric
Fixes: b1b44d775df3160d937c068d5e1079e24199ed6b
Fixes: https://tracker.ceph.com/issues/51067
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Wed, 2 Jun 2021 14:27:03 +0000 (10:27 -0400)]
Merge PR #41651 into master
* refs/pull/41651/head:
doc/cephadm: s/the the/the
Reviewed-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 14:10:12 +0000 (22:10 +0800)]
Merge pull request #41645 from tchaikov/wip-crimson-osd-mkfs
crimson/osd: check existing superblock when mkfs
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Zac Dover [Wed, 2 Jun 2021 14:06:06 +0000 (00:06 +1000)]
doc/cephadm: s/the the/the
This removes an extraneous "the" and reworks a
sentence so that it adheres to the grammatical
rules of the English language.
Signed-off-by: Zac Dover <zac.dover@gmail.com>
Kefu Chai [Wed, 2 Jun 2021 12:57:14 +0000 (20:57 +0800)]
crimson/osd: check existing superblock when mkfs
in case mkfs on an existing store.
this change mirrors the behavior of classic osd, also addresses the
assert failure when BlueStore tries to create a collection when it
already contains a colloection with the same collection id.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 12:47:03 +0000 (20:47 +0800)]
crimson/osd: extract OSD::_write_superblock() out
prepare for the change to verify existing meta collection and superblock
stored in it.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Radoslaw Zarzynski [Wed, 2 Jun 2021 11:59:37 +0000 (11:59 +0000)]
crimson/monc: fix subscription stall that blocked peering.
There is a scenario when the `active_con` is properly
chosen but isn't marked as `ready_to_send`.
If `renew_subs()` is called during the `on_session_opened()`,
the flag will be turned on after the subscriptions are
renewed which cannot happen as it requires the flag to be
already set. In other words: there is a circular data dependency.
The net result is stalling the subscription machinery,
particularly the `OSDMap` subs. This caused a nasty peering
issue at Sepia [1] where PG 2.7 got stuck in the `GetInfo`
state.
```
rzarzynski@teuthology:/home/teuthworker/archive/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/
6136908 $ less ./remote/smithi039/log/ceph-osd.1.log.gz
...
DEBUG 2021-05-26 20:19:48,134 [shard 0] osd - pg_epoch 14 pg[2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0) [] r=
-1 lpr=0 crt=0'0 mlcod 0'0 unknown enter Initial
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0]
r=0 lpr=0 crt=0'0 mlcod 0'0 unknown enter Reset
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Start
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 unknown enter Started/Primary
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Peering
...
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetInfo
DEBUG 2021-05-26 20:19:48,138 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior all_probe
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering build_prior final: probe 0,1 down blocked_by {}
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering up_thru 0 < same_since 14, must notify monitor
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:19:48,139 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0
...
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering got osd.0 2.7( DNE empty local-lis/les=0/0 n=0 ec=0/0 lis/c=0/0 les/c/f=0/0/0 sis=0)
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Adding osd: 0 peer features:
3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common peer features:
3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,237 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common acting features:
3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering state<Started/Primary/Peering/GetInfo>: Common upacting features:
3f01cfbb7ffdffff
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering exit Started/Primary/Peering/GetInfo 0.099480 4 2021-05-26T20:19:48.146172+0000
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetLog
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/GetMissing
...
DEBUG 2021-05-26 20:19:48,238 [shard 0] osd - pg_epoch 14 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+peering enter Started/Primary/Peering/WaitUpThru
...
DEBUG 2021-05-26 20:19:49,139 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=0/0 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating enter Started/Primary/Active
...
DEBUG 2021-05-26 20:19:49,142 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=0/0 les/c/f=0/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 creating+activating enter Started/Primary/Active/Activating
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Recovered
...
DEBUG 2021-05-26 20:19:49,204 [shard 0] osd - pg_epoch 15 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/0 les/c/f=15/0/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Started/Primary/Active/Clean
...
DEBUG 2021-05-26 20:22:31,223 [shard 0] osd - pg_epoch 86 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=14) [1,0] r=0 lpr=14 crt=0'0 mlcod 0'0 active enter Reset
...
<a lot of flipping>
...
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown activate_map
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Reset 0.035744 1 2021-05-26T20:24:07.817331+0000
INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Reset, entered at
1622060647 .
8158188 ,
1622060647 .
8173316 spent on 1 events
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started
INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Started
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Start
INFO 2021-05-26 20:24:07,851 [shard 0] osd - Entering state: Start
INFO 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown state<Start>: transitioning to Primary
DEBUG 2021-05-26 20:24:07,851 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown exit Start 0.000041 0 0.000000
INFO 2021-05-26 20:24:07,851 [shard 0] osd - Exiting state: Start, entered at
1622060647 .
8516333 , 0.0 spent on 0 events
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary
INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163
) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 unknown enter Started/Primary/Peering
INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering enter Started/Primary/Peering/GetInfo
INFO 2021-05-26 20:24:07,852 [shard 0] osd - Entering state: Started/Primary/Peering/GetInfo
...
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior all_probe 0,1,4
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior maybe_rw interval:139, acting: 0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering build_prior final: probe 0,1,4 down blocked_by {}
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering up_thru 125 < same_since 163, must notify monitor
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: no prior_set down osds, clearing prior_readable_until_ub
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.0
DEBUG 2021-05-26 20:24:07,852 [shard 0] osd - pg_epoch 163 pg[2.7( empty local-lis/les=14/15 n=0 ec=14/14 lis/c=14/14 les/c/f=15/15/0 sis=163) [1,0] r=0 lpr=163 pi=[14,163)/1 crt=0'0 mlcod 0'0 peering state<Started/Primary/Peering/GetInfo>: querying info from osd.4
...
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] connect to existing
DEBUG 2021-05-26 20:24:07,924 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] --> #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
...
DEBUG 2021-05-26 20:24:07,942 [shard 0] ms - [osd.1(cluster) v2:172.21.15.39:6803/34727@61064 >> osd.4 v2:172.21.15.62:6802/34686] GOT AckFrame: seq=62
...
<plenty of osd_ping messanging but no reply to the pg_query for 2.7>
...
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] <== #772 =
== osd_ping(ping e17 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2319.780029297s) v5 (70)
DEBUG 2021-05-26 20:58:19,829 [shard 0] ms - [osd.1(hb_front) v2:172.21.15.39:6807/34727 >> osd.4 v2:172.21.15.62:6807/34686@54816] --> #772 === osd_ping(ping_reply e249 up_from 10 ping_stamp 2021-05-26T20:58:19.825573+0000/2319.780029297s send_stamp 2320.039062500s) v5 (70
```
The peering request got stuck due to awaiting for `OSDMap`.
```
DEBUG 2021-05-26 20:24:07,930 [shard 0] ms - [osd.4(cluster) v2:172.21.15.62:6802/34686 >> osd.1 v2:172.21.15.39:6803/34727@61064] <== #62 === pg_query2(2.7 2.7 query(info 0'0 epoch_sent 163) e163/163) v1 (131)
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - handle_peering_op on 2.7 from 1
DEBUG 2021-05-26 20:24:07,930 [shard 0] osd - peering_event(id=517, detail=PeeringEvent(from=1 pgid=2.7 sent=163 requested=163 evt=epoch_sent: 163 epoch_requested: 163 MQuery 2.7 from 1 query_epoch 163 query: query(info 0'0 epoch_sent 163))): star
```
```
INFO 2021-05-26 20:19:49,127 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_s
ubscribe({osdmap=14}) v3 (15)
...
INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO 2021-05-26 20:19:49,131 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map osd_map(14..15 src has 1..15) v4
INFO 2021-05-26 20:19:49,138 [shard 0] osd - handle_osd_map epochs [14..15], i have 15, src has [1..15]
...
INFO 2021-05-26 20:19:49,139 [shard 0] osd - evt epoch is 15, i have 14, will wait
INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
...
INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map osd_map(15..16 src has 1..16) v4
INFO 2021-05-26 20:19:50,140 [shard 0] osd - handle_osd_map epochs [15..16], i have 15, src has [1..16]
DEBUG 2021-05-26 20:19:50,141 [shard 0] bluestore - do_transaction
INFO 2021-05-26 20:19:50,145 [shard 0] osd - osd.4: committed_osd_maps(16, 16)
...
INFO 2021-05-26 20:20:42,881 [shard 0] osd - handle_osd_map epochs [16..17], i have 16, src has [1..17]
DEBUG 2021-05-26 20:20:42,882 [shard 0] bluestore - do_transaction
INFO 2021-05-26 20:20:42,886 [shard 0] osd - osd.4: committed_osd_maps(17, 17)
...
INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
...
INFO 2021-05-26 20:20:43,957 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO 2021-05-26 20:20:43,957 [shard 0] osd - osdmap_subscribe(17)
...
INFO 2021-05-26 20:20:43,969 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO 2021-05-26 20:20:43,969 [shard 0] osd - osdmap_subscribe(17)
...
DEBUG 2021-05-26 20:20:46,930 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #4 === osd_m
ap(20..21 src has 1..21) v4 (41)
INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map osd_map(20..21 src has 1..21) v4
INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map epochs [20..21], i have 17, src has [1..21]
INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
...
DEBUG 2021-05-26 20:20:47,936 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@57288 >> mon.2 v2:172.21.15.39:3301/0] <== #5 === osd_m
ap(21..22 src has 1..22) v4 (41)
INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map osd_map(21..22 src has 1..22) v4
INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map epochs [21..22], i have 17, src has [1..22]
INFO 2021-05-26 20:20:47,936 [shard 0] osd - handle_osd_map message skips epochs 18..20
INFO 2021-05-26 20:20:47,936 [shard 0] osd - osdmap_subscribe(18)
...
<osdmap_subscribe(18) over and over>
```
```
2021-05-26T20:19:42.048+0000
7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 4 ==== mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3 ==== 82+0+0 (secure 0 0 0) 0x7f46fc04e150 con 0x7f470401c480
2021-05-26T20:19:42.048+0000
7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:42.048+0000
7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:42.048+0000
7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({mgrmap=0+,osd_pg_creates=0+,osdmap=0+}) v3
...
2021-05-26T20:19:49.129+0000
7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] <== osd.4 v2:172.21.15.62:6801/34686 9 ==== mo
n_subscribe({osdmap=14}) v3 ==== 36+0+0 (secure 0 0 0) 0x7f46e8556210 con 0x7f470401c480
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 mon.b@1(peon) e1 _ms_dispatch existing session 0x7f46fc02f500 for osd.4
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 mon.b@1(peon) e1 entity_name osd.4 global_id 4168 (new_ok) caps allow *
2021-05-26T20:19:49.129+0000
7f4712ffd700 10 mon.b@1(peon) e1 handle_subscribe mon_subscribe({osdmap=14}) v3
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 is_capable service=mon command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 allow all
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 is_capable service=osd command= read addr v2:172.21.15.62:6801/34686 on cap allow *
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 allow so far , doing grant allow *
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 allow all
2021-05-26T20:19:49.129+0000
7f4712ffd700 10 mon.b@1(peon).osd e15 check_osdmap_sub 0x7f46e84f0150 next 14 (onetime)
2021-05-26T20:19:49.129+0000
7f4712ffd700 5 mon.b@1(peon).osd e15 send_incremental [14..15] to osd.4
2021-05-26T20:19:49.129+0000
7f4712ffd700 10 mon.b@1(peon).osd e15 build_incremental [14..15] with features
3f01cfbb7ffdffff
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 15 622 bytes
2021-05-26T20:19:49.129+0000
7f4712ffd700 20 mon.b@1(peon).osd e15 build_incremental inc 14 578 bytes
2021-05-26T20:19:49.129+0000
7f4712ffd700 1 -- [v2:172.21.15.62:3300/0,v1:172.21.15.62:6789/0] --> v2:172.21.15.62:6801/34686 -- osd_map(14..
15 src has 1..15) v4 -- 0x7f46e856a100 con 0x7f470401c480
```
```
seastar::future<> Client::renew_subs()
{
if (!sub.have_new()) {
logger().warn("{} - empty", __func__);
return seastar::now();
}
logger().trace("{}", __func__);
auto m = crimson::make_message<MMonSubscribe>();
m->what = sub.get_subs();
m->hostname = ceph_get_short_hostname();
return send_message(std::move(m)).then([this] {
sub.renewed();
});
}
```
```
INFO 2021-05-26 20:19:42,081 [shard 0] osd - osdmap_subscribe(1)
DEBUG 2021-05-26 20:19:42,081 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #6 === mon_s
ubscribe({osdmap=1}) v3 (15)
...
INFO 2021-05-26 20:19:49,128 [shard 0] osd - osdmap_subscribe(14)
DEBUG 2021-05-26 20:19:49,128 [shard 0] ms - [osd.4(client) v2:172.21.15.62:6801/34686@63208 >> mon.1 v2:172.21.15.62:3300/0] --> #9 === mon_subscribe({osdmap=14}) v3 (15)
...
INFO 2021-05-26 20:19:49,141 [shard 0] osd - osdmap_subscribe(14)
WARN 2021-05-26 20:19:49,141 [shard 0] monc - renew_subs - empty
<no MMonSubcribe>
...
INFO 2021-05-26 20:20:43,941 [shard 0] osd - evt epoch is 18, i have 17, will wait
INFO 2021-05-26 20:20:43,941 [shard 0] osd - osdmap_subscribe(17)
<no MMonSubcribe>
...
INFO 2021-05-26 20:20:46,930 [shard 0] osd - handle_osd_map message skips epochs 18..19
INFO 2021-05-26 20:20:46,930 [shard 0] osd - osdmap_subscribe(18)
<no MMonSubcribe>
```
[1]: http://pulpito.front.sepia.ceph.com/rzarzynski-2021-05-26_12:20:26-rados-master-distro-basic-smithi/
6136908
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Ernesto Puerta [Wed, 2 Jun 2021 12:12:56 +0000 (14:12 +0200)]
Merge pull request #41630 from rhcs-dashboard/fix-bucket-calculations
mgr/dashboard: fix bucket objects and size calculations
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 10:43:47 +0000 (18:43 +0800)]
Merge pull request #41638 from tchaikov/wip-doc-crimson-doc
doc/dev/crimson: update link to scylladb debugging tips
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Milind Changire [Wed, 2 Jun 2021 09:42:09 +0000 (15:12 +0530)]
mds/scrub: write root inode backtrace at creation
Write root inode backtrace as soon as it is created;
Unwritten backtrace always caused scrub to fail for root inode.
Fixes: https://tracker.ceph.com/issues/50976
Signed-off-by: Milind Changire <mchangir@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 09:10:25 +0000 (17:10 +0800)]
doc/dev/crimson: update link to scylladb debugging tips
the old one is not reachable anymore.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Kefu Chai [Wed, 2 Jun 2021 09:00:53 +0000 (17:00 +0800)]
Merge pull request #41637 from tchaikov/wip-crimson-never-discard-future
crimson: always handle returned future
Reviewed-by: Xuehan Xu <xuxuehan@360.cn>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Kefu Chai [Tue, 1 Jun 2021 11:58:47 +0000 (19:58 +0800)]
doc/mgr/modules: add a "debugging" section
Signed-off-by: Kefu Chai <kchai@redhat.com>