]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
3 years agocmake: disable kvs rados cls by default 42571/head
Kefu Chai [Sat, 31 Jul 2021 03:36:37 +0000 (11:36 +0800)]
cmake: disable kvs rados cls by default

libcls_kvs was introduced back in
73d016fdb304ad19bba8aed3f2877b4bdb6ed32e, but we don't have an internal
user so far. to reduce the build time. let's disable the build of it by
default.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42601 from idryomov/wip-rbd-qemu-iotests-8stream
Ilya Dryomov [Tue, 3 Aug 2021 11:58:57 +0000 (13:58 +0200)]
Merge pull request #42601 from idryomov/wip-rbd-qemu-iotests-8stream

qa/workunits/rbd: use xenial version of qemu-iotests for centos stream 8

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #42446 from zdover23/wip-doc-cephadm-troubleshooting-logs-2021...
zdover23 [Tue, 3 Aug 2021 11:19:22 +0000 (21:19 +1000)]
Merge pull request #42446 from zdover23/wip-doc-cephadm-troubleshooting-logs-2021-07-20

doc/cephadm: linking to log material

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
3 years agoMerge pull request #42328 from aaryanporwal/visual-tests
Ernesto Puerta [Tue, 3 Aug 2021 10:55:45 +0000 (12:55 +0200)]
Merge pull request #42328 from aaryanporwal/visual-tests

mgr/dashboard: Visual regression tests for ceph dashboard

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: aaryanporwal <NOT@FOUND>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
3 years agoMerge pull request #42514 from liu-chunmei/seastore-onode-interruptible-future
Yingxin [Tue, 3 Aug 2021 08:55:04 +0000 (16:55 +0800)]
Merge pull request #42514 from liu-chunmei/seastore-onode-interruptible-future

crimson/onode-staged-tree: integrate interruptible future

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoqa/workunits/rbd: use xenial version of qemu-iotests for centos stream 8 42601/head
Ilya Dryomov [Tue, 3 Aug 2021 07:44:18 +0000 (09:44 +0200)]
qa/workunits/rbd: use xenial version of qemu-iotests for centos stream 8

It is already used for centos 8(.3) and rhel 8(.4).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agocrimson/onode-staged-tree: misc fixes to integrate interruptive-future 42514/head
Yingxin Cheng [Tue, 3 Aug 2021 03:52:41 +0000 (11:52 +0800)]
crimson/onode-staged-tree: misc fixes to integrate interruptive-future

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agodoc/cephadm: linking to log material 42446/head
Zac Dover [Tue, 20 Jul 2021 00:08:47 +0000 (10:08 +1000)]
doc/cephadm: linking to log material

This PR rewrites a section in the Troubleshooting
chapter of the Cephadm Guide. The material that this
section discusses has been covered already in the
Cephadm Guide in the Cephadm Operations chapter.
There's no reason to repeat this information twice,
unless adding technical debt to the documentation
is our goal (which of course it is not, and the
opposite of adding technical debt to the documentation
has been the aim that has guided my work these past
six months).

Signed-off-by: Zac Dover <zac.dover@gmail.com>
3 years agoMerge pull request #42471 from Dorthu/fix-pybind-rbd-mirror-image-get-status
Ilya Dryomov [Mon, 2 Aug 2021 20:04:04 +0000 (22:04 +0200)]
Merge pull request #42471 from Dorthu/fix-pybind-rbd-mirror-image-get-status

pybind/rbd: fix mirror_image_get_status

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #42570 from tchaikov/wip-libcryptsetup
Ilya Dryomov [Mon, 2 Aug 2021 19:35:16 +0000 (21:35 +0200)]
Merge pull request #42570 from tchaikov/wip-libcryptsetup

librbd/crypto/luks: require libcryptsetup v2.0.5

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #42551 from adk3798/maint-message
Sebastian Wagner [Mon, 2 Aug 2021 16:00:48 +0000 (18:00 +0200)]
Merge pull request #42551 from adk3798/maint-message

mgr/cephadm: make return message for entering maintenance mode more explicit

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
3 years agoMerge pull request #42382 from xrmeng8756/master
Harish Munjulur [Mon, 2 Aug 2021 15:28:02 +0000 (08:28 -0700)]
Merge pull request #42382 from xrmeng8756/master

rgw: avoid occuring radosgw daemon crash when access a conditionally …

3 years agoMerge pull request #42387 from wzbxqt327/master
Harish Munjulur [Mon, 2 Aug 2021 15:27:53 +0000 (08:27 -0700)]
Merge pull request #42387 from wzbxqt327/master

rgw:add lock to copy object

3 years agoMerge pull request #42404 from ivancich/wip-broken-list-plain-entries
Harish Munjulur [Mon, 2 Aug 2021 15:27:37 +0000 (08:27 -0700)]
Merge pull request #42404 from ivancich/wip-broken-list-plain-entries

 rgw: bucket index list can produce I/O errors

3 years agoMerge pull request #42582 from tchaikov/wip-doc-health-report
Kefu Chai [Mon, 2 Aug 2021 13:40:48 +0000 (21:40 +0800)]
Merge pull request #42582 from tchaikov/wip-doc-health-report

doc/dev: add health-reports.rst

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 years agoMerge pull request #42577 from tchaikov/wip-rtd
Kefu Chai [Mon, 2 Aug 2021 12:15:04 +0000 (20:15 +0800)]
Merge pull request #42577 from tchaikov/wip-rtd

.readthedocs.yml: use python3.8 and native ditaa

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 years agodoc/dev: add health-reports.rst 42582/head
Kefu Chai [Mon, 2 Aug 2021 10:59:30 +0000 (18:59 +0800)]
doc/dev: add health-reports.rst

to explain the data flow of health metrics.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42263 from rhcs-dashboard/51612-cephadm-e2e-improvements
Ernesto Puerta [Mon, 2 Aug 2021 11:09:32 +0000 (13:09 +0200)]
Merge pull request #42263 from rhcs-dashboard/51612-cephadm-e2e-improvements

mgr/dashboard: cephadm-e2e job script: improvements

3 years agoMerge pull request #42562 from tchaikov/wip-bluestore-cleanups
Kefu Chai [Mon, 2 Aug 2021 11:04:23 +0000 (19:04 +0800)]
Merge pull request #42562 from tchaikov/wip-bluestore-cleanups

os/bluestore: use scope_guard do to cleanups

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
3 years agoMerge pull request #41880 from david-caro/fix_cluster_grafana_dashboard
Ernesto Puerta [Mon, 2 Aug 2021 11:03:46 +0000 (13:03 +0200)]
Merge pull request #41880 from david-caro/fix_cluster_grafana_dashboard

monitoring/grafana/cluster: use per-unit max and limit values

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: p-se <NOT@FOUND>
3 years agoadmin, doc: introduce sphinxcontrib.seqdiag
Kefu Chai [Mon, 2 Aug 2021 11:00:35 +0000 (19:00 +0800)]
admin, doc: introduce sphinxcontrib.seqdiag

for rendering sequence-diagram. unlike ditaa, seqdiag allows us to
create sequence-diagram without worrying about the layout. and the
syntax is quite like that of dot.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42572 from tchaikov/wip-cmake-mgr-cleanup
Kefu Chai [Mon, 2 Aug 2021 03:09:47 +0000 (11:09 +0800)]
Merge pull request #42572 from tchaikov/wip-cmake-mgr-cleanup

cmake: initialize dpdk_LIBRARIES with empty list

Reviewed-by: Xiubo Li <xiubli@redhat.com>
3 years agodoc/conf.py: run ditaa with java 42577/head
Kefu Chai [Sun, 1 Aug 2021 17:41:47 +0000 (01:41 +0800)]
doc/conf.py: run ditaa with java

just in case, otherwise we could have

  File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/42577/lib/python3.8/site-packages/sphinxcontrib/ditaa.py", line 200, in html_visit_ditaa
    render_ditaa_html(self, node, node['code'], node['options'])
  File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/42577/lib/python3.8/site-packages/sphinxcontrib/ditaa.py", line 177, in render_ditaa_html
    fname, outfn = render_ditaa(self, code, options, prefix)
  File "/home/docs/checkouts/readthedocs.org/user_builds/ceph/envs/42577/lib/python3.8/site-packages/sphinxcontrib/ditaa.py", line 141, in render_ditaa
    p = Popen(ditaa_args, stdout=PIPE, stdin=PIPE, stderr=PIPE)
  File "/home/docs/.pyenv/versions/3.8.6/lib/python3.8/subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/home/docs/.pyenv/versions/3.8.6/lib/python3.8/subprocess.py", line 1702, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/usr/bin/ditaa'

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years ago.readthedocs.yml: use ditaa instead of plantweb
Kefu Chai [Sun, 1 Aug 2021 16:37:19 +0000 (00:37 +0800)]
.readthedocs.yml: use ditaa instead of plantweb

use ditaa to reader ditaa images instead of relying plantweb service.
more stable this way.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years ago.readthedocs.yml: use python3.8
Kefu Chai [Sun, 1 Aug 2021 16:34:08 +0000 (00:34 +0800)]
.readthedocs.yml: use python3.8

to prepare the python3.8 migration

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42129 from tchaikov/wip-cmake-build-type
Kefu Chai [Sun, 1 Aug 2021 04:24:07 +0000 (12:24 +0800)]
Merge pull request #42129 from tchaikov/wip-cmake-build-type

cmake: set CMAKE_BUILD_TYPE only if .git exists

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge pull request #42573 from tchaikov/wip-crimson-header
Kefu Chai [Sun, 1 Aug 2021 01:58:56 +0000 (09:58 +0800)]
Merge pull request #42573 from tchaikov/wip-crimson-header

cmake: let crimson-admin depend on legacy-option-headers

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agoMerge pull request #42445 from LiumxNL/ups-eliminate-rollfwd
Kefu Chai [Sun, 1 Aug 2021 01:58:15 +0000 (09:58 +0800)]
Merge pull request #42445 from LiumxNL/ups-eliminate-rollfwd

osd/PGLog: set acceptable rollback_info_trimmed_to for pg of replicated pool

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge PR #40511 into master
Patrick Donnelly [Sat, 31 Jul 2021 20:34:56 +0000 (13:34 -0700)]
Merge PR #40511 into master

* refs/pull/40511/head:
qa: update mds_pre_upgrade to no longer stop standbys
qa: update mds_pre_upgrade to disable standby-replay
qa: add tests for compat manipulation and upgrade
doc: remove deprecated compat commands
doc: update MDS upgrade procedure
mon,mds: use per-MDS compat to inform replacement
mon: do not update inline incompat except via mds
mds: add MDSMap method for creating null MDSMap
mds: only update beacon epoch if newer
mds: harden standby_mds lookup
mon/FSCommands: accept generic ostream rather than stringstream
include: add less verbose CompatSet dump
include: add dump operator for Feature
include: add const qualifier to appropriate CompatSet methods

Reviewed-by: Jos Collin <jcollin@redhat.com>
Reviewed-by: Ramana Raja <rraja@redhat.com>
3 years agocmake: let crimson-admin depend on legacy-option-headers 42573/head
Kefu Chai [Sat, 31 Jul 2021 09:01:38 +0000 (17:01 +0800)]
cmake: let crimson-admin depend on legacy-option-headers

legacy-option-headers provides global_legacy_options.h, which is in turn
required for building crimson-admin.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42400 from adk3798/offline-daemon-removal
Kefu Chai [Sat, 31 Jul 2021 08:35:37 +0000 (16:35 +0800)]
Merge pull request #42400 from adk3798/offline-daemon-removal

mgr/cephadm: don't remove daemons from offline hosts

Reviewed-by: Daniel Pivonka <dpivonka@redhat.com>
3 years agocmake: prefer static library when finding DPDK 42572/head
Kefu Chai [Sat, 31 Jul 2021 08:08:44 +0000 (16:08 +0800)]
cmake: prefer static library when finding DPDK

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agocmake: initialize dpdk_LIBRARIES with empty list
Kefu Chai [Sat, 31 Jul 2021 07:49:20 +0000 (15:49 +0800)]
cmake: initialize dpdk_LIBRARIES with empty list

set(dpdk_LIBRARIES) does not reset this variable, it leaves it
unchanged.

if pkg-config manages to find DPDK libraries, dpdk_LIBRARIES would be
set with a string like "rte_node;rte_graph;..." by
pkg_check_modules(dpdk QUIET libdpdk).

but we would want to set this variable to the import paths of the
required libraries. so reset it before appending them to this variable.

this change helps to address the build failure when building Ceph with
DPDK installed into system along with its .pc file.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agomgr/ServiceMap: do not include unused headers
Kefu Chai [Sat, 31 Jul 2021 04:13:40 +0000 (12:13 +0800)]
mgr/ServiceMap: do not include unused headers

<experimental/iterator> was included for std::experimental::make_ostream_joiner
in a968f65d784b3d6c6a172929aa293f09e6917fa6. but the code using
std::experimental::make_ostream_joiner was later rewritten in
ab0d8f2ae9f551e15a4c7bacbf69161e91263785, in which
std::experimental::make_ostream_joiner is not used anymore.

so let's drop it.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42553 from cfsnyder/wip_51961
Kefu Chai [Sat, 31 Jul 2021 05:10:06 +0000 (13:10 +0800)]
Merge pull request #42553 from cfsnyder/wip_51961

mgr/cephadm: fix exceptions causing stuck progress indicators

Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #42506 from p-se/mgr-prom-counter-type
Kefu Chai [Sat, 31 Jul 2021 05:08:01 +0000 (13:08 +0800)]
Merge pull request #42506 from p-se/mgr-prom-counter-type

mgr/prometheus: Fix metric types from gauge to counter

Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
3 years agolibrbd/crypto/luks: require libcryptsetup v2.0.5 42570/head
Kefu Chai [Sat, 31 Jul 2021 03:19:26 +0000 (11:19 +0800)]
librbd/crypto/luks: require libcryptsetup v2.0.5

- ubuntu focal ships libcryptsetup-dev (2:2.2.2),
- centos 8 app stream comes with cryptsetup-devel-2.3.3.
- openSUSE Leap 15.3 packages libcryptsetup-devel-2.3.4
- openSUSE Leap 15.2 packages libcryptsetup-devel-2.0.5

so we can drop the support for libcryptsetup < 2.0.5

see also ea3c1bfb9ef2edcdf572df0cb143c463b7551905

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoqa: update mds_pre_upgrade to no longer stop standbys 40511/head
Patrick Donnelly [Tue, 30 Mar 2021 21:55:54 +0000 (14:55 -0700)]
qa: update mds_pre_upgrade to no longer stop standbys

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoqa: update mds_pre_upgrade to disable standby-replay
Patrick Donnelly [Tue, 30 Mar 2021 21:53:04 +0000 (14:53 -0700)]
qa: update mds_pre_upgrade to disable standby-replay

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoqa: add tests for compat manipulation and upgrade
Patrick Donnelly [Tue, 30 Mar 2021 21:06:28 +0000 (14:06 -0700)]
qa: add tests for compat manipulation and upgrade

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agodoc: remove deprecated compat commands
Patrick Donnelly [Wed, 31 Mar 2021 13:45:08 +0000 (06:45 -0700)]
doc: remove deprecated compat commands

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agodoc: update MDS upgrade procedure
Patrick Donnelly [Tue, 30 Mar 2021 21:46:45 +0000 (14:46 -0700)]
doc: update MDS upgrade procedure

Now that CompatSet changes to the FSMap no longer cause old MDS to
suicide.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomon,mds: use per-MDS compat to inform replacement
Patrick Donnelly [Tue, 30 Mar 2021 21:26:08 +0000 (14:26 -0700)]
mon,mds: use per-MDS compat to inform replacement

This diff makes the following changes:

- FSMap::compat is now just a "default compat" of currently unknown
  utility. It is used when constructing a new file system but does
  not really have any effect or current use.

- The `mds compat *` CLI commands are deprecated. They manipulate
  the default compat which has no useful effect.

- Each MDS sends its compat to the mons in its beacon. This is from
  MDSMap::get_compat_set_all() at MDS boot. This CompatSet does not
  change for the duration of the MDS lifetime.

- Mons record each MDS compat in the FSMap to inform standby failover.
  An MDS is only promoted if it is compatible with the file system
  compat.

- Mons upgrade (merge) the file system compat when (a) the number of
  *in* MDS is 1 (effected by max_mds=1) and (b) the mons are promoting a
  standby with a new compat. A file system is never upgraded when there
  is more than 1 rank to prevent two MDS with incompatible compat.

- A suite of `fs compat` commands exist to manipulate the file system
  compat. These exist mostly for testing.

The consequence of these changes is that the upgrade procedure for MDS
can be updated to no longer require turning off all MDS but rank 0
before performing any upgrades. A CompatSet change would cause all MDS
receiving the new MDSMap to suicide due to incompatibility (if so).
Instead, the monitors will no longer assign an incompatible MDS to a
file system and enforce an upgrade procedure if incompatibilities exist.

Fixes: https://tracker.ceph.com/issues/49720
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomon: do not update inline incompat except via mds
Patrick Donnelly [Tue, 30 Mar 2021 21:07:46 +0000 (14:07 -0700)]
mon: do not update inline incompat except via mds

The MDS_FEATURE_INCOMPAT_INLINE feature indicates that an MDS knows how
to read/write inline data and that the file system may have it. The
separate setting for inline_data protects this file system feature.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomds: add MDSMap method for creating null MDSMap
Patrick Donnelly [Wed, 17 Mar 2021 16:55:04 +0000 (09:55 -0700)]
mds: add MDSMap method for creating null MDSMap

It's not necessary to distribute a CompatSet with the null mdsmap. We
only need to communicate that the MDS is not part of any map.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomds: only update beacon epoch if newer
Patrick Donnelly [Tue, 6 Apr 2021 15:20:54 +0000 (08:20 -0700)]
mds: only update beacon epoch if newer

This is a defensive programming change. We don't want the beacon epoch
to ever go backwards.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomds: harden standby_mds lookup
Patrick Donnelly [Tue, 30 Mar 2021 21:13:42 +0000 (14:13 -0700)]
mds: harden standby_mds lookup

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agomon/FSCommands: accept generic ostream rather than stringstream
Patrick Donnelly [Tue, 30 Mar 2021 19:40:58 +0000 (12:40 -0700)]
mon/FSCommands: accept generic ostream rather than stringstream

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoinclude: add less verbose CompatSet dump
Patrick Donnelly [Mon, 5 Apr 2021 14:55:20 +0000 (07:55 -0700)]
include: add less verbose CompatSet dump

For printing in `fs dump`.

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoinclude: add dump operator for Feature
Patrick Donnelly [Tue, 30 Mar 2021 21:10:28 +0000 (14:10 -0700)]
include: add dump operator for Feature

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoinclude: add const qualifier to appropriate CompatSet methods
Patrick Donnelly [Tue, 30 Mar 2021 19:39:36 +0000 (12:39 -0700)]
include: add const qualifier to appropriate CompatSet methods

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge PR #42513 into master
Patrick Donnelly [Fri, 30 Jul 2021 21:03:36 +0000 (14:03 -0700)]
Merge PR #42513 into master

* refs/pull/42513/head:
qa: multifs already enabled as default

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge PR #42499 into master
Patrick Donnelly [Fri, 30 Jul 2021 21:02:33 +0000 (14:02 -0700)]
Merge PR #42499 into master

* refs/pull/42499/head:
client:make sure only to update dir dist from auth mds

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
3 years agoMerge PR #42201 into master
Patrick Donnelly [Fri, 30 Jul 2021 21:00:19 +0000 (14:00 -0700)]
Merge PR #42201 into master

* refs/pull/42201/head:
qa: fold frag confs into conf/mds.yaml

Reviewed-by: Kefu Chai <kchai@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
3 years agoMerge pull request #42133 from sseshasa/wip-persist-osd-iops-cap-mclock
Neha Ojha [Fri, 30 Jul 2021 20:14:03 +0000 (13:14 -0700)]
Merge pull request #42133 from sseshasa/wip-persist-osd-iops-cap-mclock

osd: Add mechanism to avoid running OSD bench on every OSD init when mclock_scheduler is enabled

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #42550 from dang/wip-dang-zipper-writer
Daniel Gryniewicz [Fri, 30 Jul 2021 19:12:21 +0000 (15:12 -0400)]
Merge pull request #42550 from dang/wip-dang-zipper-writer

Zipper Writer API
Reviewed-by: cbodley@redhat.com
3 years agoMerge pull request #31454 from soumyakoduri/dbstore
Daniel Gryniewicz [Fri, 30 Jul 2021 18:00:30 +0000 (14:00 -0400)]
Merge pull request #31454 from soumyakoduri/dbstore

rgw/Zipper: DB Backend store

Reviewed-by: dang@redhat.com
Reviewed-by: amaredia@redhat.com
3 years agoRGW - Zipper - Proper Writer API 42550/head
Daniel Gryniewicz [Wed, 21 Jul 2021 14:56:59 +0000 (10:56 -0400)]
RGW - Zipper - Proper Writer API

With the implementation of DBStore, it was determined that the API used
for writing in Zipper was too tied to RADOS.  Implement a clean writing
API named Writer.

Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
3 years agoMerge pull request #42266 from dang/wip-dang-zipper-raw_obj
Daniel Gryniewicz [Fri, 30 Jul 2021 14:06:56 +0000 (10:06 -0400)]
Merge pull request #42266 from dang/wip-dang-zipper-raw_obj

Wip dang zipper raw obj

Reviewed-by: Soumya Koduri <skoduri@redhat.com>
3 years agoqa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout 42133/head
Sridhar Seshasayee [Mon, 5 Jul 2021 06:20:04 +0000 (11:50 +0530)]
qa/standalone/misc: ver-health.sh: Increase wait_for_health_string() timeout

Modified test cases:

1. ver-health.sh:
  a. TEST_check_version_health_1():
    To avoid intermittent timeouts observed in wait_for_health_string(),
    increase the wait time to 20 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoqa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler
Sridhar Seshasayee [Mon, 21 Jun 2021 12:47:32 +0000 (18:17 +0530)]
qa/standalone/scrub: Force a subset of scrub tests to use "wpq" scheduler

The following tests in the test files mentioned below use the
"osd_scrub_sleep" option to introduce delays during scrubbing to help
determine scrubbing states, validate reservations during scrubbing etc..
This works when using the "wpq" scheduler.

But when the "mclock_scheduler" is enabled, the "osd_scrub_sleep" is
disabled and overridden to 0. This is done to delegate the scheduling of
the background scrubs to the "mclock_scheduler" based on the set QoS
parameters. Due to this, the checks to verify the scrub states,
reservations etc. fail since the window to check them is very short
due to scrubs completing very quickly. This affects a small subset of
scrub tests mentioned below,

1. osd-scrub-dump.sh -> TEST_recover_unexpected()
2. osd-scrub-repair.sh -> TEST_auto_repair_bluestore_tag()
3. osd-scrub-test.sh -> TEST_scrub_abort(), TEST_deep_scrub_abort()

Only for the above tests, until there's a reliable way to query scrub
states with "--osd-scrub-sleep" set to 0, the "osd_op_queue" config
option is set to "wpq".

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoqa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler
Sridhar Seshasayee [Thu, 17 Jun 2021 11:41:58 +0000 (17:11 +0530)]
qa/standalone/erasure-code: Modify erasure-code tests for mclock scheduler

Modified test cases:

1. test-erasure-eio.sh:
  a. Test_ec_backfill_unfound():
    - Set osd_mclock_profile to high_recovery_ops profile.
    - Increase the wait for backfill_unfound timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoqa/standalone/osd-backfill: Modify backfill tests for mclock scheduler
Sridhar Seshasayee [Thu, 17 Jun 2021 12:19:29 +0000 (17:49 +0530)]
qa/standalone/osd-backfill: Modify backfill tests for mclock scheduler

Modified test cases:

1. osd-backfill-prio.sh:
  Set osd_op_queue = wpq for all tests since the mclock doesn't
  consider recovery priority as part of its scheduling algorithm.

2. osd-backfill-space.sh:
  Set osd_mclock_profile to high_recovery_ops and increase the wait
  for backfills timeout to 1200 secs for the following tests:
  - TEST_backfill_test_simple()
  - TEST_backfill_test_multi()
  - TEST_backfill_test_sametarget()
  - TEST_backfill_multi_partial()
  - TEST_ec_backfill_simple()
  - TEST_ec_backfill_multi()
  - SKIP_TEST_ec_backfill_multi_partial()
  - SKIP_TEST_ec_backfill_multi_partial()

3. osd-backfill-stats:
  - TEST_backfill_ec_down_all_out():
   Set osd_mclock_profile to high_recovery_ops and increase the wait
   for recovery timeout to 240 secs.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoqa/standalone/osd: Modify osd tests for mclock scheduler
Sridhar Seshasayee [Wed, 16 Jun 2021 11:26:54 +0000 (16:56 +0530)]
qa/standalone/osd: Modify osd tests for mclock scheduler

Modified test cases:
1. osd-recovery-prio.sh:
   Set osd_op_queue = wpq for all tests since mclock
   doesn't consider recovery priority as part of its
   scheduling algorithm.

2. osd-recovery-stats.sh:
   a. TEST_recovery_undersized():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for recovery timeout to 300 secs.

3. osd-rep-recov-eio.sh:
   a. TEST_rep_backfill_unfound():
     - Set osd_mclock_profile to high_recovery_ops profile.
     - Increase wait for backfill_unfound to 360 secs.

4. repeer-on-acting-back.sh:
   a. TEST_repeer_on_down_act():
     - Set osd_mclock_profile to high_recovery_ops profile.
       (To improve the test duration)

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoqa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.
Sridhar Seshasayee [Mon, 14 Jun 2021 08:06:23 +0000 (13:36 +0530)]
qa/standalone: Modify ceph-helpers.sh tests for mclock scheduler.

List of changes:

1. Remove the enforcement to use osd_op_queue=wpq when an osd is brought
   up in the following functions:
   - run_osd()
   - run_osd_filestore() and
   - activate_osd()

2. New functions:
   - get_op_scheduler() - Get the current osd_op_queue for an osd.

3. Modified test cases:
   - test_run_osd() - Add check for osd_max_backfill count.
     The mclock scheduler overrides the count to 1000.

4. New test cases:
   - test_activate_osd_after_mark_down()
   - test_get_op_scheduler()

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoosd: Add a new config option to forcibly run OSD benchmark on init
Sridhar Seshasayee [Thu, 24 Jun 2021 13:15:33 +0000 (18:45 +0530)]
osd: Add a new config option to forcibly run OSD benchmark on init

The new config option "osd_mclock_force_run_benchmark_on_init" is
introduced to allow a user to force run the OSD benchmark test on every
OSD boot-up even if the historical data about the OSD's iops capacity is
available on the MON config store. The 'force_run_benchmark' flag is set
to the value indicated by the new config option.

By default this new config option is set to false.

The utility of this option is to help refresh the OSD iops capacity
when the underlying device's performance characteristics have changed
significantly. In such cases, the OSD can be restarted with this option
enabled temporarily. Once the new iops capacity is updated to the MON
store, this option can be removed from the OSD's start-up config.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoosd: Add mechanism to avoid running OSD benchmark on every OSD boot-up
Sridhar Seshasayee [Thu, 24 Jun 2021 07:53:23 +0000 (13:23 +0530)]
osd: Add mechanism to avoid running OSD benchmark on every OSD boot-up

Use "mon_cmd_set_config()" to store the OSD's max iops capacity to
the MON store during the first bring-up. Don't run the OSD benchmark
test on subsequent boot-ups if a previously persisted iops capacity is
available on the MON store and is different from the default iops
capacity.

Add the 'force_run_benchmark' flag to force a run of the benchmark
in case the default iops capacity cannot be determined.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agocommon/config: Add methods to return the default value of a config option
Sridhar Seshasayee [Wed, 30 Jun 2021 09:22:50 +0000 (14:52 +0530)]
common/config: Add methods to return the default value of a config option

Add wrapper method "get_val_default()" to the ConfigProxy class that takes
the config option key to search. This method in-turn calls another method
with the same name added to md_config_t class that does the actual work of
searching for the config option. If the option is valid, _get_val_default()
is used to get the default value. Otherwise, the wrapper method returns
std::nullopt.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoosd: Add method to store config option key/value on the MON store
Sridhar Seshasayee [Thu, 24 Jun 2021 07:44:28 +0000 (13:14 +0530)]
osd: Add method to store config option key/value on the MON store

Add method mon_cmd_set_config() to save config option key and
value to the MON store. The ConfigMonitor command, 'config set' is
used to achieve this.

A corresponding get method is unnecessary since any config option
found on the MON store is loaded during OSD boot-up and set using
the md_config_t::set_mon_vals() method. Therefore, the existing
versions of ConfigProxy::get_val() method are sufficient to get
the latest value for the config option.

Fixes: https://tracker.ceph.com/issues/51464
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
3 years agoos/bluestore: use scope_guard do to cleanups 42562/head
Kefu Chai [Fri, 30 Jul 2021 10:28:01 +0000 (18:28 +0800)]
os/bluestore: use scope_guard do to cleanups

the combination of goto and labels is difficult to maintain.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42308 from jtlayton/wip-51644
Kefu Chai [Fri, 30 Jul 2021 11:03:19 +0000 (19:03 +0800)]
Merge pull request #42308 from jtlayton/wip-51644

osd: don't assert on zero-length OP_ZERO request

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #42523 from mgfritch/cephadm-fsid-validate
Kefu Chai [Fri, 30 Jul 2021 11:01:32 +0000 (19:01 +0800)]
Merge pull request #42523 from mgfritch/cephadm-fsid-validate

cephadm: validate `fsid` command arg

Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>
3 years agoMerge pull request #42528 from liewegas/fix-51816
Kefu Chai [Fri, 30 Jul 2021 11:00:30 +0000 (19:00 +0800)]
Merge pull request #42528 from liewegas/fix-51816

mon/LogMonitor: fix crash when cluster log file is not writeable

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #42538 from dsavineau/issue_51902
Kefu Chai [Fri, 30 Jul 2021 10:59:05 +0000 (18:59 +0800)]
Merge pull request #42538 from dsavineau/issue_51902

cephadm: don't use ctx.fsid for clean_cgroup

Reviewed-by: Adam King <adking@redhat.com>
3 years agoos/bluestore: always check retval of _open_db_and_around()
Kefu Chai [Fri, 30 Jul 2021 09:53:51 +0000 (17:53 +0800)]
os/bluestore: always check retval of _open_db_and_around()

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoos/bluestore: always initialize variable
Kefu Chai [Fri, 30 Jul 2021 09:31:46 +0000 (17:31 +0800)]
os/bluestore: always initialize variable

actually, target_size is always initialized as `id` should be
`BlueFS::BDEV_NEWWAL` or `BlueFS::BDEV_NEWDB`. and it is ensured by

ceph_assert(id == BlueFS::BDEV_NEWWAL || id == BlueFS::BDEV_NEWDB)

at the beginning of `BlueStore::migrate_to_new_bluefs_device()`.

but apparently, GCC is not able to figure this out:

../src/os/bluestore/BlueStore.cc: In member function ‘int BlueStore::migrate_to_new_bluefs_device(const std::set<int>&, int, const string&)’:
../src/os/bluestore/BlueStore.cc:6876:35: warning: ‘target_size’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 6876 |   r = _setup_block_symlink_or_file(
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
 6877 |     target_name,
      |     ~~~~~~~~~~~~
 6878 |     dev_path,
      |     ~~~~~~~~~
 6879 |     target_size,
      |     ~~~~~~~~~~~~
 6880 |     true);
      |     ~~~~~

in this change, target_size is always initialized to a known value to
silence the warning.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42558 from tchaikov/wip-crimson-cleanup
Kefu Chai [Fri, 30 Jul 2021 08:49:02 +0000 (16:49 +0800)]
Merge pull request #42558 from tchaikov/wip-crimson-cleanup

crimson/os: cleanups for building with Clang

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os: do not capture unused variable 42558/head
Kefu Chai [Fri, 30 Jul 2021 06:57:47 +0000 (14:57 +0800)]
crimson/os: do not capture unused variable

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agocrimson/os: reference this explicitly
Kefu Chai [Fri, 30 Jul 2021 06:56:50 +0000 (14:56 +0800)]
crimson/os: reference this explicitly

to silence false alarm from Clang that `this` is not used.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agocrimson/os: do not capture labels
Kefu Chai [Fri, 30 Jul 2021 05:40:09 +0000 (13:40 +0800)]
crimson/os: do not capture labels

structured binding does not define variables, so we cannot capture them
without defining variables in capture list.

in this change, instead of using a map<> for defining labels, just
create labels on the fly.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #42556 from tchaikov/wip-fair-mutex
Kefu Chai [Fri, 30 Jul 2021 06:30:09 +0000 (14:30 +0800)]
Merge pull request #42556 from tchaikov/wip-fair-mutex

common: add ceph::fair_mutex

Reviewed-by: Xiubo Li <xiubli@redhat.com>
3 years agoMerge pull request #42539 from cyx1231st/wip-seastore-cache-metrics-2
Kefu Chai [Fri, 30 Jul 2021 05:22:13 +0000 (13:22 +0800)]
Merge pull request #42539 from cyx1231st/wip-seastore-cache-metrics-2

crimson/os/seastore/cache: refine metrics

Reviewed-by: Samuel Just <sjust@redhat.com>
3 years agocommon: add ceph::fair_mutex 42556/head
Kefu Chai [Fri, 30 Jul 2021 04:44:52 +0000 (12:44 +0800)]
common: add ceph::fair_mutex

a mutex which enqueues and wakes up the waiters in FIFO order, to
ensure the fairness of the mutex.

Signed-off-by: Kefu Chai <kchai@redhat.com>
3 years agocrimson/os/seastore: reassign extent_types_t values and remove extent_type_to_index() 42539/head
Yingxin Cheng [Thu, 29 Jul 2021 06:52:20 +0000 (14:52 +0800)]
crimson/os/seastore: reassign extent_types_t values and remove extent_type_to_index()

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: misc cleanup to metrics
Yingxin Cheng [Wed, 28 Jul 2021 01:36:20 +0000 (09:36 +0800)]
crimson/os/seastore/cache: misc cleanup to metrics

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: remove derived metrics
Yingxin Cheng [Tue, 27 Jul 2021 08:50:52 +0000 (16:50 +0800)]
crimson/os/seastore/cache: remove derived metrics

Only keep the basic metrics to minimize the total number of metrics.

Derived metrics can be numerous according to different needs and can be
confusing with labels.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: remove counter labels
Yingxin Cheng [Tue, 27 Jul 2021 08:45:03 +0000 (16:45 +0800)]
crimson/os/seastore/cache: remove counter labels

Do not label metrics by counter type which could be confusing.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agocrimson/os/seastore/cache: cleanup, replace unordered_map by array
Yingxin Cheng [Tue, 27 Jul 2021 08:36:44 +0000 (16:36 +0800)]
crimson/os/seastore/cache: cleanup, replace unordered_map by array

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
3 years agoMerge pull request #40965 from rokj/patch-3
Ilya Dryomov [Thu, 29 Jul 2021 21:54:44 +0000 (23:54 +0200)]
Merge pull request #40965 from rokj/patch-3

doc: mention copying keyrings and adjust node names in manual deployment example

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 years agomgr/cephadm: fix exceptions causing stuck progress indicators 42553/head
Cory Snyder [Thu, 29 Jul 2021 20:08:19 +0000 (16:08 -0400)]
mgr/cephadm: fix exceptions causing stuck progress indicators

Added a try block to ensure that progress of applying a service spec is updated as failed in the case of exceptions.

Fixes: https://tracker.ceph.com/issues/51961
Signed-off-by: Cory Snyder <csnyder@iland.com>
3 years agomgr/cephadm: make return message for entering maintenance mode more explicit 42551/head
Adam King [Thu, 29 Jul 2021 18:30:00 +0000 (14:30 -0400)]
mgr/cephadm: make return message for entering maintenance mode more explicit

Signed-off-by: Adam King <adking@redhat.com>
3 years agorgw/dbstore: Fix library link issues 31454/head
Soumya Koduri [Thu, 29 Jul 2021 16:04:31 +0000 (21:34 +0530)]
rgw/dbstore: Fix library link issues

Now that rgw_common is no more linked with rgw_a library (commit#7b61667),
dbstore (rgw_sal_dbstore) should be linked directly to rgw_common.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
3 years agoMerge pull request #42432 from tchaikov/wip-mon-crush-cleanup
Kefu Chai [Thu, 29 Jul 2021 15:40:03 +0000 (23:40 +0800)]
Merge pull request #42432 from tchaikov/wip-mon-crush-cleanup

mon: let CrushWrapper::get_validated_type_id() return an optional<>

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 years agoMerge pull request #42524 from guits/cv_wait_destroy_tests
Dimitri Savineau [Thu, 29 Jul 2021 13:42:15 +0000 (09:42 -0400)]
Merge pull request #42524 from guits/cv_wait_destroy_tests

ceph-volume/tests: retry when destroying osd

3 years agoMerge pull request #42515 from rhcs-dashboard/decouple-unit-tests-from-build-dir
Ernesto Puerta [Thu, 29 Jul 2021 12:47:24 +0000 (14:47 +0200)]
Merge pull request #42515 from rhcs-dashboard/decouple-unit-tests-from-build-dir

mgr/dashboard: backend unit tests: decouple from build dir

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
3 years agomgr/prometheus: Fix metric types from gauge to counter 42506/head
Patrick Seidensal [Tue, 27 Jul 2021 13:18:46 +0000 (15:18 +0200)]
mgr/prometheus: Fix metric types from gauge to counter

Affected metrics:
- ceph_pool_rd
- ceph_pool_rd_bytes
- ceph_pool_rw
- ceph_pool_rw_bytes

Fixes: https://tracker.ceph.com/issues/51868
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
3 years agoMerge pull request #42516 from tchaikov/wip-win32-snappy
Kefu Chai [Thu, 29 Jul 2021 09:15:16 +0000 (17:15 +0800)]
Merge pull request #42516 from tchaikov/wip-win32-snappy

win32_deps_build.sh: bump snappy version to 1.1.9

Reviewed-by: Nathan Cutler <ncutler@suse.com>
3 years agocrimson/seastore: convert onode unit test to use interruptible future.
chunmei-liu [Wed, 28 Jul 2021 01:23:25 +0000 (18:23 -0700)]
crimson/seastore: convert onode unit test to use interruptible future.

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
3 years agocrimson/seastore: interruptible future for onode
chunmei-liu [Mon, 26 Jul 2021 07:00:11 +0000 (00:00 -0700)]
crimson/seastore: interruptible future for onode

Signed-off-by: chunmei-liu <chunmei.liu@intel.com>
3 years agodoc: adding missing command. changed node naming. 40965/head
Rok Jaklič [Wed, 21 Apr 2021 14:35:07 +0000 (16:35 +0200)]
doc: adding missing command. changed node naming.

Signed-off-by: Rok Jaklič <rokj@rasca.net>