]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
3 years agomgr/telemetry: fix waiting for mgr to warm up 45772/head
Yaarit Hatuka [Tue, 9 Nov 2021 18:31:11 +0000 (18:31 +0000)]
mgr/telemetry: fix waiting for mgr to warm up

1. The implementation of config_notify() in telemetry module sets the
flag for event, which is supposed to wake up the 'serve' thread whenever
a config option is changed. The problem is that we call config_notify()
at the beginning of serve(), before we enter its 'run' loop. This call
sets the event which cancels the 10 seconds wait for the mgr to warm up.
To fix this, we extract the logic of updating the config options to a
separate function (config_update_module_option()), and call it on
__init__, instead of calling config_notify() in serve().

2. We should always wait for the mgr to warm up here (10 seconds). In
case of a sporadic event (e.g. a config option change via CLI) the event
will be set, and wait will return immediately. We enforce this wait by
using time.sleep(10) instead of event.wait(10).

Fixes: https://tracker.ceph.com/issues/53204
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit fa5cc0ca081ca3cce552e0cb21a1e17273cf3482)

 Conflicts:
src/pybind/mgr/telemetry/module.py

- Several options under __init__ not present in Octopus
- No type checking in Octopus

3 years agoMerge pull request #44960 from BenoitKnecht/wip-54233-octopus
Yuri Weinstein [Fri, 25 Mar 2022 15:08:40 +0000 (08:08 -0700)]
Merge pull request #44960 from BenoitKnecht/wip-54233-octopus

octopus: mon: Abort device health when device not found

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
3 years agoMerge pull request #44546 from cfsnyder/wip-53719-octopus
Yuri Weinstein [Fri, 25 Mar 2022 15:08:03 +0000 (08:08 -0700)]
Merge pull request #44546 from cfsnyder/wip-53719-octopus

octopus: osd/OSDMapMapping: fix spurious threadpool timeout errors

Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
3 years agoMerge pull request #43224 from kotreshhr/wip-52629-octopus
Yuri Weinstein [Fri, 25 Mar 2022 15:07:04 +0000 (08:07 -0700)]
Merge pull request #43224 from kotreshhr/wip-52629-octopus

octopus: mgr/volumes: Fix permission during subvol creation with mode

Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agoMerge pull request #45613 from rhcs-dashboard/octopus-null-injection-fix
Ernesto Puerta [Thu, 24 Mar 2022 10:05:19 +0000 (11:05 +0100)]
Merge pull request #45613 from rhcs-dashboard/octopus-null-injection-fix

octopus: mgr/dashboard: fix "NullInjectorError: No provider for I18n

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 years agomgr/dashboard: fix "NullInjectorError: No provider for I18n 45613/head
Nizamudeen A [Thu, 24 Mar 2022 08:01:18 +0000 (13:31 +0530)]
mgr/dashboard: fix "NullInjectorError: No provider for I18n

Although I am not sure what's the root cause of this but this seems to
fix the test failure. I don't know if this is caused by the differnce in
angular versions between master and octopus but I still don't understand
why it didn't catch in the recent PR to this file (https://github.com/ceph/ceph/pull/44763)

Fixes: https://tracker.ceph.com/issues/55011
Signed-off-by: Nizamudeen A <nia@redhat.com>
3 years agoMerge pull request #45334 from idryomov/wip-client-upgrade-octopus-pacific-cleanup
Ilya Dryomov [Fri, 11 Mar 2022 11:47:57 +0000 (12:47 +0100)]
Merge pull request #45334 from idryomov/wip-client-upgrade-octopus-pacific-cleanup

qa/suites: clean up client-upgrade-octopus-pacific test

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
3 years agoMerge pull request #44763 from votdev/wip-53928-octopus
Ernesto Puerta [Thu, 10 Mar 2022 18:08:47 +0000 (19:08 +0100)]
Merge pull request #44763 from votdev/wip-53928-octopus

octopus: mgr/dashboard: Notification banners at the top of the UI have fixed height

Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Volker Theile <vtheile@suse.com>
3 years agoMerge pull request #44924 from p-se/wip-53883-octopus
Ernesto Puerta [Thu, 10 Mar 2022 18:06:44 +0000 (19:06 +0100)]
Merge pull request #44924 from p-se/wip-53883-octopus

octopus: mgr/dashboard: fix Grafana OSD/host panels

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: p-se <NOT@FOUND>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
3 years agoqa/suites: clean up client-upgrade-octopus-pacific test 45334/head
Ilya Dryomov [Thu, 10 Mar 2022 11:40:34 +0000 (12:40 +0100)]
qa/suites: clean up client-upgrade-octopus-pacific test

- fix .qa symlinks
- rename nautilus-client-x.yaml to octopus-client-x.yaml
- fix typos and remove stale comment
- remove 2-features permutation (it doesn't do anything useful as the
  workunit is run with RBD_FEATURES environment variable set and those
  features are explicitly passed to RBD.create and RBD.clone calls;
  the net effect is that the exact same job is run twice)

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoMerge pull request #45282 from ceph/wip-yuri-octopus-clients
Ilya Dryomov [Thu, 10 Mar 2022 10:34:10 +0000 (11:34 +0100)]
Merge pull request #45282 from ceph/wip-yuri-octopus-clients

qa/tests: added upgrade-clients/client-upgrade-octopus-quincy tests

Reviewed-by: Mykola Golub <mgolub@suse.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 years agoqa/tests: added upgrade-clients/client-upgrade-octopus-quincy tests 45282/head
Yuri Weinstein [Mon, 7 Mar 2022 16:33:39 +0000 (08:33 -0800)]
qa/tests: added upgrade-clients/client-upgrade-octopus-quincy tests

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
3 years agoMerge pull request #45261 from idryomov/wip-cmake-disable-dpdk-warnings-octopus
Kefu Chai [Sat, 5 Mar 2022 18:40:56 +0000 (02:40 +0800)]
Merge pull request #45261 from idryomov/wip-cmake-disable-dpdk-warnings-octopus

octopus: cmake: pass RTE_DEVEL_BUILD=n when building dpdk

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agocmake: pass RTE_DEVEL_BUILD=n when building dpdk 45261/head
Kefu Chai [Sat, 5 Mar 2022 04:49:57 +0000 (12:49 +0800)]
cmake: pass RTE_DEVEL_BUILD=n when building dpdk

ceph is still using the Makefile based building system for building
DPDK. and DPDK enables -Werror if RTE_DEVEL_BUILD is 'y' which is
enabled by default when the dpdk is built from a git repo.

but newer GCC is more picky than the older versions, to prevent
the possible FTBFS when we switch to newer GCC for building old
branches whose dpdk submodule might be include the changes addressing
those warnings. let's just disable this option.

the only effect of this option is to add -Werror to CFLAGS. but
the building warnings from DPDK is not our focus when developing
Ceph in the most cases. so it should be fine.

see also
https://github.com/ceph/dpdk/blob/eac901ce29be559b1bb5c5da33fe2bf5c0b4bfd6/doc/build-sdk-quick.txt#L18

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 91a616b26e830e85962200d0bac86bee7e022892)

Conflicts:
cmake/modules/BuildDPDK.cmake [ commit d3c315703ae6 ("cmake:
  pass -Wunused-but-set-variable when building dpdk") not in
  octopus ]

3 years agoMerge pull request #45169 from pponnuvel/wip-54382-octopus
Yuri Weinstein [Fri, 4 Mar 2022 16:13:06 +0000 (08:13 -0800)]
Merge pull request #45169 from pponnuvel/wip-54382-octopus

octopus: rbd-mirror: make mirror properly detect pool replayer needs restart

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
3 years agoMerge pull request #45147 from guits/wip-54400-octopus
Guillaume Abrioux [Fri, 4 Mar 2022 12:08:13 +0000 (13:08 +0100)]
Merge pull request #45147 from guits/wip-54400-octopus

octopus: ceph-volume: abort when passed devices have partitions

3 years agoMerge pull request #44800 from kotreshhr/wip-53947-octopus
Yuri Weinstein [Wed, 2 Mar 2022 17:08:39 +0000 (09:08 -0800)]
Merge pull request #44800 from kotreshhr/wip-53947-octopus

octopus: mgr/volumes: A few volumes plugin backport

Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agoMerge pull request #44624 from lxbsz/wip-53865
Yuri Weinstein [Wed, 2 Mar 2022 16:42:09 +0000 (08:42 -0800)]
Merge pull request #44624 from lxbsz/wip-53865

octopus: mds: directly return just after responding the link request

Reviewed-by: Jeff Layton <jlayton@redhat.com>
3 years agoMerge pull request #44976 from vshankar/wip-54242
Yuri Weinstein [Wed, 2 Mar 2022 15:50:50 +0000 (07:50 -0800)]
Merge pull request #44976 from vshankar/wip-54242

octopus: mds: ignore unknown client op when tracking op latency

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
3 years agoMerge pull request #45206 from idryomov/wip-rbd-qemu-iotests-8stream-octopus
Ilya Dryomov [Wed, 2 Mar 2022 10:49:16 +0000 (11:49 +0100)]
Merge pull request #45206 from idryomov/wip-rbd-qemu-iotests-8stream-octopus

octopus: backport qemu-iotests fixup for centos stream 8

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #44883 from guits/wip-54126-octopus
Guillaume Abrioux [Wed, 2 Mar 2022 09:18:29 +0000 (10:18 +0100)]
Merge pull request #44883 from guits/wip-54126-octopus

octopus: ceph-volume: fix error 'KeyError' with inventory

3 years agoMerge pull request #44768 from guits/wip-54008-octopus
Guillaume Abrioux [Wed, 2 Mar 2022 08:38:05 +0000 (09:38 +0100)]
Merge pull request #44768 from guits/wip-54008-octopus

octopus: ceph-volume: fix tags dict output in `lvm list`

3 years agoceph-volume: abort when passed devices have partitions 45147/head
Guillaume Abrioux [Wed, 23 Feb 2022 08:36:29 +0000 (09:36 +0100)]
ceph-volume: abort when passed devices have partitions

ceph-volume doesn't prevent from using db and/or wal devices
with existing partitions on them.
This can lead to a data loss situation.

Fixes: https://tracker.ceph.com/issues/54376
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 75c91a8c6f37a38d69d5da8b1e7d49d9c636230b)

3 years agoMerge pull request #44806 from mkogan1/wip-52900-octopus
Yuri Weinstein [Tue, 1 Mar 2022 20:01:22 +0000 (12:01 -0800)]
Merge pull request #44806 from mkogan1/wip-52900-octopus

octopus rgw: under fips, set flag to allow md5 in select rgw ops

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>
3 years agoMerge pull request #43656 from poulpreben/backport-object-lock-retain-date-iso8601
Yuri Weinstein [Tue, 1 Mar 2022 19:58:25 +0000 (11:58 -0800)]
Merge pull request #43656 from poulpreben/backport-object-lock-retain-date-iso8601

octopus: rgw: Dump Object Lock Retain Date as ISO 8601

Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 years agoMerge pull request #45110 from ljflores/wip-54351-octopus
Laura Flores [Tue, 1 Mar 2022 19:44:47 +0000 (13:44 -0600)]
Merge pull request #45110 from ljflores/wip-54351-octopus

octopus: mgr/dashboard: dashboard turns telemetry off when configuring report

3 years agoMerge pull request #45076 from chrisphoffman/wip-54297-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:32:05 +0000 (11:32 -0800)]
Merge pull request #45076 from chrisphoffman/wip-54297-octopus

octopus: cls/rbd: GroupSnapshotNamespace comparator violates ordering rules

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
3 years agoMerge pull request #45071 from idryomov/wip-qemu-task-rbd-package-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:31:36 +0000 (11:31 -0800)]
Merge pull request #45071 from idryomov/wip-qemu-task-rbd-package-octopus

octopus: qa/tasks/qemu: make sure block-rbd.so is installed

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #45019 from trociny/wip-47427-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:31:07 +0000 (11:31 -0800)]
Merge pull request #45019 from trociny/wip-47427-octopus

octopus: librbd: track complete async operation requests

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
3 years agoMerge pull request #45006 from sunnyku/wip-54169-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:30:34 +0000 (11:30 -0800)]
Merge pull request #45006 from sunnyku/wip-54169-octopus

octopus: mgr/rbd_support: fix schedule remove

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
3 years agoMerge pull request #45009 from idryomov/wip-rbd-help-positional-optional-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:30:06 +0000 (11:30 -0800)]
Merge pull request #45009 from idryomov/wip-rbd-help-positional-optional-octopus

octopus: rbd: mark optional positional arguments as such in help output

Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
Reviewed-by: Mykola Golub <mgolub@mirantis.com>
3 years agoMerge pull request #45004 from idryomov/wip-54128-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:29:03 +0000 (11:29 -0800)]
Merge pull request #45004 from idryomov/wip-54128-octopus

octopus: krbd: return error when no initial monitor address found

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #45001 from idryomov/wip-krbd-rxbounce-option-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:28:38 +0000 (11:28 -0800)]
Merge pull request #45001 from idryomov/wip-krbd-rxbounce-option-octopus

octopus: rbd: recognize rxbounce map option

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoMerge pull request #45000 from idryomov/wip-52522-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:28:07 +0000 (11:28 -0800)]
Merge pull request #45000 from idryomov/wip-52522-octopus

octopus: librbd: fix use-after-free on ictx in list_descendants()

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
3 years agoMerge pull request #44992 from idryomov/wip-writesame-fua-octopus
Yuri Weinstein [Tue, 1 Mar 2022 19:22:26 +0000 (11:22 -0800)]
Merge pull request #44992 from idryomov/wip-writesame-fua-octopus

octopus: librbd: honor FUA op flag for write_same() in write-around cache

Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
3 years ago15.2.16 v15.2.16
Jenkins Build Slave User [Tue, 1 Mar 2022 06:44:29 +0000 (06:44 +0000)]
15.2.16

3 years agoworkunits/rbd: remove lsb_release 45206/head
Ken Dreyer [Thu, 12 Aug 2021 14:44:48 +0000 (10:44 -0400)]
workunits/rbd: remove lsb_release

The lsb_release utility brings in a lot of other dependencies. Remove
it from the RBD workunit script.

Signed-off-by: Ken Dreyer <kdreyer@redhat.com>
(cherry picked from commit 65f4d6eb3664a4cc6120031679b8368cbc02a4a5)

3 years agoqa/workunits/rbd: use xenial version of qemu-iotests for centos stream 8
Ilya Dryomov [Tue, 3 Aug 2021 07:44:18 +0000 (09:44 +0200)]
qa/workunits/rbd: use xenial version of qemu-iotests for centos stream 8

It is already used for centos 8(.3) and rhel 8(.4).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit df96b85b58822b0f1a327b9d543eba4170312dc5)

3 years agomgr/volumes: Fix clone uid/gid mismatch 44800/head
Kotresh HR [Thu, 10 Feb 2022 05:34:41 +0000 (11:04 +0530)]
mgr/volumes: Fix clone uid/gid mismatch

This is the regression caused by commit 18b85c53a.
The 'set_attrs' function sets the uid/gid of the
group to the subvolume if uid/gid is not passed.
The attrs of the clone should match the source
snapshot. Hence, don't use the 'set_attrs'
function to set only the quota attrs for the
clone.

Fixes: https://tracker.ceph.com/issues/54066
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit b3c9e6b50cf4264538e4c41d19e7ebb8b2900c3a)

3 years agoqa: Add tests snapshot clone failure with quota
Kotresh HR [Wed, 12 Jan 2022 09:37:13 +0000 (15:07 +0530)]
qa: Add tests snapshot clone failure with quota

Fixes: https://tracker.ceph.com/issues/53848
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 7c0d31e52cea90e65152996024cabfa8a8fd299f)

Conflicts:
  qa/tasks/cephfs/test_volumes.py: The commit 103c7bdc70ca is
   not backported

3 years agomgr/volumes: Fix subvoume snapshot clone failure
Kotresh HR [Wed, 12 Jan 2022 09:31:53 +0000 (15:01 +0530)]
mgr/volumes: Fix subvoume snapshot clone failure

Problem:
The subvolume snapshot clone fails if the quota on the source
has exceeded. Since the quota is not strictly enforced at the
byte range, this is a possibility.

Cause:
The quota on the clone is set prior to copying the data
from the source. Hence the quota mostly get enforced before
copying the entire data from the source resulting in the
clone failure.

Solution:
Enforce quota on the clone after the data is copied.

Fixes: https://tracker.ceph.com/issues/53848
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 18b85c53af36d89a8c53b40cfc44fe06816a9733)

3 years agomgr/volumes: Improve debugging, add clone failure logs
Kotresh HR [Wed, 12 Jan 2022 05:43:20 +0000 (11:13 +0530)]
mgr/volumes: Improve debugging, add clone failure logs

Fixes: https://tracker.ceph.com/issues/53848
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 29ad638773715c92a0c77f10731bd69167e4ce80)

3 years agomgr/volumes: use dedicated libcephfs handles for subvolume calls and async jobs
Venky Shankar [Fri, 18 Jun 2021 07:13:01 +0000 (03:13 -0400)]
mgr/volumes: use dedicated libcephfs handles for subvolume calls and async jobs

Fixes: http://tracker.ceph.com/issues/51271
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit cb2883feac1a5c141a3d72120c2320f7a8ffdea8)

Conflicts:
  src/pybind/mgr/volumes/fs/async_cloner.py: The commit cf2a1ad65120 is
not backported
  src/pybind/mgr/volumes/fs/async_job.py: The commit cf2a1ad65120 is not
backported

3 years agomgr/volumes: Add config to insert delay at the beginning of the clone
Kotresh HR [Mon, 28 Feb 2022 10:53:39 +0000 (16:23 +0530)]
mgr/volumes: Add config to insert delay at the beginning of the clone

Added the config 'delay_snapshot_clone' to insert delay at the beginning
of the clone to avoid races in tests. The default value is set to 0.

Fixes: https://tracker.ceph.com/issues/48231
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 7588f985054282d2cff7f3582e995584b1fd20f8)

Conflicts:
 qa/tasks/cephfs/test_volumes.py: Conflicts due to tests ordering
 src/pybind/mgr/volumes/fs/volume.py: The commit e308bf898955 is not
backported
 src/pybind/mgr/volumes/module.py: The commit f002c6ce4033 is not
backported

3 years agomgr_util: move is_stopping from VolumeClient to CephfsClient
Jan Fajerski [Thu, 12 Mar 2020 12:25:43 +0000 (13:25 +0100)]
mgr_util: move is_stopping from VolumeClient to CephfsClient

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 8c94c21d098ac31708dda66cd35ccb0da5d1fd75)

3 years agomgr_util: rename ConnectionPool -> CephfsConnectionPool
Jan Fajerski [Thu, 12 Mar 2020 09:34:56 +0000 (10:34 +0100)]
mgr_util: rename ConnectionPool -> CephfsConnectionPool

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 51f6f64c123533bd2d33e3ec768af63a757a1d2a)

3 years agomgr_util: add CephfsClient implementation
Jan Fajerski [Wed, 18 Dec 2019 10:35:40 +0000 (11:35 +0100)]
mgr_util: add CephfsClient implementation

This pulls parts of the VolumesClient implementation into mgr_util to
make the CephFS specific pieces available to other mgr modules. To
reduce code duplication the VolumeClient now extends the CephfsClient
class to add the volume specific methods.

Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit a44de38b61d598fb0512ea48da0de4179d39b804)

src/pybind/mgr/mgr_util.py
src/pybind/mgr/tox.ini
src/pybind/mgr/volumes/fs/operations/volume.py
src/pybind/mgr/volumes/fs/volume.py
  Trivial conflicts because ofthe order of backports to octopus

3 years agorbd-mirror: make mirror properly detect pool replayer needs restart 45169/head
Mykola Golub [Fri, 18 Feb 2022 10:42:23 +0000 (10:42 +0000)]
rbd-mirror: make mirror properly detect pool replayer needs restart

When a PoolReplayer detects remote pool metadata change it
sets "stopping" flag expecting the Mirror will restart it.

Although setting "stopping" flag makes the PoolReplayer::run
thread to terminate, the thread's is_started function will still
return true until join is called (and reset the thread id).

This made impossible for the Mirror to detect (by calling
PoolReplayer::is_running) that the PoolReplayer needed restart.

Fixes: https://tracker.ceph.com/issues/54258
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit ad4a2990b87834fe4ae8c9111547d071aa6e75e5)

3 years agomgr/dashboard: Contact Info should be visible only when Ident channel is checked 45110/head
Sarthak0702 [Wed, 16 Feb 2022 12:45:35 +0000 (18:15 +0530)]
mgr/dashboard: Contact Info should be visible only when Ident channel is checked

Fixes:https://tracker.ceph.com/issues/54133
Signed-off-by: Sarthak0702 <sarthak.0702@gmail.com>
(cherry picked from commit 15211a6378a6fee9316f79ba0b27821891527c38)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/telemetry/telemetry.component.ts
- `this.loading` used in Octopus instead of `this.loadingReady()`

3 years agomgr/dashboard: telemetry activate: show ident fields when checked
Aaryan Porwal [Sun, 6 Jun 2021 22:08:37 +0000 (03:38 +0530)]
mgr/dashboard: telemetry activate: show ident fields when checked

Signed-off-by: Aaryan Porwal <aaryanporwal2233@gmail.com>
(cherry picked from commit ad5b3f200529fc0bc511ce99eed338afcaef6a62)

3 years agomgr/dashboard: dashboard turns telemetry off when configuring report
Sarthak0702 [Thu, 10 Feb 2022 19:50:42 +0000 (01:20 +0530)]
mgr/dashboard: dashboard turns telemetry off when configuring report

Signed-off-by: Sarthak0702 <sarthak.0702@gmail.com>
(cherry picked from commit 97c57adf8565756dbf24f3c46ed3916303903fb7)

Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cluster/telemetry/telemetry.component.ts
- `this.i18n()` was used in Octopus instead of `$localize`

3 years agocls/rbd: GroupSnapshotNamespace comparator violates ordering rules 45076/head
Ilya Dryomov [Mon, 14 Feb 2022 12:04:00 +0000 (13:04 +0100)]
cls/rbd: GroupSnapshotNamespace comparator violates ordering rules

For

  GroupSnapshotNamespace a(1, "group-1", "snap-2");
  GroupSnapshotNamespace b(1, "group-2", "snap-1");

both a < b and b < a evaluate to true.  This violates STL strict weak
ordering requirements which is a problem because GroupSnapshotNamespace
is used as a key in std::map (ictx->snap_ids at least), etc.

Fixes: https://tracker.ceph.com/issues/49792
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 830e72ab9d66c8f5703ea27da5249b02dd16ccd0)

3 years agoqa/suites/rbd: make sure block-rbd.so is installed 45071/head
Ilya Dryomov [Wed, 16 Feb 2022 09:32:26 +0000 (10:32 +0100)]
qa/suites/rbd: make sure block-rbd.so is installed

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8f0fd0af3da8581c47dc916303615264714a0489)

3 years agoqa/tasks/qemu: make sure block-rbd.so is installed
Ilya Dryomov [Tue, 15 Feb 2022 13:57:51 +0000 (14:57 +0100)]
qa/tasks/qemu: make sure block-rbd.so is installed

Fixes: https://tracker.ceph.com/issues/54286
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 525ff61cfc8516b4d7bed6f819b00a0b6cb7be0a)

3 years agoceph-volume: fix error 'KeyError' with inventory 44883/head
Guillaume Abrioux [Mon, 6 Dec 2021 09:24:46 +0000 (10:24 +0100)]
ceph-volume: fix error 'KeyError' with inventory

The tag ceph.cluster_name is always set at the end.
The only way it could be absent was if the osd prepare
has been interrupted between [1] and [2].

[1] https://github.com/ceph/ceph/blob/v14.2.11/src/ceph-volume/ceph_volume/devices/lvm/strategies/bluestore.py#L355-L387
[2] https://github.com/ceph/ceph/blob/v14.2.11/src/ceph-volume/ceph_volume/devices/lvm/prepare.py

Although the code received tremendous changes meantime
and this error shouldn't show up again, we need to handle
the case where this tag wouldn't have been set.

Fixes: https://tracker.ceph.com/issues/44356
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 691660c42eaa568a754670e093c512aa041d1479)

3 years agoMerge pull request #44986 from badone/wip-octopus-ceph-ansible-move-to-stream
Brad Hubbard [Wed, 16 Feb 2022 03:26:57 +0000 (13:26 +1000)]
Merge pull request #44986 from badone/wip-octopus-ceph-ansible-move-to-stream

octopus: qa/ceph-ansible: Move to Centos Stream

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
3 years agolibrbd: reset complete async request expiration time 45019/head
Mykola Golub [Tue, 29 Sep 2020 09:07:56 +0000 (10:07 +0100)]
librbd: reset complete async request expiration time

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 5c3ba95b4d1208868db1d00d1ba06ed086e1b268)

Conflicts:
src/librbd/ImageWatcher.cc (no quiesce requests in octopus,
            AsyncRequestId == operator is not supported, use !=)

3 years agoMerge pull request #44791 from guits/wip-54023-octopus
Yuri Weinstein [Mon, 14 Feb 2022 20:08:50 +0000 (12:08 -0800)]
Merge pull request #44791 from guits/wip-54023-octopus

octopus: ceph-volume: improve mpath devices support

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
3 years agoceph-volume: fix typo in tests 44791/head
Guillaume Abrioux [Tue, 14 Dec 2021 10:08:48 +0000 (11:08 +0100)]
ceph-volume: fix typo in tests

This fixes 2 typo in ceph-volume tests.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit b07bd3e0e17021e0cf9773f916fad954f12254ed)

3 years agodoc/ceph-volume: fix a typo
Guillaume Abrioux [Tue, 14 Dec 2021 09:42:09 +0000 (10:42 +0100)]
doc/ceph-volume: fix a typo

This fixes a typo in ceph-volume documentation.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5d0a3cee5d7021dafd1e166e17946689b4bb90b7)

3 years agoceph-volume: add a test `test_mpath_device_is_device`
Guillaume Abrioux [Tue, 14 Dec 2021 09:40:35 +0000 (10:40 +0100)]
ceph-volume: add a test `test_mpath_device_is_device`

This test checks that Device.is_device() returns True for a mpath device.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 0280ff6df09bc26107bc97446e9d5c18fbc582e9)

3 years agoceph-volume: improve mpath devices support
Guillaume Abrioux [Tue, 14 Dec 2021 08:57:10 +0000 (09:57 +0100)]
ceph-volume: improve mpath devices support

ee8887f4c0ff4f91117f31b621b95c8d08019130 was intended for adding
mpath devices support in ceph-volume but it has missed the lvm batch scenario.
This also fixes the zapping of mpath devices prepared with `ceph-volume raw`

Fixes: https://tracker.ceph.com/issues/52908
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 601ff7ed0a3ba5172b6bd886ca8ba2bd4d9e655a)

3 years agoMerge pull request #44974 from guits/wip-54245-octopus
Guillaume Abrioux [Mon, 14 Feb 2022 15:42:47 +0000 (16:42 +0100)]
Merge pull request #44974 from guits/wip-54245-octopus

octopus: ceph-volume: honour osd_dmcrypt_key_size option

3 years agolibrbd: track complete async operation return code
Mykola Golub [Mon, 21 Sep 2020 13:32:17 +0000 (14:32 +0100)]
librbd: track complete async operation return code

Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 96bc20445afb0b2579a79b54a7854bc4a23f5b62)

Conflicts:
src/librbd/ImageWatcher.cc (no quiesce requests in octopus)

3 years agolibrbd: track complete async operation requests
Mykola Golub [Sun, 6 Sep 2020 12:48:53 +0000 (13:48 +0100)]
librbd: track complete async operation requests

to prevent duplicate maintenance operations due to RPC hiccups.

Fixes: https://tracker.ceph.com/issues/46803
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit 07fbc4b71df450655dec046c10e919dbfde989ba)

Conflicts:
src/librbd/ImageWatcher.cc (no quiesce requests in octopus)

3 years agorbd: mark optional positional arguments as such in help output 45009/head
Ilya Dryomov [Tue, 8 Feb 2022 09:11:49 +0000 (10:11 +0100)]
rbd: mark optional positional arguments as such in help output

Currently at least five commands have optional positional arguments.

Overloading po::value<std::string>()->default_value("") for this
is a bit sneaky but nothing better fits into the existing Shell.cc
framework.

Note that strictly speaking "[<interval>] [<start-time>]" should be
"[<interval> [<start-time>]]" but we aren't doing that here because
"ceph" command doesn't do it either.

Fixes: https://tracker.ceph.com/issues/54191
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit cb0df397aae552adc80713ca0d59ed1ebfd3b1be)

3 years agorbd: ensure the help printer doesn't print past the end of the line
Jason Dillaman [Wed, 21 Oct 2020 19:15:09 +0000 (15:15 -0400)]
rbd: ensure the help printer doesn't print past the end of the line

When long command names and long optional names are combined,
it's possible for the help text to be printed beyond the 80
character limit.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 51478deab22178d01db7fa8632ba72071b4dfc38)

3 years agoqa/workunits/rbd: improve schedule add/remove cli test 45006/head
Sunny Kumar [Wed, 19 Jan 2022 13:15:52 +0000 (13:15 +0000)]
qa/workunits/rbd: improve schedule add/remove cli test

This patch adds few tests to cover schedule add/remove with invalid
inputs.

Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
(cherry picked from commit a9312d4777a82d8f2d8766a011f10952f84d3f27)

3 years agomgr/rbd_support: fix schedule remove
Sunny Kumar [Fri, 12 Nov 2021 16:27:55 +0000 (16:27 +0000)]
mgr/rbd_support: fix schedule remove

Issue:

If we provide a random string in the schedule remove
command the entire schedule at specified level gets
removed.

Fixes: https://tracker.ceph.com/issues/53250
Signed-off-by: Sunny Kumar <sunkumar@redhat.com>
(cherry picked from commit 1b62447071a900b9fa7d856617cb7db9e030f91e)

3 years agokrbd: return error when no initial monitor address found 45004/head
Burt Holzman [Wed, 2 Feb 2022 15:18:52 +0000 (09:18 -0600)]
krbd: return error when no initial monitor address found

Since we filter monitor addresses based on ms_mode, check that at
least one address was found.

Otherwise, we mismatch arguments when calling sysfs/add_single_major
which emits a misleading error message to dmesg:

  libceph: resolve 'name=user1' (ret=-3): failed
  libceph: parse_ips bad ip 'name=user1,key=client.user1'

Fixes: https://tracker.ceph.com/issues/54128
Signed-off-by: Burt Holzman <burt@fnal.gov>
(cherry picked from commit 0076ffc86e043af7aedc127df8661eaf87fc1c58)

3 years agoqa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage 45001/head
Ilya Dryomov [Mon, 31 Jan 2022 13:08:26 +0000 (14:08 +0100)]
qa/suites/krbd: add legacy+rxbounce and crc+rxbounce coverage

For basic, rbd and rbd-nomount subsuites, replace legacy and crc
facets with "legacy or legacy+rxbounce" and "crc or crc+rxbounce"
facets (chosen at random).

For fsx, singleton and thrash subsuites, add legacy+rxbounce and
crc+rxbounce facets and drop prefer-crc facet.  The expected behaviour
of the latter depends on cluster configuration and should be tested
separately.

The total number of jobs remains the same.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit fbf8c1d68be60ab294719113edbd7f459a755c15)

3 years agoqa: krbd rxbounce test
Ilya Dryomov [Thu, 27 Jan 2022 16:15:01 +0000 (17:15 +0100)]
qa: krbd rxbounce test

Lives in its own directory since ms_mode doesn't need to be permuted
here.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 95d30b534ef65207168397dd25ca7213c8290568)

3 years agorbd: recognize rxbounce map option
Ilya Dryomov [Wed, 26 Jan 2022 18:36:26 +0000 (19:36 +0100)]
rbd: recognize rxbounce map option

Fixes: https://tracker.ceph.com/issues/54063
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8d2a456d7055cfb64e6bb9927187e2240b8c4d2a)

3 years agolibrbd: report correct error for ictx->state->close() 45000/head
Ilya Dryomov [Tue, 7 Sep 2021 19:01:51 +0000 (21:01 +0200)]
librbd: report correct error for ictx->state->close()

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 22903c3965b15245b464540ccfb3404bf45f0627)

3 years agolibrbd: fix use-after-free on ictx in list_descendants()
Wang ShuaiChao [Tue, 7 Sep 2021 08:43:11 +0000 (16:43 +0800)]
librbd: fix use-after-free on ictx in list_descendants()

Ictx is deleted when "ictx->state->open()" and "ictx->state->close()"
fail, and then "lderr(ictx->cct)" crashes.

Fixes: https://tracker.ceph.com/issues/52522
Signed-off-by: Wang ShuaiChao <wangshuaich@chinatelecom.cn>
(cherry picked from commit fa5d61ee5144f67cba53d54d36013614183e53a3)

3 years agoMerge pull request #44978 from ifed01/wip-ifed-clist-pend-bug-oct
Yuri Weinstein [Fri, 11 Feb 2022 22:41:42 +0000 (14:41 -0800)]
Merge pull request #44978 from ifed01/wip-ifed-clist-pend-bug-oct

octopus:  os/bluestore: list obj which equals to pend

Reviewed-by: Mykola Golub <mgolub@mirantis.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agolibrbd: honor FUA op flag for write_same() in write-around cache 44992/head
Ilya Dryomov [Fri, 15 Oct 2021 16:13:55 +0000 (18:13 +0200)]
librbd: honor FUA op flag for write_same() in write-around cache

WriteAroundObjectDispatch::write_same() should pass op_flags through
to dispatch_io() so that it can bypass the cache if needed.

Fixes: https://tracker.ceph.com/issues/52956
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 0dcea098cf4d51bed31d8646dc3b533514c08a72)

3 years agomgr/prometheus: Fix regression with OSD/host details/overview dashboards 44924/head
Patrick Seidensal [Mon, 25 Oct 2021 13:00:14 +0000 (15:00 +0200)]
mgr/prometheus: Fix regression with OSD/host details/overview dashboards

Fix issues with PromQL expressions and vector matching with the
`ceph_disk_occupation` metric.

As it turns out, `ceph_disk_occupation` cannot simply be used as
expected, as there seem to be some edge cases for users that have
several OSDs on a single disk.  This leads to issues which cannot be
approached by PromQL alone (many-to-many PromQL erros).  The data we
have expected is simply different in some rare cases.

I have not found a sole PromQL solution to this issue. What we basically
need is the following.

1. Match on labels `host` and `instance` to get one or more OSD names
   from a metadata metric (`ceph_disk_occupation`) to let a user know
   about which OSDs belong to which disk.

2. Match on labels `ceph_daemon` of the `ceph_disk_occupation` metric,
   in which case the value of `ceph_daemon` must not refer to more than
   a single OSD. The exact opposite to requirement 1.

As both operations are currently performed on a single metric, and there
is no way to satisfy both requirements on a single metric, the intention
of this commit is to extend the metric by providing a similar metric
that satisfies one of the requirements. This enables the queries to
differentiate between a vector matching operation to show a string to
the user (where `ceph_daemon` could possibly be `osd.1` or
`osd.1+osd.2`) and to match a vector by having a single `ceph_daemon` in
the condition for the matching.

Although the `ceph_daemon` label is used on a variety of daemons, only
OSDs seem to be affected by this issue (only if more than one OSD is run
on a single disk).  This means that only the `ceph_disk_occupation`
metadata metric seems to need to be extended and provided as two
metrics.

`ceph_disk_occupation` is supposed to be used for matching the
`ceph_daemon` label value.

    foo * on(ceph_daemon) group_left ceph_disk_occupation

`ceph_disk_occupation_human` is supposed to be used for anything where
the resulting data is displayed to be consumed by humans (graphs, alert
messages, etc).

    foo * on(device,instance)
    group_left(ceph_daemon) ceph_disk_occupation_human

Fixes: https://tracker.ceph.com/issues/52974
Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 18d3a71618a5e3bc3cbd0bce017fb7b9c18c2ca0)

Conflicts:
        monitoring/grafana/dashboards/host-details.json
        monitoring/grafana/dashboards/hosts-overview.json
        monitoring/grafana/dashboards/jsonnet/grafana_dashboards.jsonnet
        monitoring/grafana/dashboards/osd-device-details.json
        monitoring/grafana/dashboards/tests/features/hosts_overview.feature
        src/pybind/mgr/prometheus/module.py

- Octopus does not generate Grafana dashboards using jsonnet, hence
  grafana_dashboards.jsonnet was removed.
- Octopus does not support features, hence hosts_overview.feature was
  removed.
- Features implemented in prometheus/module.py that never were
  backported to Octopus were removed.
- `tox.ini` file adapted to include mgr/prometheus tests introduced by
  the backport.
- Add `cherrypy` to src/pybind/mgr/requirements.txt to fix Prometheus
  unit testing.

3 years agoos/bluestore: list obj which equals to pend 44978/head
Kefu Chai [Fri, 24 Sep 2021 15:33:03 +0000 (23:33 +0800)]
os/bluestore: list obj which equals to pend

otherwise we could have failures like

scrub : stat mismatch, got 3/4 objects, 1/2 clones, 3/4 dirty, 3/4 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 49/56 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes."

where the numbers of scrubbed object, clones, dirty and omap are always
less than the total number of corresponding numbers, if the PG contains
object(s) whose hash happens to be 0xffffffff.

in this change, if the calculated hash of the upper bound is greater
than the maximum possible number represented by uint32_t, in addition to
setting the hash of the upper bound hobj to 0xffffffff, we also set the
nspace of hobj of the upper bound to "\xff", so that the upper bound
is greater than an hobj whose hash happens to be 0xfffffff. please note,
the nspace of "\xff" is not an ascii string, so it's not likely to be
less than a real-world nspace of an hobj.

with this new *greater* upper bound, we are able to include the previous
missing hobj when listing the objects in a PG. so the scrub won't be
annoyed when the number of objects does not match.

Fixes: https://tracker.ceph.com/issues/52705
Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit ffab13bcd9006c1f961a24b8016df9d1fe06ba1d)

 Conflicts:
src/os/bluestore/BlueStore.cc
 - get_coll_range function signature alignment

3 years agoMerge pull request #44614 from ifed01/wip-ifed-fix-ram-gridy-fsck-oct
Yuri Weinstein [Thu, 10 Feb 2022 14:37:21 +0000 (06:37 -0800)]
Merge pull request #44614 from ifed01/wip-ifed-fix-ram-gridy-fsck-oct

octopus: os/bluestore: make shared blob fsck much less RAM-greedy.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
3 years agoos/bluestore: use scope_guard to log latency
Kefu Chai [Wed, 22 Sep 2021 16:42:33 +0000 (00:42 +0800)]
os/bluestore: use scope_guard to log latency

simpler this way, and avoid using `goto`.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 715a83822ebc1a3d102d1ec13323b69db0600719)

3 years agoceph-volume/activate: load the config from lv tag 44974/head
Guillaume Abrioux [Thu, 10 Feb 2022 01:23:51 +0000 (02:23 +0100)]
ceph-volume/activate: load the config from lv tag

When `ceph-volume lvm trigger` is called with an OSD where the tag
`ceph.cluster_name` is not 'ceph', it fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ac1ec65cb2a582b2ae550202cc9911f993943f2)

3 years agoceph-volume/tests: use centos/stream8 images
Guillaume Abrioux [Wed, 9 Feb 2022 17:33:27 +0000 (18:33 +0100)]
ceph-volume/tests: use centos/stream8 images

Since recent move from CentOS 8 to CentOS Stream 8, let's do the same here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b793952bbac7973b97d245c282165daadeabb51)

3 years agoceph-volume/tests: add tests in util/encryption.py
Guillaume Abrioux [Wed, 9 Feb 2022 16:04:19 +0000 (17:04 +0100)]
ceph-volume/tests: add tests in util/encryption.py

this adds some unit tests in order to cover `luks_format()` and `luks_open()`
in `util/encryption.py`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db48850745f218e08cf53ae2d8edf3428f2b4010)

3 years agoceph-volume: honour osd_dmcrypt_key_size option
Guillaume Abrioux [Tue, 25 Jan 2022 09:25:53 +0000 (10:25 +0100)]
ceph-volume: honour osd_dmcrypt_key_size option

ceph-volume doesn't honour osd_dmcrypt_key_size.
It means the default size is always applied.

It also changes the default value in `get_key_size_from_conf()`

From cryptsetup manpage:

> For XTS mode you can optionally set a key size of 512 bits with the -s option.

Using more than 512bits will end up with the following error message:

```
Key size in XTS mode must be 256 or 512 bits.
```

Fixes: https://tracker.ceph.com/issues/54006
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 47c33179f9a15ae95cc1579a421be89378602656)

3 years agomds: ignore unknown client op when tracking op latency 44976/head
Venky Shankar [Mon, 13 Dec 2021 06:15:19 +0000 (01:15 -0500)]
mds: ignore unknown client op when tracking op latency

Server::handle_client_request() ignores unknown client operation
by returning -ENOTSUPP, however, Server::perf_gather_op_latency()
aborts on unknown client op, thereby causing -ENOTSUPP to never
reach the client. ceph_abort() seems unnecessary here.

Note, we could have invoked Server::perf_gather_op_latency()
when the return value to client is not -ENOTSUPP, however,
a valid client operation *might* just return -ENOTSUPP in
some cases.

@mchangir ran into this with his getvxattr op changes (PR #42001).

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit 2f4060b8c41004d10d9a64676ccd847f6e1304dd)

3 years agooctopus: qa/ceph-ansible: Move to Centos Stream 44986/head
Brad Hubbard [Thu, 10 Feb 2022 03:16:23 +0000 (13:16 +1000)]
octopus: qa/ceph-ansible: Move to Centos Stream

Centos 8 is eol and its package repos no longer exist.

Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
3 years agoMerge pull request #44860 from aclamk/wip-53392-octopus
Yuri Weinstein [Thu, 10 Feb 2022 01:10:58 +0000 (17:10 -0800)]
Merge pull request #44860 from aclamk/wip-53392-octopus

octopus: Fix data corruption in bluefs truncate()

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agosystemd: Set PrivateDevices=false in ceph-mon@.service 44960/head
Benoît Knecht [Mon, 6 Dec 2021 08:29:43 +0000 (09:29 +0100)]
systemd: Set PrivateDevices=false in ceph-mon@.service

The `ceph-mon` daemon needs access to block devices to check the health of the
disk that backs its DB store (#24151).

Fixes: https://tracker.ceph.com/issues/52416
Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit 96de1c97608b81ab80d4be3160ac05d11d4b23c8)

Conflicts:
      systemd/ceph-mon@.service.in

3 years agomon: Abort device health when device not found
Benoît Knecht [Mon, 6 Dec 2021 08:14:56 +0000 (09:14 +0100)]
mon: Abort device health when device not found

If `store->get_devname()` returns an empty device name, it means it couldn't
determine the device that backs the monitor DB store directory.

This can happen if `ceph-mon` runs with `PrivateDevices=yes` in systemd, or
within a container where the host `/dev` isn't exposed.

This commit makes sure we abort trying to get the device health at that point,
and return an appropriate error.

Signed-off-by: Benoît Knecht <bknecht@protonmail.ch>
(cherry picked from commit c05a3b769dccf5fe839a2150e39d899516469164)

3 years agoMerge pull request #44929 from adk3798/octopus-cephadm-qa-centos-8-stream
Yuri Weinstein [Tue, 8 Feb 2022 17:26:29 +0000 (09:26 -0800)]
Merge pull request #44929 from adk3798/octopus-cephadm-qa-centos-8-stream

octopus: qa/suites/rados/cephadm: use centos 8.stream

Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
3 years agoMerge pull request #44864 from cbodley/wip-54089
Yuri Weinstein [Tue, 8 Feb 2022 15:21:49 +0000 (07:21 -0800)]
Merge pull request #44864 from cbodley/wip-54089

octopus: qa: remove centos8 from supported distros

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoqa/suites/rados/cephadm: remove centos 8.2, 8.3 44929/head
Adam King [Mon, 7 Feb 2022 18:18:17 +0000 (13:18 -0500)]
qa/suites/rados/cephadm: remove centos 8.2, 8.3

Signed-off-by: Adam King <adking@redhat.com>
3 years agoqa/suites/orch/cephadm: add 8.stream + container_tools
Sage Weil [Mon, 8 Nov 2021 17:01:45 +0000 (11:01 -0600)]
qa/suites/orch/cephadm: add 8.stream + container_tools

Signed-off-by: Sage Weil <sage@newdream.net>
Conflicts:
qa/suites/rados/cephadm/upgrade/1-start-distro/1-start-centos_8.stream_container-tools.yaml

3 years agoosd/OSDMapMapping: fix spurious threadpool timeout errors 44546/head
Sage Weil [Mon, 6 Dec 2021 18:12:50 +0000 (13:12 -0500)]
osd/OSDMapMapping: fix spurious threadpool timeout errors

We were passing a grace of zero seconds to our temporary work queue, which
led to the HeartbeatMap issuing cpu_tp timeout errors to the log.  By using
a non-zero grace period we can avoid these.  Use the same default grace
we use for the workqueue itself when it goes to sleep.

Fixes: https://tracker.ceph.com/issues/53506
Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 30ac5e79352839e285ec2eef6d603226d9071db4)

Conflicts:
src/osd/OSDMapMapping.h

Cherry-pick notes:
- Octopus was passing integer 0 as WorkQueue time_t args vs. ceph::timespan::zero()

3 years agoqa/rgw: rgw/verify no longer pins centos 8.0 44864/head
Casey Bodley [Mon, 31 Jan 2022 22:23:25 +0000 (17:23 -0500)]
qa/rgw: rgw/verify no longer pins centos 8.0

the symlink rgw/verify/centos_latest.yaml already selects centos

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0fad609d4dca01335abda6c48ae2663a8fd15494)

3 years agoqa/distros: remove duplicate centos_8.stream.yaml from supported
Casey Bodley [Mon, 31 Jan 2022 19:52:04 +0000 (14:52 -0500)]
qa/distros: remove duplicate centos_8.stream.yaml from supported

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 3b71b41190bbb0af5026babc82266541b6398e92)

3 years agoqa/distros: centos_8.yaml is now a symlink to centos_8.stream.yaml
Casey Bodley [Mon, 31 Jan 2022 19:51:00 +0000 (14:51 -0500)]
qa/distros: centos_8.yaml is now a symlink to centos_8.stream.yaml

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 0f4e51f05f9b340fe6128b46ea4601ecf01625d2)

3 years agoqa/distro/supported: add centos 8.stream
Sage Weil [Fri, 18 Jun 2021 23:07:30 +0000 (18:07 -0500)]
qa/distro/supported: add centos 8.stream

Signed-off-by: Sage Weil <sage@newdream.net>
(cherry picked from commit 577cbd162ed63bcee9f027776d72d569d9adf93b)

3 years agoos/bluestore/bluefs: Fix data corruption in truncate() 44860/head
Adam Kupczyk [Tue, 2 Nov 2021 15:57:32 +0000 (16:57 +0100)]
os/bluestore/bluefs: Fix data corruption in truncate()

It is possible to create condition in which a BlueFS contains file that is corrupted.
It can happen when BlueFS replay log is on device A and we just wrote to device B and truncated file.

Scenario:
1) write to file h1 on SLOW device
2) flush h1 (initiate transfer, but no fdatasync yet)
3) truncate h1
4) write to file h2 on DB
5) fsync h2 (forces replay log to be written, after fdatasync to DB)
6) poweroff

Fixes: https://tracker.ceph.com/issues/53129
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 49b7b44b3b5c94ee401562e603999e2b3bd8f9a2)