]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
3 years agorgw/admin: fix radosgw-admin datalog list max-entries issue 45499/head
Yuval Lifshitz [Wed, 2 Feb 2022 14:53:21 +0000 (16:53 +0200)]
rgw/admin: fix radosgw-admin datalog list max-entries issue

Fixes: https://tracker.ceph.com/issues/54116
Signed-off-by: Yuval Lifshitz <ylifshit@redhat.com>
(cherry picked from commit bd429ed9bec8aa4dc17c61a07e30987f50f7e5f6)

3 years agoMerge pull request #45342 from benhanokh/wip-54523-quincy
Yuri Weinstein [Wed, 16 Mar 2022 20:44:02 +0000 (13:44 -0700)]
Merge pull request #45342 from benhanokh/wip-54523-quincy

quincy: OSD::Modify OSD Fast-Shutdown to work safely i.e. quiesce all activit…

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
3 years agoMerge pull request #45322 from ljflores/wip-54467-quincy
Yuri Weinstein [Wed, 16 Mar 2022 20:42:40 +0000 (13:42 -0700)]
Merge pull request #45322 from ljflores/wip-54467-quincy

quincy: osd: require osd_pg_max_concurrent_snap_trims > 0

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #45237 from k0ste/wip-54449-quincy
Yuri Weinstein [Wed, 16 Mar 2022 20:41:06 +0000 (13:41 -0700)]
Merge pull request #45237 from k0ste/wip-54449-quincy

quincy: mgr/prometheus: Added `avail_raw` field for Pools DF Prometheus mgr module

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #45193 from ronen-fr/wip-rf-45068-quincy
Yuri Weinstein [Wed, 16 Mar 2022 20:40:26 +0000 (13:40 -0700)]
Merge pull request #45193 from ronen-fr/wip-rf-45068-quincy

quincy: osd/scrub: stop sending bogus digest-update event messages

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #45321 from kamoltat/wip-ksirivad-backport-quincy-fix-autoscale-doc
Kamoltat Sirivadhna [Mon, 14 Mar 2022 20:30:43 +0000 (16:30 -0400)]
Merge pull request #45321 from kamoltat/wip-ksirivad-backport-quincy-fix-autoscale-doc

quincy: doc/rados/operations/placement-groups: fix --bulk docs
Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoos/bluestore: Fix problem with allocation desync 45342/head
Gabriel BenHanokh [Mon, 7 Mar 2022 15:36:34 +0000 (17:36 +0200)]
os/bluestore: Fix problem with allocation desync

Close window for possibility to capture allocator state and bluefs state
that are not in sync.

Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
(cherry picked from commit 8d052558bed4a9761c3b181253568a8686ee2df2)

3 years agoos/bluestore/bluefs: Fix sync compaction
Adam Kupczyk [Thu, 3 Mar 2022 14:39:00 +0000 (15:39 +0100)]
os/bluestore/bluefs: Fix sync compaction

Fixes problem with sync compaction (_rewrite_log_and_layout_sync).
There was a problem with not updating log_seq after compacting log.

It cause to stop _replay log right after first transaction.

... 20 bluefs _replay 0x0:  op_dir_create sharding
... 20 bluefs _replay 0x0:  op_dir_link  sharding/def to 21
... 20 bluefs _replay 0x0:  op_jump_seq 1025
... 10 bluefs _read h 0x555557c46400 0x1000~1000 from file(ino 1 size 0x1000 mtime 0.000000 allocated 410000 alloc_commit 410000 extents [1:0x1540000~410000])
... 20 bluefs _read left 0xff000 len 0x1000
... 20 bluefs _read got 4096
... 10 bluefs _replay 0x1000: stop: seq 1025 != expected 1026

This is a product of bluefs fine grain locks refactor.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 2f8e37064ca079c960929d7bb91e84fbf7f5cd47)

Conflicts:
src/test/objectstore/test_bluefs.cc
(cherry picked from commit 4fd98ce0359d6c3a36f08a3d87a78c3f0b65018d)

3 years agoosd: Modify OSD Fast-Shutdown to work safely
Gabriel BenHanokh [Mon, 7 Mar 2022 15:16:54 +0000 (17:16 +0200)]
osd: Modify OSD Fast-Shutdown to work safely

quiesce all activities and destage allocations to disk before killing the OSD

    1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager())
    2) skip service.prepare_to_stop() which can take as much as 10 seconds
    3) skip debug options in fast-shutdown
    4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD
    5) clear op_shardedwq queues, this is safe since we didn't started processing them
    6) stop timer
    7) drain osd_op_tp (no new items will be added)
    8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk
    9) skip _shutdown_cache() when we are in the middle of a fast-shutdown
    10) increase debug level on fast-shutdown
    11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests
    12) disable fsck-on-umount when running fast-shutdown
    13) add an option to increase debug level at fast-shutdown umount()
    14) set a time limit to fast-shutdown

    15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed
    16) Fix error message for qfsck (error was caused by PR https://github.com/ceph/ceph/pull/44563)

    17) make shutdown-timeout configurable

Fixes: https://tracker.ceph.com/issues/53266
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
(cherry picked from commit 9b2a64a5f6ea743b2a4f4c2dbd703248d88b2a96)

3 years agoosd: require osd_pg_max_concurrent_snap_trims > 0 45322/head
Dan van der Ster [Thu, 24 Feb 2022 08:42:00 +0000 (09:42 +0100)]
osd: require osd_pg_max_concurrent_snap_trims > 0

If osd_pg_max_concurrent_snap_trims is zero, we mistakenly clear
the snaptrim queue. Require it to be > 0.

Fixes: https://tracker.ceph.com/issues/54396
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit 29545b617b3b0324f9b0b20e032e3e38557115eb)

3 years agodoc/rados/operations/placement-groups: typo two 'the' 45321/head
Kamoltat [Wed, 9 Mar 2022 15:36:20 +0000 (15:36 +0000)]
doc/rados/operations/placement-groups: typo two 'the'

typo, should not have two 'the' next to each
other.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 7f1c7637e229c6a1bd69a4b204a56ae49c7fec44)

3 years agodoc/rados/operations/placement-groups: fix --bulk commands
Kamoltat [Mon, 7 Mar 2022 14:52:41 +0000 (14:52 +0000)]
doc/rados/operations/placement-groups: fix --bulk commands

Some parts of the documents regarding
the bulk flag  have typos.

Command for creating a pool

was: `ceph osd create test_pool --bulk`

should be: `ceph osd pool create test_pool --bulk`

Command for setting bulk value in a pool

was: `ceph osd pool set test_pool bulk=<true/false/1/0>`

should be: `ceph osd pool set test_pool bulk <true/false/1/0>`

Also removed a bit of trailing white spaces.

Changed `complements` to `complement`.

https://tracker.ceph.com/issues/54485

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit 4a01fc77985e5cf919b99eca86c4c7e8aae242f0)

3 years agoMerge pull request #45263 from idryomov/wip-cmake-disable-dpdk-warnings-quincy
Kefu Chai [Sun, 6 Mar 2022 08:05:52 +0000 (16:05 +0800)]
Merge pull request #45263 from idryomov/wip-cmake-disable-dpdk-warnings-quincy

quincy: cmake: pass RTE_DEVEL_BUILD=n when building dpdk

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 years agocmake: pass RTE_DEVEL_BUILD=n when building dpdk 45263/head
Kefu Chai [Sat, 5 Mar 2022 04:49:57 +0000 (12:49 +0800)]
cmake: pass RTE_DEVEL_BUILD=n when building dpdk

ceph is still using the Makefile based building system for building
DPDK. and DPDK enables -Werror if RTE_DEVEL_BUILD is 'y' which is
enabled by default when the dpdk is built from a git repo.

but newer GCC is more picky than the older versions, to prevent
the possible FTBFS when we switch to newer GCC for building old
branches whose dpdk submodule might be include the changes addressing
those warnings. let's just disable this option.

the only effect of this option is to add -Werror to CFLAGS. but
the building warnings from DPDK is not our focus when developing
Ceph in the most cases. so it should be fine.

see also
https://github.com/ceph/dpdk/blob/eac901ce29be559b1bb5c5da33fe2bf5c0b4bfd6/doc/build-sdk-quick.txt#L18

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 91a616b26e830e85962200d0bac86bee7e022892)

3 years agoMerge pull request #45145 from guits/wip-54401-quincy
Guillaume Abrioux [Fri, 4 Mar 2022 12:07:04 +0000 (13:07 +0100)]
Merge pull request #45145 from guits/wip-54401-quincy

quincy: ceph-volume: abort when passed devices have partitions

3 years agoMerge pull request #45232 from guits/wip-54454-quincy
Guillaume Abrioux [Fri, 4 Mar 2022 07:31:11 +0000 (08:31 +0100)]
Merge pull request #45232 from guits/wip-54454-quincy

quincy: ceph-volume: fix generic activate

3 years agoceph-volume: fix generic activate 45232/head
Guillaume Abrioux [Tue, 1 Mar 2022 23:38:17 +0000 (00:38 +0100)]
ceph-volume: fix generic activate

afd8be7eac5e996c3bd07656601a4534053e2516 broke it.
It has dropped`block_wal` and `block_db` from
`ceph_volume.devices.raw.activate.activate_bluestore` but
`activate.main.Activate.main` still passes those arguments when
calling `RAWActivate([]).activate()`

Fixes: https://tracker.ceph.com/issues/54441
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 3337b62e859673cba908bf8e12c7f3f23fddf2c2)

3 years agoMerge pull request #45165 from kotreshhr/quincy-mgr-volumes-backport
Yuri Weinstein [Thu, 3 Mar 2022 15:44:34 +0000 (07:44 -0800)]
Merge pull request #45165 from kotreshhr/quincy-mgr-volumes-backport

quincy: mgr/volumes:  A few mgr/volumes backports

Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agomgr/prometheus: added `avail_raw` field for Pools DF Prometheus mgr module 45237/head
Konstantin Shalygin [Mon, 6 Sep 2021 07:54:23 +0000 (14:54 +0700)]
mgr/prometheus: added `avail_raw` field for Pools DF Prometheus mgr module

Fixes: https://tracker.ceph.com/issues/52512
Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
(cherry picked from commit 3a78b9b6b8d8593ff3838b8efe630a210fd1a142)

3 years agoceph-volume: abort when passed devices have partitions 45145/head
Guillaume Abrioux [Wed, 23 Feb 2022 08:36:29 +0000 (09:36 +0100)]
ceph-volume: abort when passed devices have partitions

ceph-volume doesn't prevent from using db and/or wal devices
with existing partitions on them.
This can lead to a data loss situation.

Fixes: https://tracker.ceph.com/issues/54376
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 75c91a8c6f37a38d69d5da8b1e7d49d9c636230b)

3 years agoMerge pull request #45197 from rhcs-dashboard/cephadm-image-quincy
Yuri Weinstein [Tue, 1 Mar 2022 20:20:09 +0000 (12:20 -0800)]
Merge pull request #45197 from rhcs-dashboard/cephadm-image-quincy

quincy: cephadm: change ceph-ci image from master to quincy

Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Adam King adking@redhat.com
3 years agoMerge pull request #45134 from soumyakoduri/quincy
Casey Bodley [Tue, 1 Mar 2022 12:52:21 +0000 (07:52 -0500)]
Merge pull request #45134 from soumyakoduri/quincy

quincy: rgw/qa: Add test suite for lifecycle cases

Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 years agorgw/qa: Add test suite for lifecycle cases 45134/head
Soumya Koduri [Fri, 17 Dec 2021 11:32:37 +0000 (17:02 +0530)]
rgw/qa: Add test suite for lifecycle cases

Execute lifecycle s3-tests in the teuthology test-suite by configuring
required storage classes and 'rgw lc debug interval' option.

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
3 years agoMerge pull request #45091 from liumiaomiaoIntel/qatenable
Yuri Weinstein [Mon, 28 Feb 2022 15:49:57 +0000 (07:49 -0800)]
Merge pull request #45091 from liumiaomiaoIntel/qatenable

quincy: common: fix compilation and function issues about compressor and crypto to enable latest QAT driver

Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
3 years agoMerge pull request #45097 from vshankar/wip-54218
Yuri Weinstein [Mon, 28 Feb 2022 15:48:21 +0000 (07:48 -0800)]
Merge pull request #45097 from vshankar/wip-54218

quincy: mds: fix seg fault in expire_recursive

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Laura Flores <lflores@redhat.com>
Reviewed-by: Nikhilkumar Shelke <nshelke@redhat.com>
Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agocephadm: change ceph-ci image from master to quincy 45197/head
Nizamudeen A [Mon, 28 Feb 2022 13:01:36 +0000 (18:31 +0530)]
cephadm: change ceph-ci image from master to quincy

quincy image is available in the quay.io repo and we should use it for
the quincy branch for now atleast until v17 is released.

Signed-off-by: Nizamudeen A <nia@redhat.com>
3 years agoosd/scrub: stop sending bogus digest-update event messages 45193/head
Ronen Friedman [Thu, 17 Feb 2022 09:29:22 +0000 (09:29 +0000)]
osd/scrub: stop sending bogus digest-update event messages

A minimal change extracted from PR#44050, to facilitate
backporting.

The multitudes of bogus events generated fill up the logs.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
(cherry picked from commit e1b5347b81d17c8a5a1f6e1d4d76d18977ec2b0c)
Conflicts: the logic changes were already part of Quincy. Left is
  a removal of an unneeded log message.

3 years agoMerge remote-tracking branch 'gh/quincy' into quincy
David Galloway [Fri, 25 Feb 2022 16:26:38 +0000 (11:26 -0500)]
Merge remote-tracking branch 'gh/quincy' into quincy

3 years agoMerge pull request #45098 from vshankar/wip-54216
Laura Flores [Fri, 25 Feb 2022 15:26:56 +0000 (09:26 -0600)]
Merge pull request #45098 from vshankar/wip-54216

quincy: mds: kill session state are open when mds do ms_handle_remote_reset

3 years agoMerge pull request #45017 from Vicente-Cheng/wip-54196-quincy
Laura Flores [Fri, 25 Feb 2022 15:25:41 +0000 (09:25 -0600)]
Merge pull request #45017 from Vicente-Cheng/wip-54196-quincy

quincy: mds: mds_oft_prefetch_dirfrags default to false

3 years agomgr/volumes: Fix subvolumegroup ls 45165/head
Kotresh HR [Tue, 1 Feb 2022 11:06:34 +0000 (16:36 +0530)]
mgr/volumes: Fix subvolumegroup ls

The subvolumegroup ls listed '_deleting' directory which is
internal to 'mgr/volumes' and should not be listed as a
subvolumegroup. This patch fixes the same by filtering it.

Fixes: https://tracker.ceph.com/issues/54099
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit a125b0e8a22ba6c8d14f25fe85ce8d0771299c6c)

3 years agoqa: Add test for subvolumegroup ls filter
Kotresh HR [Tue, 1 Feb 2022 11:08:41 +0000 (16:38 +0530)]
qa: Add test for subvolumegroup ls filter

Fixes: https://tracker.ceph.com/issues/54099
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 95a83efcbe7b7faf2a292889e7c7ea7fcc629749)

3 years agomgr/volumes: Inherit file quota attr to clone
Kotresh HR [Fri, 11 Feb 2022 08:40:16 +0000 (14:10 +0530)]
mgr/volumes: Inherit file quota attr to clone

The file quota attribute 'ceph.quota.max_files'
is not inherited to the cloned subvolume. This
patch fixes the same.

Fixes: https://tracker.ceph.com/issues/54121
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 72671c8ead126fdbcb39a2f179c001fb1fe43fe5)

3 years agoqa: Validate file quota attrs on clone subvolume
Kotresh HR [Thu, 3 Feb 2022 06:01:48 +0000 (11:31 +0530)]
qa: Validate file quota attrs on clone subvolume

Fixes: https://tracker.ceph.com/issues/54121
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit 96c79634043b06ab8a2a1fc445998c8f64929aed)

3 years agomgr/volumes: Fix clone uid/gid mismatch
Kotresh HR [Thu, 10 Feb 2022 05:34:41 +0000 (11:04 +0530)]
mgr/volumes: Fix clone uid/gid mismatch

This is the regression caused by commit 18b85c53a.
The 'set_attrs' function sets the uid/gid of the
group to the subvolume if uid/gid is not passed.
The attrs of the clone should match the source
snapshot. Hence, don't use the 'set_attrs'
function to set only the quota attrs for the
clone.

Fixes: https://tracker.ceph.com/issues/54066
Signed-off-by: Kotresh HR <khiremat@redhat.com>
(cherry picked from commit b3c9e6b50cf4264538e4c41d19e7ebb8b2900c3a)

3 years ago17.1.0 v17.1.0
Jenkins Build Slave User [Thu, 24 Feb 2022 23:06:35 +0000 (23:06 +0000)]
17.1.0

3 years agoMerge pull request #45141 from sseshasa/wip-45118-45121-quincy
Laura Flores [Thu, 24 Feb 2022 20:19:11 +0000 (14:19 -0600)]
Merge pull request #45141 from sseshasa/wip-45118-45121-quincy

quincy: Combine backport of master PRs 45118 and 45121.

3 years agoMerge pull request #45129 from idryomov/wip-rbd-quincy-batch-4
Ilya Dryomov [Thu, 24 Feb 2022 16:43:49 +0000 (17:43 +0100)]
Merge pull request #45129 from idryomov/wip-rbd-quincy-batch-4

quincy: rbd backports (batch 4)

Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
3 years agomgr/devicehealth: skip null pages when extracting wear level 45141/head
Yaarit Hatuka [Tue, 22 Feb 2022 19:22:09 +0000 (19:22 +0000)]
mgr/devicehealth: skip null pages when extracting wear level

Some devices have null pages in their ata_device_statistics struct; skip
those pages in order to avoid an AttributeError when extracting device's
wear level.

Fixes: https://tracker.ceph.com/issues/51554
Signed-off-by: Yaarit Hatuka <yaarit@redhat.com>
(cherry picked from commit 2864ac30d4170ba7b5f60ae01ecfdeee707e026a)

3 years agoosd: Write non-zero data as part of osd benchmark test.
Sridhar Seshasayee [Tue, 22 Feb 2022 12:25:44 +0000 (17:55 +0530)]
osd: Write non-zero data as part of osd benchmark test.

An optimization (see PR: https://github.com/ceph/ceph/pull/43337) was made
in BlueStore to avoid writing bufferlists made up of zeros. The osd
benchmark used zero filled bufferlists and this resulted in inflated osd
benchmark results.

This issue is fixed by using bufferlists filled with non-zero values.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>
Fixes: https://tracker.ceph.com/issues/54364
(cherry picked from commit 09f94ace863255a7dd7075e269f8d7d63a398495)

3 years agorbd-mirror: make mirror properly detect pool replayer needs restart 45129/head
Mykola Golub [Fri, 18 Feb 2022 10:42:23 +0000 (10:42 +0000)]
rbd-mirror: make mirror properly detect pool replayer needs restart

When a PoolReplayer detects remote pool metadata change it
sets "stopping" flag expecting the Mirror will restart it.

Although setting "stopping" flag makes the PoolReplayer::run
thread to terminate, the thread's is_started function will still
return true until join is called (and reset the thread id).

This made impossible for the Mirror to detect (by calling
PoolReplayer::is_running) that the PoolReplayer needed restart.

Fixes: https://tracker.ceph.com/issues/54258
Signed-off-by: Mykola Golub <mgolub@suse.com>
(cherry picked from commit ad4a2990b87834fe4ae8c9111547d071aa6e75e5)

3 years agorbd-mirror: synchronize with in-flight stop in ImageReplayer::stop()
Ilya Dryomov [Sun, 20 Feb 2022 16:33:08 +0000 (17:33 +0100)]
rbd-mirror: synchronize with in-flight stop in ImageReplayer::stop()

Complete on_finish right away only if the replayer is stopped (meaning
that it is legible to be restarted immediately, possibly from on_finish
itself).  This is the behaviour pretty much anyone would assume and
also what ImageReplayer::restart() relies on.

Fixes: https://tracker.ceph.com/issues/54344
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8965a0f2a6f7bdbe732be94b1ee269cab5be0a2a)

3 years agorbd-mirror: turn m_on_stop_finish into a list of Contexts
Ilya Dryomov [Sun, 20 Feb 2022 16:11:28 +0000 (17:11 +0100)]
rbd-mirror: turn m_on_stop_finish into a list of Contexts

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 4ad31cd0583ebb695a9d84a35b9fc20ad9ec8585)

3 years agorbd-mirror: manual stop should take precedence over regular stop
Ilya Dryomov [Sun, 20 Feb 2022 12:11:02 +0000 (13:11 +0100)]
rbd-mirror: manual stop should take precedence over regular stop

Somewhat similar to commit 0a3794e56256 ("rbd-mirror: make stop
properly cancel restart"), make it so that a) if a manual stop is
joined to regular stop, the stop becomes manual and b) if a regular
stop is joined to a manual stop, the stop stays manual.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c5b5787349e91a0fd23cd6d5e73b2a383ddd8687)

3 years agorbd-mirror: straighten ImageReplayer::stop() a bit
Ilya Dryomov [Sat, 19 Feb 2022 15:43:04 +0000 (16:43 +0100)]
rbd-mirror: straighten ImageReplayer::stop() a bit

- don't default on_finish parameter
- m_restart_requested is set in ImageReplayer::restart() which is the
  only restart=true call site, so setting m_restart_requested here is
  redundant
- is_stopped_() can't be true in is_running_() branch
- on_finish->complete(0) in the end is unreachable

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 219c500977bbfbcfe4ccd24beb294edbe0562d35)

3 years agocommon: replace BitVector::NoInitAllocator with wrapper struct
Casey Bodley [Tue, 15 Feb 2022 23:27:10 +0000 (18:27 -0500)]
common: replace BitVector::NoInitAllocator with wrapper struct

in c++20, the deprecated `struct std::allocator<T>::rebind` template was
removed, so `BitVector` no longer compiles. without a `rebind` to
inherit, `std::allocator_traits<NoInitAllocator>::rebind_alloc<U>` was
looking for `NoInitAllocator<U>`, but it isn't a template class

further investigation found that in c++17, `vector<__u32, NoInitAllocator>`
was rebinding this `NoInitAllocator` to `std::allocator<__u32>` and
preventing the no-init optimization from taking effect

instead of messing with the allocator to avoid zero-initialization, wrap
each __u32 in a struct whose constructor does not initialize the value

Fixes: https://tracker.ceph.com/issues/54279
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 4f0ad8aab6b21a1fd57a7c1630d298e31b5d9bb6)

3 years agoqa/suites/krbd: add rbd_default_map_options override coverage
Christopher Hoffman [Wed, 9 Feb 2022 20:28:19 +0000 (20:28 +0000)]
qa/suites/krbd: add rbd_default_map_options override coverage

Add coverage to test precedence, override, and option merge on rbd map.

Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 19d46b918169601afe9eb834a2361fb015048c67)

3 years agoqa/suites/krbd: rename rxbounce subsuite
Ilya Dryomov [Fri, 18 Feb 2022 16:06:42 +0000 (17:06 +0100)]
qa/suites/krbd: rename rxbounce subsuite

A new job that doesn't want ms_mode to be set underneath it is about to
be added.  Rename rxbounce to ms_modeless to make this purpose obvious.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 7f391c5688105e55f7799a9d45721ec49531747d)

3 years agorbd: support pool and image level overrides for rbd_default_map_options
Christopher Hoffman [Fri, 4 Feb 2022 21:25:53 +0000 (21:25 +0000)]
rbd: support pool and image level overrides for rbd_default_map_options

Fixes: https://tracker.ceph.com/issues/52850
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
(cherry picked from commit 9afc9712824a92fd6bdb2574c5880ab835236ed1)

3 years agoMerge pull request #45079 from aclamk/wip-54318-quincy
Yuri Weinstein [Tue, 22 Feb 2022 22:14:30 +0000 (14:14 -0800)]
Merge pull request #45079 from aclamk/wip-54318-quincy

quincy: os/bluestore/bluefs: Fix improper vselector tracking in _flush_special()

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #45108 from rhcs-dashboard/wip-dashboard-quincy-backports3
Yuri Weinstein [Tue, 22 Feb 2022 22:11:24 +0000 (14:11 -0800)]
Merge pull request #45108 from rhcs-dashboard/wip-dashboard-quincy-backports3

quincy: mgr/dashboard: 3rd (and hopefully last) backport batch

Reviewed-by: Nizamudeen A <nia@redhat.com>
3 years agoMerge pull request #45092 from ljflores/wip-54326-quincy
Laura Flores [Tue, 22 Feb 2022 20:51:11 +0000 (14:51 -0600)]
Merge pull request #45092 from ljflores/wip-54326-quincy

quincy: mgr/telemetry: handle empty device report when "send" is triggered

3 years agoMerge pull request #45074 from cbodley/wip-54162
Yuri Weinstein [Tue, 22 Feb 2022 19:36:28 +0000 (11:36 -0800)]
Merge pull request #45074 from cbodley/wip-54162

quincy: rgw: fix segfault in OpsLogRados::log when realm is reloaded

Reviewed-by: Cory Snyder <csnyder@iland.com>
3 years agoMerge pull request #45061 from soumyakoduri/quincy
Yuri Weinstein [Tue, 22 Feb 2022 19:33:57 +0000 (11:33 -0800)]
Merge pull request #45061 from soumyakoduri/quincy

quincy: rgw/dbstore: Add dbstore-tests to `make check`

Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 years agomgr/dashboard: add validation for snmp v3 engine id 45108/head
Avan Thakkar [Tue, 15 Feb 2022 13:13:36 +0000 (18:43 +0530)]
mgr/dashboard: add validation for snmp v3 engine id

Fixes: https://tracker.ceph.com/issues/54270
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit 2866db1eac7d726201f5bb34abdb32981c783f0e)

3 years agomgr/dashboard: change privacy protocol field from required to optional
Avan Thakkar [Mon, 14 Feb 2022 12:18:39 +0000 (17:48 +0530)]
mgr/dashboard: change privacy protocol field from required to optional

Fixes: https://tracker.ceph.com/issues/54270
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Privacy protocol field shouldn't be a required field.

(cherry picked from commit 2d8f2b8195a0f0c7a21d4ec5061b1b51a3aade2c)

3 years agomgr/dashboard: Contact Info should be visible only when Ident channel is checked
Sarthak0702 [Wed, 16 Feb 2022 12:45:35 +0000 (18:15 +0530)]
mgr/dashboard: Contact Info should be visible only when Ident channel is checked

Fixes:https://tracker.ceph.com/issues/54133
Signed-off-by: Sarthak0702 <sarthak.0702@gmail.com>
(cherry picked from commit 15211a6378a6fee9316f79ba0b27821891527c38)

3 years agomgr/dashboard: dashboard turns telemetry off when configuring report
Sarthak0702 [Thu, 10 Feb 2022 19:50:42 +0000 (01:20 +0530)]
mgr/dashboard: dashboard turns telemetry off when configuring report

Signed-off-by: Sarthak0702 <sarthak.0702@gmail.com>
(cherry picked from commit 97c57adf8565756dbf24f3c46ed3916303903fb7)

3 years agomgr/dashboard: "Please expand your cluster first" shouldn't be shown if cluster is...
Volker Theile [Wed, 9 Feb 2022 08:37:48 +0000 (09:37 +0100)]
mgr/dashboard: "Please expand your cluster first" shouldn't be shown if cluster is already meaningfully running

This PR will assume that a cluster is already up and fully running. If this should not be the expected behaviour, deployment tools have to set 'INSTALLED' explicitly. Without this assumption it might happen that upgraded and fully running clusters, e.g. Octopus -> Pacific, will show the 'Expand Cluster' on first log in.

cephadm will take care that the bootstrap phase will write the necessary key to show the 'Expand cluster' page.

Fixes: https://tracker.ceph.com/issues/54215
Signed-off-by: Volker Theile <vtheile@suse.com>
(cherry picked from commit 48fff60b63785ec07f71d3e59394b0c08357247c)

3 years agomds: kill session when mds do ms_handle_remote_reset 45098/head
IvanGuan [Tue, 18 Jan 2022 13:01:59 +0000 (21:01 +0800)]
mds: kill session when mds do ms_handle_remote_reset

if the mds decide to reuse the old connection it will
do reset_session and should  also kill the session
which are open state in MDSDaemon::ms_handle_remote_reset
to prevent the situation client session is stuck in
opening state and never has chance to becaome open.

the root cause is client missed the request_open
reply but the mds session has become open already.
so we should kill the session in mds side and let
mds recreate the session when received the connect
request from client.

Fixes: http://tracker.ceph.com/issues/53911
Signed-off-by: YunfeiGuan <yunfeiguan@xtaotech.com>
(cherry picked from commit 3651deb4e0b0c102adcaddce79ee4e053f033418)

3 years agomds: fix seg fault in expire_recursive 45097/head
胡玮文 [Thu, 6 Jan 2022 07:43:29 +0000 (15:43 +0800)]
mds: fix seg fault in expire_recursive

Range-based for should not be used when we are altering the container.
Use iterator explicitly instead.

Fixes: https://tracker.ceph.com/issues/53805
Signed-off-by: 胡玮文 <huww98@outlook.com>
(cherry picked from commit d48a2cf7e2481cf9758f2934464ec6d9c35d898b)

3 years agomgr/telemetry: handle empty device report when "send" is triggered 45092/head
Laura Flores [Fri, 11 Feb 2022 19:37:26 +0000 (19:37 +0000)]
mgr/telemetry: handle empty device report when "send" is triggered

On certain environments, such as the "ceph-dev-docker" environment
(https://github.com/ricardoasmarques/ceph-dev-docker), the mgr
module is unable to fetch device metrics. As a result, the device
report generated by "gather_device_report()" returns an empty dict.
This causes an AssertionError when the "send" function is triggered
(i.e. by running `ceph telemetry status` or `ceph telemetry send`),
and the module crashes.

The fix in this commit checks that the generated device report
contains metrics before trying to send it. If the device report
does not contain metrics (it returns an empty dict), the module
will log an appropriate message in the mgr log and not send the
device report.

If this scenario happens when running the `ceph telemetry send` command,
the user will additionally see this message:
```
Ceph report sent to https://telemetry.ceph.com/report
Unable to send device report: channel is on, but generated report was empty.
```

I also added a few more debug messages in gather_device_report() to make
future debugging easier.

Fixes: https://tracker.ceph.com/issues/54250
Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 54e0e58f1b3f431281df0e2dd2b258f85cbade19)

3 years agocmake: replace BuildQatDrv.cmake with FindQatDrv.cmake 45091/head
Miaomiao Liu [Fri, 14 Jan 2022 05:40:11 +0000 (13:40 +0800)]
cmake: replace BuildQatDrv.cmake with FindQatDrv.cmake

because QAT driver with version v1.7.l.4.14.0 or higher cannot be dowmloaded
directly by URL, FindQatDrv.cmake can find the locally installed QAT package and libraries

Signed-off-by: Miaomiao Liu <miaomiao.liu@intel.com>
Signed-off-by: Hualong Feng <hualong.feng@intel.com>
(cherry picked from commit 082dcf0a58b47946cc1375a7e8336b261d499f64)

3 years agocrypto/qat: fix issues about QAT based Encryption for RGW
Miaomiao Liu [Fri, 14 Jan 2022 05:44:15 +0000 (13:44 +0800)]
crypto/qat: fix issues about QAT based Encryption for RGW

update the librares usage and add a namespace before ostream

Fixes: https://tracker.ceph.com/issues/54059
Signed-off-by: Miaomiao Liu <miaomiao.liu@intel.com>
Signed-off-by: Hualong Feng <hualong.feng@intel.com>
(cherry picked from commit 6b879a74782b5ff3a7cba999bf583e1a92578787)

3 years agocompressor: replace snappy and lz4 compressors with zlib for QAT based compression
Miaomiao Liu [Wed, 19 Jan 2022 07:27:07 +0000 (15:27 +0800)]
compressor: replace snappy and lz4 compressors with zlib for QAT based compression

current QAT hardware only supports zlib compressor

Signed-off-by: Miaomiao Liu <miaomiao.liu@intel.com>
Signed-off-by: Hualong Feng <hualong.feng@intel.com>
(cherry picked from commit 47fd35d52e3c755df243d04f30e566b98e793f4b)

3 years agocompressor: fix compilation issues about QATzip
Miaomiao Liu [Wed, 19 Jan 2022 07:27:07 +0000 (15:27 +0800)]
compressor: fix compilation issues about QATzip

Signed-off-by: Miaomiao Liu <miaomiao.liu@intel.com>
Signed-off-by: Hualong Feng <hualong.feng@intel.com>
(cherry picked from commit 9a9001a08fdc05361057e7880dac98210fffe1fc)

3 years agoMerge pull request #45058 from idryomov/wip-rbd-quincy-batch-3
Ilya Dryomov [Fri, 18 Feb 2022 15:31:30 +0000 (16:31 +0100)]
Merge pull request #45058 from idryomov/wip-rbd-quincy-batch-3

quincy: rbd backports (batch 3)

Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
3 years agoos/bluestore/bluefs: Fix vselector 45079/head
Adam Kupczyk [Tue, 15 Feb 2022 22:13:59 +0000 (23:13 +0100)]
os/bluestore/bluefs: Fix vselector

Fix bluefs volume selector in device_migrate_to_existing.
Fix bluefs volume selector in _rewrite_log_and_layout_sync_LNF_LD.

Fixes: https://tracker.ceph.com/issues/54248
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 3813416e6a8d296312271598e823f876a09b2504)

3 years agoos/bluestore/bluefs: Fix improper vselector tracking in _flush_special()
Adam Kupczyk [Wed, 9 Feb 2022 15:19:56 +0000 (16:19 +0100)]
os/bluestore/bluefs: Fix improper vselector tracking in _flush_special()

Moves vselector size tracking outside _flush_special().
Function _compact_log_async...() updated sizes twice.
Problem could not be solved by making second modification of size just update,
as it will possibly disrupt vselector consistency check (_vselector_check()).
Feature to track vselector consistency relies on the fact that either log.lock or nodes.lock
are taken when the check is performed. Which is not true for _compact_log_async...().

Now _flush_special does not update vselector sizes by itself but leaves the update to
the caller.

Fixes: https://tracker.ceph.com/issues/54248
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 4bc0f61d23299724fad2d8e6f2858734f1db6e5a)

3 years agorgw: fix segfault in OpsLogRados::log when realm is reloaded 45074/head
Cory Snyder [Thu, 3 Feb 2022 19:48:05 +0000 (14:48 -0500)]
rgw: fix segfault in OpsLogRados::log when realm is reloaded

We weren't previously handling the deallocation of the store when
a realm was reloaded. Now passing a const reference to the pointer.

Fixes: https://tracker.ceph.com/issues/54130
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit 0713f65355586b2f6ceeb6bbce8763158847e5ed)

3 years agoMerge pull request #45038 from guits/bkp-quincy-cephadm-ingress-fix
Yuri Weinstein [Wed, 16 Feb 2022 21:18:37 +0000 (13:18 -0800)]
Merge pull request #45038 from guits/bkp-quincy-cephadm-ingress-fix

quincy: cephadm/ingress: make frontend stat bind on localhost

Reviewed-by: Adam King adking@redhat.com
3 years agoMerge pull request #45043 from kamoltat/wip-ksirivad-backport-quincy-44588
Yuri Weinstein [Wed, 16 Feb 2022 20:24:29 +0000 (12:24 -0800)]
Merge pull request #45043 from kamoltat/wip-ksirivad-backport-quincy-44588

quincy: pybind/mgr/progress: disable pg recovery event by default

Reviewed-by: Neha Ojha <nojha@redhat.com>
3 years agoMerge pull request #45030 from ljflores/wip-quincy-basic-channel-additions
Yuri Weinstein [Wed, 16 Feb 2022 20:23:13 +0000 (12:23 -0800)]
Merge pull request #45030 from ljflores/wip-quincy-basic-channel-additions

quincy: mgr/telemetry: add basic_pool_usage and basic_usage_by_class collections to the telemetry module

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
3 years agoMerge pull request #45029 from ljflores/wip-54274-quincy
Yuri Weinstein [Wed, 16 Feb 2022 20:22:42 +0000 (12:22 -0800)]
Merge pull request #45029 from ljflores/wip-54274-quincy

quincy: mgr/telemetry: collect what we can from histograms, mempools, and heap stats

Reviewed-by: Yaarit Hatuka <yaarit@redhat.com>
3 years agoMerge pull request #44982 from batrick/i54234
Yuri Weinstein [Wed, 16 Feb 2022 20:22:07 +0000 (12:22 -0800)]
Merge pull request #44982 from batrick/i54234

quincy: qa: use cephadm to provision cephfs for fs:workloads

Reviewed-by: Adam King adking@redhat.com
3 years agoMerge pull request #44952 from aclamk/wip-54209-quincy
Yuri Weinstein [Wed, 16 Feb 2022 20:21:23 +0000 (12:21 -0800)]
Merge pull request #44952 from aclamk/wip-54209-quincy

quincy: [BlueStore] Fix problem with volume selector

Reviewed-by: Igor Fedotov <ifedotov@suse.com>
3 years agorgw/dbstore: Add dbstore-tests to `make check` 45061/head
Soumya Koduri [Tue, 8 Feb 2022 10:02:16 +0000 (15:32 +0530)]
rgw/dbstore: Add dbstore-tests to `make check`

Include and run dbstore-tests as part of `make check` target

Signed-off-by: Soumya Koduri <skoduri@redhat.com>
3 years agoqa/suites/rbd: make sure block-rbd.so is installed 45058/head
Ilya Dryomov [Wed, 16 Feb 2022 09:32:26 +0000 (10:32 +0100)]
qa/suites/rbd: make sure block-rbd.so is installed

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 8f0fd0af3da8581c47dc916303615264714a0489)

3 years agoqa/tasks/qemu: make sure block-rbd.so is installed
Ilya Dryomov [Tue, 15 Feb 2022 13:57:51 +0000 (14:57 +0100)]
qa/tasks/qemu: make sure block-rbd.so is installed

Fixes: https://tracker.ceph.com/issues/54286
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 525ff61cfc8516b4d7bed6f819b00a0b6cb7be0a)

3 years agocls/rbd: GroupSnapshotNamespace comparator violates ordering rules
Ilya Dryomov [Mon, 14 Feb 2022 12:04:00 +0000 (13:04 +0100)]
cls/rbd: GroupSnapshotNamespace comparator violates ordering rules

For

  GroupSnapshotNamespace a(1, "group-1", "snap-2");
  GroupSnapshotNamespace b(1, "group-2", "snap-1");

both a < b and b < a evaluate to true.  This violates STL strict weak
ordering requirements which is a problem because GroupSnapshotNamespace
is used as a key in std::map (ictx->snap_ids at least), etc.

Fixes: https://tracker.ceph.com/issues/49792
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 830e72ab9d66c8f5703ea27da5249b02dd16ccd0)

3 years agorbd: mark optional positional arguments as such in help output
Ilya Dryomov [Tue, 8 Feb 2022 09:11:49 +0000 (10:11 +0100)]
rbd: mark optional positional arguments as such in help output

Currently at least five commands have optional positional arguments.

Overloading po::value<std::string>()->default_value("") for this
is a bit sneaky but nothing better fits into the existing Shell.cc
framework.

Note that strictly speaking "[<interval>] [<start-time>]" should be
"[<interval> [<start-time>]]" but we aren't doing that here because
"ceph" command doesn't do it either.

Fixes: https://tracker.ceph.com/issues/54191
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit cb0df397aae552adc80713ca0d59ed1ebfd3b1be)

3 years agoMerge pull request #44909 from batrick/i54160
Yuri Weinstein [Tue, 15 Feb 2022 16:20:03 +0000 (08:20 -0800)]
Merge pull request #44909 from batrick/i54160

quincy: mon/MDSMonitor: sanity assert when inline data turned on in MDSMap from v16.2.4 -> v16.2.[567]

Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agoMerge pull request #44875 from kotreshhr/wip-54123-quincy
Yuri Weinstein [Tue, 15 Feb 2022 16:19:12 +0000 (08:19 -0800)]
Merge pull request #44875 from kotreshhr/wip-54123-quincy

quincy: mgr/volumes: Fix subvoume snapshot clone failure

Reviewed-by: Venky Shankar vshankar@redhat.com
3 years agopybind/mgr/progress: disable pg recovery event by default 45043/head
Kamoltat [Fri, 14 Jan 2022 02:44:16 +0000 (02:44 +0000)]
pybind/mgr/progress: disable pg recovery event by default

The progress module disabled the pg recovery event by default
since the event is expensive and has interrupted other serviceis
when there is OSDs being marked in/out from the the cluster.

To turn the event on manually:

ceph config set mgr mgr/progress/allow_pg_recovery_event true

Updated qa/tasks/mgr/test_progress.py to enable
the pg recovery event when testing the progress module.

Signed-off-by: Kamoltat <ksirivad@redhat.com>
(cherry picked from commit f06da20dff141dc239900f944001d55fb8296014)

3 years agoMerge pull request #44899 from rhcs-dashboard/wip-dashboard-quincy-backports2
Ernesto Puerta [Tue, 15 Feb 2022 11:32:06 +0000 (12:32 +0100)]
Merge pull request #44899 from rhcs-dashboard/wip-dashboard-quincy-backports2

quincy: dashboard: 2nd backport batch

Reviewed-by: Nizamudeen A <nia@redhat.com>
3 years agocephadm/ingress: make frontend stat bind on localhost 45038/head
Guillaume Abrioux [Fri, 11 Feb 2022 16:39:18 +0000 (17:39 +0100)]
cephadm/ingress: make frontend stat bind on localhost

The current configuration of keepalived makes it do
a curl on localhost:9999 in order to check the endpoint is alive.
Given the endpoint only binds on the vip addr, that doesn't work.

Fixes: https://tracker.ceph.com/issues/53807
Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit ff482da6cb3a62b14f3a06e2d558876eabebfe65)

3 years agomgr/telemetry: separate device class usage statistics into their own collection 45030/head
Laura Flores [Tue, 8 Feb 2022 00:45:02 +0000 (00:45 +0000)]
mgr/telemetry: separate device class usage statistics into their own collection

The new collection is called `basic_usage_by_class`. This info should be separate
from `basic_pool_usage` since it doesn't involve pool statistics.

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit f69cec5b708ce71083d16d9976cf7e6b20f090d2)

3 years agomgr/telemetry: update `basic_pool_usage` collection desc
Laura Flores [Tue, 8 Feb 2022 00:42:37 +0000 (00:42 +0000)]
mgr/telemetry: update `basic_pool_usage` collection desc

- Added the word "default" since we are only collecting
default pool applications

- Removed the word "data" since we are actually collecting
usage *statistics*

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit c71a54ec1ab804de8408bdf39fe8727192d23492)

3 years agodoc/mgr: update telemetry doc to reflect `basic_pool_usage` collection
Laura Flores [Wed, 2 Feb 2022 23:08:53 +0000 (23:08 +0000)]
doc/mgr: update telemetry doc to reflect `basic_pool_usage` collection

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 4a2b54c1f2f2b58784d118011c3bd407281123ff)

3 years agomgr/telemetry: fix perf channel to screen out non-default pool applications
Laura Flores [Wed, 2 Feb 2022 23:08:11 +0000 (23:08 +0000)]
mgr/telemetry: fix perf channel to screen out non-default pool applications

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 7467ed59aceb696e6682081dd03bbe8e9cccf789)

3 years agomgr/telemetry: add `stats_by_class` to the `basic_pool_usage` collection
Laura Flores [Wed, 2 Feb 2022 23:06:43 +0000 (23:06 +0000)]
mgr/telemetry: add `stats_by_class` to the `basic_pool_usage` collection

Any device classes that are not default ('hdd', 'ssd', 'nvme') are screened out.

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 285d14457c157a3e4dfd12363e0ba02b8add57fa)

3 years agomgr/telemetry: add df stats to the `basic_pool_usage` collection
Laura Flores [Wed, 2 Feb 2022 23:02:01 +0000 (23:02 +0000)]
mgr/telemetry: add df stats to the `basic_pool_usage` collection

The `df` stats under `pools` indicate data usage for each pool.
The `kb_bytes` field is screened out since it is redundant.

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit ee63d624ba395dacd9f9c0ff59a989589448eab8)

3 years agomgr/telemetry: create `basic_pool_usage` collection
Laura Flores [Wed, 2 Feb 2022 22:57:38 +0000 (22:57 +0000)]
mgr/telemetry: create `basic_pool_usage` collection

Here, I define the `basic_pool_usage` collection and add
pool application under the basic channel. I screen out
any applications that are not default.

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 1f571cd4251422f03f40401c3f9163d716d2b6e4)

3 years agomgr/telemetry: compare len(values) to len(categories) 45029/head
Laura Flores [Wed, 2 Feb 2022 23:57:23 +0000 (23:57 +0000)]
mgr/telemetry: compare len(values) to len(categories)

This format will allow us to safely add or remove
categories as needed in the future.

Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 5ac1dd6866287d9e5bc895a4028836d09c836069)

3 years agomgr/telemetry: collect what we can from heap stats, mempools, and osd histograms
Laura Flores [Mon, 24 Jan 2022 03:19:50 +0000 (21:19 -0600)]
mgr/telemetry: collect what we can from heap stats, mempools, and osd histograms

If we run into a problem collecting heap stats, mempools,
or osd histograms from a particular osd (i.e. the osd is down),
we should continue to collect what we can from other osds rather
than exiting and returning an empty JSON object.

Some log messages are also refined.

Fixes: https://tracker.ceph.com/issues/53985
Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit c617b78f7bb589314b3c377496a9bb3914cbb2ba)

3 years agoMerge pull request #44972 from guits/wip-54243-quincy
Guillaume Abrioux [Mon, 14 Feb 2022 13:50:18 +0000 (14:50 +0100)]
Merge pull request #44972 from guits/wip-54243-quincy

quincy: ceph-volume: honour osd_dmcrypt_key_size option

3 years agomds: mds_oft_prefetch_dirfrags default to false 45017/head
Dan van der Ster [Wed, 19 Jan 2022 14:17:15 +0000 (15:17 +0100)]
mds: mds_oft_prefetch_dirfrags default to false

The oft dirfrag prefetch is unpredictable and dangerous because it can
delay rejoin by several minutes and balloon memory usage well beyond the
mds_cache_memory_limit. In the worst cases this can cause an OOM loop if
the memory required for prefetch exceeds the physical memory.

The PR which introduced mds_oft_prefetch_dirfrags also optimized the
client behaviour to eliminate the bad effects of disabling dirfrags
prefetch. And commit d3946e36f89203bd5c7f51c3a73d1a17a4d19863 has been
testing setting this option to false.

We therefore should default the option to false. Operators can still
manually enable it if they know it can speedup their use-cases.

Related-to: https://tracker.ceph.com/issues/45835
Fixes: https://tracker.ceph.com/issues/53952
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit e19ef199c94406afd220e574e24d52c03d9b67a3)

3 years agoceph-volume/activate: load the config from lv tag 44972/head
Guillaume Abrioux [Thu, 10 Feb 2022 01:23:51 +0000 (02:23 +0100)]
ceph-volume/activate: load the config from lv tag

When `ceph-volume lvm trigger` is called with an OSD where the tag
`ceph.cluster_name` is not 'ceph', it fails.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 5ac1ec65cb2a582b2ae550202cc9911f993943f2)

3 years agoceph-volume/tests: use centos/stream8 images
Guillaume Abrioux [Wed, 9 Feb 2022 17:33:27 +0000 (18:33 +0100)]
ceph-volume/tests: use centos/stream8 images

Since recent move from CentOS 8 to CentOS Stream 8, let's do the same here.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit 2b793952bbac7973b97d245c282165daadeabb51)

3 years agoceph-volume/tests: add tests in util/encryption.py
Guillaume Abrioux [Wed, 9 Feb 2022 16:04:19 +0000 (17:04 +0100)]
ceph-volume/tests: add tests in util/encryption.py

this adds some unit tests in order to cover `luks_format()` and `luks_open()`
in `util/encryption.py`.

Signed-off-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit db48850745f218e08cf53ae2d8edf3428f2b4010)