John Mulligan [Sun, 22 Oct 2023 12:14:24 +0000 (08:14 -0400)]
cephadm: add deployment test for osd
Add a deployment test case for OSD. OSD has some special properties that
we have extra assertions for.
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:27:13 +0000 (16:27 -0400)]
cephadm: add test assertions for nvmeof options, mount
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:29:38 +0000 (16:29 -0400)]
cephadm: add test assertions for iscsi options, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:20:10 +0000 (16:20 -0400)]
cephadm: add test assertions for keepalived options, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:16:32 +0000 (16:16 -0400)]
cephadm: add test assertion for haproxy options, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:12:40 +0000 (16:12 -0400)]
cephadm: add test assertions for nfs env vars, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:32:15 +0000 (16:32 -0400)]
cephadm: add assertions for monitoring options, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:35:32 +0000 (16:35 -0400)]
cephadm: add test assertion for snmp env file option
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:10:32 +0000 (16:10 -0400)]
cephadm: add a test assertion for tracing env var
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:08:25 +0000 (16:08 -0400)]
cephadm: add test assertions for ceph mgr entrypoint, mounts
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 20:04:48 +0000 (16:04 -0400)]
cephadm: add test assertions for unlimited pids option
Ensure that future changes continue to set/not set option as
appropriate.
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 21 Oct 2023 19:53:34 +0000 (15:53 -0400)]
cephadm: assert that ceph specific env vars get set
Assert that ceph based services set a tcmalloc related env var.
Also assert that a few assorted services do not set the env var.
Part of a series of commits to increase coverage of deployment
path features with regards to container engine options, env vars
and mounts. This will serve future refactoring efforts.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Patrick Donnelly [Fri, 20 Oct 2023 13:06:41 +0000 (09:06 -0400)]
Merge PR #54042 into main
* refs/pull/54042/head:
cmake: populate liburing include and library paths down to rocksdb external project
cmake: promote uring package search to top-level
Venky Shankar [Fri, 20 Oct 2023 09:14:27 +0000 (14:44 +0530)]
Merge PR #53839 into main
* refs/pull/53839/head:
qa: enhance test cases
mds: erase clients getting evicted from laggy_clients
mds: report clients laggy due laggy OSDs only after checking any OSD is laggy
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Co-authored-by: Zac Dover <zac.dover@proton.me> Signed-off-by: Juan Miguel Olmo Martínez <jolmomar@ibm.com>
tanchangzhi [Tue, 17 Oct 2023 08:48:51 +0000 (16:48 +0800)]
doc: Update mClock QOS documentation to discard osd_mclock_cost_per_*
The cost parameters (osd_mclock_cost_per_*) have been removed.
The cost of an operation is now determined using the random IOPS
and maximum sequential bandwidth capability of the OSD's underlying device.
At some point the debug builds for wip branches no longer had the .git
directory available so the Debug build type was unset. This meant we are
no longer doing numerous checks (like mutex ownership checks) that we
would normally be doing in the qa suite.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Ilya Dryomov [Thu, 12 Oct 2023 17:03:10 +0000 (19:03 +0200)]
pybind/rbd: don't produce info on errors in aio_mirror_image_get_info()
Check completion return value before attemting to decode c_info.
Otherwise we are guaranteed to access invalid memory in decode_cstr()
while trying to compute global_id string length when the client is
blocklisted for example.
In the monitors we hold 2 copies of disallowed_leader ...
1. MonMap class 2. Elector class.
When computing the ConnectivityScore for the monitors during
the election, we use the `disallowed_leader` from Elector
class to determine which monitors we shouldn't allow to lead.
Now, we rely on the function `set_elector_disallowed_leaders`
to set the `disallowed_leader` of the Elector class, MonMap
class copy of the `disallowed_leader` contains the
`tiebreaker_monitor` so we inherit that plus we also add the
monitors that are dead due to a zone failure.
Hence, the `adding dead monitors` phase is only allowed if we can
enter stretch_mode. However, there is a problem when failing over a stretch cluster
zone and reviving the entire zone back up, the revived monitors
couldn't enter stretch_mode when they are at the state of "probing"
since PaxosServices like osdmon becomes unreadable (this is expected)
Solution:
We unconditionally add monitors that are in
`monmap->stretch_marked_down_mons` to the
`disallowed_leaders` list in
`Monitor::set_elector_disallowed_leaders` since
if the monitors are in `monmap->stretch_marked_down_mons`
we know that they probably belong in a marked down
zone and is not fit for lead.
This will fix the problem of newly revived monitors
having different disallowed_leaders set
and getting stuck in election.
this structure should be created at the frontend and trickle all the way
to the RADOS layer. holding: dout prefix, optional yield and trace.
in this commit, so far it was only added to the "complete()" sal interface,
and to the "write_meta()" rados interface.
in the future, it should be added to more sal interfaces, replacing the
current way where dpp and optional yield are passed as sepearte
arguments to all functions.
in addition, if more information would be needed, it should be possible
to add that information to the request context struct without changing
many function prototypes
mgr/volumes: fix `subvolume group rm` error message
Currently, if we try to delete subvolumegroup using `fs subvolumegroup rm`
when there's one or more subvolume(s) present under that subvolumegroup we
see the error something like :
`Error ENOTEMPTY: error in rmdir /volumes/group1`
which causes confusion. Make it more descriptive
Adam King [Tue, 10 Oct 2023 16:42:57 +0000 (12:42 -0400)]
mgr/cephadm: add unit test to for upgrade check with --ceph-version
This is actually meant to make sure we don't screw
up the image base. See https://tracker.ceph.com/issues/63150
to see what we're trying to avoid happening again