librbd: make group and group snapshot IDs more random
Image IDs suffered from the same issue -- it was addressed in commit be8373688c1b ("librbd: block_name_prefix is not created randomly").
The code for generating group IDs is duplicated in api/Group.cc and
got missed.
Instead of cut-and-pasting the fix, just call generate_image_id()
directly and rename variables for more explicitness.
Adam King [Tue, 23 Apr 2024 16:04:39 +0000 (12:04 -0400)]
doc/cephadm: remove downgrade reference from upgrade docs
This has been in here for years, but cephadm will block
attempted upgrades to lower versions and we generally
don't want people to think this is supported or safe.
Remove references to dual-stack mode in
doc/rados/configuration/network-config-ref.rst and
doc/rados/configuration/msgr2.rst. This feature seems to have been
planned but never to have been completely implemented.
See the tracker issue listed below for an email exchange detailing the
confusion caused by the presence in the documentation of this
now-removed information.
Nizamudeen A [Tue, 26 Sep 2023 16:08:51 +0000 (21:38 +0530)]
mgr/dashboard: start using alertmanager v2
I was looking into sorting the alerts and saw there is an api v2 for
alertmanager which also has an endpoint like `alerts/groups` which might
be something that is useful for us.
Pierre Riteau [Mon, 22 Apr 2024 09:28:53 +0000 (11:28 +0200)]
doc/rados: fix outdated value for ms_bind_port_max
The highest port number used by OSD or MDS daemons was increased from
7300 to 7568 in [1] but the documentation still refers to 7300 in
multiple locations.
[1] https://github.com/ceph/ceph/pull/42210
Fixes: https://tracker.ceph.com/issues/65609 Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
(cherry picked from commit 23d2740241af2118652fef6e7d6a286f338a18f2)
Incorporate the material in /doc/rados/operations/pg-repair into
/doc/rados/troubleshooting/troubleshooting-pg. Remove
/doc/rados/operations/pg-repair from the documentation. Redirect all
links to the old location to the new location.
Replace the ".. graphviz" directive with an ".. image" directive that
correctly displays an image where previously an unusably zoomed-in image
appeared.
Rishabh Dave [Thu, 18 Apr 2024 08:59:15 +0000 (14:29 +0530)]
qa/vstart_runner: increase timeout for vstart.sh command
Since the timeout bug was fixed (https://tracker.ceph.com/issues/65533)
"Ceph API tests" sometimes fails because vstart.sh command had to be
aborted due to timeout.
Currently, "timeout" is set to 300 seconds which sometimes is not enough
for vstart.sh to run successfully for "Ceph API tests" CI job. 180
seconds usually suffices for vstart.sh to run successfully when used for
CephFS.
Increase value of "timeout" to avoid such failures on "Ceph API tests" CI.
luo rixin [Tue, 16 Apr 2024 07:18:06 +0000 (15:18 +0800)]
install-deps: save and restore user's XDG_CACHE_HOME
Since ccache 4.0, ccache use $XDG_CACHE_HOME/ccache to keep compile cache
if XDG_CACHE_HOME is set. In this case $XDG_CACHE_HOME is overwrite,
ccache will use $XDG_CACHE_HOME/ccache(ccache will create the dir if not exsit) to
store compile cache, but $XDG_CACHE_HOME will be removed next round running,
leading to ccache contests are always removed. So save and restore user's XDG_CACHE_HOME.
Fixes: https://tracker.ceph.com/issues/65175 Signed-off-by: luo rixin <luorixin@huawei.com>
(cherry picked from commit a17342147d4411211ecf646730987d2633dabb6e)
Instruct readers to use "mkdir /mnt/cephfs1" to create a mountpoint
before using "ceph-fuse" to mount a filesystem, if "/mnt/cephfs1"
doesn't already exist. cf.
https://github.com/ceph/ceph/pull/56831#discussion_r1561102227
Matt Benjamin [Wed, 27 Mar 2024 22:33:56 +0000 (18:33 -0400)]
rgwlc: check for no-bucket at bucket_lc_process() preamble
Avoids trivial segfault deferencing the bucket pointer.
Fixes: https://tracker.ceph.com/issues/65188 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit d5f6fe772f83d9e6b1ebaafdb1e8274041b0d684)
Tobias Urdin [Thu, 18 Jan 2024 09:29:05 +0000 (09:29 +0000)]
rgw: invalidate and retry keystone admin token
We validate client tokens against the Keystone API by
sending our own "admin token" that is allowed to lookup
client tokens.
This "admin token" is cached and upon checking the cache
we verify the expiration on the token before using it but
we have no logic to invalidate the cache if the response
from the Keystone API says that the "admin token" is invalid.
Since we don't invalidate it and it still has not expired
it will stay in our cache and continue to cause Swift API
requests for clients to be dropped because of the invalid
admin token, until service is restarted, admin token is
expired (which it can already be) or until
the whole cache is dropped or TokenCache::invalidate()
called on the admin token.
There is probably multiple places in Keystone where it
invalidates tokens, but one example where the "admin token"
would be invalidated and return HTTP 401 status code is if
the user that is configured in rgw_keystone_admin_user has
it's password changed (even if it's the same password as the
current one) then Keystone will invalidate it's cache and
invalidated existing tokens even if they have not expired yet.
test_multi.py:test_object_sync is updated to reproduce the issue.
Without the fix, objects "." and ".." are not replicated and the test
fails (times out).
The function is typically invoked on client errors like NoSuchBucket. Logging these errors with level 1 may initially suggest a significant issue, when in fact it's just a client error. Consider raising the logging level to 20 for better clarity.
- Moves "features" section in rbd image create form to "Advanced" section.
- makes rbd configuration section to be expanded by default rather than
being collapsed as it has only single section. This will improve user experience as it will not
require two clicks.
- updates e2e test
Adam King [Thu, 4 Apr 2024 18:11:11 +0000 (14:11 -0400)]
mgr/cephadm: make client-keyring deploying ceph.conf optional
There are cases where users would like to manage their own
ceph.conf but still have cephadm deploy the client keyrings,
so this is being added to facilitate that.
Shachar Sharon [Wed, 13 Mar 2024 14:43:29 +0000 (16:43 +0200)]
qa/suites/orch: add minimal smb non-AD test
Test minimal SMB deployment over CephFS, using local users (non-AD).
Upon successful deployment run minima smbclient command ('ls') to probe
Samba's share liveness.
Co-authored-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Shachar Sharon <ssharon@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 8bb5fb69648f497da80c97011e171dff23c5130d)
John Mulligan [Fri, 15 Mar 2024 17:48:35 +0000 (13:48 -0400)]
qa/tasks: add a cephadm samba container helper func independent of AD DC
To have the standalone (non-AD) server test function similarly to the AD
member server test we need to set a variable for samba client container
command similar to how the AD setup command does it.
John Mulligan [Sat, 24 Feb 2024 15:52:53 +0000 (10:52 -0500)]
qa/suites/orch: add a new smb service cephadm sub-suite and test
Start a new subdir under cephadm suite for the new smb service
that cephadm can deploy. Add one new test that checks that a
smb service with domain membership can be deployed and connect
to it with smbclient from the samba client container image.
John Mulligan [Tue, 27 Feb 2024 14:48:25 +0000 (09:48 -0500)]
qa/tasks: add error condition to exec functions
Looking at the code that expands `all-roles` and `all-hosts` there's no
proper error checking for when these values appear but there are >1
top-level roles in the task config. If a user does this it'll fail
but in a somewhat unclear manner. Add a new condition that raises a
clear exception in this case hopefully saving someone future debugging
time.
John Mulligan [Tue, 27 Feb 2024 14:44:51 +0000 (09:44 -0500)]
qa/tasks: reduce duplicated code
All `exec`-style function in teuthology appear to have a transformation
block that expands names like `all-roles` and `all-hosts`. With the new
cephadm.exec task that block appeared twice in cephadm.py. This change
removes the duplication by creating an _expand_roles function that
can be called from the command executing functions.
John Mulligan [Mon, 26 Feb 2024 21:17:22 +0000 (16:17 -0500)]
qa/tasks: add a template filter to map a role name to a remote
Add a `role_to_remote` template filter function that has the ability to
map a role name to a remote. Attributes of the remote can then be
used to get the actual node ip or name.
John Mulligan [Mon, 26 Feb 2024 21:16:57 +0000 (16:16 -0500)]
qa/tasks: a new cephadm exec task similar to vip.exec but generalized
Add a new cephadm.exec task that works similarly to the existing
vip.exec but instead of only considering VIP related string replacements
it uses that templating feature that was recently added to the
cephadm module for generalized string templating.
John Mulligan [Mon, 26 Feb 2024 18:47:04 +0000 (13:47 -0500)]
qa/tasks: add a cephadm.exclude role
Add a cephadm.exclude role that excludes a test node from cluster setup
and related commands. I need this as I have test node that will be set
up as an AD Domain Controller for testing Samba and do not want that
node to be have *any* other services running on it.
John Mulligan [Sat, 24 Feb 2024 19:26:36 +0000 (14:26 -0500)]
qa/tasks: allow passing stdin string to cephadm shell commands
There are cases where I want to pass some large-ish strings to ceph
commands executed via cephadm shell. Allow items within the commands
list to be dicts containing a command (as before) and an optional
stdin variable. This change also supports possible future extensions as
well.
John Mulligan [Tue, 20 Feb 2024 23:28:58 +0000 (18:28 -0500)]
qa/tasks: add a new cephadm task for setting up samba ad dc
Add a new task function to cephadm.py that sets up a container running
the Samba based domain controller on a node using podman or docker.
Much of the function actually deals with disabling systemd-resolved
because that service conflicts with the DNS server component of the DC.
John Mulligan [Fri, 5 Jan 2024 15:45:08 +0000 (10:45 -0500)]
mgr/cephadm: simplify _get_container_image a bit
Because the "if-ladder" was only ever assigning a single variable with
a value it can be directly replaced by a dict & dict-lookup which is
much more succinct.
Also take the opportunity to sort the (non-comment) lines as there's
no meaning to the previous order and this makes it easier for a reader
to scan through.
John Mulligan [Thu, 4 Jan 2024 21:38:08 +0000 (16:38 -0500)]
mgr/cepahdm: add various touch points to enable smb service
Add the smb service by name or by type to one of the many, many touch
points in the orchestrator and cephadm packages needed to get the
orchestrator aware of smb.
John Mulligan [Thu, 14 Dec 2023 00:20:45 +0000 (19:20 -0500)]
python-common: reformat ServiceSpec class level service type lists
Reformat the ServiceSpec classes properties KNOWN_SERVICE_TYPES and
REQUIRES_SERVICE_ID. These were previously strings that were converted
to lists via a call to split. With a string there's very little a human
or a tool can do to validate the content. Changing these into proper
lists in the source code brings clarity of intent and the ability to
analyze the code. Because there's no semantic difference what services
are listed where (this means the type could probably be a set - a quest
for another day) I also took the opportunity to sort the contents of the
lists and add some basic comments for what these lists are for.
It also removes the use of (ugly, IMO) line continuations. The downside
is that it makes more total lines, but if that bugs you - use code
folding :-).
John Mulligan [Fri, 5 Jan 2024 15:24:10 +0000 (10:24 -0500)]
mgr/cephadm: refactor keyring simplification out of get_keyring_with_caps
Refactor get_keyring_with_caps such that the keyring simplification code
is moved into a new function that can be used in other locations.
get_keyring_with_caps will now call the new function to return the
simplified & consistent keyring output.
John Mulligan [Wed, 13 Dec 2023 20:49:12 +0000 (15:49 -0500)]
mgr/cephadm: reformat the _service_classes variable
Reformat the _service_classes variable so that it uses a multi-line list
with a single item on each line in a more black-ish style that is more
readable (especially if you use code-folding wisely).
Sort the list while we're at it.
John Mulligan [Wed, 13 Dec 2023 21:05:27 +0000 (16:05 -0500)]
mgr/orchestrator: fix the sorting of the imports
While ceph doesn't enforce sorted imports I prefer them when possible. I
had once sorted these imports but then nvmeof came along an ruined
things. Put nvmeof back in it's place.