The function is typically invoked on client errors like NoSuchBucket. Logging these errors with level 1 may initially suggest a significant issue, when in fact it's just a client error. Consider raising the logging level to 20 for better clarity.
- Moves "features" section in rbd image create form to "Advanced" section.
- makes rbd configuration section to be expanded by default rather than
being collapsed as it has only single section. This will improve user experience as it will not
require two clicks.
- updates e2e test
Shachar Sharon [Wed, 13 Mar 2024 14:43:29 +0000 (16:43 +0200)]
qa/suites/orch: add minimal smb non-AD test
Test minimal SMB deployment over CephFS, using local users (non-AD).
Upon successful deployment run minima smbclient command ('ls') to probe
Samba's share liveness.
Co-authored-by: John Mulligan <jmulligan@redhat.com> Signed-off-by: Shachar Sharon <ssharon@redhat.com> Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 8bb5fb69648f497da80c97011e171dff23c5130d)
John Mulligan [Fri, 15 Mar 2024 17:48:35 +0000 (13:48 -0400)]
qa/tasks: add a cephadm samba container helper func independent of AD DC
To have the standalone (non-AD) server test function similarly to the AD
member server test we need to set a variable for samba client container
command similar to how the AD setup command does it.
John Mulligan [Sat, 24 Feb 2024 15:52:53 +0000 (10:52 -0500)]
qa/suites/orch: add a new smb service cephadm sub-suite and test
Start a new subdir under cephadm suite for the new smb service
that cephadm can deploy. Add one new test that checks that a
smb service with domain membership can be deployed and connect
to it with smbclient from the samba client container image.
John Mulligan [Tue, 27 Feb 2024 14:48:25 +0000 (09:48 -0500)]
qa/tasks: add error condition to exec functions
Looking at the code that expands `all-roles` and `all-hosts` there's no
proper error checking for when these values appear but there are >1
top-level roles in the task config. If a user does this it'll fail
but in a somewhat unclear manner. Add a new condition that raises a
clear exception in this case hopefully saving someone future debugging
time.
John Mulligan [Tue, 27 Feb 2024 14:44:51 +0000 (09:44 -0500)]
qa/tasks: reduce duplicated code
All `exec`-style function in teuthology appear to have a transformation
block that expands names like `all-roles` and `all-hosts`. With the new
cephadm.exec task that block appeared twice in cephadm.py. This change
removes the duplication by creating an _expand_roles function that
can be called from the command executing functions.
John Mulligan [Mon, 26 Feb 2024 21:17:22 +0000 (16:17 -0500)]
qa/tasks: add a template filter to map a role name to a remote
Add a `role_to_remote` template filter function that has the ability to
map a role name to a remote. Attributes of the remote can then be
used to get the actual node ip or name.
John Mulligan [Mon, 26 Feb 2024 21:16:57 +0000 (16:16 -0500)]
qa/tasks: a new cephadm exec task similar to vip.exec but generalized
Add a new cephadm.exec task that works similarly to the existing
vip.exec but instead of only considering VIP related string replacements
it uses that templating feature that was recently added to the
cephadm module for generalized string templating.
John Mulligan [Mon, 26 Feb 2024 18:47:04 +0000 (13:47 -0500)]
qa/tasks: add a cephadm.exclude role
Add a cephadm.exclude role that excludes a test node from cluster setup
and related commands. I need this as I have test node that will be set
up as an AD Domain Controller for testing Samba and do not want that
node to be have *any* other services running on it.
John Mulligan [Sat, 24 Feb 2024 19:26:36 +0000 (14:26 -0500)]
qa/tasks: allow passing stdin string to cephadm shell commands
There are cases where I want to pass some large-ish strings to ceph
commands executed via cephadm shell. Allow items within the commands
list to be dicts containing a command (as before) and an optional
stdin variable. This change also supports possible future extensions as
well.
John Mulligan [Tue, 20 Feb 2024 23:28:58 +0000 (18:28 -0500)]
qa/tasks: add a new cephadm task for setting up samba ad dc
Add a new task function to cephadm.py that sets up a container running
the Samba based domain controller on a node using podman or docker.
Much of the function actually deals with disabling systemd-resolved
because that service conflicts with the DNS server component of the DC.
John Mulligan [Fri, 5 Jan 2024 15:45:08 +0000 (10:45 -0500)]
mgr/cephadm: simplify _get_container_image a bit
Because the "if-ladder" was only ever assigning a single variable with
a value it can be directly replaced by a dict & dict-lookup which is
much more succinct.
Also take the opportunity to sort the (non-comment) lines as there's
no meaning to the previous order and this makes it easier for a reader
to scan through.
John Mulligan [Thu, 4 Jan 2024 21:38:08 +0000 (16:38 -0500)]
mgr/cepahdm: add various touch points to enable smb service
Add the smb service by name or by type to one of the many, many touch
points in the orchestrator and cephadm packages needed to get the
orchestrator aware of smb.
John Mulligan [Thu, 14 Dec 2023 00:20:45 +0000 (19:20 -0500)]
python-common: reformat ServiceSpec class level service type lists
Reformat the ServiceSpec classes properties KNOWN_SERVICE_TYPES and
REQUIRES_SERVICE_ID. These were previously strings that were converted
to lists via a call to split. With a string there's very little a human
or a tool can do to validate the content. Changing these into proper
lists in the source code brings clarity of intent and the ability to
analyze the code. Because there's no semantic difference what services
are listed where (this means the type could probably be a set - a quest
for another day) I also took the opportunity to sort the contents of the
lists and add some basic comments for what these lists are for.
It also removes the use of (ugly, IMO) line continuations. The downside
is that it makes more total lines, but if that bugs you - use code
folding :-).
John Mulligan [Fri, 5 Jan 2024 15:24:10 +0000 (10:24 -0500)]
mgr/cephadm: refactor keyring simplification out of get_keyring_with_caps
Refactor get_keyring_with_caps such that the keyring simplification code
is moved into a new function that can be used in other locations.
get_keyring_with_caps will now call the new function to return the
simplified & consistent keyring output.
John Mulligan [Wed, 13 Dec 2023 20:49:12 +0000 (15:49 -0500)]
mgr/cephadm: reformat the _service_classes variable
Reformat the _service_classes variable so that it uses a multi-line list
with a single item on each line in a more black-ish style that is more
readable (especially if you use code-folding wisely).
Sort the list while we're at it.
John Mulligan [Wed, 13 Dec 2023 21:05:27 +0000 (16:05 -0500)]
mgr/orchestrator: fix the sorting of the imports
While ceph doesn't enforce sorted imports I prefer them when possible. I
had once sorted these imports but then nvmeof came along an ruined
things. Put nvmeof back in it's place.
John Mulligan [Wed, 13 Dec 2023 19:33:20 +0000 (14:33 -0500)]
mgr/cephadm: fix test failure on newer python
Tests that touch this enum fail for me locally but pass in the CI. This
seems to be due to new enum related behavior in Python 3.11.
See: https://blog.pecar.me/python-enum
Instead of fixing it as suggested in the above blog, adding a __str__
method works on all python versions I care to know about.
John Mulligan [Tue, 16 Jan 2024 20:37:27 +0000 (15:37 -0500)]
cephadm: fix issue joining to ad by using a virtual hostname
The not-a-real-fqdn hostname that the containers got were causing
performance issues joining AD (and running testjoin and winbind).
Define a virtual hostname that can be passed in from the service or
automatically derived from the system's hostname.
John Mulligan [Wed, 6 Dec 2023 20:14:32 +0000 (15:14 -0500)]
cephadm: import and enable deployment of SMB daemon class
Enable the use of the SMB container daemon form class by importing, and
thus registering, it. Note that the only way to invoke this feature is
by hand rolling some JSON to feed to the `ceph _orch deploy` command.
Connecting this with the cephadm mgr module is left as a future task.
John Mulligan [Wed, 6 Dec 2023 20:14:31 +0000 (15:14 -0500)]
cephadm: add an SMB daemon module and classes
Add an incomplete but largely viable SMB/Samba container daemon form
implementation to cephadm. Currently unused but it lays out some of the
basics needed to create smb sharing using samba containers under cephadm
orchestration.
John Mulligan [Sun, 3 Dec 2023 16:01:05 +0000 (11:01 -0500)]
cephadm: add generic methods for sharing namespaces across containers
In the future, some sidecar containers will need to share namespaces
with the primary container (or each other). Make it easy to set this up
by creating a enable_shared_namespaces function and Namespace enum.
John Mulligan [Thu, 22 Feb 2024 18:49:10 +0000 (13:49 -0500)]
qa/tasks: add templating functions to cephadm module
Add functions to cephadm.py that will be later used to template
strings within the yaml files in the cephadm suites. This will be used
to replace the specific subst_vip call with generic calls that let
tests access "any" variables stored on the test ctx.
John Mulligan [Tue, 20 Feb 2024 15:09:50 +0000 (10:09 -0500)]
qa/tasks: fix VIPs log line
While testing my previous patches were correct I noticed that the string
here was logged exactly as written, and was thus pretty useless. This
was probably meant to be an f-string. So make it one. Also get rid of
the unnecessary map call, the list and IP address type can repr
themselves just fine IMO.
John Mulligan [Tue, 20 Feb 2024 00:14:52 +0000 (19:14 -0500)]
qa/tasks: change map_vips to raise exceptions instead of returning None
None of the callers of map_vips ever checks for a None return. So
instead of handling any error conditions it would always just blow
up with a semi-obscure TypeError. Convert the function to always
raise an exception (one that tries to breifly explain the condition)
when something goes wrong. I also take the opportunity to make
more clearer logging and reduce an indentation level.
Adam King [Mon, 8 Apr 2024 19:11:02 +0000 (15:11 -0400)]
qa/cephadm: only fail on CEPHADM_ error in logs
Rather than failing for any instance of
[ERR], [WRN], or [SEC]. The orch/cephadm suite
does a lot of stuff that can cause these various
warnings to breifly appear. Trying to catch all
cases has been difficult and the suite has been
red for some time. This patch makes it so it
instead only matches log messages that
include CEPHADM_ on top of having [ERR], [WRN],
or [SEC] as those warnings have been the ones
that have actually lead us to cephadm bugs, while
the others are pretty much always just noise in
these tests. This patch does not apply this
to the mds_upgrade_sequence, nfs, or rbd-iscsi
sections as those are symlinked from other suites
and I didn't want to affect those suites tests
directly with this change.
Adam King [Mon, 8 Apr 2024 18:27:26 +0000 (14:27 -0400)]
qa/tasks/cephadm: add option to limit what matches in log error scraping
This is specifically being added with the orch/cephadm suite
in mind, where coming up with a viable ignorelist has proved
difficult. The orch testing does a lot of actions that can
cause thigns like an OSD or MON daemon to be down very
briefly, and I've found the vast majority of the time we
really don't want to fail the test when these pop up as cephadm
testing really only benefits from catching the CEPHADM_ errors/
warnings rather than eveyr possible one. Rather than continuing to
play whack-a-mole with the errors in the logs, this
patch should allow us to limit what we fail on to at
least get the suite in a good spot again. We can always
phase out the uses of this new "log-only_match" option
later in a more controlled way, and adding it shouldn't
affect log scraping for any of the tests that aren't
facing a similar issue.
qa/rgw/s3tests: remove 'client.0' from bucket prefix
new sns test cases are using this for topic names, but the '.' is not
allowed there:
> api_params = {'Name': 'test-client.0-n3bdgre5el2jk8v-606'}
> botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the CreateTopic operation: Name must be made up of only uppercase and lowercase ASCII letters, numbers, underscores, and hyphens
rgw/notify: support cross-tenant and cross-account notifications
a bucket's notification configuration may refer to topics from several
different tenants or accounts. when publishing to a given topic, look in
the correct namespace for each topic instead of defaulting to the
requesting user's tenant namespace
Casey Bodley [Tue, 12 Mar 2024 20:26:44 +0000 (16:26 -0400)]
rgw/pubsub: return 404 NotFound instead of NoSuchKey
repurpose the ERR_NOT_FOUND define which was otherwise unused to
customize the error response for sns apis, which return the NotFound
error code instead of NoSuchKey from s3:
Casey Bodley [Tue, 12 Mar 2024 23:08:50 +0000 (19:08 -0400)]
rgw/pubsub: notifications can refer to topics in other accounts/tenants
accounts can use topic policy to grant sns:Publish permissions to other
accounts. the PutBucketNotification op should expect TopicArns from
other accounts. the account name from each TopicArn should be used as
the 'tenant' argument for RGWPubSub's constructor so we look for the
topic in the right namespace
Casey Bodley [Tue, 12 Mar 2024 20:25:58 +0000 (16:25 -0400)]
rgw/pubsub: when present, use account id instead of tenant
RGWPubSub provides topic namespace isolation for tenants by adding
prefixes to rados object names and topic metadata keys. accounts use
this the same way
refactor verify_topic_owner_or_policy() to share the same interface
as similar functions like verify_user/bucket/object_permission()
from rgw_common.cc
in addition to the topic resource policy, this now also consults iam
identity policies like user, group, or role policy
for account users, this now implements cross-account policy evaluation.
this only comes into play for sns:Publish permissions though, because
the topics themselves are scoped to the account
Casey Bodley [Sat, 9 Mar 2024 16:08:17 +0000 (11:08 -0500)]
rgw/pubsub: do init/validation in init_processing()
verify_permission() should do permission checks and nothing else!
admin/system users ignore errors from verify_permission() and go on to
call execute() regardless. that means that execute() can't rely on any
initialization that happened during verify_permission(), at risk of
crashing on admin/system requests. it also means that any permission
checks in execute() won't get overridden for admin/system users,
breaking their superuser access
by moving all parameter validation and initialization into
init_processing(), we can prepare all the state that verify_permission()
will need to do it's thing