Marcus Watts [Wed, 27 Apr 2022 22:50:56 +0000 (18:50 -0400)]
qa/rgw - run sse-s3 test cases only if configured or requested
This commit adds logic to automatically detect when sse-s3 is
available and if not, disables sse-s3 tests by default.
Configuration opions are provided to override the default either way.
Marcus Watts [Fri, 4 Mar 2022 01:37:53 +0000 (20:37 -0500)]
rgw/crypt - fix rest call to fail if insufficient kms args supplied.
in s3-land, it is ok to supply incomplete kms args for bucket encryption
configuration, but not on the rest call. This is a fix to distinguish
between the two and error out in the case of the latter.
The existing logic for bucket encryption was incomplete. This adds the
rest of the changes necessary to support sse-kms with default bucket
encryption.
The new logic has these changes:
on input: SSEAlgorithm is now optional.
On output: emit xmlns attribute at top level.
also output
BucketKeyEnabled and KMSMasterKeyID.
Hnadle "empty rule" case.
for testing and diagnostics:
support RGWBucketEncryptionConfig in ceph-dencoder
Marcus Watts [Tue, 15 Feb 2022 01:02:34 +0000 (20:02 -0500)]
rgw/crypt - remote old parts path for sse attributes
crypt_attribute_map is the place where sse attributes
should be found by the rest of the sse logic. There is
no longer any need to feed "parts" down to the crypto
logic; this commit removes the old data path.
Marcus Watts [Fri, 28 Jan 2022 10:34:43 +0000 (05:34 -0500)]
rgw/crypt - generalize putbucketencryption.
The previous logic only suported putbucketencryption to enable
sse-s3. The protocol allows putbucketencryption to be used to
enable sse-kms by default, and the surrounding logic is now ready
to do this as well. This commit removes the checks which stopped
this from working, so that it is now possible to use putbucketencryption
to default either sse-s3 or sse-kms on.
Marcus Watts [Fri, 28 Jan 2022 10:32:14 +0000 (05:32 -0500)]
rgw/crypt - fix sse-s3 logic.
The previous logic path was overly eager to do sse-s3. This version
ensures that the "no-encryption" case does not default to sse-s3.
It also removes some argument sanity checking which is now down before
this code is reached.
Marcus Watts [Sat, 18 Dec 2021 04:16:09 +0000 (23:16 -0500)]
rgw/sse-s3: +get_encryption_defaults, use new crypt_attribute_map
putobj and postobj: get_encryption_defaults
this fetches bucketencryption policy and resolves defaults.
also errors for various conflicts between parameters (& policy).
verify_permisions
fetch encryption attributes from crypt_attribute_map not x_meta_map
for postobj, x_meta_map only gets meta attributes, not sse.
if bucketencryption policy exists, it *may* be correct to
prepopulate this before bucket policy sees it.
map_qs_metadata
for putobj it now also copies sse attributes into crypt_attribute_map.
Marcus Watts [Sat, 18 Dec 2021 04:13:09 +0000 (23:13 -0500)]
rgw/sse-s3: various improvements.
1. sse-s3 should not require bucketencryption policy, work w/ postobj
2. make bucket key name configurable
3. +rgw_remove_sse_s3_bucket_key
1. for sse-s3 should not require bucketencryption policy, work w/ postobj
get_crypt_attribute ->
using s->info.crypt_attribute_map instead of s->env to avoid havoing
to know about HTTP_X_AMZ_SERVER_SIDE_ENCRYPTION_CUSTOMER_ALGORITHM names,
crypt_attribute_get -> crypt_attributes.get
to consolidate crypt attribute sources
rework sse-s3 logic: sse-s3 can be specified entirely in the rest call,
so remove requirement that bucket has bucket encryption policy.
also avoid term "default encryption", prefer term "test key".
2. for make bucket key name configurable:
With this modification, sse-s3 key names default to being
the bucket id, but can be configured to instead consist
of the owners name, a fixed string, or variations thereof.
3. +rgw_remove_sse_s3_bucket_key
For sse-s3, keys are supposed to be managed entirely by s3.
This means when a bucket is removed, we should be removing its key,
which should no longer be in use for anything. This is only safe
if the key was constructed using "%bucket_id", otherwise it might be
used in another bucket and we can never remove it automatically.
Marcus Watts [Sat, 18 Dec 2021 04:09:56 +0000 (23:09 -0500)]
rgw/sse-s3: save sse attributes in req_state->crypt_attribute_map
req_state->crypt_attribute_map to save sse-s3 cryptographic attributes
this is not quite a duplicate of x_meta_map because I think some of
of its uses conflict with sse-s3. (for instance, bucketencryption vs. signatures)
rgw: Adding SSE-S3 support in GET and PUT paths (using Vault as KMS)
Added the support to generate KEK based on bucket owner UID in
PutBucketEncryption. This is stored in bucket x-attrs. The KEK-ID is
later used in GET and PUT paths.
In the PUT path, we check if BucketEncryption is enabled for the bucket.
If yes, we detemine if the encryption type is AES256 (i.e., SSE-S3),
then we fetch the KEK-ID from the bucket x-attrs and use it to wrap the
data key. Thereafter, we call generate-data-key. We store the KEK-ID
and the wrapped data-key in the object x-attrs.
In the GET path, we simply pull out the KEK-ID from the object x-attr
and decrypt the object.
Due to lack of Windows support in the Teuthology, the test case adopts
the following workaround:
* Deploy baremetal machine with `ubuntu_latest.yaml` and
configure it with libvirt KVM.
* Create a libvirt VM and provision it with Windows Server 2019, using
the official ISO from Microsoft.
* Configure SSH in the Windows VM, and run the tests remotely via SSH.
The implementation of the test case consists of workunit scripts.
`qa/workunits/windows/test_rbd_wnbd.py` is the main Python script
to test Ceph on Windows basic functionality. This is executed in the
libvirt VM configured with Windows Server 2019.
rgw/dbstore: Handle prefix/delim in Bucket::List op
Given a prefix, fetch only those objects matching the prefix.
In addition, skip the entries with "delim" and instead include
those entries in common_prefixes
cmake/modules: always use the python3 specified in command line
if another python3 with higher version is found by
find_package(Python3), the cmake's install script would just
install the python modules/extensions into that python3's
dist-package directory, and the packaging script would fail
to find these artifacts when trying to package them.
so we need to ensure that the install directories for python
modeules/extensions are always "versioned" with WITH_PYTHON3
cmake option.
Adam King [Wed, 6 Apr 2022 14:32:22 +0000 (10:32 -0400)]
mgr/cephadm: allow setting insecure_skip_verify for alertmanager
Add a "secure" parameter to alertmanager spec that will cause it
to deploy alertmanagers with insecure_skip_verify as true or false
depending on the value given for "secure".
NOTE: alertmanager must still be reconfigured after applying a yaml
with this option changed.
Fixes: https://tracker.ceph.com/issues/55272 Fixes: https://tracker.ceph.com/issues/55333 Signed-off-by: Adam King <adking@redhat.com>
Xiubo Li [Tue, 29 Mar 2022 08:45:12 +0000 (16:45 +0800)]
client: stop forwarding the request when exceeding 256 times
The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.
In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.
Fixes: https://tracker.ceph.com/issues/55129 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Moritz Röhrich [Mon, 21 Mar 2022 16:32:25 +0000 (17:32 +0100)]
cephadm: avoid crashing on expected non-zero exit
- Avoid crashing when a call out to an external program expectedly does
not return exit status zero.
There are programs that communicate other information than error/no
error through exit status. E.g. `systemctl status` will return different
exit codes depending on the actual status of the units in question.
In cases where this is expected crashing with a RuntimeError exception
is inappropriate and should be avoided.
Fixes: https://tracker.ceph.com/issues/55117 Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
cmake: resurrect mutex debugging in all Debug builds
Commit 403f1ec2888a ("cmake: make "WITH_CEPH_DEBUG_MUTEX" depend on
CMAKE_BUILD_TYPE") made WITH_CEPH_DEBUG_MUTEX depend on build type
being set to Debug, in CMakeLists.txt. However, if CMAKE_BUILD_TYPE
isn't specified by the user, we may still set it to Debug later, in
src/CMakeLists.txt, and in that case WITH_CEPH_DEBUG_MUTEX doesn't
get enabled. The result is that
$ do_cmake.sh -DCMAKE_BUILD_TYPE=Debug ...
debug builds have mutex debugging enabled, while
$ do_cmake.sh ...
builds, which are supposed to be the same, don't. Jenkins builders
don't pass -DCMAKE_BUILD_TYPE=Debug so that commit effectively turned
off all ceph_mutex_is_locked* asserts in "make check".
test/rbd_mirror: grab timer lock before calling add_event_after()
add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").
Volker Theile [Wed, 30 Mar 2022 11:38:33 +0000 (13:38 +0200)]
mgr/dashboard: Imrove error message of '/api/grafana/validation' API endpoint
In case the validation of the Grafana URL fails, e.g. because of an invalid SSL certificate, a useless and not helping default error message is displayed in the UI.
This PR will re-raise the exception as a DashboardException which includes the detailed description of what happened. This will help to identify SSL cert issues much easier for example.
1. Created subvolume
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot
5. Delete snapshot from back end (don't use subvolume interface) before
clone completes
6. Delete clone with force
7. Delete subvolume
8. Delete fs and associated pools
9. Created new fs
10 Created new subvolume,
11. Written some I/O on the subvolume
12. Create snapshot of the subvolume
13. Create clone of the snapshot <---------------THIS OPERATION HANGS -----------------
Root Cause:
Since the snapshot is deleted from the back end, the clone fails. But it
also fails to remove the clone index at '/volumes/_index/clone'. The
cloner thread goes to infinite loop of starting the clone and failing.
This involves taking 'self.async_job.lock()' and reads the clone index
to get the job and registers the above job.
While the 'cloner thread' is in above loop, the fs is destroyed. The
cloner threads which lives till the mgr/volumes is enabled in mgr, takes
the 'self.async_job.lock()' and hangs while reading the clone index.
Any further clone operations which also requires above lock hangs.
Fix:
Remove the clone index even though snapshot is not present.