Ilya Dryomov [Fri, 19 Feb 2021 15:47:17 +0000 (16:47 +0100)]
krbd: make sure the device node is accessible after the mapping
We have always assumed this to be the case and users' scripts and
orchestration tools have grown to depend on this. Let's add some
enforcement, prompted by [1]:
"I am running my Kubernetes worker node inside of an LXC container
which doesn't benefit from the device node created by the kernel, so
I'm using udev to create the /dev/rbd* device nodes inside of the LXC
container."
which, through the unfortunate interaction with ceph-csi rbd plugin,
results in data loss for "volumeMode: Filesystem" PVs because it ends
up recreating the filesystem every time the PV is attached to the pod:
"When deleting the pod and re-creating it, I can see that the RBD
image is indeed being reformatted. This seems to be because when
blkid is being run to check if the image is formatted, the /dev/rbd*
device has not yet been created by udev. By the time the code gets
down to running mkfs, the device is there and the damage is done."
Kefu Chai [Sat, 6 Mar 2021 16:32:42 +0000 (00:32 +0800)]
.github: correct the regex in mileston workflow
also use pull_request_target event so the action is run in the
context of the base of the pull request. this helps us to overcome
the "Resource not accessible by integration" issue where the action
is run in the context of the pull request.
* refs/pull/39906/head:
mgr/volumes: Bump up AuthMetadataManager's version
pybind/ceph_volume_client: Bump up the version and compat_version to 6
pybind/ceph_volume_client: Fix auth-metadata file recovery
pybind/ceph_volume_client: Update the 'volumes' key to 'subvolumes' in auth metadata file
Reviewed-by: Ramana Raja <rraja@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Kotresh HR [Fri, 19 Feb 2021 11:27:23 +0000 (16:57 +0530)]
mgr/volumes: Bump up AuthMetadataManager's version
With ceph_volume_client and mgr-volumes co-existing
for sometime, the version of both needs to be same.
The ceph_volume_client version <=5 can't decode
'subvolumes' key in auth-metadata file. Hence to
handle version in-compatibility, the version of
ceph_volume_client is bumped up to 6 and the same
needs to be done in mgr-volume's AuthMetadataManager
Kotresh HR [Fri, 19 Feb 2021 11:12:33 +0000 (16:42 +0530)]
pybind/ceph_volume_client: Bump up the version and compat_version to 6
With 'volumes' key updated to 'subvolumes', the version of
ceph_volume_client <= 5 can't decode auth-metadata file. Hence
bumping up ceph_volume_client version and compat_version to 6.
Kotresh HR [Mon, 15 Feb 2021 16:26:51 +0000 (21:56 +0530)]
pybind/ceph_volume_client: Update the 'volumes' key to 'subvolumes' in auth metadata file
The older auth metadata files before nautilus release stores
the authorized subvolumes using the 'volumes' key. As the
notion of 'subvolumes' brought in by mgr/volumes, it makes
sense to use 'subvolumes' key. This patch would be tranparently
update 'volumes' key to 'subvolumes' and newer auth metadata
files would store them with 'subvolumes' key.
Also fails the deauthorize if the auth-id doesn't exist.
Kotresh HR [Fri, 5 Feb 2021 18:05:22 +0000 (23:35 +0530)]
qa: Fix a few mgr/volume test cases
Recovering dirty auth metadata file might not retain the order,
fixed the comparison in 'test_recover_auth_metadata_during_authorize'
and 'test_recover_auth_metadata_during_deauthorize'.
Kotresh HR [Sat, 23 Jan 2021 17:03:32 +0000 (22:33 +0530)]
ceph_volume_client: Fix failure of test_idempotency
With the test environment, 'args must be encodeable
as a bytearray' error is seen for 'ceph_mds_command'.
Hence removed tuple and passed the JSON formatted string.
Kotresh HR [Tue, 5 Jan 2021 12:55:54 +0000 (18:25 +0530)]
mgr/volumes: Update the 'volumes' key to 'subvolumes' in auth metadata file
The older auth metadata files created by CephVolumeClient stores the
authorized subvolumes using the 'volumes' key as the notion of
'subvolumes' brought in by mgr/volumes. Hence, this would be tranparently
updated to 'subvolumes' and newer auth metadata files would store them
with 'subvolumes' key.
Also fails the deauthorize if the auth-id doesn't exist.
Optionally allow authorizing auth-ids not created by mgr plugin
via the option 'allow_existing_id'. This can help existing deployers
of manila to disallow/allow authorization of pre-created auth IDs
via a manila driver config that sets 'allow_existing_id' to False/True.
Kotresh HR [Tue, 15 Dec 2020 12:01:54 +0000 (17:31 +0530)]
mgr/volumes: Preserve existing caps while authorize/deauthorize auth-id
Authorize/Deauthorize used to overwrite the caps of auth-id which would
end up deleting existing caps. This patch fixes the same by retaining
the existing caps by appending or deleting the new caps as needed.
Kotresh HR [Mon, 4 Jan 2021 13:04:54 +0000 (18:34 +0530)]
mgr/volumes: Disallow authorize existing auth_id
This patch disallow the mgr plugin to authorize the auth_id
which is not created via mgr plugin. Those auth_ids could be
created by other means for other use cases which should not be modified
via mgr plugin.
Kotresh HR [Wed, 18 Nov 2020 10:13:25 +0000 (15:43 +0530)]
mgr/volumes: Persist auth and subvolume metadata
1. Subvolume create and delete operations create and delete subvolume
metadata file respectively.
2. Subvolume authorize creates the auth meta file and persists the
required metadata on subvolume metadata file and auth metdata file
on disk. Subvolume deauthorize clears the required metadata on
both metadata files.
donggyu_park [Mon, 22 Feb 2021 07:52:50 +0000 (16:52 +0900)]
cephadm: Delete the unnecessary error line in open_ports
In #39020, d9fbd7e is cherry picked from 70722a2. there is no bug in 70722a2,
but there is a bug in d9fbd7e. It seems that the unnecessary error line was added during cherry picking.
So error only occurs in octopus branch.
This commit directly fixes issue in octopus branch instead of cherry picking
since cherry picking from 70722a2 has already been applied to octopus branch.
This commit deletes the unnecessary error line added in d9fbd7e.
In d9fbd7e, the parameter verbose_on_failure was removed in call.
However, the unnecessary line that uses verbose_on_failure was
added in open_ports and so error occurs.
Fixes: https://tracker.ceph.com/issues/49467 Signed-off-by: Donggyu Park <donggyu_park@tmax.co.kr>
Matthew Vernon [Thu, 4 Feb 2021 11:41:14 +0000 (11:41 +0000)]
rgw/radosgw-admin clarify error when email address already in use
The error message if you try and create an S3 user with an email
address that is already associated with another S3 account is very
confusing; this patch makes it much clearer
To reproduce:
radosgw-admin user create --uid=foo --display-name="Foo test" --email=bar@domain.invalid
radosgw-admin user create --uid=test --display-name="AN test" --email=bar@domain.invalid
could not create user: unable to parse parameters, user id mismatch, operation id: foo does not match: test
With this patch:
radosgw-admin user create --uid=test --display-name="AN test" --email=bar@domain.invalid
could not create user: unable to create user test because user id foo already exists with email bar@domain.invalid
Fixes: https://tracker.ceph.com/issues/49137 Fixes: https://tracker.ceph.com/issues/19411 Signed-off-by: Matthew Vernon <mv3@sanger.ac.uk>
(cherry picked from commit 05318d6f71e45a42a46518a0ef17047dfab83990)
Jason Dillaman [Mon, 8 Feb 2021 16:53:28 +0000 (11:53 -0500)]
rbd-mirror: don't prune older mirror snapshots when pruning incomplete snapshot
Since we normally prune in order, we need to ensure that we don't prune older
snapshots when we need to delete an incomplete mirror snapshot since the
older snapshot might be the only remaining mirror snapshot.
Jason Dillaman [Tue, 8 Dec 2020 19:16:49 +0000 (14:16 -0500)]
librbd/deep_copy: added new migrating flag to object copy
The migration operation and the copyup state machine will set
this flag when attempting to perform a deep-copy due to a
live-migration.
This flag will prevent a possible race condition between the
start of the object deep-copy when migration was enabled and
the writing portion of the deep-copy when migration might
have completed via external means.
Fixes: https://tracker.ceph.com/issues/45694 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 1baba64e213cb808804796575d3f7969cf37a3c6)
Jason Dillaman [Fri, 5 Feb 2021 15:41:30 +0000 (10:41 -0500)]
librbd/deep-copy: object-copy state machine must update object map
If there was no data to copy, the object-copy state machine was bypassing
the object-map update states and prematurely completing. Since the
object-map is default-initialized to all non-existent objects, this results
in incorrect state for OBJECT_EXISTS_CLEAN objects.
Jason Dillaman [Fri, 25 Sep 2020 14:40:32 +0000 (10:40 -0400)]
librbd: deep-copy should update object-map before writing to object
For the original use-case of RBD mirroring it was (maybe) more
acceptable to write to the object before updating the object map
because an interrupted sync will be retried. However, when using
the deep-copy object copy state machine as part of copyup, it's
more likely that the object-map has the potential to become
out-of-sync with reality if it's updated after the object is
written.
Jason Dillaman [Thu, 28 Jan 2021 23:30:16 +0000 (18:30 -0500)]
librbd/object_map: diff state machine should track object existence
The deep-copy snapshot-create state machine initializes the object-map
state to non-existent for all objects. There was an assumption that the
deep-copy object-copy state machine would always update the object map
but that was being skipped for clean objects as an optimization. This
change will support a future commit to run the object-copy state machine
for existing objects.
Mykola Golub [Wed, 2 Dec 2020 09:41:13 +0000 (09:41 +0000)]
test/librbd: print difference if deep-copy or migration test fails
It may appear to be useful to track the sporadic test failures
observed on jenkins, not reproducible locally.
Previously it was disabled because the output could be too
large. But after the hexdump was improved to skip repeating bytes
the output will hopefully be much smaller.
Kefu Chai [Wed, 3 Jun 2020 01:39:26 +0000 (09:39 +0800)]
qa/tasks/vstart_runner: do not teardown test_path if "create-cluster-only"
otherwise we could be removing a "None" directory when tearing down the cluster,
and have following failure:
Exception ignored in: <bound method LocalContext.__del__ of <__main__.LocalContext object at 0x7f99fd4a6cc0>>
Traceback (most recent call last):
File "../qa/tasks/vstart_runner.py", line 1189, in __del__
shutil.rmtree(self.teuthology_config['test_path'])
File "/tmp/tmp.mmM2ugspuR/venv/lib/python3.6/shutil.py", line 477, in rmtree
onerror(os.lstat, path, sys.exc_info())
File "/tmp/tmp.mmM2ugspuR/venv/lib/python3.6/shutil.py", line 475, in rmtree
orig_st = os.lstat(path)
TypeError: lstat: path should be string, bytes or os.PathLike, not NoneType
Aashish Sharma [Mon, 4 Jan 2021 05:09:16 +0000 (10:39 +0530)]
mgr/dashboard: alert badge includes suppressed alerts
On a cluster with alerting enabled, when alerts are triggered, even if they are silenced, the vertical navigation item (Cluster > Monitoring) displays the total number of alerts, including the ones suppressed.This PR intends to fix this issue.