Paul Cuzner [Wed, 3 Feb 2021 19:57:16 +0000 (08:57 +1300)]
mgr/cephadm:toleration fix for API consistency in dashboard
The function signatire change to "available" needed a
patch to the dashboard>orchestratior interface. This patch
just tolerates the change - if or how to consume the
additional data from "available" is yet to be deternined.
Paul Cuzner [Wed, 3 Feb 2021 01:45:15 +0000 (14:45 +1300)]
mgr/cephadm:updates to support orch status changes
The available function definition has been updated in
cephadm/rook and Orchestrator base class to provide a
module specific dictionary holding any specific info that
would be pertinent to share with the user. This in turn
changed the _status method for cephadm, but the UX
remains the same (i.e. admin must use --detail to see
the worker pool size).
Paul Cuzner [Mon, 18 Jan 2021 21:57:14 +0000 (10:57 +1300)]
mgr/orchestrator: removed worker_pool method
The worker_pool_size method has been removed and its
functionality replaced by the cephadm 'available' function to
help keep the interface model generic.
Kotresh HR [Fri, 5 Feb 2021 18:05:22 +0000 (23:35 +0530)]
qa: Fix a few mgr/volume test cases
Recovering dirty auth metadata file might not retain the order,
fixed the comparison in 'test_recover_auth_metadata_during_authorize'
and 'test_recover_auth_metadata_during_deauthorize'.
Ilya Dryomov [Mon, 8 Feb 2021 16:01:47 +0000 (17:01 +0100)]
librbd: don't hold owner_lock for validate_image_removal()
handle_exclusive_lock() and handle_shut_down_exclusive_lock() call
validate_image_removal() without owner_lock held, so holding it in
shut_down_exclusive_lock() appears to be redundant.
Ilya Dryomov [Sun, 7 Feb 2021 14:09:24 +0000 (15:09 +0100)]
librbd: treat EROFS as expected in handle_acquire_lock()
If the peer refuses to release exclusive lock (e.g. in case automatic
exclusive lock transitions are disabled), EROFS is retured. Suppress
a rather confusing "Read-only file system" error message -- this case
is no different from EBUSY or EAGAIN.
Ilya Dryomov [Sun, 7 Feb 2021 12:46:15 +0000 (13:46 +0100)]
librbd: refuse to release exclusive lock when removing
Commit 25c2ffe145be ("librbd: acquire exclusive lock from peer when
removing") changed PreRemoveRequest to request exclusive lock from the
peer instead of giving up and proceeding without exclusive lock. This
caused one of the test cases that sometimes runs concurrent "rbd rm"
against the same image to fail intermittently, most often on assert
because exclusive lock is now automatically transitioned to another
"rbd rm" on its request.
The root cause is older and probably goes back to when synchronous
librbd::remove() which held owner_lock across all operations including
trim_image() was converted to a set of state machines. Since then, any
peer that requests exclusive lock (instead of trying once and backing
off) is able to mess with image removal.
Install StandardPolicy to disable automatic exclusive lock transitions
during image removal.
Jason Dillaman [Mon, 8 Feb 2021 16:53:28 +0000 (11:53 -0500)]
rbd-mirror: don't prune older mirror snapshots when pruning incomplete snapshot
Since we normally prune in order, we need to ensure that we don't prune older
snapshots when we need to delete an incomplete mirror snapshot since the
older snapshot might be the only remaining mirror snapshot.
Jason Dillaman [Fri, 29 Jan 2021 15:44:38 +0000 (10:44 -0500)]
librbd/deep_copy: skip snap list if object is known to be clean
If the fast-diff indicates that the destination object should exist
and that it hasn't changed, there shouldn't be a need to issue the
snap list operation. Instead, just update the destination object map
to indicate the existence of the object.
Jason Dillaman [Fri, 29 Jan 2021 02:42:09 +0000 (21:42 -0500)]
librbd/deep_copy: object-copy state machine must update object map
If there was no data to copy, the object-copy state machine was bypassing
the object-map update states and prematurely completing. Since the
object-map is default-initialized to all non-existent objects, this results
in incorrect state for OBJECT_EXISTS_CLEAN objects.
Jason Dillaman [Wed, 3 Feb 2021 18:21:34 +0000 (13:21 -0500)]
librbd/io: track object non-existence when computing snapshot deltas
Re-use the existing DNE state to track whether or not the object
already exists when computing snapshot deltas from an arbitrary
set of snapshots. Previously, the non-existence of the object was
only computed for snap id 0 for tracking whiteouts. In a future
commit, the deep-copy object-copy state machine will be able to
properly update the object-map state to indicate exists clean
vs non-existent state.
Jason Dillaman [Wed, 3 Feb 2021 15:13:28 +0000 (10:13 -0500)]
librbd/io: only track initial diff extents if no diffs exists
The purpose of the initial diff extents ({0, 0}) was to help track
whether or not objects exists for read-from-parent / whiteout
tracking. Once we have at least one set of diffs on the object, we
actually have enough information to know about the object state.
Jason Dillaman [Thu, 28 Jan 2021 23:30:16 +0000 (18:30 -0500)]
librbd/object_map: diff state machine should track object existence
The deep-copy snapshot-create state machine initializes the object-map
state to non-existent for all objects. There was an assumption that the
deep-copy object-copy state machine would always update the object map
but that was being skipped for clean objects as an optimization. This
change will support a future commit to run the object-copy state machine
for existing objects.
Nizamudeen A [Tue, 19 Jan 2021 12:35:43 +0000 (18:05 +0530)]
mgr/dashboard: Automatically refresh the crush map metadata table
If we make any change to the osd crush map like do an osd crush reweight from cli, for that change to be reflected on metadata table we need to reload the entire page. Instead this PR takes care of auto refreshing the tree view.
Fixes: https://tracker.ceph.com/issues/48922 Signed-off-by: Nizamudeen A <nia@redhat.com> Signed-off-by: Avan Thakkar <athakkar@redhat.com>
(cherry picked from commit bc8562ef2a17b78e80bd4e1272d3fd1a512249bb)
Sebastian Wagner [Fri, 29 Jan 2021 10:10:38 +0000 (11:10 +0100)]
mgr/cephadm: Add strings to assert statements
This helps with: https://tracker.ceph.com/issues/48981
Looks like there is an assert somewhere:
```
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1269, in _handle_command
return self.handle_command(inbuf, cmd)
...snip...
File "/usr/share/ceph/mgr/orchestrator/module.py", line 550, in _list_services
raise_if_exception(completion)
File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 653, in raise_if_exception
raise e
AssertionError
```
Sebastian Wagner [Fri, 15 Jan 2021 12:13:35 +0000 (13:13 +0100)]
mgr/cephadm: try again calling ceph-volume without --filter-for-batch
Fixes: https://tracker.ceph.com/issues/48870
This deals with a cephadm upgrade issue:
1. user calls `ceph orch upgrade`
2. mgr/cephadm calls `ceph orch config set mgr.x container_image <new-container>`
3. standby mgr gets upgraded
4. mgr failover to new mgr
5. mgr/cephadm calls `_refresh_host_devices`
6. `_refresh_host_devices` calls` ceph orch config get osd container_image`.
But this returns the old image
7. `_refresh_host_devices` calls `ceph-volume ... --filter-for-batch`
with an image that doesn't support `filter-for-batch`
The idea is to simply retiry calling ceph-volume inventory without `--filter-for-batch`
(also removed `out` being used without being declared)
Sage Weil [Tue, 26 Jan 2021 22:10:13 +0000 (16:10 -0600)]
mgr/cephadm/upgrade: scale down MDS cluster(s) for major version upgrades
For octopus -> pacific, as with other recent releases, we need to scale
down the MDS cluster(s) to a single daemon before upgrading. (This is
because the MDS intra-cluster protocols aren't fully versioned.)
Sage Weil [Wed, 27 Jan 2021 14:54:00 +0000 (08:54 -0600)]
mgr/cephadm/upgrade: match against any repo_digest, not image_id
The image id can vary across hosts and (most notably) docker vs podman.
Instead, use the repo_digest as an image identifier.
Unfortunately, a single image may have multiple digests, even within the
same registry, so keep a list of the digests for the image we are
upgrading to, and ensure that each container has a digest that matches at
least one of them.
This allows upgrade to proceed in mixed docker+podman clusters. However,
it does not yet address a cluster with mixed CPU architectures, because
the container image will have different digest(s) for each architecture
build.