Ilya Dryomov [Fri, 21 Jan 2022 12:41:46 +0000 (13:41 +0100)]
rbd-mirror: fix races in snapshot-based mirroring deletion propagation
When remote image is deleted, rbd-mirror can encounter three cases:
1) no remote image id
2) no remote mirror metadata
3) MIRROR_IMAGE_STATE_DISABLING in remote mirror metadata
Commit d4c66ac5c615 ("rbd-mirror: fix issue with snapshot-based
mirroring deletion propagation") fixed case 1. Cases 2 and 3 remained
broken because for both of them finalize_snapshot_state_builder() would
populate not only remote_mirror_peer_uuid but also remote_image_id,
thus disabling ENOLINK logic in handle_prepare_remote_image() and
handle_bootstrap(). Commit ff60aec2d9ef ("rbd-mirror: fix bootstrap
sequence while the image is removed") touched on case 3, but it made
a difference only for journal-based mirroring.
Stop calling finalize_snapshot_state_builder() on errors. Instead,
align with journal-based mirroring by filling remote_mirror_peer_uuid
together with remote_mirror_uuid.
Make it clear that the local image non-primariness is asserted
independent of the mode; avoid the default implementation being
overridden but still relied on by both modes.
Ilya Dryomov [Wed, 19 Jan 2022 11:54:23 +0000 (12:54 +0100)]
rbd: add missing switch arguments for recognition by get_command_spec()
Currently this
$ rbd --all children img
doesn't work, while this
$ rbd children --all img
or this
$ rbd children img --all
does. The issue is that -a/--all isn't on the list of known switch
arguments. The "rbd children" example may seem contrived but for more
complicated commands such as "rbd device map" mixing switches and
positional arguments occurs naturally:
Conflicts:
src/tools/rbd/ArgumentTypes.h [ snapshot quiesce support
not in octopus ]
src/tools/rbd/action/Device.cc [ nbd cookie support not in
octopus ]
src/tools/rbd/action/Migration.cc [ import-only migration
not supported in octopus ]
src/tools/rbd/action/Wnbd.cc [ wnbd support not in octopus ]
Matt Benjamin [Tue, 4 Jan 2022 16:22:00 +0000 (11:22 -0500)]
rgwlc: remove lc entry on bucket delete
Buckets with lifecycle policies installed have a state entry that
must also be deleted when the bucket is removed.
Fixes: https://tracker.ceph.com/issues/46728
N.b., should really be generic, not specific to the RADOS store, but
there doesn't seem to be a clean model for implementing generic side
effects in Zipper, currently.
Igor Fedotov [Tue, 2 Nov 2021 12:03:39 +0000 (15:03 +0300)]
os/bluestore: avoid premature onode release.
This was observed when onode's removal is followed by reading
and the latter causes object release before the removal is finalized.
The root cause is an improper 'pinned' state assessment in Onode::get
More detailed overview is:
At some point Onode::get() might face the case when nref == 2 and pinned = true
which means parallel incomplete put is running on the onode - ref count is
decremented but pinned state is still unmodified (and even lock hasn't been
acquired yet).
This might finally result in two puts racing over the same onode with nref == 2
which finally results in a premature onode release:
// nref =3, pinned = 1
// Thread 1 Thread 2
// o->put() o->get()
// --nref(n = 2, pinned=1)
// nref++ (n=3, pinned = 1)
// return
// ...
// o->put()
// --nref(n = 2)
// pinned = 0,
// --nref(n = 1)
// ocs->_unpin_and_rm(o) -> o->put()
// ...
// --nref(n = 0)
// release o
// o->c->get_onode_cache()
// FAULT!
//
The suggested fix is to introduce additional atomic counter tracking
running put() functions. And permit onode release when both regular
nref and put_nref are both equal to zero.
Fixes: https://tracker.ceph.com/issues/53002 Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 96f0efe6d5307a55bea32f7216ef9511da0c5a47)
The recent changes from PR #43536 introduced a regeression preventing from
running ceph-volume in a containerized context on Ubuntu 18.04.
Given that the path for the binary `lvs` differs between CentOS 8 and Ubuntu 18.04.
(`/usr/sbin/lvs` and `/sbin/lvs` respictively). It means that ceph-volume running
in the container on CentOS 8 sees the `lvs` binary at `/usr/sbin/lvs` and try to
run it with `nsenter` on the host which is running Ubuntu 18.04.
Josh Durgin [Fri, 7 Jan 2022 18:37:13 +0000 (13:37 -0500)]
mon/OSDMonitor: avoid null dereference if stats are not available
Not confirmed yet whether this was the issue in the bug referenced
below, however it's a necessary defensive check for the
'osd pool get-quota' command.
All other uses of get_pool_stats() already handle this case.
Mykola Golub [Fri, 14 Jan 2022 18:21:29 +0000 (18:21 +0000)]
cls/journal: skip disconnected clients when finding min_commit_position
When a new journal client is registered, all already registered
clients are checked, and a client with min position is selected
as a position for the new client. Thus we may expect that
starting from the registered position all journal entries will be
available (not trimmed) for the new client.
But when looking for a min commit position, the client_register
function did not take into account that a registered client might
be in disconnected state, and in that case the journal entries
might be trimmed for this client.
Ilya Dryomov [Fri, 7 Jan 2022 12:31:08 +0000 (13:31 +0100)]
test/librbd: make diff-iterate clone tests exercise fast-diff mode
The fast-diff feature wasn't propagated to the clone so these tests
were exercising the slow list_snaps path no matter what RBD_FEATURES
value was supplied to ceph_test_librbd.
Ilya Dryomov [Wed, 5 Jan 2022 19:24:40 +0000 (20:24 +0100)]
librbd: restore diff-iterate include_parent functionality in fast-diff mode
Commit 4429ed4f3f4c ("librbd: switch diff iterate API to use new snaps
list dispatch methods") removed the recursive execute() call. The new
list_snaps method does indeed handle parent diffs internally but it is
not used in fast-diff mode. Nothing changed there -- we still need to
load the parent object map, calculate parent object_diff_state, etc.
Ilya Dryomov [Wed, 19 Jan 2022 20:08:01 +0000 (21:08 +0100)]
librbd: diff-iterate reports incorrect offsets if whole_object=true
It turns out that in octopus both fast-diff and list-snaps (slow)
modes were broken. As long as whole_object=true, the same incorrect
offset was reported in both modes. The fast-diff mode is fixed in
in previous commit.
This is an octopus-only patch for list-snaps mode. In pacific this
issue was addressed with 4429ed4f3f4c ("librbd: switch diff iterate
API to use new snaps list dispatch methods").
mgr/dashboard: upgrade Cypress to the latest stable version
- Remove unneeded dependency that was causing UI performance issues: zone.js
- Ignore 'ResizeObserver loop limit exceeded' error.
- run-frontend-e2e-tests.sh refactoring: create rgw dashboard user through
'ceph dashboard set-rgw-credentials' and use it on rgw buckets' tests.
Fixes: https://tracker.ceph.com/issues/53357 Signed-off-by: Alfonso MartĂnez <almartin@redhat.com>
(cherry picked from commit 3e4e29590aa1742fc3b44d21389325a13cca8199)
Conflicts:
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.e2e-spec.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.po.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/ui/navigation.po.ts
Deleted this file since its not in octopus
src/pybind/mgr/dashboard/frontend/package-lock.json
Generated new file
src/pybind/mgr/dashboard/frontend/package.json
Kept zone.js and changed the cypress version to 9.0.0
src/pybind/mgr/dashboard/run-frontend-e2e-tests.sh
Accept the current change
Ilya Dryomov [Tue, 11 Jan 2022 12:13:01 +0000 (13:13 +0100)]
qa/run_xfstests_qemu.sh: harden against wget failures
If wget fails (e.g. due to a certificate issue), it still creates
an empty file. Then this file is marked executable, ./"${SCRIPT}"
immediately returns 0 and run_xfstests_qemu.sh exits successfully
without running a single xfstest.
This started on Sep 30, 2021 with the expiration of Let's Encrypt
root certificate -- all qemu jobs with "test: qa/run_xfstests_qemu.sh"
just booted the VM for a couple of seconds and reported success.
Sage Weil [Thu, 18 Nov 2021 20:46:06 +0000 (15:46 -0500)]
osd/PeeringState: separate history's pruub from pg's
(pruub = prior_readable_until_ub [upper bound])
During peering, a primary may conclude that it does not need to wait for
the prior interval(s)' read lease because it will query all such osds.
However, it is dangerous to reflect that local inference about future
peering effects in the info.history, which is freely shared with other
OSDs. For example, if the primary cleared the history pruub, shared it,
and then failed, the next primary may conclude that it does not need to
wait for the lease to expire.
Instead, track the pruub in the conventional way. Only at the end of
peering do we clear it (if there are no prior_interval_down_osds) before
ending our final activate infos out, when we have already talked to the
peer osds and they know the prior interval has finished.
rbd-mirror: remove image_map next_state if sets to the same state
In some cases, set_state is called with DISSOCIATING, then ASSOCIATING
and DISSOCIATING again. In this case the state DISSOCIATING is
processed to remove the image and then schedule the next action which is
associating.
To fix this case, this commit removes the next_state if the state is
sets to the same state.
rbd-mirror: fix bootstrap sequence while the image is removed
If the image is being removed the PrepareRemoteImageRequest was
returning the same error if the image was disabled or non primary which
doesn't allow the BootstrapRequest to have the correct error handling.
This commit fix this behavior by considering that the remote image is
already deleted if the image is in disabling state.
rbd-mirror: remove image_mapped condition to remove image_map
In some split-brain scenario the image is removed while the image_mapped
is false. This prevents the removal of image_map in OMAP and thus the
entry will not be removed until the daemon is restarted.
cls/rbd: prevent image_status when mirror image is not created
This prevent image_status_set to succeed when there is no mirror image
yet. This solves some stale entries that were not removed in
rbd-mirror and prevent to add entries that would not be visible from the
rbd cli.
In the LoadRequest in the ImageMap class add initial cleanup to remove
stale entries. To cleanup the LoadRequest will query the mirror image
list and remove all the image_map that are notin the list.
Added a condition to handle the case where m_image_ctx is null on
close_image and handle_close_image in the TrashMoveRequest. This fix is
not needed in newer versions of Ceph as ImageCtx no longer needs to be
destroyed explicitely with a destroy method after Octopus.
rbd-mirror: add mirror status removal on ImageReplayer shutdown
In a scenario where you have rbd-mirror daemons on both clusters. The
rbd-mirror daemon on the primary site will not properly cleanup his
status on image removal.
This commit add a path for direct removal at the shut_down of the
ImageReplayer to properly cleanup the metadata.
Ilya Dryomov [Tue, 4 Jan 2022 19:38:35 +0000 (20:38 +0100)]
librbd: diff-iterate reports incorrect offsets in fast-diff mode
If rbd_diff_iterate2() is called on an image offset that doesn't
correspond to an object boundary, the callback is invoked with an
incorrect image offset. For example, assuming a fully allocated
image, a diff request for 806354944~57344 results in offs=807403520,
len=57344, exists=true invocation, which is ahead by 1048576 bytes.
This occurs only in fast-diff mode, for a diff request on an image
with the fast-diff feature disabled or if whole_object parameter is
set to false the invocation is correct.
This bug goes back to the introduction of fast-diff mode in commit 6d5b969d4206 ("librbd: add diff_iterate2 to API").
Sage Weil [Thu, 16 Dec 2021 15:24:46 +0000 (10:24 -0500)]
mon: prevent new sessions during shutdown
From shutdown() we set STATE_SHUTDOWN and then call remove_all_sessions().
ms_handle_accept() is the only caller of add_session, so verifying that
we aren't shutting down (while under the session_map_lock) is sufficient
to prevent any new sessions from being added.
J. Eric Ivancich [Thu, 11 Nov 2021 22:20:24 +0000 (17:20 -0500)]
rgw: fix `bi put` not using right bucket index shard
When `radosgw-admin bi put` adds an entry for an incomplete multipart
upload, the bucket index shard is not calculated correctly. It should
be based on the name of the ultimate object. However the calculation
was including the added organizational suffixes as well. This corrects
that.
NOTE: When entries are not put in the correct index shard, unordered
listing becomes unreliable, perhaps causing entries to be skipped or
infinite loops to form.
This is redundant and makes nsenter throw messages like following:
```
Failed to find sysfs mount point
dev/block/11:0/holders/: opendir failed: Not a directory
dev/block/252:0/holders/: opendir failed: Not a directory
dev/block/253:0/holders/: opendir failed: Not a directory
dev/block/252:1/holders/: opendir failed: Not a directory
dev/block/253:1/holders/: opendir failed: Not a directory
dev/block/252:2/holders/: opendir failed: Not a directory
dev/block/253:2/holders/: opendir failed: Not a directory
dev/block/252:3/holders/: opendir failed: Not a directory
dev/block/253:3/holders/: opendir failed: Not a directory
dev/block/252:16/holders/: opendir failed: Not a directory
dev/block/252:32/holders/: opendir failed: Not a directory
dev/block/252:48/holders/: opendir failed: Not a directory
dev/block/252:64/holders/: opendir failed: Not a directory
```
ceph-volume should run pv/vg/lv commands in the host namespace rather than
running them inside the container in order to avoid lvm metadata corruption.
Jeff Layton [Wed, 10 Nov 2021 18:10:50 +0000 (13:10 -0500)]
qa: account for split of the kclient "metrics" debugfs file
Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.
Account for this change in get_op_read_count().
Fixes: https://tracker.ceph.com/issues/53214 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit e9f2bff8cd7df1c81ff8bbfa2530f470d9c6af2c)