Ilya Dryomov [Fri, 21 Jan 2022 12:41:46 +0000 (13:41 +0100)]
rbd-mirror: fix races in snapshot-based mirroring deletion propagation
When remote image is deleted, rbd-mirror can encounter three cases:
1) no remote image id
2) no remote mirror metadata
3) MIRROR_IMAGE_STATE_DISABLING in remote mirror metadata
Commit d4c66ac5c615 ("rbd-mirror: fix issue with snapshot-based
mirroring deletion propagation") fixed case 1. Cases 2 and 3 remained
broken because for both of them finalize_snapshot_state_builder() would
populate not only remote_mirror_peer_uuid but also remote_image_id,
thus disabling ENOLINK logic in handle_prepare_remote_image() and
handle_bootstrap(). Commit ff60aec2d9ef ("rbd-mirror: fix bootstrap
sequence while the image is removed") touched on case 3, but it made
a difference only for journal-based mirroring.
Stop calling finalize_snapshot_state_builder() on errors. Instead,
align with journal-based mirroring by filling remote_mirror_peer_uuid
together with remote_mirror_uuid.
Make it clear that the local image non-primariness is asserted
independent of the mode; avoid the default implementation being
overridden but still relied on by both modes.
Mykola Golub [Fri, 14 Jan 2022 18:21:29 +0000 (18:21 +0000)]
cls/journal: skip disconnected clients when finding min_commit_position
When a new journal client is registered, all already registered
clients are checked, and a client with min position is selected
as a position for the new client. Thus we may expect that
starting from the registered position all journal entries will be
available (not trimmed) for the new client.
But when looking for a min commit position, the client_register
function did not take into account that a registered client might
be in disconnected state, and in that case the journal entries
might be trimmed for this client.
Ilya Dryomov [Fri, 7 Jan 2022 12:31:08 +0000 (13:31 +0100)]
test/librbd: make diff-iterate clone tests exercise fast-diff mode
The fast-diff feature wasn't propagated to the clone so these tests
were exercising the slow list_snaps path no matter what RBD_FEATURES
value was supplied to ceph_test_librbd.
Ilya Dryomov [Wed, 5 Jan 2022 19:24:40 +0000 (20:24 +0100)]
librbd: restore diff-iterate include_parent functionality in fast-diff mode
Commit 4429ed4f3f4c ("librbd: switch diff iterate API to use new snaps
list dispatch methods") removed the recursive execute() call. The new
list_snaps method does indeed handle parent diffs internally but it is
not used in fast-diff mode. Nothing changed there -- we still need to
load the parent object map, calculate parent object_diff_state, etc.
Ilya Dryomov [Wed, 19 Jan 2022 20:08:01 +0000 (21:08 +0100)]
librbd: diff-iterate reports incorrect offsets if whole_object=true
It turns out that in octopus both fast-diff and list-snaps (slow)
modes were broken. As long as whole_object=true, the same incorrect
offset was reported in both modes. The fast-diff mode is fixed in
in previous commit.
This is an octopus-only patch for list-snaps mode. In pacific this
issue was addressed with 4429ed4f3f4c ("librbd: switch diff iterate
API to use new snaps list dispatch methods").
Alfonso Martínez [Tue, 23 Nov 2021 14:17:54 +0000 (15:17 +0100)]
mgr/dashboard: upgrade Cypress to the latest stable version
- Remove unneeded dependency that was causing UI performance issues: zone.js
- Ignore 'ResizeObserver loop limit exceeded' error.
- run-frontend-e2e-tests.sh refactoring: create rgw dashboard user through
'ceph dashboard set-rgw-credentials' and use it on rgw buckets' tests.
Fixes: https://tracker.ceph.com/issues/53357 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 3e4e29590aa1742fc3b44d21389325a13cca8199)
Conflicts:
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.e2e-spec.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/rgw/buckets.po.ts
Reject the current changes
src/pybind/mgr/dashboard/frontend/cypress/integration/ui/navigation.po.ts
Deleted this file since its not in octopus
src/pybind/mgr/dashboard/frontend/package-lock.json
Generated new file
src/pybind/mgr/dashboard/frontend/package.json
Kept zone.js and changed the cypress version to 9.0.0
src/pybind/mgr/dashboard/run-frontend-e2e-tests.sh
Accept the current change
Ilya Dryomov [Tue, 11 Jan 2022 12:13:01 +0000 (13:13 +0100)]
qa/run_xfstests_qemu.sh: harden against wget failures
If wget fails (e.g. due to a certificate issue), it still creates
an empty file. Then this file is marked executable, ./"${SCRIPT}"
immediately returns 0 and run_xfstests_qemu.sh exits successfully
without running a single xfstest.
This started on Sep 30, 2021 with the expiration of Let's Encrypt
root certificate -- all qemu jobs with "test: qa/run_xfstests_qemu.sh"
just booted the VM for a couple of seconds and reported success.
rbd-mirror: remove image_map next_state if sets to the same state
In some cases, set_state is called with DISSOCIATING, then ASSOCIATING
and DISSOCIATING again. In this case the state DISSOCIATING is
processed to remove the image and then schedule the next action which is
associating.
To fix this case, this commit removes the next_state if the state is
sets to the same state.
rbd-mirror: fix bootstrap sequence while the image is removed
If the image is being removed the PrepareRemoteImageRequest was
returning the same error if the image was disabled or non primary which
doesn't allow the BootstrapRequest to have the correct error handling.
This commit fix this behavior by considering that the remote image is
already deleted if the image is in disabling state.
rbd-mirror: remove image_mapped condition to remove image_map
In some split-brain scenario the image is removed while the image_mapped
is false. This prevents the removal of image_map in OMAP and thus the
entry will not be removed until the daemon is restarted.
cls/rbd: prevent image_status when mirror image is not created
This prevent image_status_set to succeed when there is no mirror image
yet. This solves some stale entries that were not removed in
rbd-mirror and prevent to add entries that would not be visible from the
rbd cli.
In the LoadRequest in the ImageMap class add initial cleanup to remove
stale entries. To cleanup the LoadRequest will query the mirror image
list and remove all the image_map that are notin the list.
Added a condition to handle the case where m_image_ctx is null on
close_image and handle_close_image in the TrashMoveRequest. This fix is
not needed in newer versions of Ceph as ImageCtx no longer needs to be
destroyed explicitely with a destroy method after Octopus.
rbd-mirror: add mirror status removal on ImageReplayer shutdown
In a scenario where you have rbd-mirror daemons on both clusters. The
rbd-mirror daemon on the primary site will not properly cleanup his
status on image removal.
This commit add a path for direct removal at the shut_down of the
ImageReplayer to properly cleanup the metadata.
Ilya Dryomov [Tue, 4 Jan 2022 19:38:35 +0000 (20:38 +0100)]
librbd: diff-iterate reports incorrect offsets in fast-diff mode
If rbd_diff_iterate2() is called on an image offset that doesn't
correspond to an object boundary, the callback is invoked with an
incorrect image offset. For example, assuming a fully allocated
image, a diff request for 806354944~57344 results in offs=807403520,
len=57344, exists=true invocation, which is ahead by 1048576 bytes.
This occurs only in fast-diff mode, for a diff request on an image
with the fast-diff feature disabled or if whole_object parameter is
set to false the invocation is correct.
This bug goes back to the introduction of fast-diff mode in commit 6d5b969d4206 ("librbd: add diff_iterate2 to API").
This is redundant and makes nsenter throw messages like following:
```
Failed to find sysfs mount point
dev/block/11:0/holders/: opendir failed: Not a directory
dev/block/252:0/holders/: opendir failed: Not a directory
dev/block/253:0/holders/: opendir failed: Not a directory
dev/block/252:1/holders/: opendir failed: Not a directory
dev/block/253:1/holders/: opendir failed: Not a directory
dev/block/252:2/holders/: opendir failed: Not a directory
dev/block/253:2/holders/: opendir failed: Not a directory
dev/block/252:3/holders/: opendir failed: Not a directory
dev/block/253:3/holders/: opendir failed: Not a directory
dev/block/252:16/holders/: opendir failed: Not a directory
dev/block/252:32/holders/: opendir failed: Not a directory
dev/block/252:48/holders/: opendir failed: Not a directory
dev/block/252:64/holders/: opendir failed: Not a directory
```
ceph-volume should run pv/vg/lv commands in the host namespace rather than
running them inside the container in order to avoid lvm metadata corruption.
Jeff Layton [Wed, 10 Nov 2021 18:10:50 +0000 (13:10 -0500)]
qa: account for split of the kclient "metrics" debugfs file
Recently, Luis posted a patch to turn the metrics debugfs file into a
directory with separate files for the different sections in the old
metrics file.
Account for this change in get_op_read_count().
Fixes: https://tracker.ceph.com/issues/53214 Signed-off-by: Jeff Layton <jlayton@redhat.com>
(cherry picked from commit e9f2bff8cd7df1c81ff8bbfa2530f470d9c6af2c)
Neha Ojha [Mon, 9 Aug 2021 14:35:01 +0000 (14:35 +0000)]
qa/suites/rados/perf/ceph.yaml: remove rgw
This is no longer required because we removed cosbench workloads in fd350fd0150a2d4072f055658c20314a435a19ba. This is also required to prevent
failures like the following or any other changes that break the rgw task:
```
2021-08-06T20:13:25.812 INFO:teuthology.orchestra.run.smithi060.stderr:curl: (7) Failed to connect to smithi060.front.sepia.ceph.com port 80: Connection refused
2021-08-06T20:15:33.813 ERROR:teuthology.contextutil:Saw exception from nested tasks
Traceback (most recent call last):
File "/home/teuthworker/src/git.ceph.com_git_teuthology_04c2febe7099917d97a71271f17abb5710030132/teuthology/contextutil.py", line 31, in nested
vars.append(enter())
File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
return next(self.gen)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/rgw.py", line 191, in start_rgw
wait_for_radosgw(url, remote)
File "/home/teuthworker/src/github.com_ceph_ceph-c_3c0f8c8164075af7aac4d1f2805d3f4580709461/qa/tasks/util/rgw.py", line 94, in wait_for_radosgw
assert exit_status == 0
AssertionError
```
Yaarit Hatuka [Wed, 25 Aug 2021 02:12:08 +0000 (02:12 +0000)]
rpm, debian: move smartmontools and nvme-cli to ceph-base
We wish to be able to scrape SMART and NVMe metrics from OSD and MON
nodes. For this we require / recommend smartmontools and nvme-cli
dependencies for both the ceph-osd and ceph-mon packages. However, the
sudoers file (which is required for invoking `smartctl` by user 'ceph')
was installed only in the ceph-osd package. Since different packages
cannot own the same file, and because we want to be able to scrape from
every daemon, we move the dependencies and the sudoers installation to
ceph-base. For generalization, we rename:
sudoers.d/ceph-osd-smartctl -> sudoers.d/ceph-smartctl
Igor Fedotov [Thu, 27 May 2021 12:49:05 +0000 (15:49 +0300)]
common/PriorityCache: low perf counters priorities for submodules.
Having too many perf counters with nicknames priorities >= PRIO_INTERESTING spoils daemonperf output and causes no "osd" section there due to presumably too many columns.
Fixes: https://tracker.ceph.com/issues/51002 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 35238d41360a22e22fae7d8ceddf3a2a047e5464)
yanqiang-ux [Mon, 7 Jun 2021 07:54:44 +0000 (15:54 +0800)]
osd: set r only if succeed in FillInVerifyExtent
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.
Fixes: https://tracker.ceph.com/issues/51115 Signed-off-by: yanqiang-ux <yanqiang_ux@163.com>
(cherry picked from commit 127745161fbcdee06b2dfa8464270c3934bcd06a)
J. Eric Ivancich [Wed, 28 Jul 2021 17:52:29 +0000 (13:52 -0400)]
rgw: user stats showing 0 value for "size_utilized" and "size_kb_utilized" fields
When accumulating user stats, the "utilized" fields are not looked
at. Updates RGWStorageStats::dump so it only outputs the "utilized"
data if they're updated.
Kajetan Janiak [Wed, 18 Nov 2020 10:42:07 +0000 (11:42 +0100)]
rgw: disable prefetch in rgw_file
Each call to rgw_read (rgw_file.cc) invokes three calls to RGWRados::get_obj_state with s->prefetch_data=true. It results in great read amplification. If length argument in rgw_read call is smaller than rgw_max_chunk_size, then the amplification is threefold.
Duncan Bellamy [Sat, 8 May 2021 10:52:35 +0000 (11:52 +0100)]
mds: PurgeQueue.cc fix for 32bit compilation
files_high_water is defined as uint64_t but when compiling on 32bit these max functions
fail as they are both not considered uint64_t by gcc 10 even though they are
Adam Kupczyk [Sat, 13 Nov 2021 10:28:18 +0000 (11:28 +0100)]
os/bluestore: Fix omap upgrade to per-pg scheme
This is fix to regression introduced by fix to omap upgrade: https://github.com/ceph/ceph/pull/43687
The problem was that we always skipped first omap entry.
This worked fine with objects having omap header key.
For objects without header key we skipped first actual omap key.
Fixes: https://tracker.ceph.com/issues/53260 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
(cherry picked from commit 65a3f374aa1c57c5bb9401e57dab98a643b4360a)
Those functions should return `None` if 0 or more than 1 item is returned.
The current name of these functions are confusing and can lead to thinking that
we just want the first item returned, even though it returns more than 1
item, let's rename them to `get_single_pv()`, `get_single_vg()` and
`get_single_lv()`