Jos Collin [Tue, 28 May 2024 14:57:55 +0000 (20:27 +0530)]
cephfs_mirror: Add ErrorListener to maintain blocklisted/failed timestamp in FSMirror
Have FSMirror register a listener with InstanceWatcher/MirrorWatcher which would get invoked when the mirror daemon is blocklisted or failed.
Thus FSMirror can maintain the last blocklisted/failed timestamp and use that for restarting the mirror daemon.
Fixes: https://tracker.ceph.com/issues/64927 Fixes: https://tracker.ceph.com/issues/51964 Fixes: https://tracker.ceph.com/issues/63931 Fixes: https://tracker.ceph.com/issues/63089 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 77ec7bfde7a349b0e06b34ecdf328996c7642d43)
Edit the section called "Is mount helper present?", the title of which
prior to this commit was "Is mount helper is present?". Other small
disambiguating improvements have been made to the text in the section.
An unselectable prompt has been added before a command.
Improve "Principles for format change" in doc/dev/encoding.rst. This
commit started as a response to Anthony D'Atri's suggestion here: https://github.com/ceph/ceph/pull/58299/files#r1656985564
Review of this section suggested to me that certain minor English usage
improvements would be of benefit. The numbered lists in this section
could still be made a bit clearer.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 570797e5588b67b8c72e5297b61f84d9aa48dc45)
Rishabh Dave [Wed, 8 May 2024 13:09:35 +0000 (18:39 +0530)]
qa/cephfs: improve and move _get_unhealthy_mds_name to TestMDSFail
1. Instead of accepting health report as argument, get one directly.
2. Since it is not being used elsewhere move it to the class where it is
being used.
Rishabh Dave [Wed, 8 May 2024 12:08:43 +0000 (17:38 +0530)]
qa/cephfs: make code for generating health warnings reusable
Code to generate MDS_TRIM and MDS_CACHE_OVERSIZED health warnings is
repeated in test methods of TestMDSFail and TestFSFail. Move this code
to separate helper methods so that it can be reused instead of
duplicating it. And move these helper methods to TestAdminCommands so
to make them conveniently available for reuse.
This test deletes the CephFS already present on the cluster at the very
beginning and unmounts the first client beforehand. But it leaves the
second client mounted on this deleted CephFS that doesn't exist for the
rest of the test. And then at the very end of this test it attempts to
remount the second client (during tearDown()) which hangs and causes
test runner to crash.
Unmount the second client beforehand to prevent the bug and delete
mount_b object to avoid confusion for the readers in future about
whether or not 2nd mountpoint exists.
Fixes: https://tracker.ceph.com/issues/66077 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 2130ec8ebc377364a11be7448ed2773b46b464c0)
Rishabh Dave [Mon, 17 Jun 2024 19:03:28 +0000 (00:33 +0530)]
mgr/vol: handle case where clone index entry goes missing
In `async_cloner.py`, clone index entry is fetched to get next clone job
that needs to be executed. It might happen that the clone job was
cancelled just when it was going to be picked for execution (IOW, when
it was about to move from pending state to in-progress state).
Currently, MGR hangs in such a case because exception `ObjectNotFound`
from CephFS Python bindings is raised and is left uncaught. To prevent
this issue catch the exception, log it and return None to tell
`get_job()` of `async_job.py` to look for next job in the queue.
Increase the scope of try-except in method `get_oldest_clone_entry()` of
`async_cloner.py` so that when exception `cephfs.Error` or any exception
under it is thrown by `self.fs.lstat()` is not left uncaught.
FS object is also passed to the method `list_one_entry_at_a_time()`, so
increasing scope of try-except is useful as it will not allow exceptions
raised in other calls to CephFS Python binding methods to be left
uncaught.
Fixes: https://tracker.ceph.com/issues/66560 Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 3cff7251c86a4670768721f924b11b3de33f807b)
Xiubo Li [Tue, 28 May 2024 04:35:17 +0000 (12:35 +0800)]
mds: set the proper extra bl for the create request
Just in case a create request was totally completed but the client
didn't receive any unsafe and safe responses, and then when it's
retried in the MDS side it will be treated as a open request and
will miss setting the create ino for the reply message.
Finally for client it just sent a create request and then the MDS
just sent back a open reply, which will let the client miss setting
the CREATED flag and then the VFS will fail the create by returnning
a -EEXIST errno.
Xiubo Li [Tue, 28 May 2024 04:23:57 +0000 (12:23 +0800)]
mds: encode the correct extra info depending on the feature bits
In the client side it will decode the extra info depending on the
feature bits, but if we always encode it with the old version then
the client could crash or gets the corrupted data. More detail
please see "parse_reply_info_create()" in kclient code.
mds: relax divergent backtrace scrub failures for replicated ancestor inodes
scrub could be verifying backtrace for an inode for which some of its
ancestors might be replicas, e.g. (from a custom debug build) some
ancestors of an inode with divergent backtrace were replicas:
In such cases, the backpointer version (inode_backpointer_t::version) of the
in-memory (cache) inode can fall behind the on-disk version causing scrub to
consider the inode backtrace as divergent (memory version < on-disk version).
Xiubo Li [Thu, 25 Apr 2024 04:06:25 +0000 (12:06 +0800)]
mds: set the correct WRLOCK flag always in wrlock_force()
The wrlock is not like the xlock, which needs to be acquired in
the CInode's auth always, and it is based on the CDir's auths instead.
When a remote_wrlock is acquired and the local MDS will add a lock
item and marks it as REMOTE_WRLOCK. And later when the local MDS try
to force wrlock in the emplace_lock() will just return the existing
lock item without updating the WRLOCK flag. So when cleaning the
requests later it will just release the remote locks and then removes
lock items directly, which will miss releasing the local wrlock
reference.
Ilya Dryomov [Thu, 20 Jun 2024 19:13:56 +0000 (21:13 +0200)]
librbd: make diff-iterate in fast-diff mode aware of encryption
diff-iterate wasn't updated when librbd was being prepared to support
encryption in commit 8d6a47933269 ("librbd: add crypto image dispatch
layer"). This is even noted in [1]:
> The two places I skipped for now are DiffIterate and TrimRequest.
CryptoImageDispatch has since been removed, but diff-iterate in
fast-diff mode is still unaware of encryption and just assumes that all
offsets are raw. This means that the callback gets invoked with
incorrect image offsets when encryption is loaded. For example, for
a LUKS1-formatted image with some data at offsets 0 and 20971520,
diff-iterate with encryption loaded reports
as "exists". For any piece of code that is using diff-iterate to
optimize block-by-block processing (e.g. copy an encrypted source image
to a differently-encrypted destination image), this is fatal: it would
skip processing block 20971520 which has data and instead process block 25165824 which doesn't have any data and was to be skipped, producing
a corrupted destination image.
Currently we are laying data only at the beginning of an object.
Extend the skeletons to write to three different offsets in the middle
and also at the end of the object.
Separately, make C and C++ API test variants slightly different in
terms of offsets being targeted to not go through exactly the same
scenario twice.
After rollback started being tested in commit b3977c53c930
("test/librbd: make rollback in TestGroup.add_snapshot{,PP}
meaningful"), these tests can fail on comparing post-rollback
data to expected data if run with exclusive lock disabled.
This doesn't occur with exclusive lock enabled because the RBD
cache gets invalidated implicitly before releasing the lock.
While at it, pass LIBRADOS_OP_FLAG_FADVISE_FUA to avoid relying
on any cache settings that happen to be in effect.
Ilya Dryomov [Fri, 14 Jun 2024 12:04:39 +0000 (14:04 +0200)]
librbd: disallow group snap rollback if memberships don't match
Before proceeding with group rollback, ensure that the set of images
that took part in the group snapshot matches the set of images that are
currently part of the group. Otherwise, because we preserve affected
snapshots when an image is removed from the group, data loss can ensue
where an image gets rolled back while part of another group or not part
of any group but long repurposed for something else.
Similarly, ensure that the group snapshot is complete.
Document how to manually pass the search domain to "mon_dns_srv_name" in
doc/rados/configuration/mon-lookup-dns.rst.
This commit is made in response to a request by Lander Duncan that was made on the [ceph-users] mailing list, and can be seen here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/F7V4CWLIYCAJ4JXI2JLNY6QPCFPR4SLA/
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 98938a0312dd0c8e0b293ed9aa2e0760cc9619fa)