Lucian Petrut [Thu, 1 Feb 2024 14:40:03 +0000 (14:40 +0000)]
msg: update MOSDOp() to use ceph_tid_t instead of long
The MOSDOp constructor receives the the transaction ID as a long
instead of ceph_tid_t.
The issue is that "long" uses 32b on Windows instead of 64 bits,
so it flips after about 2 billion requests. At that point, the OSD
replies are dropped because of transaction ID mismatches.
We'll solve the issue by using the correct type for the transaction
id, specifically ceph_tid_t.
bacport mgr/rook: always recreate kvm default network + fix groups refresh Fixes: https://tracker.ceph.com/issues/64079
This change also includes:
- adding ~/.local/bin to path so behave binary can be found
- adding requirements.txt file for testing dependencies
- increasing timeout used to wait for tools deployment to 90s
- increasing timeout used to wait for kvm network to 20s
Laura Flores [Mon, 29 Jan 2024 00:58:25 +0000 (00:58 +0000)]
mgr: pin pytest to version 7.4.4
On 2024-01-27, pytest updated to 8.0.0,
which broke run-tox-mgr.
https://docs.pytest.org/en/stable/changelog.html
==================================== ERRORS ====================================
_____________________ ERROR collecting alerts/__init__.py ______________________
alerts/__init__.py:2: in <module>
from .module import Alerts
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
______________________ ERROR collecting alerts/module.py _______________________
alerts/module.py:6: in <module>
from mgr_module import CLIReadCommand, HandleCommandResult, MgrModule, Option
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
____________________ ERROR collecting balancer/__init__.py _____________________
balancer/__init__.py:2: in <module>
from .module import Module
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
_____________________ ERROR collecting balancer/module.py ______________________
balancer/module.py:12: in <module>
from mgr_module import CLIReadCommand, CLICommand, CommandResult, MgrModule, Option, OSDMap, CephReleases
mgr_module.py:1: in <module>
import ceph_module # noqa
E ModuleNotFoundError: No module named 'ceph_module'
ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
ceph-volume: use 'no workqueue' options with dmcrypt
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
Ramana Raja [Tue, 23 Jan 2024 21:07:04 +0000 (16:07 -0500)]
rbd-nbd: log errors during netlink_resize() using derr
When using rbd CLI to map the images to NBD devices via netlink,
any errors that arose during image resizing in netlink_resize()
were not logged. Switching the error logging from using cerr to
derr helps log the errors from netlink_resize().
Ramana Raja [Mon, 22 Jan 2024 22:06:58 +0000 (17:06 -0500)]
rbd_nbd: fix resize of images mapped using netlink
Include device identifier or cookie in the message sent to the kernel
to resize images mapped to NBD devices using netlink. Otherwise,
netlink_resize() fails and the size of the device isn't updated.
When the OSD preboots it sends a MMonGetPurgedSnaps message to
the monitor (`_get_purged_snaps`).
The monitor will reply with all the purged snapshots that their purged_epoch_ is in the
range of superblock.purged_snaps_last + 1 up to the last superblock.current_epoch + 1.
When the OSD will handle the reply from the mon (`handle_get_purged_snaps_reply`)
it will call `record_purged_snaps` to write those purged snapshots in the
OSD store as well (PSN_ keys).
Once purged_snaps_last is reset, in the following OSD reboot, the snapshots that were marked as
purged (purged_snaps_ keys) in the mon's store will be also marked,
correspondingly, in the OSD store.
That way `scrub_purged_snaps` will be able to re-trim the snapshots that weren't
marked as purged in the OSD side (for some reason)
Aashish Sharma [Wed, 23 Aug 2023 09:59:44 +0000 (15:29 +0530)]
mgr/dashboard: Create realm sets to default
In Multisite page, When we create a realm the realm sets to default even if some other realm is already default and default checkbox in unchecked as well while creating.
Ilya Dryomov [Sun, 10 Dec 2023 16:01:24 +0000 (17:01 +0100)]
test/pybind/rbd: don't ignore from_snapshot in check_diff()
Despite the test in test_diff_iterate() being correct, it started
failing:
> check_diff(self.image, 0, IMG_SIZE, 'snap1', [(0, 512, False)])
...
a = [], b = [(0, 512, False)]
...
> assert a == b
E AssertionError
This is because check_diff() drops 'snap1' argument on the floor and
passes None to image.diff_iterate() instead. This goes back to 2013,
see commit e88fe3cbbc8f ("rbd.py: add some missing functions").
- beginning of time -> HEAD, through intermediate snap
- snap -> snap, directly
- snap -> HEAD, directly
But coverage is too weak: none of the weird OBJECT_PENDING cases and
only a single diff-iterate vs deep-copy case is tested, for example.
Coverage is missing completely for:
- beginning of time -> HEAD, directly
- beginning of time -> snap, directly
- beginning of time -> snap, through intermediate snap
- snap -> snap, through intermediate snap
- snap -> HEAD, through intermediate snap
Ilya Dryomov [Fri, 8 Dec 2023 14:19:02 +0000 (15:19 +0100)]
librbd: OBJECT_PENDING should always be treated as dirty
OBJECT_PENDING is a transition state which normally isn't encountered
in (snapshot) object maps. In case it's encountered, for example when
a snapshot is taken after losing power at the time a discard was being
handled, the object should be treated as dirty and produce a diff as
a result.
Assuming an object is marked OBJECT_PENDING, theoretically there are
four cases with respect to object's state in the next snapshot:
Prior to commit b81cd2460de7 ("librbd/object_map: diff state machine
should track object existence"), (3) was handled incorrectly (diff set
to DIFF_STATE_NONE instead of DIFF_STATE_UPDATED).
Post commit 399a45e11332 ("librbd/object_map: rbd diff between two
snapshots lists entire image content"), (4) is handled incorrectly
(diff set to DIFF_STATE_DATA instead of DIFF_STATE_DATA_UPDATED).
Similar to DiffIterateTest.DiffIterateDeterministic, systematically
cover the most common cases involving full-object discards. With this
in place, issue [1] can be reproduced by any of:
(preparatory) before snap3 is taken
(1) beginning of time -> HEAD
(2) snap1 -> HEAD
(5) beginning of time -> snap3
(6) snap1 -> snap3
Sub-object discards aren't covered here because of further issues
[2][3].
Ilya Dryomov [Fri, 10 Nov 2023 10:14:42 +0000 (11:14 +0100)]
librbd: resurrect "exists" assert in simple_diff_cb()
This effectively reverts commit 3ccc3bb4bd35 ("librbd: diff_iterate
needs to handle holes in parent images") which just dropped the assert
instead of addressing the root cause of reported crashes.
Ilya Dryomov [Thu, 9 Nov 2023 19:44:18 +0000 (20:44 +0100)]
librbd: diff-iterate shouldn't ever report "new hole" against a hole
If an object doesn't exist in both start and end versions but there is
an intermediate snapshot which contains it (i.e. the object is written
to and captured at some point but then discarded prior to or in the end
version), diff-iterate reports "new hole" -- callback is invoked with
exists=false. This occurs both on the slow list_snaps path and in
fast-diff mode.
Despite going all the way back to the introduction of diff-iterate in
commit 0296c7cdae91 ("librbd: implement diff_iterate"), this behavior
is wrong and contradicts diff-iterate API documentation added in commit a69532e86450 ("librbd: document diff_iterate in header") in the same
series:
If the source snapshot name is NULL, we interpret that as
the beginning of time and return all allocated regions of the
image.
It also triggered an assert added in commit c680531e070a ("librbd:
change diff_iterate interface to be more C-friendly") in the same
series. Unfortunately, commit f1f6407221a0 ("test_librbd: add
diff_iterate test including discard"), also part of the same series,
added a test which expected the wrong behavior. Very confusing!
A year later, a different manifestation of this bug was fixed in commit 9a1ab95176fe ("rbd: Fix rbd diff for non-existent objects"), but the
fix only covered the case where calc_snap_set_diff() goes past the end
snap ID while processing clones. The case where it runs out of clones
to process before reaching the end snap ID remained mishandled.
A year after that, commit 3ccc3bb4bd35 ("librbd: diff_iterate needs to
handle holes in parent images") dropped the assert mentioned above and
this bug got enshrined in the newly introduced fast-diff mode.
Finally, a few years later, deep-copy actually started relying on this
bug in commit e5a21e904142 ("librbd: deep-copy image copy state machine
skips clean objects"). This necessitates bifurcation in DiffRequest
because deep-copy wants the "has this object been touched" semantics,
which is different from diff-iterate (and also potentially much more
expensive to produce!).
This commit brings a minimal update to TestMockObjectMapDiffRequest
tests and DiffIterateTest.DiffIterateDiscard. Coverage is expanded in
the following commits.
Nizamudeen A [Tue, 16 Jan 2024 05:21:56 +0000 (10:51 +0530)]
admin/doc-requirements: bump Sphinx to 5.0.2
```
Running Sphinx v4.5.0
Sphinx version error:
The sphinxcontrib.applehelp extension used by this project needs at least Sphinx v5.0; it therefore cannot be built with this version.
```
Nizamudeen A [Wed, 18 Oct 2023 17:46:09 +0000 (23:16 +0530)]
mgr/dashboard: cephfs subvolume list snapshots
Added a tab for displaying the subvolume snapshots
- this tab will show an info alert when there are no subvolumes present
- if the subvolume is present, then it'll be auto-selected by default
Implemented a filter to search the groups and subvolumes by its name.
Also added a scrollbar when there are too many items in the nav list
Modified the REST APIs to fetch only the names of the resources and
fetch the info when an API call is requesting for it.
Ronen Friedman [Mon, 22 May 2023 15:09:28 +0000 (18:09 +0300)]
osd/scrub: increasing max_osd_scrubs to 3
Bug reports seem to hint that the current default value of
'1' is too low: the cluster is susceptible to scrub scheduling
delays and issues stemming from local software/networking/hardware
problems, even if affecting a very small number of OSDs.
Squid will include a major overhaul of the way scrubs are counted
in the cluster, providing a better solution to the problem. For
now - modifying the default is an effective stop-gap measure.
Kamoltat [Tue, 2 May 2023 14:17:07 +0000 (14:17 +0000)]
mon/ConnectionTracker.cc: disregard connection scores from mon_rank = -1
There are certain situations where we would
come across rank -1 in our MON connection scores;
- New MON sends probe message to existing MON,
existing MON handle probe message by adding -1
to existing peer_scores.
This is not good because we want to implement
a connection scores check mechanism where we
should not have to take into account the possibility
of having rank -1 in our score.