Let ReplicatedRecoveryBackend::handle_recovery_op route pushes
between handle_push and handle_pull_response instead of
ReplicatedRecoveryBackend::handle_push.
Ilya Dryomov [Thu, 30 Jan 2025 19:30:18 +0000 (20:30 +0100)]
doc/rbd: use https links in live import examples
Even though it's explicitly said that "http" stream can be used to
import via both HTTP and HTTPS, it can still be confusing that "type":
"http" is expected to go with "url": "https://...". Switch example
URLs from HTTP to HTTPS to make it more obvious.
When the NBD server is killed, nbd_pread() can set errno to at least
ENOTCONN, EINVAL and 0 which is supposed to stand for "no additional
errno information is available for this error". Add a test to ensure
that "rbd migration execute" command always fails and that the image
isn't transitioned to MIGRATION_STATE_EXECUTED in this scenario.
Ilya Dryomov [Wed, 29 Jan 2025 11:56:34 +0000 (12:56 +0100)]
librbd: stop filtering async request error codes
The roots of this go back to 2015 when snap create was changed to
filter EEXIST in commit 63f6c9bac9a4 ("librbd: fixed snap create race
conditions") and flatten respectively EINVAL in commit ef7e210c3f74
("librbd: better handling for duplicate flatten requests"). From there
this pattern made it to most other operations that can be proxied
including "rbd migration execute".
The motivation was to suppress generation of an "expected" error in
response to a duplicate async request notification for the operation.
However, doing this at the top of the handler (right before returning
to the caller) and for an error as generic as EINVAL is super fragile.
It's trivial for an error that is being filtered to sneak in with
a lower level change completely unnoticed. For example, live migration
recently added NBD stream which is implemented on top of libnbd and it
turns out that some libnbd APIs return EINVAL on various occasions when
the NBD endpoint disappears and an error like ENOTCONN would make more
sense. If this occurs during "rbd migration execute" operation, the
rest of librbd never learns that migration was disrupted and the image
is transitioned to MIGRATION_STATE_EXECUTED, thus handing a partially
imported (read: corrupted) image to the user.
Luckily, with commits 07fbc4b71df4 ("librbd: track complete async
operation requests") and 96bc20445afb ("librbd: track complete async
operation return code"), the scenario which originally prompted error
code filtering isn't an issue anymore. Despite a few shortcomings
(e.g. when an async request notification is acked with result 0, it's
impossible to tell whether a) a new operation was kicked off, b) there
is an operation that is still in progress or c) it's for an operation
that completed earlier but hasn't "expired" yet), even just commit 07fbc4b71df4 by itself prevents a duplicate notification from kicking
off a second operation that could generate an error for something that
actually succeeded. With that in mind, eradicate error code filtering
from Operations class.
Gil Bregman [Thu, 30 Jan 2025 11:33:51 +0000 (13:33 +0200)]
mgr/cephadm/nvmeof: Add verify_listener_ip field to NVMeOF configuration and remove obsolete enable_key_encryption
Fixes https://tracker.ceph.com/issues/69731
Ronen Friedman [Thu, 30 Jan 2025 09:27:58 +0000 (03:27 -0600)]
osd/scrub: discard repair_oinfo_oid()
repair_oinfo_oid(), called every scrub, has a very specific
functionality: fix the object ID specified in the Object Info
attribute, if different from the ID of the owning object.
This fix was added in 2017, as a response to a unique failure
scenario that was observed in Sepia - probably following a
filesystem bug. See https://tracker.ceph.com/issues/18409 &
https://tracker.ceph.com/issues/20471.
The limited functionality of repair_oinfo_oid() -
only repairing this one specific issue, and only if the OI_ATTR
exists and is decodable - does not justify the overhead of
running it every scrub.
Zac Dover [Wed, 29 Jan 2025 14:05:59 +0000 (00:05 +1000)]
doc/cephadm: simplify confusing math proposition
s/This means that the exact device size is 3.64 * 1000, or 3640GB"/This
means that the exact device size is 3.64TB, or 3640 GB"/
In the original text, the number "3.64" appears to refer to a quantity
(and indeed, it is a quantity of Terabytes), but it is unlabeled. Also,
on repeated recent readings of this sentence I found it more puzzling
than enlightening. So I made this commit.
Redouane Kachach [Tue, 14 Jan 2025 09:38:13 +0000 (10:38 +0100)]
mgr/cephadm: using service registry pattern for cephadm services
This change includes mainly the following enhancements:
- Introduced a centralized `CephadmServiceRegistry` to manage service registration and initialization.
- Added dynamic discovery of service modules in the same directory using `pkgutil` and `importlib`.
- Implemented a decorator `@service_registry_decorator` for automatic registration of service classes.
Ilya Dryomov [Mon, 27 Jan 2025 11:29:54 +0000 (12:29 +0100)]
osd/OSDCap: fix misleading grammar comments
The restrictions on pool name and namespace have been independent of
each other for ages. Specifying namespace[=]<namespace> doesn't require
specifying pool[=]<pool> like is currently suggested -- neither for
regular "allow" grants nor for "profile" grants.
Ilya Dryomov [Fri, 24 Jan 2025 19:47:11 +0000 (20:47 +0100)]
mon/OSDMonitor: relax cap enforcement for unmanaged snapshots
Since commit 4972e054b32c ("mon/OSDMonitor: enforce caps when
creating/deleting unmanaged snapshots"), a) write access to the MON
service, b) write access to the OSD service for a pool or c) permission
for "osd pool op unmanaged-snap" command for a pool is required. For
"profile rbd" we configure read-only access to the MON service and rely
on write access to the OSD service, however the corresponding check in
is_osd_writable() is too strict.
A OSD cap like "profile rbd namespace=myns" or "allow w namespace=myns"
allows write access to myns namespace of any pool, but is_osd_writable()
disallows operations with unmanaged snapshots with such a cap because
its match.pool_namespace.pool_name.empty() is true. This condition
appears to serve as the "doesn't include support for the application
tag" guard, but it should actually be match.pool_tag.is_match_all()
(or match.pool_tag.application.empty() if open-coded) -- no restriction
on the pool name doesn't automatically mean that there is a restriction
on the application tag.
Alex Ainscow [Thu, 23 Jan 2025 16:56:25 +0000 (16:56 +0000)]
interval_set: Fix test_interval_set.cc
Here we duplicate all the original tests which used "insert" to verify they also pass with "union_insert". All new tests are modified to use "union_of" and "union_insert"
Alex Ainscow [Thu, 23 Jan 2025 09:28:09 +0000 (09:28 +0000)]
interval_set: Add back insert()
A reviewer (see github) was concerned that the policing provided by insert() may have been required for some applications of insert_set. As such, I have re-instated the old insert method and instead refactored "union_insert". IU have also enhanced the comments.