Kotresh HR [Wed, 14 Jan 2026 20:06:31 +0000 (01:36 +0530)]
tools/cephfs_mirror: Fix assert while opening handles
When the crawler or a datasync thread encountered an error,
it's possible that the crawler gets notified by a datasync
thread and bails out resulting in the unregister of the
particular dir_root. The other datasync threads might
still hold the same syncm object and tries to open the
handles during which the following assert is hit.
ceph_assert(it != m_registered.end());
The above assert is removed and the error is handled.
Kotresh HR [Wed, 14 Jan 2026 19:59:36 +0000 (01:29 +0530)]
tools/cephfs_mirror: Fix dequeue of syncm on error
On error encountered in crawler thread or datasync
thread while processing a syncm object, it's possible
that multiple datasync threads attempts the dequeue of
syncm object. Though it's safe, add a condition to avoid
it.
Kotresh HR [Wed, 14 Jan 2026 19:53:34 +0000 (01:23 +0530)]
tools/cephfs_mirror: Handle errors in crawler thread
Any error encountered in crawler threads should be
communicated to the data sync threads by marking the
crawl error in the corresponding syncm object. The
data sync threads would finish pending jobs, dequeue
the syncm object and notify crawler to bail out.
Kotresh HR [Wed, 14 Jan 2026 19:35:29 +0000 (01:05 +0530)]
tools/cephfs_mirror: Handle error in datasync thread
On any error encountered in datasync threads while syncing
a particular syncm dataq, mark the datasync error and
communicate the error to the corresponding syncm's crawler
which is waiting to take a snaphsot. The crawler will log
the error and bail out.
There is global queue of SyncMechanism objects(syncm). Each syncm
object represents a single snapshot being synced and each syncm
object owns m_sync_dataq representing list of files in the snapshot
to be synced.
The data sync threads should consume the next syncm job
if the present syncm has no pending work. This can evidently
happen if the last file being synced in the present syncm
job is a large file from it's syncm_dataq. In this case, one
data sync thread is busy syncing the large file, the rest of
data sync threads just wait for it to finish to avoid busy loop.
Instead, the idle data sync threads could start consuming the next
syncm job.
This brings in a change to data structure.
- syncm_q has to be std::deque instead of std::queue as syncm in the
middle can finish syncing first and that needs to be removed before
the front
Kotresh HR [Wed, 14 Jan 2026 12:30:43 +0000 (18:00 +0530)]
tools/cephfs_mirror: Synchronize taking snapshot
The crawler/entry creation thread needs to wait until
all the data is synced by datasync threads to take
the snapshot. This patch adds the necessary conditions
for the same.
It is important for the conditional flag to be part
of SyncMechanism and not part of PeerReplayer class.
The following bug would be hit if it were part of
PeerReplayer class.
When multiple directories are confiugred for mirroring as below
/d0 /d1 /d2
Crawler1 Crawler2 Crawler3
DoneEntryOps DoneEntryOps DoneEntryOps
WaitForSafeSnap WaitForSafeSnap WaitForSafeSnap
When all crawler threads are waiting at above, the data sync threads
which is done processing /d1, would notify, waking up all the crawlers
causing spurious/unwanted wake up and half baked snapshots.
Kotresh HR [Wed, 14 Jan 2026 12:05:33 +0000 (17:35 +0530)]
tools/cephfs_mirror: Fix data sync threads completion logic
We need to exactly know when all data threads completes
the processing of a syncm. If a few threads finishes the
job, they all need to wait for the in processing threads
of that syncm to complete. Otherwise the finished threads
would be busy loop until in processing threads finishes.
And only after all threads finishes processing, the crawler
thread can be notified to take the snapshot.
Kotresh HR [Tue, 9 Dec 2025 10:05:08 +0000 (15:35 +0530)]
tools/cephfs_mirror: Mark crawl finished
After entry operations are synced and stack is empty,
mark the crawl as finished so the data sync threads'
wait logic works correctly and doesn't indefinitely wait.
Kotresh HR [Wed, 14 Jan 2026 08:47:07 +0000 (14:17 +0530)]
tools/cephfs_mirror: Add SyncMechanism Queue
Add a queue of shared_ptr of type SyncMechanism.
Since it's shared_ptr, the queue can hold both
shared_ptr to both RemoteSync and SnapDiffSync objects.
Each SyncMechanism holds the queue for the SyncEntry
items to be synced using the data sync threads.
The SyncMechanism queue needs to be shared_ptr because
all the data sync threads needs to access the object
of SyncMechanism to process the SyncEntry Queue.
This patch sets up the building blocks for the same.
Kotresh HR [Wed, 14 Jan 2026 08:27:34 +0000 (13:57 +0530)]
tools/cephfs_mirror: Use the existing m_lock and m_cond
The entire snapshot is synced outside the lock.
The m_lock and m_cond pair is used for data sync
threads along with crawler threads to work well
with all terminal conditions like shutdown and
existing data structures.
Ville Ojamo [Tue, 3 Feb 2026 06:28:12 +0000 (13:28 +0700)]
doc: unpin pip in admin/doc-read-the-docs.txt
7dd00ca introduced a proper fix for pip 25.3/PEP517 compatibility by
adding pyproject.toml files and the workaround in a65c46c is no longer
necessary. RTD builds with pip 25.3 and later work with the proper fix.
Remove the pinned pip in admin/doc-read-the-docs.txt and let RTD use the
default PIP version.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
- updates tearsheet component css to match with carbon component
- adds laoding state to submit button
- adds support for step validation when angualr component are use for steps rather than plain html templates
- adds step one of nvmeof
Ilya Dryomov [Fri, 30 Jan 2026 15:32:35 +0000 (16:32 +0100)]
qa/tasks/rbd_mirror_thrash: don't use random.randrange() on floats
This stopped working in Python 3.12:
Changed in version 3.12: Automatic conversion of non-integer types
is no longer supported. Calls such as randrange(10.0) and
randrange(Fraction(10, 1)) now raise a TypeError.
Ilya Dryomov [Tue, 11 Nov 2025 15:33:16 +0000 (16:33 +0100)]
qa/tasks/qemu: install genisoimage package
genisoimage is expected to be included in our base images but currently
isn't on Rocky 10. Since it's quite a niche thing, let's install the
package explicitly.
Ilya Dryomov [Thu, 29 Jan 2026 20:41:03 +0000 (21:41 +0100)]
qa/workunits/rbd: reduce randomized sleeps in live import tests
These tests were tuned for slower hardware than what we have now.
Currently "rbd migration execute" always finishes (successfully) before
the NBD server is killed.
Ilya Dryomov [Tue, 11 Nov 2025 20:39:58 +0000 (21:39 +0100)]
qa/valgrind.supp: make gcm_cipher_internal suppression more resilient
gcm_cipher_internal() and ossl_gcm_stream_final() make it to the stack
trace only on CentOS Stream 9. On Ubuntu 22.04 and Rocky 10, it looks
as follows:
Thread 4 msgr-worker-1:
Conditional jump or move depends on uninitialised value(s)
at 0x70A36D4: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x70A39A1: ??? (in /usr/lib64/libcrypto.so.3.2.2)
by 0x6F8A09C: EVP_DecryptFinal_ex (in /usr/lib64/libcrypto.so.3.2.2)
by 0xB498C1F: ceph::crypto::onwire::AES128GCM_OnWireRxHandler::authenticated_decrypt_update_final(ceph::buffer::v15_2_0::list&) (crypto_onwire.cc:271)
by 0xB4992D7: ceph::msgr::v2::FrameAssembler::disassemble_preamble(ceph::buffer::v15_2_0::list&) (frames_v2.cc:281)
by 0xB482D98: ProtocolV2::handle_read_frame_preamble_main(std::unique_ptr<ceph::buffer::v15_2_0::ptr_node, ceph::buffer::v15_2_0::ptr_node::disposer>&&, int) (ProtocolV2.cc:1149)
by 0xB475318: ProtocolV2::run_continuation(Ct<ProtocolV2>&) (ProtocolV2.cc:54)
by 0xB457012: AsyncConnection::process() (AsyncConnection.cc:495)
by 0xB49E61A: EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) (Event.cc:492)
by 0xB49EA9D: UnknownInlinedFun (Stack.cc:50)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:61)
by 0xB49EA9D: UnknownInlinedFun (invoke.h:111)
by 0xB49EA9D: std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (std_function.h:290)
by 0xBB11063: ??? (in /usr/lib64/libstdc++.so.6.0.33)
by 0x4F17119: start_thread (in /usr/lib64/libc.so.6)
The proposal to amend the existing suppression so that it's tied to the
specific callsite rather than libcrypto internals [1] received a thumbs
up from Radoslaw.
Roland Sommer [Fri, 30 Jan 2026 07:54:49 +0000 (08:54 +0100)]
debian: package mgr/smb in ceph-mgr-modules-core
The `BaseController` auto-imports the packaged `mgr/dashboard/controllers/smb.py`
file, which in turn wants to import `smb.enums` etc. which is part of the `smb`
package which is missing from `debian/ceph-mgr-modules-core.install`, thus
missing in the package. The missing module causes an exception
`ModuleNotFoundError: No module named 'smb'` on mgr instances when running a
ceph tentacle cluster installed from debian packages.
See: https://tracker.ceph.com/issues/74268 Signed-off-by: Roland Sommer <rol@ndsommer.de>
Afreen Misbah [Wed, 28 Jan 2026 09:59:08 +0000 (15:29 +0530)]
mgr/dashboard: fetch all namespaces in a gateway group
- adds a new API /api/gateway_group/{group}/namespace
- updates tests
- needed for UI flows and in general to fetch all namespaces, could not change existing API due to the maintenence of backward compatibility
- in a followup PR will add server side pagination
Ville Ojamo [Fri, 30 Jan 2026 04:47:40 +0000 (11:47 +0700)]
doc/dev: add sequence diagrams back to health-reports.rst
The sequence diagrams were removed in ce96ddd because they were causing
issues. Add them back as SVG images. Include as comments the source code
used to generate the diagrams.
Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
John Mulligan [Thu, 29 Jan 2026 23:28:44 +0000 (18:28 -0500)]
Merge pull request #65632 from phlogistonjohn/jjm-smb-hosts-allow
smb: support shares equivalent for hosts allow
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Anoop C S <anoopcs@cryptolab.net> Reviewed-by: Shwetha Acharya <sacharya@redhat.com> Reviewed-by: Avan Thakkar <athakkar@redhat.com> Reviewed-by: Adam King <adking@redhat.com>
John Mulligan [Fri, 9 Jan 2026 16:25:43 +0000 (11:25 -0500)]
qa/workunits/smb: make the runner script easier to use manually
When testing the tests it can help speed things up to avoid
recreating the virtualenv, allow an env var SMB_REUSE_VENV=<path>
to supply a specific virtual env dir to (re)use.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 8 Jan 2026 18:42:14 +0000 (13:42 -0500)]
qa/suites/orch/cephadm: enable hosts_access tests
Enable the hosts_access tests when running deploy_smb_mgr_basic.yaml,
deploy_smb_mgr_domain.yaml, deploy_smb_mgr_res_basic.yaml, or
deploy_smb_mgr_res_dom.yaml.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Thu, 8 Jan 2026 18:45:43 +0000 (13:45 -0500)]
qa/workunits/smb: add tests for hosts_access field
The recently added hosts_access field allows a share to be configured
to allow or deny hosts by IP or network. The new module reconfigures
a share to attempt a small set of access scenarios with the hosts_access
field.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 19 Nov 2025 22:26:27 +0000 (17:26 -0500)]
qa/workunits/smb: add utility module for cephadm shell commands
Add a helper module that makes it a bit cleaner and easier to
find and interact with the cluster's 'admin node' the node where
we can run `cephadm shell` and commands within that shell.
This will allow us to make modifications to smb resources via
the ceph command and JSON in order to test various features.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 9 Jan 2026 14:32:56 +0000 (09:32 -0500)]
qa/workunits/smb: make the smb_cfg fixture module scoped
This means the file will only be read when pytest changes modules.
This also allows this fixture to be used with other fixtures at the
module or scope "higher" than the function scope.
John Mulligan [Fri, 9 Jan 2026 16:12:46 +0000 (11:12 -0500)]
qa/tasks: add client node info to smb workunit config dump
When generating the big ball of config JSON that helps define
parameters for the smb tests in the workunit add client "node"
info as well.
Add a function to avoid repeating the logic of getting node
info from the teuthology remote object.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Wed, 7 Jan 2026 23:02:21 +0000 (18:02 -0500)]
qa/tasks: embed use of ssh_keys task in smb workunit
Automatically use the ssh_keys tasks in the smb workunit task.
It can be disabled by passing false to `ssh_keys:` config key.
This allows the node running the tests to ssh into the node where
cephadm is installed in order to execute commands within
the cephadm shell.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Using the Share resource hosts_access parameter generate
smb.conf-equivalent configuration for the 'hosts allow' and 'hosts deny'
configuration parms. Note that currently we automatically set hosts deny
to all if *any* hosts allow is set to avoid the possibly surprising
result of explicitly setting hosts to allow and then having the share
continue to allow hosts not explicitly listed.
If needed, in the future we could allow the user to override the
default deny - but I'm trying to keep it real simple for now.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Mon, 22 Sep 2025 18:44:30 +0000 (14:44 -0400)]
mgr/smb: add a new hosts_access field to the Share resource
This access list can be used to allow or deny access to hosts by
IP address or network (IP/prefixlen-style). It partially borrows
from the previous work to do ip address binds.
The structure would look something like the following:
```
hosts_access:
- address: 192.168.7.200
access: allow
- address: 192.168.7.202
access: allow
- network: 10.10.220.0/24
access: allow
```
or
```
hosts_access:
- access: deny
network: 10.10.220.0/24
``
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 26 Sep 2025 18:22:12 +0000 (14:22 -0400)]
python-common/smb: move network conversion validation func to common
Extract code from the service_spec.py file that parses, validates and
converts network or ip address strings into a network object into a new
file so that it can be re-used more widely later.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Kefu Chai [Wed, 28 Jan 2026 02:58:31 +0000 (10:58 +0800)]
pybind/rbd: move legacy_implicit_noexcept to rbd.pyx
Move the legacy_implicit_noexcept compiler directive from setup.py to
the top of rbd.pyx, making it consistent with how CephFS handles this
directive. This simplifies the build setup by:
- Removing conditional logic based on Cython version in setup.py
- Eliminating the need for compiler_directives dict and packaging import
- Making RBD's directive handling consistent with other bindings
The directive is needed for building with both Cython 0.x and Cython 3
from the same file while preserving the same behavior. Cython safely
ignores unknown compiler directives when specified at the top of .pyx
files, so this works across all supported Cython versions.
When Cython 0.x support is eventually dropped, this directive can be
replaced with explicit noexcept annotations on rbd_callback_t and
librbd_progress_fn_t type definitions.
Kefu Chai [Fri, 23 Jan 2026 01:36:22 +0000 (09:36 +0800)]
pybind: hardwire language_level to 3
Previously, to maintain backward compatibility with Python 2, we set
'language_level' to sys.version_info.major, so the value would be 2
when building with Python 2, and 3 with Python 3. Now that Python 2
support has been dropped, we can hardwire it to "3".
This change also removes the comment about switching to
`language_level=3str` in the future. According to the Cython 3.1+
documentation,
> language_level=3 is now the default. language_level=3str has become a
> legacy alias.
see https://cython.readthedocs.io/en/3.1.x/src/changes.html.
For context, in Cython < 3.1, language_level=3 and language_level=3str
had different meanings:
- 3 = unprefixed strings are unicode
- 3str = unprefixed strings follow Python version (bytes in Py2, unicode
in Py3)
Since we no longer support Python 2, this distinction is irrelevant and
the comment can be safely removed.
Kefu Chai [Tue, 27 Jan 2026 07:08:28 +0000 (15:08 +0800)]
cmake: migrate Python module installation from setup.py to pip
Replace 'setup.py install' with 'pip install --use-pep517' to fix
Cython compilation failures and eliminate deprecation warnings.
Problem Statement:
The build process for Cython modules involves preprocessing .pyx files
(e.g., generating rbd_processed.pyx from rbd.pyx) and then cythonizing
with specific compiler_directives. The previous approach using separate
'setup.py build' and 'setup.py install' commands caused this failure:
rbd_processed.pyx:781:44: Cannot assign type 'int (*)(uint64_t, uint64_t, void *) except? -1' to 'librbd_progress_fn_t'. Exception values are incompatible. Suggest adding 'noexcept' to type 'int (uint64_t, uint64_t, void *) except? -1'.
```
This occurs because:
1. 'setup.py build build_ext' successfully preprocesses and cythonizes
with compiler_directives from setup.py's cythonize() call
2. 'setup.py install' internally triggers a rebuild that:
- Regenerates the preprocessed .pyx files
- Re-runs cythonize() through Cython.Distutils.build_ext
- Does NOT apply the compiler_directives from setup.py
- Fails on the regenerated files missing required directives
New Options Explained:
`--use-pep517`:
Addresses deprecation warning:
```
DEPRECATION: Building 'rados' using the legacy setup.py bdist_wheel
mechanism, which will be removed in a future version. pip 25.3 will
enforce this behaviour change.
```
Uses the modern PEP 517 build backend which:
- Performs a single build pass with all compiler_directives applied
- Prevents the implicit rebuild that caused CompileError
- Future-proofs against pip 25.3+ which will require this
`--no-build-isolation`:
Ensures that environment variables set by CMake are respected:
- CC, LDSHARED (compiler toolchain)
- CPPFLAGS, LDFLAGS (compilation flags)
- CYTHON_BUILD_DIR, CEPH_LIBDIR (build paths)
Without this flag, pip would create an isolated build environment
that ignores these critical build settings.
`--no-deps`:
Prevents pip from attempting to install Python dependencies listed
in setup.py's install_requires. All dependencies are managed by
CMake and the distribution's package manager, not pip.
`--ignore-installed`:
Addresses installation error when DESTDIR is set:
```
ERROR: Could not install packages due to an OSError: [Errno 13]
Permission denied: '/usr/lib/python3/dist-packages/rados-2.0.0.egg-info'
OSError: [Errno 18] Invalid cross-device link:
'/usr/lib/python3/dist-packages/rados-2.0.0.egg-info' -> '/tmp/pip-uninstall-...'
```
This error occurs because pip detects an existing system installation
and tries to uninstall it before installing to DESTDIR. With
--ignore-installed, pip skips the uninstall step and directly installs
to the DESTDIR staging directory, which is the correct behavior for
packaging.
Removed Options:
`--install-layout=deb`:
This Debian-specific patch to 'setup.py install' is no longer needed.
Modern pip automatically detects the distribution and uses the correct
layout (dist-packages on Debian, site-packages on RPM distros).
`--single-version-externally-managed`:
This option was specific to 'setup.py install' to prevent egg
installation. With pip, this is handled automatically.
`--record /dev/null`:
No longer needed as pip manages installation records internally.
`egg_info --egg-base`:
Not needed with pip as metadata is generated automatically during
the build process.
Not added option:
`--root-user-action=ignore`: not added
In this change, we installing a python module using pip with
`fakeroot` before packaging it. But pip warned:
```
Error: WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
```
But we use fakeroot on purpose, this option could have been added to
silence this warning. But it is not available in all supported pip
versions. see
https://github.com/pypa/pip/commit/2e1112a8141dbdf767505fded918706e9ad61031
New environmental variable:
`DEB_PYTHON_INSTALL_LAYOUT=deb` is conditionally applied when packaging
for debian-derivative distributions. As pip does not support
`--install-layout` option. Since debian patches pip so it installs Python
modules into /usr/local/lib instead of /usr/lib where debian dh_install
helper looks for the content to be packaged, so we have to enforce the
debian layout using the environmental variable.
Working Directory Change:
Changed from `CMAKE_CURRENT_SOURCE_DIR` to `CMAKE_CURRENT_BINARY_DIR` to
keep pip's temporary files and logs in the build directory rather than
polluting the source tree.
Additional Dependencies:
Since the build process uses pip and creates a wheel distribution,
we need to add `pip` and `wheel` Python modules as build dependencies.
Python moduels packaging:
- with `--use-pep517`, pip creates .dist-info directoires as per PEP-517
instead of .egg-info, so we need to package the new metadata directory.
Future Improvements
We considered implementing a custom `build_templates` command or using
setuptools' `sub_commands` mechanism to avoid regenerating `*_processed.pyx`
files on every build (tracking dependencies via file modification times or
hash-based checks). However, to keep `setup.py` simple and maintainable,
we've deferred this optimization for future work. The current solution
using `pip install --use-pep517` ensures correct builds without additional
complexity.
This solution works correctly for both Debian and RPM packaging workflows,
both of which use DESTDIR-based staged installations.