In SingletonClient::init(), objecter->start() called before
monc->authenticate(), it makes conns of monc authencated before
monc->authenticate() called if mons reply faster, in this case,
monc will not subsribe monmap/config.
Signed-off-by: Shaohui Wang <wangshaohui.0512@bytedance.com>
Afreen Misbah [Mon, 17 Nov 2025 05:01:45 +0000 (10:31 +0530)]
mgr/dashboard: Set max subsystem count to 512 rather than 4096
Fixes https://tracker.ceph.com/issues/73867
- regression from https://github.com/ceph/ceph/pull/64477/files
- removing frontend valdations as this values are volatiel and require changes every release. Nvmeof is seeting these and validating as well.
Matan Breizman [Sun, 16 Nov 2025 10:02:23 +0000 (10:02 +0000)]
qa/suites/crimson-rados/thrash: enable seastore
https://github.com/ceph/ceph/pull/64715 added seastore recovery tests
to crimson-rados-experimental. Based on:
https://pulpito.ceph.com/matan-2025-11-11_09:38:51-crimson-rados-experimental-wip-anrao2-testing-2025-11-11-1111-distro-crimson-debug-smithi/
the remaining failures also occur in main (without recovery).
Move the experimental tests to the stable ones as they
do not introduce regressions.
```
=== INSTALLING ===
No match for argument: ceph-crimson-osd
Error: Unable to find a match: ceph-crimson-osd
```
ceph-dev-pipeline builds are not failing (with the above error) -
however, ceph-dev-builds (used for main nightly) are failing with
the above:
https://shaman.ceph.com/builds/ceph/main/8a27bf16140173253ab8f28112bf5deee99cca02/
Anoop C S [Tue, 21 Oct 2025 08:53:50 +0000 (14:23 +0530)]
mgr/smb: Disable posix locking in share definition
The prerequisites for supporting durable handles[1] in Samba include
disabling the mapping of POSIX locks, as well as setting the `kernel
oplocks` and `kernel sharemodes` parameters to disabled. Currently
this configuration is hard‑coded, but in the future it could be made
conditional and combined with other settings to enable persistent
handles on continuously available shares.
Kefu Chai [Sat, 18 Oct 2025 14:23:56 +0000 (22:23 +0800)]
debian/rules: enable WITH_CRIMSON when pkg.ceph.crimson profile is set
Since commit 9b1d524839 ("debian: mark "crimson" specific deps with
"pkg.ceph.crimson""), crimson-specific build dependencies have been
gated by the Build-Profiles: <pkg.ceph.crimson> tag. However,
debian/rules was never updated to pass -DWITH_CRIMSON=ON when this
build profile is active.
This causes builds with the crimson profile enabled to fail during
dh_install, as the crimson-osd binary is never built but the install
file tries to package it:
Failed to copy 'usr/bin/crimson-osd': No such file or directory
dh_install: error: debian/ceph-crimson-osd.install returned exit code 127
Fix this by checking for pkg.ceph.crimson in DEB_BUILD_PROFILES and
enabling the CMake option accordingly, following the same pattern used
for pkg.ceph.arrow.
Kefu Chai [Fri, 17 Oct 2025 14:09:26 +0000 (22:09 +0800)]
qa: install ceph-osd-classic and ceph-osd-crimson
- qa/packages/packages.yaml: add ceph-osd and ceph-osd-classic to
packages/packages.yaml, so that the "install" task can install
ceph-osd-classic by default, this preserves the existing behavior.
- qa/suites/crimson-rados: install ceph-osd-crimson instead of
ceph-osd-classic. adding them to exclude_packages and extra_packages
to task.install allows us to customize the packages to be installed
when performing the "install"
task.
- qa/suites/crimson-rados-experimental: likewise.
debian,ceph.spec: split ceph-osd into shared base and implementation packages
Previously, ceph-osd packaging had two mutually exclusive flavors that
could only be built one at a time: one with classic OSD and another
with crimson OSD. Both provided /usr/bin/ceph-osd, making them
impossible to coexist and confusing from a user perspective.
This commit restructures the packaging to enable both implementations
to coexist on the same system:
- ceph-osd: Contains shared components (systemd units, sysctl configs,
common executables like ceph-erasure-code-tool) and depends on exactly
one OSD implementation
- ceph-osd-classic: Contains the classic OSD implementation binary and
classic-specific tools
- ceph-osd-crimson: Contains the crimson OSD implementation binary and
crimson-specific tools
The two implementation packages install different sets of file, so they
don't conflict with each other anymore, and both depend on ceph-osd for
shared resources.
Changes:
Debian packaging:
- Revert e5f00d2f
- Add ceph-osd-crimson package
- Add Recommends: ceph-osd-classic to prefer classic on upgrades
- Add Replaces/Breaks for smooth upgrades from old monolithic package
- Create separate .install files for crimson and classic osd packages
Enforce exact version matching using ${binary:Version}
RPM packaging:
- Use rich dependencies for OR requirement (classic or crimson)
- Add Recommends: ceph-osd-classic for upgrade preference
Upgrade behavior:
Users upgrading from older versions will automatically get
ceph-osd-classic due to the Recommends directive, maintaining
backward compatibility. Users can explicitly choose crimson by
installing ceph-osd-crimson, which will coexist with classic.
Switching between implementations is supported via standard package
operations, with the alternatives system ensuring /usr/bin/ceph-osd
always points to the active implementation.
ceph-volume: remove exc_info from ceph.conf load warning
This commit removes exc_info=1 from the logger.warning call when failing to
load ceph.conf. According to the preceding comment, this scenario can happen
legitimately, so it is not an unexpected error, it means there is no need to
clutter the logs with a full python traceback for this case.
ceph-volume: migrate namedtuple based config and sysInfo to dataclasses
this commit replaces the previous namedtuple definitions for config and sys_info
with dataclasses.
namedtuple is meant to be immutable, but the code modifies its attributes,
so using dataclasses fixes this misuse.
UnloadedConfig is preserved as the default for ceph in Config
to maintain the runtime error behavior when accessing ceph configuration before
it is loaded.
Imran Imtiaz [Wed, 12 Nov 2025 14:04:44 +0000 (14:04 +0000)]
mgr/dashboard: add API endpoint to create consistency groups
Signed-off-by: Imran Imtiaz <imran.imtiaz@uk.ibm.com> Fixes: https://tracker.ceph.com/issues/73821
Add the ability to create a consistency group via the Dashboard API.
Nizamudeen A [Thu, 6 Nov 2025 04:53:47 +0000 (10:23 +0530)]
mgr/dashboard: start node virtual-env after starting ceph cluster
in frontend e2e.sh file, we don't need to start the node venv early on
before the ceph cluster is started. we only need it for the `npm` or
`npx` commands. Starting node virtual env and then starting ceph will
cause the ceph cluster to assume the node-env python as the python
environment which breaks the cryptotools call.
So moving the node-env venv start after the ceph is created
Fixes: https://tracker.ceph.com/issues/73804 Signed-off-by: Nizamudeen A <nia@redhat.com>
Kefu Chai [Tue, 21 Oct 2025 03:25:00 +0000 (11:25 +0800)]
debian: Use system packages for cephadm bundled dependencies
Configure the Debian build to use CEPHADM_BUNDLED_DEPENDENCIES=deb,
which instructs the cephadm build script to bundle dependencies from
system-installed Debian packages instead of downloading from PyPI.
This change addresses build failures in restricted network environments
where Debian build tools do not permit internet access. By leveraging
the Debian package support added in commit 9378a2988e1, the build now
uses python3-markupsafe, python3-jinja2, and python3-yaml packages
that are already installed as build dependencies.
This approach mirrors the existing RPM packaging workflow, ensuring
consistent behavior across different distribution package formats.
Kefu Chai [Tue, 21 Oct 2025 03:26:25 +0000 (11:26 +0800)]
cephadm/tests: Add tests for deb bundled dependencies
Add container definitions and test cases for building cephadm with
Debian package dependencies. The new test_cephadm_build_from_debs
function mirrors the existing RPM test structure, verifying that:
- Build succeeds when required Debian packages are installed
- Build fails when packages are missing
- Bundled packages are correctly identified as sourced from 'deb'
- All expected packages (Jinja2, MarkupSafe, PyYAML) are included
- The zipapp contains expected package directories
Test environments include Ubuntu 22.04 and 24.04 with and without
the required python3-jinja2, python3-yaml, and python3-markupsafe
packages.
Kefu Chai [Tue, 14 Oct 2025 13:04:42 +0000 (21:04 +0800)]
cephadm/build: Add Debian package support for bundled dependencies
Extends the cephadm build script to support bundling dependencies from
Debian packages in addition to pip and RPM packages. This allows building
cephadm on Debian-based distributions using system packages.
Key changes:
- Add 'deb' to DependencyMode enum to enable Debian package mode
- Implement _setup_deb() to configure Debian dependency requirements
- Add _install_deb_deps() to orchestrate Debian package installation
- Add _gather_deb_package_dirs() to parse Debian package file listings
and locate Python package directories (handles both site-packages and
dist-packages directories used by Debian)
- Add _deps_from_deb() to extract Python dependencies from installed
Debian packages using dpkg/apt-cache tools
- Fix variable reference bug in _install_deps() (deps.mode -> config.deps_mode)
The Debian implementation follows a similar pattern to the existing RPM
support, using dpkg-query and dpkg -L to locate installed packages and
their files, with special handling for Debian naming conventions
(e.g., PyYAML -> python3-yaml).
Kefu Chai [Mon, 10 Nov 2025 04:11:08 +0000 (12:11 +0800)]
cephadm: fix zip_root_entries population in version command
The 'cephadm version --verbose' command was returning an empty
zip_root_entries list because it relied on the private '_files'
attribute of zipimport.zipimporter, which is not reliably populated
across Python versions.
This commit fixes the issue by using the zipfile module to properly
read the archive contents via the loader.archive path. This ensures
that zip_root_entries is correctly populated with the root-level
directories in the zipapp.
This fix is necessary for the cephadm build tests to properly validate
that all expected packages and modules are included in the built zipapp.
Kefu Chai [Mon, 10 Nov 2025 04:10:46 +0000 (12:10 +0800)]
cephadm/tests: fix _dist_info function logic error
The _dist_info helper function had a logic error where it was checking
if 'entry.startswith(entry)' instead of 'entry.startswith(name)'. This
caused the function to always evaluate incorrectly when checking for
.dist-info or .egg-info entries in the zipapp.
This bug was preventing the test assertions from properly validating
that package metadata directories are included in the built cephadm
zipapp.
crimson/seastore: adapt _mkfs() to new coroutine::experimental::generator
Update use of experimental_list_directory() to match Seastar’s new
generator.
For more details see: https://github.com/scylladb/seastar/commit/81f2dc9dd976b0019ff84274b8b7fb7507c3e4e7
src/msg/Policy: explicitly include <map> for std::map usage
The Seastar update brought in a newer toolchain, which stopped pulling in <map> indirectly through other headers.
That exposed missing includes in Policy.h, where std::map was used but <map> wasn’t explicitly included.
Samuel Just [Tue, 11 Nov 2025 02:52:22 +0000 (02:52 +0000)]
qa/suites: remove centos restriction from valgrind yaml
http://tracker.ceph.com/issues/20360 and
http://tracker.ceph.com/issues/18126 were quite some time
ago. It's causing trouble now because it only overrides the
os_type bit leaving the os_version alone causing teuthology
to look for centos 10 (centos + rocky 10).
Kefu Chai [Mon, 10 Nov 2025 15:01:32 +0000 (23:01 +0800)]
mgr/dashboard: fix Physical Disks identify test race condition
Fix a regression in the Physical Disks identify device e2e test that
causes intermittent timeouts when attempting to click the "Identify"
button.
Problem:
The test was timing out after 120 seconds while attempting to click
the "Identify" button, which remained in a disabled state. This
manifested as a race condition where the test would try to click the
button before it became enabled.
Root Cause (Regression Analysis):
This regression was introduced in commit 94418d90d2b ("mgr/dashboard:
fix UI modal issues", Sept 9, 2024) which aimed to fix the Physical
Disks Identify modal not opening (tracker.ceph.com/issues/67547).
While that commit successfully:
- Migrated from cd-modal to cds-modal (Carbon Design System)
- Changed button selector to use data-testid="primary-action"
- Added the e2e test to prevent future regressions
It inadvertently introduced a timing issue by not adding proper wait
logic for the button to become enabled. The commit also modified the
table-actions component to conditionally render the primary action
button based on tableActions.length > 0, which can cause the button
to be disabled while table actions are still loading.
Solution:
Add .should('not.be.disabled') before .click() to ensure Cypress waits
for the button to become enabled before attempting to interact with it.
This follows the established pattern used elsewhere in the codebase
(see page-helper.po.ts:319).
Impact:
- Fixes Jenkins build failures in ceph-dashboard-cephadm-e2e job
- Observed in build #18956 as "Regression - Failing for 1 build"
- Jenkins metrics show MTTF of ~2 hours, indicating this race
condition occurs frequently enough to cause CI instability
Ilya Dryomov [Mon, 10 Nov 2025 19:43:59 +0000 (20:43 +0100)]
qa/suites/rbd/valgrind: don't hardcode os_type in memcheck.yaml
The entire subsuite is pinned by centos_latest.yaml symlink, so the
stanza in memcheck.yaml is redundant. Removing it allows to experiment
with other distros just through varying the symlink target.
common: ModeCollector: locating the value of the mode
The ModeCollector class is used to collect values
of some type 'key', each associated with some object
identified by an 'ID'. The collector reports the 'mode'
value - the value associated with the largest number
of distinct IDs.
The results structure returned by the collector specifies
one of three possible mode_status_t values:
- no_mode_value - No clear victory for any value
- mode_value - we have a winner, but it has less than half of the
samples
- authorative_value - more than half of the samples are of the same
value
Yuval Lifshitz [Mon, 3 Nov 2025 11:20:07 +0000 (11:20 +0000)]
rgw/logging: deleteting the object holding the temp object name on cleanup
* in case of prefix per source this would prevent leaking this object
* in case of share prefix, it would prevent data loss when other source
buckets will try to commit an already comitted temporary object
* when updatign the "last committed" attribute, the object must exist.
this is so that commit without rollover (in case of cleanup) won't
recreate the deleted object
* some refactoring of try-catch code to have less nesting
Kefu Chai [Mon, 10 Nov 2025 13:44:07 +0000 (21:44 +0800)]
rgw/posix: Fix race condition in Inotify causing segfault
Fixed a race condition in the Inotify class where the ev_loop() thread
and caller threads (add_watch/remove_watch) were accessing the
wd_callback_map and wd_remove_map hash maps without synchronization.
This caused a segfault during hash table operations when one thread
was reading from the map while another was modifying it, leading to
iterator invalidation and memory corruption.
Backtrace from the crash:
Frame 5: file::listing::Inotify::ev_loop()+0x190
Frame 4: ankerl::unordered_dense::v3_1_0::detail::table::find()
Crash: Memory access violation during WatchRecord lookup
The fix adds:
- A mutex (map_mutex) to protect both hash maps
- Lock guards in add_watch() and remove_watch() during map modifications
- Lock guard in ev_loop() with proper copying of watch record data to
avoid holding the lock during callbacks and prevent use-after-free
See https://jenkins.ceph.com/job/ceph-pull-requests/169774/testReport/junit/projectroot.src.test/rgw/unittest_rgw_posix_driver/
galsalomon66 [Mon, 27 Oct 2025 17:25:58 +0000 (17:25 +0000)]
initializing of enable_progress length_before_processing length_post_processing on construction.
these variable are getting initialized on s3select/CSV flow, no valgrind local run had discovered any issue related to these variables.
valgrind reports produced by teuthology points on run_s3select_on_csv to contain UninitCondition warning. sometimes.