git-server-git.apps.pok.os.sepia.ceph.com Git

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

Redouane Kachach [Wed, 11 Feb 2026 13:36:01 +0000 (14:36 +0100)]

mgr/cephadm: cleanup leftover certs/keys after cert_src changes

This PR improves certificate cleanup when a service switches
certificate sources (cephadm-signed <-> inline/reference). It also adds
best-effort post-remove helpers to purge stale cephadm-managed
cert/key pairs. Inline-stored (non-editable) certs/keys are removed,
while referenced/user-managed (editable) credentials are preserved.

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Redouane Kachach [Wed, 11 Feb 2026 11:17:55 +0000 (12:17 +0100)]

mgr/cephadm: adding tls fields as deps for services with TLS support

This is especially important for inline certificates, so the certmgr
store is updated automatically whenever the user changes the values in
the spec and reapplies it.

Fixes: https://tracker.ceph.com/issues/75009
Signed-off-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Redouane Kachach [Thu, 14 May 2026 08:58:08 +0000 (10:58 +0200)]

Merge pull request #67087 from ShwetaBhosale1/fix_issue_74479_nfs_active_active_support_allow_colo

mgr/cephadm: Allow colocation of NFS daemon to support active-active mode

Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Sagar Gopale [Wed, 13 May 2026 14:13:56 +0000 (19:43 +0530)]

mgr/dashboard: Carbonize cluster-wide OSD flags modal
fixes:https://tracker.ceph.com/issues/76580
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>

commit | commitdiff | tree

Ronen Friedman [Thu, 14 May 2026 04:52:41 +0000 (07:52 +0300)]

Merge pull request #68725 from ronen-fr/wip-rf-cmem-crimson

crimson/osd,qa: support OSD memory size in the OSD and in QA suites

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Jose J Palacios-Perez <perezjos@uk.ibm.com>

commit | commitdiff | tree

Kefu Chai [Thu, 14 May 2026 01:10:47 +0000 (09:10 +0800)]

Merge pull request #68876 from tchaikov/wip-crimson-co-return

crimson/osd: drop redundant trailing co_return in pg_advance_map

Reviewed-by: Matan Breizman<mbreizma@redhat.com>

commit | commitdiff | tree

Dan Mick [Thu, 14 May 2026 00:24:25 +0000 (17:24 -0700)]

Merge pull request #68602 from phlogistonjohn/jjm-bwc-u26

script/build-with-container: add distro references for ubuntu 26.04

commit | commitdiff | tree

Adam Emerson [Wed, 13 May 2026 21:00:17 +0000 (17:00 -0400)]

Merge pull request #68014 from adamemerson/wip-rgw-no-vla

rgw: VLAs are no longer welcome

Reviewed-by: Jesse F. Williamson <jfw@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Wed, 13 May 2026 20:48:46 +0000 (22:48 +0200)]

Merge pull request #68761 from MaxKellermann/librbd__missing_includes

librbd: add missing includes

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 13 May 2026 19:48:41 +0000 (15:48 -0400)]

Merge PR #68781 into main

* refs/pull/68781/head:
doc/governance: remove Sam from CSC

Reviewed-by: Joseph Mundackal <jmundackal@bloomberg.net>
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Garry Drankovich [Wed, 13 May 2026 18:09:40 +0000 (21:09 +0300)]

os/bluestore: add effective elastic shared blobs mode into OSD metadata

Signed-off-by: Garry Drankovich <garry.drankovich@clyso.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 13 May 2026 17:53:17 +0000 (13:53 -0400)]

qa/suites/upgrade: ignore undersized PG during stress splits

Fixes: https://tracker.ceph.com/issues/76585
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 13 May 2026 17:47:54 +0000 (13:47 -0400)]

qa: ignore cephadm failed daemon warnings during thrashing

Fixes: https://tracker.ceph.com/issues/73079
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Krunal Chheda [Wed, 13 May 2026 17:19:48 +0000 (22:49 +0530)]

osd/test: Fix build breakage when WITH_EC_ISA_PLUGIN is OFF

Signed-off-by: Krunal Chheda <116836699+kchheda3@users.noreply.github.com>

commit | commitdiff | tree

Adam King [Tue, 17 Mar 2026 18:30:51 +0000 (14:30 -0400)]

mgr/cephadm: serialize OSD class before returning for OSD rm status

Fixes: https://tracker.ceph.com/issues/74862
Signed-off-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 13 May 2026 15:11:50 +0000 (11:11 -0400)]

Merge PR #68780 into main

* refs/pull/68780/head:
doc/governance: remove Ken and Jeff from CSC

Reviewed-by: Dan van der Ster <dan.vanderster@clyso.com>

commit | commitdiff | tree

Patrick Donnelly [Wed, 13 May 2026 15:11:26 +0000 (11:11 -0400)]

Merge PR #68779 into main

* refs/pull/68779/head:
doc/governance: update Ceph Executive Council List

Reviewed-by: Dan van der Ster <dan.vanderster@clyso.com>

commit | commitdiff | tree

Shweta Bhosale [Thu, 22 Jan 2026 10:09:41 +0000 (15:39 +0530)]

doc: Updated the doc for NFS colocating ports

Fixes: https://tracker.ceph.com/issues/74479
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>

commit | commitdiff | tree

Kefu Chai [Wed, 13 May 2026 11:09:37 +0000 (19:09 +0800)]

crimson/osd: fix crash in committed_osd_maps when an OSD is removed

OSDMap::is_down(osd) is defined as !is_up(osd), and is_up() gates on
exists(osd).  This means is_down() returns true for OSDs that have
been *removed* from the map (EXISTS flag cleared), not just marked
down.

committed_osd_maps() iterates over epochs [first, last], and for each
epoch over all OSDs in old_map, calling get_cluster_addrs() for any
OSD that was up in old_map and is_down() in the current epoch.
get_cluster_addrs() asserts exists(osd), so when that OSD has been
removed the assertion fires.

Reproducer (3 crimson OSDs running: osd.0, osd.1, osd.2):

  # Two rapid OSDMap changes; the monitor batches them into one message.
  ceph osd down 2
  ceph osd purge 2 --yes-i-really-mean-it

  # osd.0 and osd.1 call committed_osd_maps(N, N+1).  Before this fix
  # old_map is set once before the loop and never updated, so in
  # iteration N+1 the comparison is still old_map(N-1) vs osdmap(N+1):
  #
  #   old_map->is_up(2)=true     (osd.2 was up at N-1)
  #   osdmap->exists(2)=false    (purged in N+1)
  #   osdmap->is_down(2)=true    (!is_up, since !exists -> true)
  #   -> get_cluster_addrs(2) asserts -> crash
  #
  #   OSDMap.h: ceph_assert(exists(osd))  [in get_cluster_addrs()]
  #   Signal 6 (SIGABRT)

Note: 'ceph osd destroy' does NOT clear the EXISTS flag; it only sets
CEPH_OSD_DESTROYED.  The EXISTS flag is cleared by 'osd rm', which
'osd purge' calls internally after 'osd destroy'.

Fix: advance old_map at the end of each iteration so the comparison
is pairwise (N-1 vs N, then N vs N+1, ...), matching classic
OSD::advance_map at src/osd/OSD.cc:8615.  In the reproducer,
iteration N marks osd.2 down using osdmap(N) (where osd.2 still
exists), then sets old_map = osdmap(N).  Iteration N+1 starts with
old_map(N)->is_up(2)=false (osd.2 was DOWN in N), so the condition
short-circuits and get_cluster_addrs() is never called on the new
map.

No explicit !exists branch is needed.  The monitor produces a
separate epoch for each of 'osd down' / 'osd destroy' / 'osd rm', so
an OSD can only transition UP -> REMOVED through at least one
intermediate DOWN epoch in any batched MOSDMap, and the pairwise
comparison short-circuits before the assert can fire.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Afreen Misbah [Wed, 13 May 2026 10:47:18 +0000 (16:17 +0530)]

Merge pull request #68801 from afreen23/custom-image

mgr/dashboard: Allow quick bootstrap script to use custom images

Reviewed-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 13 May 2026 10:04:14 +0000 (12:04 +0200)]

Merge pull request #68769 from guits/fix-76433

ceph-volume: fix argparse dmcrypt opts: use str type

commit | commitdiff | tree

Guillaume Abrioux [Wed, 13 May 2026 10:04:07 +0000 (12:04 +0200)]

Merge pull request #68765 from guits/cv-fix-get-file-contents

ceph-volume: fallback to default for empty get_file_contents values

commit | commitdiff | tree

Jacques Heunis [Wed, 13 May 2026 09:14:05 +0000 (09:14 +0000)]

rgw: Fix ops logs sometimes having several entries per line.

Although not explicitly documented, the RGW ops log is generally
formatted with one entry per line. This makes it work well with log
shipping/ingestion services (many of which default to treating each line
as a separate entry) and in particular works well with log servers that
index based on the available JSON fields.

The current implementation separates the log call from the resulting
disk IO: Logs are written to a buffer and a separate thread flushes them
to disk (possibly in batches). The current code appends a newline only
at the end of the batch being flushed to disk and in many cases when
under load this means that several log entries are concatenated onto a
single line, which complicates attempts to process those logs.

This PR separates the addition of a new line from the flush to disk,
appending a newline after every log entry but still only flushing at the
end of the batch to avoid additional IO overhead.

Fixes: https://tracker.ceph.com/issues/76566
Signed-off-by: Jacques Heunis <jheunis@bloomberg.net>

commit | commitdiff | tree

Matan Breizman [Wed, 13 May 2026 08:25:23 +0000 (11:25 +0300)]

Merge pull request #68844 from Matan-B/wip-matanb-java17-crimson-rgw

qa/suites/crimson-rados/rgw/sts/tasks/1-keycloak: dont install java-1…

Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>

commit | commitdiff | tree

Kefu Chai [Tue, 5 May 2026 01:36:01 +0000 (09:36 +0800)]

pybind/mgr/status: drop asserts that fight the defaultdict defaults

The 'assert metadata' checks in the status module were actually fighting
against our own defaults. Since an empty defaultdict is falsy, these
asserts would blow up the whole command if a single daemon was down
after a mgr restart.

This drops those four grumpy asserts. Now, instead of a traceback,
`ceph osd status` and `ceph fs status` will just show a blank hostname
or "unknown" version as intended.

The trigger is common in practice: any mgr restart leaves daemons
that are currently down without metadata in daemon_state, since
they never reconnect via MMgrOpen to repopulate it. After such a
restart, `ceph osd status` and `ceph fs status` blow up:
```
  Error EINVAL: Traceback (most recent call last):
    ...
    File ".../status/module.py", line 340, in handle_osd_status
      assert metadata
  AssertionError
```

The bug was introduced in 5ac2901f54ff

Fixes: https://tracker.ceph.com/issues/76416
Reported-by: Maximiliano Sandoval <m.sandoval@proxmox.com>
Signed-off-by: Kefu Chai <tchaikov@gmail.com>

commit | commitdiff | tree

Kefu Chai [Tue, 5 May 2026 01:35:00 +0000 (09:35 +0800)]

mgr: narrow get_metadata return type with @overload

Enable type narrowing for get_metadata() when a non-None default is
provided. Previously, the return type was always `Optional[Dict[str, str]]`,
forcing callers to use defensive `assert metadata` checks even when
a result was guaranteed.

The wrapper returns either the metadata from `_ceph_get_metadata()` or the
caller-supplied default. Providing an `@overload` allows type checkers to
prove the result is non-None, avoiding invalid assertions for falsy
defaults (like an empty defaultdict).

This is a hygienic change with no runtime impact.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Matan Breizman [Wed, 13 May 2026 07:54:37 +0000 (10:54 +0300)]

Merge pull request #68814 from amathuria/wip-amat-fix-76447

crimson/osd: skip PGAdvanceMap on a deleted PG

Reviewed-by: Kefu Chai <tchaikov@gmail.com>

commit | commitdiff | tree

Niklas Hambüchen [Sat, 27 Dec 2025 13:05:19 +0000 (14:05 +0100)]

doc: Document that client_dirsize_rbytes confuses rsync

This is important to document because otherwise the immediate question
one has is "why _wouldn't_ I enable this?".
At the same time, being able to use tools like rsync is a common
motivation for using CephFS.

Unfortunately the only source to this I could find is the presentation
"CephFS: Architecture Introduction & New Features" by Greg Farnum:
https://ceph.io/assets/pdfs/events/2025/ceph-days-silicon-valley/10%20-%20Greg%20-%20CephFS.pdf

Signed-off-by: Niklas Hambüchen <mail@nh2.me>

commit | commitdiff | tree

Kefu Chai [Wed, 13 May 2026 07:19:20 +0000 (15:19 +0800)]

Merge pull request #68857 from tchaikov/wip-debian-libprotobuf

debian: drop explicit libprotobuf dependency from ceph-osd-crimson

Reviewed-by: Dan Mick <dmick@ibm.com>

commit | commitdiff | tree

Ashwin M. Joshi [Wed, 25 Feb 2026 05:58:39 +0000 (11:28 +0530)]

mgr: Control PG autoscaler during upgrades with pg_autoscale_during_upgrade

Fixes: https://tracker.ceph.com/issues/69477
Signed-off-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>
Conflicts:
src/pybind/mgr/cephadm/tests/test_upgrade.py
src/pybind/mgr/cephadm/upgrade.py

commit | commitdiff | tree

Kefu Chai [Wed, 13 May 2026 05:02:16 +0000 (13:02 +0800)]

crimson/osd: disable ofstream buffering to fix concurrent logging

seastar::logger::do_log() writes to the shared static _out pointer from
every shard's reactor thread with no lock. std::cerr is safe in this
setting because it is unbuffered: each write maps to a single write(2)
syscall, which POSIX serializes at the kernel level. A buffered
std::ofstream is not safe: multiple shards concurrently advance the
filebuf's put pointer (pptr) past the end-of-buffer marker (epptr),
causing _M_convert_to_external to compute a length longer than the
8192-byte internal buffer and write past it.

The C++ standard does not provide thread-safety guarantees for
std::ofstream. [res.on.data.races] (C++23 §16.4.6.10) specifies that
concurrent non-const access to a standard library object from multiple
threads is a data race with undefined behavior. std::basic_filebuf,
which std::ofstream owns internally, maintains mutable state (pptr,
epptr, _M_buf, and the codec state) that is updated on every write with
no synchronization. std::cerr is an explicit exception: [iostream.objects]
guarantees that concurrent writes to the standard stream objects are
safe, and cerr achieves this by being unbuffered (no pptr/epptr to race
on) and writing through a single atomic write(2) per flush.

This manifested as a heap-buffer-overflow in basic_filebuf::overflow()
via seastar::logger::do_log() during a multi-shard run, reported against
the build that included the dangling-pointer fix (6680e02d041). The
dangling-pointer bug had masked this latent race by crashing before
multiple shards came online.

In this change, we fix this by calling pubsetbuf(nullptr, 0) immediately
after opening the file. This suppresses _M_allocate_internal_buffer and
makes each operator<< call fall through to a single write(2) syscall,
matching std::cerr's thread-safety guarantee.

This fix is necessary for all production deployments, not just
pathological configurations: log_file has a daemon_default of
/var/log/ceph/$cluster-$name.log in global.yaml.in ($cluster is always
"ceph" in modern Ceph, as customized cluster names have been deprecated).
Every crimson-osd process therefore opens a log_file_stream by default,
and every multi-shard run is exposed to this race.

An alternative would be to follow ScyllaDB's approach: its service file
has no StandardOutput or StandardError directives, so systemd connects
the process to the journal, and the logger keeps _out pointing at
std::cerr. This sidesteps the buffered-ofstream problem entirely. For
Crimson to adopt that model it would need to respect log_to_file and
log_to_stderr (which it currently ignores, checking only log_file), and
a dedicated ceph-crimson-osd@.service unit would be needed so that a
StandardError=append:/var/log/ceph/ceph-osd.%i.log directive could be
added without affecting the classic OSD. That is a larger refactor;
pubsetbuf(nullptr, 0) is the minimal correct fix for now.

Fixes: https://tracker.ceph.com/issues/76524
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Tue, 12 May 2026 09:55:06 +0000 (17:55 +0800)]

crimson/osd: inline log file stream setup to fix dangling pointer

maybe_set_logger() called logger().set_ostream() with a reference to a
local ofstream before returning it by value. Correctness relied on NRVO:
without it, _out would point to a moved-from object on a dead stack frame,
causing undefined behaviour that manifests as a heap-buffer-overflow under
ASan with GCC 14 (libasan.so.8).

Note that seastar::logger::_out is a static member shared by all logger
instances, so the dangling pointer affects every logger (seastore, network,
OSD), explaining why the crash appears across subsystems.

Inline the setup so log_file_stream and reset_logger share the same scope.
set_ostream() is now called with an unambiguously live object, with no
dependence on copy elision.

Fixes: 66c923d70354415cd1746c4f57cf31f3d55cc1bd
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Venky Shankar [Wed, 13 May 2026 04:37:36 +0000 (10:07 +0530)]

Merge pull request #67030 from indirasawant/wip-isawant-volumes-info-log

mgr/volumes: reduce noisy health check logs

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Kefu Chai [Wed, 13 May 2026 04:32:33 +0000 (12:32 +0800)]

crimson/osd: drop redundant trailing co_return in pg_advance_map

check_for_splits() and split_pg() both ended with a bare co_return
that the compiler inserts implicitly for a coroutine returning
seastar::future<>. Remove the redundant statements.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Oguzhan Ozmen [Tue, 12 May 2026 19:31:04 +0000 (19:31 +0000)]

test/neocls/log trimming: reproduce log trimming can go into an infinite loop

Add two tests that calls trim() loop function directly (the
use_awaitable_t overloads) rather than the single-op wrapper used by
existing tests. Two test cases for the marker-based overload:

- trim_loop_all_entries_by_marker: writes 10 entries, trims all, verifies
  the loop terminates and entries are gone.
- trim_loop_empty_log_by_marker: trims an empty log object, verifying
  the loop terminates on immediate ENODATA.

Without the fix in the following commits, both tests hang indefinitely.

- start a vstart cluster
- run the test: [build] $ ./bin/ceph_test_neocls_log
- the test introduced in this commit stalls forever:
    ...
    RUN      ] neocls_log.trim_loop_all_entries_by_marker <-- stalls forever

Reproduces: https://tracker.ceph.com/issues/76563
Signed-off-by: Oguzhan Ozmen <oozmen@bloomberg.net>

commit | commitdiff | tree

Krunal Chheda [Tue, 12 May 2026 18:50:59 +0000 (00:20 +0530)]

osd/test: Fix build breakage when WITH_EC_ISA_PLUGIN is OFF

Signed-off-by: Krunal Chheda <116836699+kchheda3@users.noreply.github.com>

commit | commitdiff | tree

Indira Sawant [Wed, 21 Jan 2026 17:46:31 +0000 (11:46 -0600)]

mgr/volumes: reduce noisy health check logs

Previously, the manager logged connection cleanup messages at info level
even when no idle connections existed, adding unnecessary noise to logs.

This change logs cleanup actions at info level only when idle connections
are found, and moves the 'no idle connections' message to debug level.

Fixes: https://tracker.ceph.com/issues/73635
Signed-off-by: Indira Sawant <indira.sawant@ibm.com>

commit | commitdiff | tree

Venky Shankar [Tue, 12 May 2026 15:26:29 +0000 (20:56 +0530)]

Merge PR #68128 into main

* refs/pull/68128/head:
qa: Fix checksum calculation on empty directories
qa: Add mirror test for snapshot with only dir
tools/cephfs_mirror: Fix sync hang

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Venky Shankar [Tue, 12 May 2026 15:25:50 +0000 (20:55 +0530)]

Merge PR #68389 into main

* refs/pull/68389/head:
qa: Handle TypeError in test_filelock

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Venky Shankar [Tue, 12 May 2026 15:25:20 +0000 (20:55 +0530)]

Merge PR #68141 into main

* refs/pull/68141/head:
client: Use correct size for fscrypt dummy key

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

dheart [Thu, 16 Apr 2026 12:34:42 +0000 (20:34 +0800)]

tool/ceph-kvstore-tool: add --pretty-binary-key option

Signed-off-by: dheart <dheart_joe@163.com>

commit | commitdiff | tree

Venky Shankar [Tue, 12 May 2026 15:24:09 +0000 (20:54 +0530)]

Merge PR #68446 into main

* refs/pull/68446/head:
mds: remove duplicate context completion calls
mds: add retry request to MDSRank wait queue rather via finisher
mds: adjust scan_stray_dir after fixing up MDSContext class
Revert "mds: move MDSContext completion handling to finish method"

Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Igor Fedotov [Fri, 8 May 2026 12:27:56 +0000 (15:27 +0300)]

os/bluestore: avoid iteration over spanning blobs when debug level is
inappropriate.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 3f808ed9515bf65f86dbf1756d0fe7b2dd19ed93)

commit | commitdiff | tree

Igor Fedotov [Fri, 8 May 2026 12:13:04 +0000 (15:13 +0300)]

os/bluestore: enforce blob's tail prunning after splitting.

Failing to do that could cause "left" blob to keep having invalid
pextents at the end. Which we prefer to avoid.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit b9b25eacceac3463266d1339656b1ef1f156d858)

commit | commitdiff | tree

Igor Fedotov [Fri, 8 May 2026 12:08:58 +0000 (15:08 +0300)]

test/unittest_bluestore_types: more tests for blob splitting

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 5ff275a4bf678ece9836c343ca3d428c0ab70134)

commit | commitdiff | tree

Igor Fedotov [Fri, 8 May 2026 10:58:29 +0000 (13:58 +0300)]

test/test_bluestore_types: add prune tail tests

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 9c61bb87ebaa183ffe20e66eba4c73fbe28d39a8)

commit | commitdiff | tree

Raimund Sacherer [Tue, 12 May 2026 09:07:30 +0000 (11:07 +0200)]

python-common/drive_selection: keep existing-OSD devices past limit

assign_devices() breaks out of disk iteration on the first hit from
_limit_reached(). When that happens, any later existing-OSD device
for the current spec is silently dropped from the selection, and
ceph-volume's lvm batch loses sight of it.

Only break when the candidate is not an existing OSD for this
service_id. Existing-for-this-spec devices continue to be added past
the limit; they are already accounted for through existing_daemons.

Complements d3f1a0e1c0b ("fix limit with existing devices", 2023),
which excluded ceph devices from the limit count. That fix prevents
the break from firing in most cases; this one keeps the iteration
useful when it does fire anyway.

Needs review: interaction across spec shapes (with/without explicit
limit:, with/without existing_daemons) should be looked at.

Fixes: https://tracker.ceph.com/issues/76522
Signed-off-by: Raimund Sacherer <rsachere@redhat.com>

commit | commitdiff | tree

Raimund Sacherer [Tue, 12 May 2026 09:05:53 +0000 (11:05 +0200)]

ceph-volume: tolerate <=1% short-fall on requested db/wal size

When requested_size (e.g. 1 GiB) slightly exceeds abs_size (e.g.
1023.3 MiB lost to PE alignment), get_physical_fast_allocs() called
exit(1) and aborted the whole batch.

if the short-fall is within 1%, scale down to abs_size
with an info log instead of aborting. Anything larger still hits the
existing error path.

Needs review: confirm 1% is the right threshold (maybe a lower percentage is
sufficient) and that no caller assumes abs_size == requested_size after this branch.

Signed-off-by: Raimund Sacherer <rsachere@redhat.com>

commit | commitdiff | tree

Raimund Sacherer [Tue, 12 May 2026 09:05:26 +0000 (11:05 +0200)]

ceph-volume: allocate db/wal slot on partial fast-device VG

On single-OSD redeploy where the fast device VG already has
DB LVs for sibling OSDs, get_physical_fast_allocs() returned an empty
list and ceph-volume fell back to a co-located OSD.

Two fixes in get_physical_fast_allocs():

- abs_size = dev_size / slots_for_vg can exceed vg_free when other
  slots are still in use, so the while-loop never enters. Fall back
  to abs_size = free_size / fast_slots_per_device.

- The loop counter was occupied_slots = len(dev.lvs), so on a partial
  VG the loop was aborted prematurely. Count only slots
  allocated in this call (new_slots) instead.

Initial issues where silent creation of OSD without DB, which
was fixed in commit 5c700ed7d64. After applying this fix we
did not get OSDs deployed at all.

Tested on RHCS 8 lab cluster (12 HDDs / 4 SSDs across 3 hosts,
db_slots: 6, encrypted)

Needs review: confirm new_slots match the original intent
of the per-batch cap when multiple OSDs are deployed in one call.

Fixes: https://tracker.ceph.com/issues/76522
Signed-off-by: Raimund Sacherer <rsachere@redhat.com>

commit | commitdiff | tree

Olivier Chaze [Tue, 12 May 2026 14:32:10 +0000 (16:32 +0200)]

doc/rgw: warn about rgw_usage_max_shards consistency

Add documentation warnings explaining that all RGW daemons and
radosgw-admin commands must use the same rgw_usage_max_shards value.
Mismatched shard counts cause writes and reads/trim to target different
objects, resulting in seemingly empty usage logs or failed cleanup.

Also document the --rgw-usage-max-shards command-line parameter for
radosgw-admin as an alternative to global config.

Fixes: https://tracker.ceph.com/issues/76459
Signed-off-by: Olivier Chaze <olivier.chaze@infomaniak.com>

commit | commitdiff | tree

Aarti [Thu, 19 Mar 2026 18:03:32 +0000 (23:33 +0530)]

dashboard: use metadata = event.get('refs', {}) instead of dict(event.get('refs', {}))

Fixes: https://tracker.ceph.com/issues/75619
Signed-off-by: Aarti Dhikale <aarti.s.dhikale@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 12 May 2026 13:34:52 +0000 (15:34 +0200)]

Merge pull request #68748 from guits/update-cv-doc

doc: warn against default cephadm shell for ceph-volume

commit | commitdiff | tree

Afreen Misbah [Tue, 12 May 2026 12:03:51 +0000 (17:33 +0530)]

Merge pull request #68052 from rhcs-dashboard/enable-overview-page

mgr/dashbaord: Enable overview landing page

Reviewed-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Kefu Chai [Tue, 12 May 2026 09:28:13 +0000 (17:28 +0800)]

Merge pull request #68084 from tchaikov/osd-odr

osd: fix ASAN ODR violations in denc-mod-osd

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Redouane Kachach [Tue, 12 May 2026 09:26:59 +0000 (11:26 +0200)]

Merge pull request #66820 from Shubhaj1810/fix-hostname-case-insensitive-v2

python-common/hostspec: normalize hostnames for case-insensitive matc…

Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Kefu Chai [Tue, 12 May 2026 09:17:56 +0000 (17:17 +0800)]

debian: drop explicit libprotobuf dependency from ceph-osd-crimson

The ceph-osd-crimson package already lists ${shlibs:Depends} in its
Depends field, which generates the correct libprotobuf dependency for
the target distribution at build time (e.g. libprotobuf32t64 on
Trixie/Noble). The hardcoded libprotobuf23 entry is redundant and
breaks installations on distributions where protobuf ships under a
different package name.

See also ab4c5daead7f26d41028625453d50bb58d3b02be which added this
runtime dep.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Redouane Kachach [Tue, 12 May 2026 09:23:52 +0000 (11:23 +0200)]

Merge pull request #67707 from Shubhaj1810/fix-upgrade-order-validation

mgr/cephadm: Fix upgrade order validation when using daemon_types with hosts

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Ashwin M. Joshi <ashjosh1@in.ibm.com>

commit | commitdiff | tree

Redouane Kachach [Tue, 12 May 2026 09:20:19 +0000 (11:20 +0200)]

Merge pull request #66189 from timqn22/mon-public-network-updating

mgr/cephadm: mon public network updating

Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Redouane Kachach [Tue, 12 May 2026 09:19:01 +0000 (11:19 +0200)]

Merge pull request #68398 from ashjosh1git/ceph-tracker-75603-ok-to-upgrade-bucket-params

mgr: Bucket scoped OSD upgrades using ok-to-upgrade

Reviewed-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Redouane Kachach [Tue, 12 May 2026 09:15:12 +0000 (11:15 +0200)]

Merge pull request #68484 from kginonredhat/issue-75967-ceph-orch-daemon-incorrectly-sets-container_image-to-force

Correct: ceph orch daemon incorrectly setting container image to force

Reviewed-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 5 May 2026 21:05:11 +0000 (02:35 +0530)]

mgr/dashboard: Updates to empty state component

- added state for no storage in empty state component
- extended the icon component to take into account the scenario of button with icon
- fix unit tests

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 27 Apr 2026 19:50:41 +0000 (01:20 +0530)]

mgr/dashboard: Update cypress dashboard e2e tests

- removed dashboard v3 tests
-fixed login, navigation, mirroring, language, osd, page header e2e tests

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Sun, 12 Apr 2026 19:27:44 +0000 (00:57 +0530)]

mgr/dashboard: Allow checks for prometheus disablement

- dont fire promethues queries if promethues is disabled

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 27 Mar 2026 21:58:02 +0000 (03:28 +0530)]

mgr/dashboard: Fix cephadm e2e tests

- these tests failing due to new onboarding page changes

Fixes https://tracker.ceph.com/issues/75697

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 27 Mar 2026 21:15:49 +0000 (02:45 +0530)]

mgr/dashboard: Fix creat-cluster welcome tests

Fixes https://tracker.ceph.com/issues/75697

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 27 Mar 2026 20:36:54 +0000 (02:06 +0530)]

mgr/dashboard: Fix overview a11y tests

Fixes https://tracker.ceph.com/issues/75696

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 27 Mar 2026 20:18:05 +0000 (01:48 +0530)]

mgr/dashboard: Fix a11y tests of navigation

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 27 Mar 2026 13:40:32 +0000 (19:10 +0530)]

mgr/dashbaord: Enable overview landing page

- removes feature toggle
- removed unused dashboard component, dashboard v3 component, and helper pipes and components

Fixes https://tracker.ceph.com/issues/75749

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 09:01:50 +0000 (12:01 +0300)]

Merge pull request #68333 from lumir-sliva/crimson/fix-stat-enoent

crimson/os/seastore: handle enoent in SeaStore::Shard::stat

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:58:50 +0000 (11:58 +0300)]

Merge pull request #68132 from myoungwon/wip-fastpath-logmanager

crimson/os/seastore: make the common write case the fast path in logmanager

Reviewed-by: Josh Durgin <jdurgin@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:56:29 +0000 (11:56 +0300)]

Merge pull request #68630 from xxhdx1985126/wip-76268

crimson/os/seastore: destroy Transaction only when no other reference exists

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:55:43 +0000 (11:55 +0300)]

Merge pull request #68839 from tchaikov/crimson-get-segment-manager-cleanups

crimson/os/seastore: SegmentManager::get_segment_manager() cleanups

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:54:39 +0000 (11:54 +0300)]

Merge pull request #68534 from xxhdx1985126/wip-76197

crimson/os/seastore/cache: conconrrent read of EXIST_CLEAN extents can

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:54:00 +0000 (11:54 +0300)]

Merge pull request #68544 from myoungwon/wip-coroutine-cjs

crimson/os/seastore/journal: switch CircularJournalSpace to coroutines

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 12 May 2026 08:53:23 +0000 (11:53 +0300)]

Merge pull request #68340 from myoungwon/wip-avoid-continuation-delta-overwrite

crimson/os/seastore: remove an extra continuation in delta-overwrite path

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Kefu Chai [Tue, 12 May 2026 08:21:56 +0000 (16:21 +0800)]

Merge pull request #68085 from tchaikov/mgr-python-cleanup

mgr: replace deprecated PyImport_ImportModuleNoBlock with PyImport_ImportModule

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Shweta Bhosale [Mon, 11 May 2026 10:02:14 +0000 (15:32 +0530)]

mgr/nfs: reuse CephfsClient for path checks and earmark resolver

cephfs_path_is_dir defined an inner function decorated with lru_cache, so
each call got a new function object and an empty cache, CephfsClient(mgr)
ran every time. Moved caching to module-level cephfs_client_for_mgr(mgr)
and call it from cephfs_path_is_dir.
Passed that shared client into CephFSEarmarkResolver from the NFS module so
export create/apply does not construct a separate CephfsClient for
earmarks.

Fixes: https://tracker.ceph.com/issues/76504
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 12 May 2026 06:36:23 +0000 (12:06 +0530)]

Merge pull request #68806 from rhcs-dashboard/db-sso-oauth2-fixes

mgr/dashboard: add oauth2 sso prerequisites and fixes missing claims and expired token

Reviewed-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Kefu Chai [Mon, 11 May 2026 05:46:25 +0000 (13:46 +0800)]

crimson: consolidate the return paths of get_segment_manager()

before this change, two branches both return `BlockSegmentManager`,
which is redundant. in this change, consolidate them so that the
`HAVE_ZNS` path becomes an early return. this improves readability.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Mon, 11 May 2026 05:27:42 +0000 (13:27 +0800)]

crimson: abort on ioctl(BLKGETNRZONES) failure

previously, we did not check the return value of ioctl(BLKGETNRZONES).

we query the number of zones of the storage device to determine which
seastore backend to use. the only possible error from this ioctl is
-EFAULT (invalid user pointer), which indicates a programming error
and should never happen in practice. use ceph_assert() to catch this.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Mon, 11 May 2026 05:07:25 +0000 (13:07 +0800)]

crimson: use uint32_t when calling ioctl(BLKGETNRZONES)

before this change, we pass a pointer to a `size_t` to
ioctl(BLKGETNRZONES), but in the Linux kernel,
include/uapi/linux/blkzoned.h:

```c
#define BLKGETNRZONES _IOR(0x12, 133, __u32)
```
this API reads 32 bits of data into the pointer. on 64-bit
architectures, size_t is 64 bits. fortunately, we initialize
nr_zones with 0, so the upper 32 bits remain zero. this works
on little-endian systems, but not on big-endian systems. it is
also semantically wrong. we should pass a pointer to a 32-bit
value when calling ioctl(BLKGETNRZONES).

in this change, we change the type of nr_zones from size_t to
uint32_t to match what the Linux kernel expects.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Mon, 11 May 2026 04:43:47 +0000 (12:43 +0800)]

crimson: coroutinize SegmentManager::get_segment_manager()

this change was inspired by following warning:

```
[1/3] Building CXX object src/crimson/os/seastore/CMakeFiles/crimson-seastore.dir/segment_manager.cc.o
/home/kefu/dev/ceph/src/crimson/os/seastore/segment_manager.cc:45:15: warning: lambda capture 'FNAME' is not used [-Wunused-lambda-capture]
45 | ).then([FNAME,
| ^
```

but we went further by coroutinize the whole method. because the return
value of ioctl() is not checked before this change, and clang correctly
flagged this with a warning, we marker it with `[[maybe_unused]]`, we
will fix it in a separate change.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

SrinivasaBharathKanta [Mon, 11 May 2026 23:16:29 +0000 (04:46 +0530)]

Merge pull request #68131 from rzarzynski/wip-ec-asserted-isa-prepare

ec: validate tcache retrievals in ErasureCodeIsaDefault::prepare()

commit | commitdiff | tree

SrinivasaBharathKanta [Mon, 11 May 2026 23:11:23 +0000 (04:41 +0530)]

Merge pull request #68609 from aainscow/attr_rollback_fix

osd: Fix incorrect rollback logic for partial write OI

commit | commitdiff | tree

SrinivasaBharathKanta [Mon, 11 May 2026 23:09:57 +0000 (04:39 +0530)]

Merge pull request #67292 from JonBailey1993/stats_fix_part_2

osd: Reduce pg_stats invalidations occurring in fast ec

commit | commitdiff | tree

Redouane Kachach [Mon, 11 May 2026 19:29:37 +0000 (21:29 +0200)]

Merge pull request #66559 from timqn22/crash-dir-permission-setting

src/cephadm: updated crash dir creation

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Redouane Kachach [Mon, 11 May 2026 19:27:53 +0000 (21:27 +0200)]

Merge pull request #68848 from rkachach/fix_issue_76511

qa/cephadm: start upgrade tests from tentacle instead of reef on main

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Laura Flores [Mon, 11 May 2026 18:41:49 +0000 (13:41 -0500)]

Merge pull request #68365 from kamoltat/wip-ksirivad-fix-75418

qa/suites/upgrade: ignore PG_DAMAGED

Reviewed-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Mon, 11 May 2026 18:35:47 +0000 (13:35 -0500)]

Merge pull request #67915 from falconlee236/fix-osd-df-sorting-main

mon/PGMap: sort 'osd df' and 'osd perf' outputs by OSD ID

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Laura Flores [Mon, 11 May 2026 18:33:33 +0000 (13:33 -0500)]

Merge pull request #68326 from ljflores/wip-tracker-75763

qa/suites/rados/encoder: remove rocky from supported distros

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Laura Flores [Mon, 11 May 2026 18:15:14 +0000 (13:15 -0500)]

Merge pull request #67715 from NitzanMordhai/wip-nitzan-is_pg_clean-hang-after-teardown

test/ceph-helpers: add timeout to ceph pg query

Reviewed-by: Radosław Zarzyński <rzarzyns@redhat.com>

commit | commitdiff | tree

Sage McTaggart [Mon, 11 May 2026 18:10:57 +0000 (14:10 -0400)]

Merge pull request #68847 from ceph/wip-doc-SageMcTSecurityCSC

docs/security: added workinggroup.rst and securitylead.rst

commit | commitdiff | tree

John Mulligan [Mon, 11 May 2026 17:10:50 +0000 (13:10 -0400)]

Merge pull request #68401 from phlogistonjohn/jjm-pypkg

build: Update python packaging for src/python-common

Reviewed-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Redouane Kachach [Mon, 11 May 2026 16:32:31 +0000 (18:32 +0200)]

Merge pull request #68747 from Kushal-deb/fix-nvmeof-apply-path

mgr/cephadm: allow nvmeof group assignment for NVMe-oF services

Reviewed-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>

commit | commitdiff | tree

Sage McTaggart [Mon, 11 May 2026 14:58:57 +0000 (10:58 -0400)]

docs/security: added workinggroup.rst and securitylead.rst
Signed-off-by: Sage McTaggart <sagemct@ibm.com>

commit | commitdiff | tree

Redouane Kachach [Mon, 11 May 2026 16:27:57 +0000 (18:27 +0200)]

Merge pull request #67344 from dermalikmann/fix-mgmt-gateway-use-vip

mgr/cephadm: mgmt-gateway bind to virtual_ip

Reviewed-by: Redouane Kachach <rkachach@ibm.com>

commit | commitdiff | tree

Pedro Gonzalez Gomez [Thu, 7 May 2026 19:36:32 +0000 (21:36 +0200)]

mgr: add prerequisites check before enabling dashboard oauth2 sso

Assisted-by: Claude:claude-4.6-sonnet
Fixes: https://tracker.ceph.com/issues/76476
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Wed, 22 Apr 2026 23:28:18 +0000 (23:28 +0000)]

src/test/mon: test_monmap_monitor.cc

Added the following test cases:
- Test success when explicitly supplied tiebreaker
- Test success when auto-selecting tiebreaker monitor
- Test success with minimal valid configuration (1 monitor per zone)
- Test success with auto-selection and minimal config (1 monitor per zone)
- Test success when strategy is automatically changed to CONNECTIVITY
- Test failure when auto-selecting and tiebreaker is in a data zone
- Test failure when explicitly specifying tiebreaker in a data zone
- Test failure when multiple potential tiebreakers exist
- Test failure when one data zone has 0 monitors
- Test failure when tiebreaker monitor doesn't exist

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Wed, 22 Apr 2026 17:55:13 +0000 (17:55 +0000)]

doc: update stretch-mode.rst

1. enable_stretch_mode no longer require to supply tiebreaker mon
2. enable_stretch_mode will automatically set monitor election strategy
to Connectivity if not already set.
3. Move away from "sites" and use "zones" instead throughout the doc

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>

commit | commitdiff | tree

Kamoltat (Junior) Sirivadhna [Wed, 15 Apr 2026 22:34:08 +0000 (22:34 +0000)]

mon: make tiebreaker mon optional in stretch-mode

Motivation:
To future support EC stretch feature, we need to
simplify how we enable stretch-mode

Solution:
Make tiebreaker argument optional.

old:
```
ceph mon enable_stretch_mode <tiebreaker_mon> <new_crush_rule>
<dividing_bucket>
```
new:
ceph mon enable_stretch_mode <tiebreaker_mon (optional)> <new_crush_rule>
<dividing_bucket>

Ceph will try to select a tiebreaker mon that resides in
the crush <dividing_bucket> type but doesn't belong
to any of the data sites which the OSDs resides in.

Also created a helper function
`MonmapMonitor::validate_and_enable_stretch_mode`
inside `MonmapMonitor::try_enable_stretch_mode`
making the logic unittestable

Moreover, ceph mon enable_stretch_mode will
automatically set monitor election strategy to Connectivity.

We now also enforce that at least 1 monitor exists for each data zone.

Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom