]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
2 months agoMerge branch 'execute_command_retry_logic_if_command_fails_with_connection_error... wip-adk-testing-2025-12-10-2318
Adam King [Thu, 11 Dec 2025 04:18:17 +0000 (23:18 -0500)]
Merge branch 'execute_command_retry_logic_if_command_fails_with_connection_error' of https://github.com/ShwetaBhosale1/ceph into wip-adk-testing-2025-12-10-2316

Conflicts:
src/pybind/mgr/cephadm/ssh.py

2 months agoMerge branch 'fix_issue_nvmeof_ssl' of https://github.com/rkachach/ceph into wip...
Adam King [Thu, 11 Dec 2025 04:16:43 +0000 (23:16 -0500)]
Merge branch 'fix_issue_nvmeof_ssl' of https://github.com/rkachach/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix_cephadm_agent_volume_gatherer' of https://github.com/xelexin/ceph...
Adam King [Thu, 11 Dec 2025 04:16:42 +0000 (23:16 -0500)]
Merge branch 'fix_cephadm_agent_volume_gatherer' of https://github.com/xelexin/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix_issue_74015' of https://github.com/rkachach/ceph into wip-adk-testi...
Adam King [Thu, 11 Dec 2025 04:16:42 +0000 (23:16 -0500)]
Merge branch 'fix_issue_74015' of https://github.com/rkachach/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix_issue_73851_cephadm_crashes_when_ganesha-rados-grace_fails' of...
Adam King [Thu, 11 Dec 2025 04:16:42 +0000 (23:16 -0500)]
Merge branch 'fix_issue_73851_cephadm_crashes_when_ganesha-rados-grace_fails' of https://github.com/ShwetaBhosale1/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix_issue_73853' of https://github.com/rkachach/ceph into wip-adk-testi...
Adam King [Thu, 11 Dec 2025 04:16:41 +0000 (23:16 -0500)]
Merge branch 'fix_issue_73853' of https://github.com/rkachach/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'rsachere_ceph-crash-daemon' of https://github.com/rsacherer/ceph into...
Adam King [Thu, 11 Dec 2025 04:16:41 +0000 (23:16 -0500)]
Merge branch 'rsachere_ceph-crash-daemon' of https://github.com/rsacherer/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix-73814-main' of https://github.com/rhcs-dashboard/ceph into wip...
Adam King [Thu, 11 Dec 2025 04:16:40 +0000 (23:16 -0500)]
Merge branch 'fix-73814-main' of https://github.com/rhcs-dashboard/ceph into wip-adk-testing-2025-12-10-2316

2 months agoMerge branch 'fix_issue_73774_nfs_tls_add_xprtsec' of https://github.com/ShwetaBhosal...
Adam King [Thu, 11 Dec 2025 04:16:40 +0000 (23:16 -0500)]
Merge branch 'fix_issue_73774_nfs_tls_add_xprtsec' of https://github.com/ShwetaBhosale1/ceph into wip-adk-testing-2025-12-10-2316

2 months agomgr/cephadm: Added retry logic for execute command if command fails with connection...
Shweta Bhosale [Wed, 10 Dec 2025 09:43:41 +0000 (15:13 +0530)]
mgr/cephadm: Added retry logic for execute command if command fails with connection error

Fixes: https://tracker.ceph.com/issues/74179
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
2 months agomgr/cephadm: Handle ganesha-rados-grace tool failure
Shweta Bhosale [Fri, 14 Nov 2025 12:04:25 +0000 (17:34 +0530)]
mgr/cephadm: Handle ganesha-rados-grace tool failure

Fixes: https://tracker.ceph.com/issues/73851
Signed-off-by: Shweta Bhosale <Shweta.Bhosale1@ibm.com>
2 months agoMerge pull request #66249 from thotz/adminops-restore-op
Jiffin Tony Thottan [Wed, 10 Dec 2025 06:41:09 +0000 (12:11 +0530)]
Merge pull request #66249 from thotz/adminops-restore-op

rgw/adminops: support for adding restore operation

2 months agoMerge pull request #54435 from dparmar18/libcephfs-nonblocking-io-testcases
Venky Shankar [Wed, 10 Dec 2025 04:38:45 +0000 (10:08 +0530)]
Merge pull request #54435 from dparmar18/libcephfs-nonblocking-io-testcases

src/test: add libcephfs tests for async(nonblocking) calls

Reviewed-by: Venky Shankar <vshankar@redhat.com>
2 months agoMerge pull request #66404 from tchaikov/wip-bwc-with-more-branch-names
Kefu Chai [Wed, 10 Dec 2025 03:21:54 +0000 (11:21 +0800)]
Merge pull request #66404 from tchaikov/wip-bwc-with-more-branch-names

script: sanitize git branch names for OCI tag compliance

Reviewed-by: John Mulligan <jmulligan@redhat.com>
2 months agoMerge pull request #66533 from imran-imtiaz/dashboard
Imran Imtiaz [Tue, 9 Dec 2025 14:02:18 +0000 (14:02 +0000)]
Merge pull request #66533 from imran-imtiaz/dashboard

mgr/dashboard: add API endpoint for consistency group name update

2 months agoMerge pull request #66506 from Matan-B/wip-matanb-crimson-seastore-extent-race-v3
Matan Breizman [Tue, 9 Dec 2025 12:43:02 +0000 (14:43 +0200)]
Merge pull request #66506 from Matan-B/wip-matanb-crimson-seastore-extent-race-v3

crimson/os/seastore/cache: Verify crc prior to complete_io

Reviewed-by: Xuehan Xu <xuxuehan@qianxin.com>
2 months agorgw/radosgw-admin.cc: use new apis for admin cli command
Jiffin Tony Thottan [Mon, 17 Nov 2025 12:34:50 +0000 (18:04 +0530)]
rgw/radosgw-admin.cc: use new apis for admin cli command

Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
2 months agorgw/adminops: support for adding restore operation
Jiffin Tony Thottan [Thu, 9 Oct 2025 13:43:52 +0000 (19:13 +0530)]
rgw/adminops: support for adding restore operation

Adding support for restore cli in admin ops, added existing clis list and statusw

Fixes: https://tracker.ceph.com/issues/70931
Signed-off-by: Jiffin Tony Thottan <thottanjiffin@gmail.com>
2 months agoMerge pull request #65689 from VallariAg/wip-listener-mon-cmd
Vallari Agrawal [Tue, 9 Dec 2025 07:25:12 +0000 (12:55 +0530)]
Merge pull request #65689 from VallariAg/wip-listener-mon-cmd

mon: Add command "nvme-gw listeners"

2 months agoMerge pull request #66549 from tchaikov/wip-cpp-btree-silence-warnings
Kefu Chai [Tue, 9 Dec 2025 05:09:15 +0000 (13:09 +0800)]
Merge pull request #66549 from tchaikov/wip-cpp-btree-silence-warnings

include/cpp-btree: fix array bounds warning in child() accessors

Reviewed-by: Edwin Rodriguez <edwin.rodriguez1@ibm.com>
2 months agoMerge pull request #66562 from Germano0/manual-deployment.osd
bluikko [Tue, 9 Dec 2025 04:43:28 +0000 (11:43 +0700)]
Merge pull request #66562 from Germano0/manual-deployment.osd

doc: add ceph.conf in osd creation

2 months agodoc: add ceph.conf in osd creation
Germano Massullo (Thetra) [Tue, 9 Dec 2025 00:23:31 +0000 (01:23 +0100)]
doc: add ceph.conf in osd creation

Signed-off-by: Germano Massullo (Thetra) <germano.massullo@thetra.eu>
2 months agoMerge pull request #66304 from NitzanMordhai/wip-nitzan-objecter-osdmap-request-override
SrinivasaBharathKanta [Mon, 8 Dec 2025 22:49:46 +0000 (04:19 +0530)]
Merge pull request #66304 from NitzanMordhai/wip-nitzan-objecter-osdmap-request-override

Objecter: respect higher epoch subscription in tick

2 months agoMerge pull request #66560 from anthonyeleven/arminfo
Anthony D'Atri [Mon, 8 Dec 2025 22:22:30 +0000 (17:22 -0500)]
Merge pull request #66560 from anthonyeleven/arminfo

doc/start: Add ARM support note to hardware-recommendations.rst

2 months agodoc/start: Add ARM support note to hardware-recommendations.rst
Anthony D'Atri [Mon, 8 Dec 2025 19:48:58 +0000 (14:48 -0500)]
doc/start: Add ARM support note to hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
2 months agoMerge pull request #66020 from bluikko/doc-conf-auth-config-ref-links-rados
Ilya Dryomov [Mon, 8 Dec 2025 19:10:26 +0000 (20:10 +0100)]
Merge pull request #66020 from bluikko/doc-conf-auth-config-ref-links-rados

doc: Use validated links with ref instead of external links

Reviewed-by: Kefu Chai <tchaikov@gmail.com>
Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 months agocrimson/os/seastore/lba_mapping: Introduce co_refresh wip-matanb-crimson-seastore-extent-race-v3
Matan Breizman [Mon, 8 Dec 2025 12:54:36 +0000 (12:54 +0000)]
crimson/os/seastore/lba_mapping: Introduce co_refresh

See comment:
```
  //TODO: should be changed to return future<> once all calls
  //   to refresh are through co_await. We return LBAMapping
  //   for now to avoid mandating the callers to make sure
  //   the life of the lba mapping survives the refresh.
```

For now introduce co_refresh and mark the existing refresh as
deprecated. Following work will audit all the existing users of
refresh and move them to the new method. This change is not trivial
so I prefer to follow up on this as a separate PR.

This should help avoiding UAR in suspension points:
```
==103588==ERROR: AddressSanitizer: stack-use-after-return on address 0xffff80197e90 at pc 0xaaaacb941b24 bp 0xffff7e48dd80 sp 0xffff7e48dd78
READ of size 8 at 0xffff80197e90 thread T1
    #0 0xaaaacb941b20 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::swap(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:172:18
    #1 0xaaaacb941998 in boost::intrusive_ptr<crimson::os::seastore::LBACursor>::operator=(boost::intrusive_ptr<crimson::os::seastore::LBACursor>&&) /opt/ceph/include/boost/smart_ptr/intrusive_ptr.hpp:93:61
    #2 0xaaaacb933758 in crimson::os::seastore::LBAMapping::operator=(crimson::os::seastore::LBAMapping&&) /ceph/src/crimson/os/seastore/lba_mapping.h:46:48
    #3 0xaaaacde2fa54 in ... crimson::os::seastore::LBAMapping&&, std::array<crimson::os::seastore::LBAManager::remap_entry_t, 1ul>) (.resume) /ceph/src/crimson/os/seastore/transaction_manager.h:1282:11
```

Deprecate is commented out since otherwise make check would fail.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/cache: Verify crc prior to complete_io
Matan Breizman [Sun, 7 Dec 2025 10:36:28 +0000 (10:36 +0000)]
crimson/os/seastore/cache: Verify crc prior to complete_io

Calling check_full_extent_integrity in pin_to_extent is wrong.
By the time we partially read the extent, another transaction
might fully load the entire extent. Either by a full_extent read
or by filling the missing unloaded extent gap.
This would result in veryfing the extent crc since it's fully
loaded. However, since the extent was only partially read our
data (and crc) might be outdated.
Instead move the crc checks to read_extent prior to complete_io.
That way we would only check the crc when the extent was intended to
be fully loaded. Other users of read_extent which do not use pin_crc
(CRC_NULL), would skip this check.

Fixes: https://tracker.ceph.com/issues/73790
Fixes: https://tracker.ceph.com/issues/72864
Signed-off-by: Zhang Song <zhangsong02@qianxin.com>
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/cache: get_absent_extent to accept pin_crc
Matan Breizman [Sun, 7 Dec 2025 10:31:41 +0000 (10:31 +0000)]
crimson/os/seastore/cache: get_absent_extent to accept pin_crc

Allow pin_to_extent_by_type and pin_to_extent to pass the pin CRC.
As get_absent_extent and read_extent has other users which are not
pin_to_extent - use CRC_NULL as the default value.
This change would allow us to move the integrity checks to _read_extent.
See next commit.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: add logs pin_to_extent
Matan Breizman [Sun, 7 Dec 2025 10:13:08 +0000 (10:13 +0000)]
crimson/os/seastore/transaction_manager: add logs pin_to_extent

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: pin_to_extent_by_type into
Matan Breizman [Sun, 7 Dec 2025 10:07:17 +0000 (10:07 +0000)]
crimson/os/seastore/transaction_manager: pin_to_extent_by_type into
coroutines

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: pin_to_extent into coroutines
Matan Breizman [Sun, 7 Dec 2025 10:03:58 +0000 (10:03 +0000)]
crimson/os/seastore/transaction_manager: pin_to_extent into coroutines

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: add logs to remap_pin
Matan Breizman [Sun, 7 Dec 2025 15:15:29 +0000 (15:15 +0000)]
crimson/os/seastore/transaction_manager: add logs to remap_pin

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: remap_pin into coroutines
Matan Breizman [Sun, 7 Dec 2025 14:51:44 +0000 (14:51 +0000)]
crimson/os/seastore/transaction_manager: remap_pin into coroutines

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agoMerge pull request #66070 from samarahu/rgw-d4n-unit-test-fix
Samarah Uriarte [Mon, 8 Dec 2025 14:32:35 +0000 (08:32 -0600)]
Merge pull request #66070 from samarahu/rgw-d4n-unit-test-fix

Reviewed-by: Pritha Srivastava <prsrivas@redhat.com>
2 months agoinclude/cpp-btree: fix array bounds warning in child() accessors
Kefu Chai [Mon, 8 Dec 2025 08:29:00 +0000 (16:29 +0800)]
include/cpp-btree: fix array bounds warning in child() accessors

Replace array indexing with pointer arithmetic in child() and
mutable_child() methods to avoid compiler warning when accessing
child nodes beyond the static array bounds.

The original code was functionally correct but triggered
-Warray-bounds when accessing mutable_child(32) during btree
operations. Using pointer arithmetic achieves the same result
without the bounds check warning.

This is a follow-up to commit 8458a19ab which fixed similar
warnings in other btree_node methods.

No functional changes.

Fixes: https://tracker.ceph.com/issues/72477
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2 months agoMerge pull request #66402 from afreen23/remove-ng
Afreen Misbah [Mon, 8 Dec 2025 10:03:05 +0000 (15:33 +0530)]
Merge pull request #66402 from afreen23/remove-ng

mgr/dashboard: Remove ng script

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: pujaoshahu <pshahu@redhat.com>
2 months agoMerge pull request #62002 from mohit84/crimson_mclock_1
Matan Breizman [Mon, 8 Dec 2025 09:40:59 +0000 (11:40 +0200)]
Merge pull request #62002 from mohit84/crimson_mclock_1

crimson: Support mclock for crimson

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
2 months agocrimson/os/seastore/transaction_manager: Move CRC_NULL checks
Matan Breizman [Sun, 7 Dec 2025 09:53:03 +0000 (09:53 +0000)]
crimson/os/seastore/transaction_manager: Move CRC_NULL checks

check_full_extent_integrity would allow for CRC_NULL (0) checksums
when `full_extent_integrity_check` was false. Instead, move this
check as an assertiion into TransactionManager::read_pin.

This would allow us to reuse CRC_NULL concept for more purposes e.g
skipping integrity checks when no CRC is passed (next commits).

Note: With this change check_full_extent_integrity could be called
      only on non CRC_NULL, the check would be moved in next commits.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agoscript: sanitize git branch names for OCI tag compliance
Kefu Chai [Tue, 25 Nov 2025 11:02:32 +0000 (19:02 +0800)]
script: sanitize git branch names for OCI tag compliance

Git branch names commonly use forward slashes for hierarchy
(e.g., feature/my-branch), but OCI container image tags cannot
contain slashes. This causes build-with-container.py to fail when
building images from branches with slashes in their names.

Add _sanitize_for_oci_tag() function to convert branch names into
OCI-compliant tags by:
- Replacing '/' with '-'
- Replacing other invalid characters with '_'
- Stripping leading invalid characters
- Truncating to 128 characters (OCI tag max length)

Apply sanitization consistently to both auto-detected branches
and manually specified --current-branch arguments.

Fixes branch name handling for pull request branches and other
hierarchical naming schemes.

References:
- OCI Image Spec: tags must match [a-zA-Z0-9_][a-zA-Z0-9._-]{0,127}
  https://github.com/opencontainers/image-spec/blob/main/descriptor.md
- Git allows '/' in branch names for hierarchical organization
  https://git-scm.com/docs/git-check-ref-format
  https://docs.github.com/en/get-started/using-git/dealing-with-special-characters-in-branch-and-tag-names

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 months agoMerge pull request #65729 from tchaikov/cmake-link-ceph-common
Kefu Chai [Mon, 8 Dec 2025 07:38:31 +0000 (15:38 +0800)]
Merge pull request #65729 from tchaikov/cmake-link-ceph-common

cmake: convert erasure_code and json_spirit to OBJECT libraries

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agodoc: Use validated links with ref instead of external links
Ville Ojamo [Wed, 22 Oct 2025 07:19:31 +0000 (14:19 +0700)]
doc: Use validated links with ref instead of external links

Use :ref: for intra-docs links that are validated, instead of external
links.
Only use already existing labels.
Fixes a few anchors that pointed to now-renamed section titles.
Use automatically generated link text where appropriate.
Delete unused link definitions.

Mostly in doc/rados/ but also a few in doc/rbd/.
Try to fix all links in each of the changed documents.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
2 months agoMerge pull request #66489 from joscollin/wip-drop-double-check
Venky Shankar [Mon, 8 Dec 2025 04:34:42 +0000 (10:04 +0530)]
Merge pull request #66489 from joscollin/wip-drop-double-check

mds: drop checking CEPH_MDS_OP_SETLAYOUT two times

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
2 months agocrimson/os/seastore: Introduce check_full_extent_integrity helper
Matan Breizman [Wed, 3 Dec 2025 13:02:27 +0000 (13:02 +0000)]
crimson/os/seastore: Introduce check_full_extent_integrity helper

This is an intermediate step to move the crc checks to Cache logic.

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
2 months agoMerge pull request #66158 from xxhdx1985126/wip-seastore-minor-perf-issue
Matan Breizman [Sun, 7 Dec 2025 08:16:57 +0000 (10:16 +0200)]
Merge pull request #66158 from xxhdx1985126/wip-seastore-minor-perf-issue

crimson/os/seastore/epm: avoid unnecessary container copies

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
2 months agocmake: convert erasure_code and json_spirit to OBJECT libraries wip-pr-65729-kefu
Kefu Chai [Tue, 30 Sep 2025 10:38:31 +0000 (18:38 +0800)]
cmake: convert erasure_code and json_spirit to OBJECT libraries

This resolves a circular dependency issue where ceph-common was linked
against erasure_code and json_spirit static libraries, while these
libraries themselves referenced symbols from ceph-common, creating an
unresolvable circular dependency. The static libraries were incorrectly
marked PUBLIC, causing executables linking against ceph-common to also
link against them directly.

The circular dependency manifested as linker errors in tests like
ceph_test_ino_release_cb, where libjson_spirit.a and liberasure_code.a
contained undefined references to ceph-common symbols (e.g.,
ceph::__ceph_assert_fail and get_str_list) that couldn't be resolved
due to the linking order.

For instance, ceph_test_ino_release_cb failed to link:
```
/usr/bin/ld: ../../../lib/libcephfs.so.2.0.0: undefined reference to symbol '_ZN4ceph18__ceph_assert_failERKNS_11assert_dataE'
/usr/bin/ld: ../../../lib/libceph-common.so.2: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
```

Changes:

- Convert erasure_code and json_spirit from STATIC to OBJECT libraries
- Embed their object files directly into ceph-common during linking,
  breaking the circular dependency and preventing public propagation
  of these dependencies to downstream targets
- Remove direct linkage from targets that already get these symbols
  through ceph-common dependencies:
  * Erasure code unit tests (unittest_erasure_code_isa,
    unittest_erasure_code_plugin_isa, unittest_erasure_code_example)
  * Test libraries (radostest)
  * Plugins (denc-mod-osd, cls_refcount, cls_rgw, cls_lua, ec_lrc)

This also prevents ODR violations that would occur if targets linked
against these libraries directly while also getting them from
ceph-common. Such violations cause undefined behavior including
segmentation faults, as static variables and vtables would exist in
duplicate, leading to crashes during destruction or when accessing
shared state.

For example, before removing direct linkage from plugins, ceph-dencoder
would segfault on certain object types:

  /ceph/src/test/encoding/readable.sh: line 111: Segmentation fault
  $CEPH_DENCODER type ScrubMap import ... decode encode decode dump_json

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 months agomgr/dashboard: add API endpoint for consistency group name update
Imran Imtiaz [Fri, 5 Dec 2025 08:46:40 +0000 (08:46 +0000)]
mgr/dashboard: add API endpoint for consistency group name update

Signed-off-by: Imran Imtiaz <imran.imtiaz@uk.ibm.com>
Fixes: https://tracker.ceph.com/issues/74121
Add a dashboard API endpoint to update (rename) consistency groups.

2 months agomgr/cephadm: Fix mgmt-gateway default port in get_port_start()
Redouane Kachach [Fri, 28 Nov 2025 08:38:45 +0000 (09:38 +0100)]
mgr/cephadm: Fix mgmt-gateway default port in get_port_start()

The mgmt-gateway port was already defaulted to 443 in most places, but
get_port_start() did not apply this default. Since the output of
get_port_start() is used both to configure the daemon ports which are
later used to to open them in firewalld, this inconsistency meant the
HTTPS port was not opened when firewalld service was active.

This change makes get_port_start() also default to port 443, ensuring
the daemon is configured correctly and the corresponding firewalld port
is opened as expected.

Fixes: https://tracker.ceph.com/issues/74015
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 months agoMerge pull request #65274 from ifed01/wip-ifed-more-max-lat
Igor Fedotov [Wed, 3 Dec 2025 14:46:08 +0000 (17:46 +0300)]
Merge pull request #65274 from ifed01/wip-ifed-more-max-lat

os/bluestore: track max latencies for key bluestore/bluefs perf counters

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>
2 months agosrc/test/crimson/CMakeLists: include dmclock wip-moagrawa-crimson-mclock-debug
Mohit Agrawal [Thu, 6 Nov 2025 12:52:23 +0000 (18:22 +0530)]
src/test/crimson/CMakeLists: include dmclock

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocrimson/CMakeLists.txt: include dmclock
Mohit Agrawal [Mon, 28 Jul 2025 14:44:47 +0000 (20:14 +0530)]
crimson/CMakeLists.txt: include dmclock

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocrimson/mclock_scheduler: Support mclock for crimson
Mohit Agrawal [Mon, 28 Jul 2025 13:25:41 +0000 (18:55 +0530)]
crimson/mclock_scheduler: Support mclock for crimson

The patch is trying to sync mclock source of crimson similar to
classic osds. Currently the feature is use by crimson only for
background recovery operations but later we will use it for other
OSD operations also.To use the same user need to configure
crimson_osd_scheduler_concurrency parameter for osd.

Replace item_t with WorkItem variant to maintain similarity
with classic OSD.

Introduce cost and priority as part of item_t.

Fixes: https://tracker.ceph.com/issues/67367
Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocrimson/background_recovery: switch to unified SchedulerClass and introduce get_avera...
Mohit Agrawal [Mon, 28 Jul 2025 13:19:14 +0000 (18:49 +0530)]
crimson/background_recovery: switch to unified SchedulerClass and introduce get_average_object_size for pg

1)  Replace usage of crimson::osd::scheduler::scheduler_class_t
    with unified SchedulerClass
2) Add priority to scheduler params structure
3) Introduce get_average_object_size for pg

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocrimson/osd: Refactor crimson scheduler wrapper
Mohit Agrawal [Mon, 28 Jul 2025 13:03:38 +0000 (18:33 +0530)]
crimson/osd: Refactor crimson scheduler wrapper

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocrimson/osd,osd_operation: initialize mClock scheduler, detect rotational devices...
Mohit Agrawal [Mon, 28 Jul 2025 12:48:57 +0000 (18:18 +0530)]
crimson/osd,osd_operation: initialize mClock scheduler, detect rotational devices, and run OperationThrottler background task

Initialize the mClock scheduler on all shards when the device class
is non-rotational. If the device is rotational throw an exception
to prevent unsupported configurations.

In addition, introduce a background task in OperationThrottler that
continuously dequeues and schedules client requests from the mClock
scheduler based on available credits and throttling limits.

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agocommon/mclock_common: Declared MonClient for crimson in mclock_common
Mohit Agrawal [Mon, 28 Jul 2025 12:08:07 +0000 (17:38 +0530)]
common/mclock_common: Declared MonClient for crimson in mclock_common

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>
2 months agoMerge pull request #66453 from aainscow/lrc_fix
NitzanMordhai [Wed, 3 Dec 2025 09:55:59 +0000 (11:55 +0200)]
Merge pull request #66453 from aainscow/lrc_fix

osd: Perform shard look up correctly in partial EC writes

2 months agoMerge pull request #66384 from bill-scales/issue72879
NitzanMordhai [Wed, 3 Dec 2025 09:54:19 +0000 (11:54 +0200)]
Merge pull request #66384 from bill-scales/issue72879

Fix teuthology timeout issues with bluestore software compression and improve thread heartbeat timeout code

2 months agomgr/cephadm: fix nvmeof TLS handling and add coverage for ssl/mTLS
Redouane Kachach [Wed, 3 Dec 2025 09:36:25 +0000 (10:36 +0100)]
mgr/cephadm: fix nvmeof TLS handling and add coverage for ssl/mTLS

This PR fixes the value of `ssl` field on `NvmeofServiceSpec` (was
always set to enable_auth) and add some UT to make sure both specs
with ssl only and with mTLS enabled (enable_auth) generate the
expected daemon configuration.

Fixes: https://tracker.ceph.com/issues/74073
Signed-off-by: Redouane Kachach <rkachach@ibm.com>
2 months agoMerge PR #66093 into main
Venky Shankar [Wed, 3 Dec 2025 06:40:18 +0000 (12:10 +0530)]
Merge PR #66093 into main

* refs/pull/66093/head:

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: Shwetha Acharya <sacharya@redhat.com>
2 months agomds: drop checking CEPH_MDS_OP_SETLAYOUT two times
Jos Collin [Tue, 2 Dec 2025 12:58:13 +0000 (18:28 +0530)]
mds: drop checking CEPH_MDS_OP_SETLAYOUT two times

Signed-off-by: Jos Collin <jcollin@redhat.com>
2 months agoorch/cephadm: Fixes a unlimited env append in cephadm agent
Rafal Wadolowski [Tue, 2 Dec 2025 12:06:29 +0000 (13:06 +0100)]
orch/cephadm: Fixes a unlimited env append in cephadm agent

We will check if environment variable exist before adding it.

Fixes: https://tracker.ceph.com/issues/74053
Signed-off-by: Rafal Wadolowski <rafal.wadolowski@cleura.com>
2 months agoMerge pull request #66339 from rhcs-dashboard/cephfs-mirror-list-endpoint
Pedro Gonzalez Gomez [Tue, 2 Dec 2025 11:28:31 +0000 (12:28 +0100)]
Merge pull request #66339 from rhcs-dashboard/cephfs-mirror-list-endpoint

mgr/dashboard: add GET endpoints for CephFS mirror peers

Reviewed-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
2 months agoMerge pull request #66461 from imran-imtiaz/dashboard
Imran Imtiaz [Tue, 2 Dec 2025 09:59:59 +0000 (09:59 +0000)]
Merge pull request #66461 from imran-imtiaz/dashboard

mgr/dashboard: add API endpoint to delete images from consistency groups

2 months agoMerge pull request #66440 from tchaikov/wip-avoid-odr
Kefu Chai [Tue, 2 Dec 2025 04:46:55 +0000 (12:46 +0800)]
Merge pull request #66440 from tchaikov/wip-avoid-odr

osd: fix ODR violation in max_prio_map

Reviewed-by: Matan Breizman <mbreizma@ibm.com>
2 months agoMerge PR #66328 into main
Patrick Donnelly [Tue, 2 Dec 2025 02:13:41 +0000 (21:13 -0500)]
Merge PR #66328 into main

* refs/pull/66328/head:
mon/HealthMonitor: avoid MON_DOWN for freshly added Monitor
mon: add time_added to mon_info_t
common/options: add missing runtime flag
mon/MonMap: cleanup initialization

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
2 months agoMerge pull request #66462 from cbodley/wip-cmake-breakpad-arch2
Casey Bodley [Mon, 1 Dec 2025 21:22:30 +0000 (16:22 -0500)]
Merge pull request #66462 from cbodley/wip-cmake-breakpad-arch2

cmake: fix for -DWITH_BREAKPAD=OFF

Reviewed-by: Joseph Mundackal <jmundackal@bloomberg.net>
2 months agoqa/d4n: Update host checks
Samarah [Tue, 18 Nov 2025 16:21:40 +0000 (16:21 +0000)]
qa/d4n: Update host checks

Signed-off-by: Samarah <samarah.uriarte@ibm.com>
2 months agotest/d4n: Add temporary eviction testing
Samarah [Mon, 24 Nov 2025 20:14:13 +0000 (20:14 +0000)]
test/d4n: Add temporary eviction testing

Signed-off-by: Samarah <samarah.uriarte@ibm.com>
2 months agorgw/d4n: Add yield parameter to get_free_space
Samarah Uriarte [Mon, 27 Oct 2025 21:35:45 +0000 (16:35 -0500)]
rgw/d4n: Add yield parameter to get_free_space

Signed-off-by: Samarah Uriarte <samarah.uriarte@ibm.com>
2 months agoMerge pull request #66431 from cbodley/wip-doc-release-os-recommendations
Ilya Dryomov [Mon, 1 Dec 2025 19:05:43 +0000 (20:05 +0100)]
Merge pull request #66431 from cbodley/wip-doc-release-os-recommendations

doc: add Tentacle to os recommendations

Reviewed-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 months agodoc: remove redundant note about tested container hosts
Casey Bodley [Mon, 1 Dec 2025 17:53:59 +0000 (12:53 -0500)]
doc: remove redundant note about tested container hosts

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2 months agodoc: remove old distros from os recommendations
Casey Bodley [Mon, 1 Dec 2025 17:53:42 +0000 (12:53 -0500)]
doc: remove old distros from os recommendations

Signed-off-by: Casey Bodley <cbodley@redhat.com>
2 months agocmake: fix for -DWITH_BREAKPAD=OFF wip-cmake-breakpad-arch2
Casey Bodley [Mon, 1 Dec 2025 15:25:16 +0000 (10:25 -0500)]
cmake: fix for -DWITH_BREAKPAD=OFF

in 1ba55a20be1023c585ba96617dc6a9d2aa79a51b, i tried to avoid the NOT
condition by swapping the option's defaults. but when the condition is
false, the option is forced to ON even if the user manually set it OFF

fix this by inverting the condition and swapping the default values

Reported-by: Joseph Mundackal <joseph.j.mundackal@gmail.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
2 months agoMerge pull request #63794 from clwluvw/enc-copy
Casey Bodley [Mon, 1 Dec 2025 15:33:11 +0000 (10:33 -0500)]
Merge pull request #63794 from clwluvw/enc-copy

rgw: implement CopyObject for encrypted objects

Reviewed-by: Casey Bodley <cbodley@redhat.com>
2 months agomgr/dashboard: add API endpoint to delete images from consistency groups
Imran Imtiaz [Mon, 1 Dec 2025 14:25:07 +0000 (14:25 +0000)]
mgr/dashboard: add API endpoint to delete images from consistency groups

Signed-off-by: Imran Imtiaz <imran.imtiaz@uk.ibm.com>
Fixes: https://tracker.ceph.com/issues/74033
Create a consistency group dashboard API endpoint that enables removal
of RBD images from the group.

2 months agoMerge pull request #66410 from aclamk/aclamk-encode-fix-debug-macro
Adam Kupczyk [Mon, 1 Dec 2025 14:09:05 +0000 (15:09 +0100)]
Merge pull request #66410 from aclamk/aclamk-encode-fix-debug-macro

encode: Fix bad use of DENC_DUMP_PRE

2 months agoosd: fix ODR violation in max_prio_map
Kefu Chai [Thu, 27 Nov 2025 12:28:31 +0000 (20:28 +0800)]
osd: fix ODR violation in max_prio_map

The static std::map max_prio_map was defined in the osd_types.h header
file, causing every translation unit that included this header to get
its own copy of the variable. This led to One Definition Rule (ODR)
violations where multiple instances of the same variable existed at
runtime.

During program cleanup, destructors for these multiple instances would
attempt to free the same memory regions, resulting in segmentation
faults in tcmalloc/memory allocator as seen with ceph-dencoder.

This issue surfaced after a yet-merged-change which converts erasure_code
and json_spirit to OBJECT libraries. Before that change, these were
STATIC libraries that were linked via target_link_libraries. The
incorrect linkage meant their object files (and thus their copies of
max_prio_map) were kept separate and didn't conflict at runtime.

After converting to OBJECT libraries and properly incorporating them
into libceph-common.so (commit 8b0e3fb2c23), the multiple copies of
max_prio_map from different translation units all ended up in the same
shared library, exposing the ODR violation. During program exit, the
dynamic linker attempted to run destructors for all instances, leading
to double-free crashes.

Fix by moving the map into a static helper function in PeeringState.cc
(the only file that uses it). The map is now a function-local static
const variable, ensuring a single instance that is properly initialized
and destructed.

Backtrace before fix:
```
    #0  0x00007ffff7dbb1a0 in tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int) () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
    #1  0x00007ffff7dbb57f in tcmalloc::ThreadCache::Scavenge() () from /lib/x86_64-linux-gnu/libtcmalloc.so.4
    #2  0x00007ffff6bc8aa2 in std::__new_allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
    #3  0x00007ffff6bc89f9 in std::allocator<std::_Rb_tree_node<std::pair<int const, int> > >::deallocate (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890, __n=1)
    #4  std::allocator_traits<std::allocator<std::_Rb_tree_node<std::pair<int const, int> > > >::deallocate (__a=..., __p=0x555555f43890, __n=1)
    #5  std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_put_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
    #6  0x00007ffff6bc892e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_drop_node (this=0x7ffff7d48f78 <max_prio_map>, __p=0x555555f43890)
    #7  0x00007ffff6bc886e in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43890)
    #8  0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43cb0)
    #9  0x00007ffff6bc8854 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_erase (this=0x7ffff7d48f78 <max_prio_map>, __x=0x555555f43ad0)
    #10 0x00007ffff6bc8805 in std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::~_Rb_tree (this=0x7ffff7d48f78 <max_prio_map>)
    #11 0x00007ffff6bc7345 in std::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >::~map (this=0x7ffff7d48f78 <max_prio_map>)
    #12 0x00007ffff484bd51 in __cxa_finalize (d=0x7ffff7d3f440) at ./stdlib/cxa_finalize.c:97
    #13 0x00007ffff6af9487 in __do_global_dtors_aux () from /home/kefu/dev/ceph/build/lib/libceph-common.so.2
    #14 0x00007ffff7fbfd20 in ?? ()
    #15 0x00007ffff7fc8fc2 in _dl_call_fini (closure_map=0x7fffffffd0f0, closure_map@entry=0x7ffff7fbfd20) at ./elf/dl-call_fini.c:43
    #16 0x00007ffff7fcbe72 in _dl_fini () at ./elf/dl-fini.c:120
    #17 0x00007ffff484c291 in __run_exit_handlers (status=0, listp=0x7ffff49f1680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:118
    #18 0x00007ffff484c35a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:148
    #19 0x00007ffff4833caf in __libc_start_call_main (main=main@entry=0x55555556cd90 <main(int, char const**)>, argc=argc@entry=2, argv=argv@entry=0x7fffffffd488) at ../sysdeps/nptl/libc_start_call_main.h:74
    #20 0x00007ffff4833d65 in __libc_start_main_impl (main=0x55555556cd90 <main(int, char const**)>, argc=2, argv=0x7fffffffd488, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd478) at ../csu/libc-start.c:360
    #21 0x00005555555695e1 in _start ()
```

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
2 months agoMerge pull request #66407 from rhcs-dashboard/redirect-other-propery-name
Nizamudeen A [Mon, 1 Dec 2025 12:35:39 +0000 (18:05 +0530)]
Merge pull request #66407 from rhcs-dashboard/redirect-other-propery-name

mgr/dashboard: support custom prop for table item redirection

Reviewed-by: Afreen Misbah <afreen@ibm.com>
2 months agoMerge pull request #66437 from rhcs-dashboard/fix-74008-main
afreen23 [Mon, 1 Dec 2025 11:56:07 +0000 (17:26 +0530)]
Merge pull request #66437 from rhcs-dashboard/fix-74008-main

mgr/dashboard: fix multi-cluster route reload logic

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2 months agoMerge pull request #66346 from rhcs-dashboard/fix-service-form
afreen23 [Mon, 1 Dec 2025 11:54:04 +0000 (17:24 +0530)]
Merge pull request #66346 from rhcs-dashboard/fix-service-form

mgr/dashboard: service creation fails if service name is same as sevice type

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
2 months agoMerge pull request #66436 from rhcs-dashboard/add-sagar--to-mailmap-githubmap-organiz...
afreen23 [Mon, 1 Dec 2025 09:29:57 +0000 (14:59 +0530)]
Merge pull request #66436 from rhcs-dashboard/add-sagar--to-mailmap-githubmap-organizationmap

add Sagar Gopale to githubmap mailmap organizationmap

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Abhishek Desai <abhishek.desai1@ibm.com>
2 months agoMerge pull request #65433 from mohit84/repeer_on_acting
SrinivasaBharathKanta [Mon, 1 Dec 2025 09:27:14 +0000 (14:57 +0530)]
Merge pull request #65433 from mohit84/repeer_on_acting

test: repeer_on_down_acting_member_coming_back is continuously failing

2 months agomgr/dashboard: support custom prop for table item redirection
Nizamudeen A [Tue, 25 Nov 2025 11:31:31 +0000 (17:01 +0530)]
mgr/dashboard: support custom prop for table item redirection

use an extra customTemplateConfig called `customRowProperty` where
you can provide the key of the property you wish to route, instead of
relying on the cell's prop itself

Fixes: https://tracker.ceph.com/issues/73989
Signed-off-by: Nizamudeen A <nia@redhat.com>
2 months agoosd: Perform shard look up correctly in partial EC writes
Alex Ainscow [Fri, 28 Nov 2025 14:33:13 +0000 (14:33 +0000)]
osd: Perform shard look up correctly in partial EC writes

Plugins are permitted to provide a mapping to change the order in which OSDs
are used. In practice only LRC does this and it is not currently enabled
with optimisations, so this is a theoretical bug.

The bug here was that the "first" shard was assumed to be shard_id_t(0).  However,
this is not true for LRC.

Fixes: https://tracker.ceph.com/issues/74016
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
2 months agoMerge pull request #65771 from aainscow/ec_direct_reads_pr_1
Alex Ainscow [Thu, 27 Nov 2025 23:17:37 +0000 (23:17 +0000)]
Merge pull request #65771 from aainscow/ec_direct_reads_pr_1

EC Direct Reads: First PR, background work

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
2 months agoMerge pull request #66377 from baum/rbd_aio_write_with_crc32c_initial_fix
Ilya Dryomov [Thu, 27 Nov 2025 22:58:38 +0000 (23:58 +0100)]
Merge pull request #66377 from baum/rbd_aio_write_with_crc32c_initial_fix

librbd: rbd_aio_write_with_crc32c store CRC32C with initial value -1 to match msgr2 validation

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
2 months agomgr/dashboard: fix multi-cluster route reload logic
Aashish Sharma [Thu, 27 Nov 2025 09:22:47 +0000 (14:52 +0530)]
mgr/dashboard: fix multi-cluster route reload logic

Issue: Route was being force-reloaded using a two-step navigation hack causing unnecessary redirects and side effects.
Fix: Replaced the hack with Angular’s native same-URL reload using onSameUrlNavigation: 'reload' for a clean, stable route refresh.

Fixes: https://tracker.ceph.com/issues/74008
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
2 months agoqa: Reduce number of osd threads when using compression
Bill Scales [Fri, 21 Nov 2025 10:06:22 +0000 (10:06 +0000)]
qa: Reduce number of osd threads when using compression

Smithi nodes used by teuthology tests have 8 CPU cores and typically run
4 OSD processes. When bluestore software compression is enabled the size
of the OSD thread pool needs to be reduced to 2 threads per OSD because
these threads can easily use 100% of a core. This avoids excessive
amounts of context switches, which leads to OSD threads timing out,
which causes the OSD to drop heartbeat pings and for the monitor to
temporarily mark it down. In extreme cases this can lead to PGs getting
stuck in repeated loops of peering until the teuthology test times out.

Context switches happen oppurtunistically at the end of system calls
so functions with lots of logging are some of the worst affected.

Fixes: https://tracker.ceph.com/issues/72879
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
2 months agoosd: Restrict logging in MissingLoc::add_source_info
Bill Scales [Fri, 21 Nov 2025 10:38:44 +0000 (10:38 +0000)]
osd: Restrict logging in MissingLoc::add_source_info

add_source_info can generate an excessive amount of logging
if a PG has thousands of missing objects. When a system is
under load and threads are repeatedly context switching this
can lead to timeouts (tests showed this function taking up
to 10 seconds to execute with 99% of that time being in
logging calls where the thread was being pre-empted).
Stopping logging after the function has been running for
more than 0.5 seconds strikes a balance between providing
sufficient informtion to debug problems while providing
more stability when a system is heavily loaded.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
2 months agoosd: Increase log level for listing missing list
Bill Scales [Fri, 21 Nov 2025 10:25:48 +0000 (10:25 +0000)]
osd: Increase log level for listing missing list

Logging the entire contents of a missing list can generate a
1M character log line when there are 8000 missing objects in a
PG. Other places in the code logging the missing list use debug
level 25 which is not enabled by default in teuthology tests.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
2 months agoosd: reset_tp_timeout should reset timeout for all shards
Bill Scales [Mon, 24 Nov 2025 09:18:21 +0000 (09:18 +0000)]
osd: reset_tp_timeout should reset timeout for all shards

ShardedThreadPools are only used by the classic OSD process
which can have more than one thread for the same shard. Each
thread has a heartbeat timeout used to detect stalled threads.
Some code that is known to take a long time makes calls to
reset_tp_timeout to reset this timeout. However for sharded
pools this can be ineffective because it is common for threads
for the same shard to use the same locks (e.g. PG Lock) and
therefore if thread A is taking a long time and resetting
its timeout while holding a lock, thread B for the same shard
is liable to be waiting for the same lock, will not be
resetting its timeout and can be timed out.

Debug for issue 72879 showed heartbeat timeouts occurring at
the same time for both shards, an attempt to fix the problem
by calling reset_tp_timeout for the slow thread still showed
the other threads for the shard timing out waiting for the PG
lock that was held bythe slow thread. Looking at the OSD code
most places where reset_tp_timeout is called the thread is
holding the PG lock.

This commit moves the concept of shard_index from OSD into
ShardedThreadPool and modifies reset_tp_timeout so that it resets
the timeout for all threads for the same shard.

Some code calls reset_tp_timeout from inside loops that can take
a long time without consideration for how long the thread has
actually been running for. There is a risk that this type of
call could repeatedly reset the timeout for another shard which
is genuinely stuck and hence defeat the heartbeat checks. To
prevent this reset_tp_timeout is modified to be a NOP unless
the thread has been processing the current workitem for more
than 0.5 seconds. Therefore threads have to be slow but making
forward progress to be abe to reset the timeout.

Fixes: https://tracker.ceph.com/issues/72879
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
3 months agoMerge pull request #65739 from tchaikov/rgw-gap-list-manpage
Kefu Chai [Thu, 27 Nov 2025 04:12:08 +0000 (12:12 +0800)]
Merge pull request #65739 from tchaikov/rgw-gap-list-manpage

debian: include rgw-gap-list manpage and rgw-policy-check in ceph-common

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@ibm.com>
3 months agomgr/dashboard: add GET endpoint for CephFS mirror peers list and daemon status
Pedro Gonzalez Gomez [Thu, 20 Nov 2025 14:09:03 +0000 (15:09 +0100)]
mgr/dashboard: add GET endpoint for CephFS mirror peers list and daemon status

Fixes: https://tracker.ceph.com/issues/74002
Signed-off-by: Pedro Gonzalez Gomez <pegonzal@ibm.com>
3 months agodoc: remove os recommendations for eol releases
Casey Bodley [Wed, 26 Nov 2025 16:06:16 +0000 (11:06 -0500)]
doc: remove os recommendations for eol releases

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 months agodoc/dev: add os-recommendations.rst to release checklist
Casey Bodley [Wed, 26 Nov 2025 15:44:14 +0000 (10:44 -0500)]
doc/dev: add os-recommendations.rst to release checklist

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 months agodoc: add Tentacle to os recommendations
Casey Bodley [Wed, 26 Nov 2025 15:41:31 +0000 (10:41 -0500)]
doc: add Tentacle to os recommendations

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 months agodoc: remove Octopus and Centos7 from os recommendations
Casey Bodley [Wed, 26 Nov 2025 15:36:53 +0000 (10:36 -0500)]
doc: remove Octopus and Centos7 from os recommendations

cleanup to prepare for tentacle

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 months agoMerge pull request #66416 from bluikko/doc-fscrypt-improvements-cephfs
bluikko [Wed, 26 Nov 2025 13:51:41 +0000 (20:51 +0700)]
Merge pull request #66416 from bluikko/doc-fscrypt-improvements-cephfs

doc/cephfs: Small improvements in fscrypt.rst

3 months agoMerge pull request #66420 from bluikko/doc-sphinx-warnings-202511
bluikko [Wed, 26 Nov 2025 13:51:21 +0000 (20:51 +0700)]
Merge pull request #66420 from bluikko/doc-sphinx-warnings-202511

doc: Fix Sphinx warnings