]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 weeks agomgr/dashboard: mgr/dashboard: Carbonize Realm Name and Token block in Multi-site... 68546/head
Sagar Gopale [Fri, 17 Apr 2026 10:49:44 +0000 (16:19 +0530)]
mgr/dashboard: mgr/dashboard: Carbonize Realm Name and Token block in Multi-site Replication Wizard

Fixes: https://tracker.ceph.com/issues/76085
Signed-off-by: Sagar Gopale <sagar.gopale@ibm.com>
(cherry picked from commit 5155b2125db002c21be1b014ccd6389b7c7b386e)

3 weeks agoMerge PR #67511 into tentacle
Patrick Donnelly [Fri, 17 Apr 2026 21:47:14 +0000 (17:47 -0400)]
Merge PR #67511 into tentacle

* refs/pull/67511/head:
doc/_ext: fix ceph_commands.py for new decorator-based command system
pybind/mgr: add per-module CLICommand instances to remaining modules
pybind/mgr/dashboard: create DBCLICommand, use throughout
pybind/mgr/tests/test_object_format: update DecoDemo to use fresh CLICommand
pybind/mgr/smb: adapt SMBCommand to use CLICommandBase
pybind/orchestrator,cephadm: replace CLICommandMeta
pybind/mgr: mechanically fix simple users to not import CLI*Command
pybind/mgr/mgr_module: support per-module CLICommand instances and globals
pybind/.../dashboard: misc automatic linter fixes

Reviewed-by: John Mulligan <jmulligan@redhat.com>
3 weeks agodoc/_ext: fix ceph_commands.py for new decorator-based command system 67511/head
Kefu Chai [Sun, 8 Feb 2026 12:34:15 +0000 (20:34 +0800)]
doc/_ext: fix ceph_commands.py for new decorator-based command system

After commit 4aa9e246f, mgr modules migrated from using a class-level
COMMANDS list to decorator-based command registration using per-module
CLICommand instances (e.g., @BalancerCLICommand.Read('balancer status')).

This broke the ceph_commands.py Sphinx extension which was hardcoded to
expect m.COMMANDS to be a list, causing documentation builds to fail.

But not all modules are using this per-module CLICommand. Some modules are
fully migrated (balancer, hello, etc.) and use decorators, while others
are partially migrated (volumes, progress, stats, influx, k8sevents,
osd_perf_query, osd_support) - they have CLICommand defined but still
use the old COMMANDS list.

This fix updates _collect_module_commands() to handle three scenarios:

1. Fully migrated modules: Check CLICommand.dump_cmd_list() and use it
   if it returns commands
2. Partially migrated modules: Fall back to the old COMMANDS list if
   dump_cmd_list() returns empty
3. Legacy modules: Use COMMANDS list if CLICommand doesn't exist

This ensures the Sphinx extension works with modules in any migration
state, maintaining backwards compatibility while supporting the new
decorator pattern.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
(cherry picked from commit 77efb41aec4a3ccece0bbca94e7c5b3fea154298)

3 weeks agopybind/mgr: add per-module CLICommand instances to remaining modules
Samuel Just [Wed, 26 Nov 2025 22:51:54 +0000 (22:51 +0000)]
pybind/mgr: add per-module CLICommand instances to remaining modules

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 4aa9e246f05663ec334f67d8e7f1ce817c1cbf2d)

Conflicts:
  - src/pybind/mgr/prometheus/module.py
  CLIReadCommand/CLIWriteCommand removed from imports, not yet in tentacle - kept tentacle version
  - src/pybind/mgr/orchestrator/module.py
      _cert_store_key_ls, _cert_store_cert_ls - signature changed in main, kept tentacle version
  _cert_store_entity_ls - function renamed in main, kept tentacle name
  _cert_store_entity_ls - function name changed in main but not in tentacle
  - src/pybind/mgr/smb/cli.py
  typing imports differ between main and tentacle - manually reconciled
  error_wrapper not backported to tentacle - removed from cherry-pick
  - src/pybind/mgr/mirroring/module.py
  import differs between main and tentacle - manually kept tentacle imports
  - src/pybind/mgr/pg_autoscaler/module.py
      get_scaling_threshold not backported to tentacle - removed that change

3 weeks agopybind/mgr/dashboard: create DBCLICommand, use throughout
Samuel Just [Mon, 24 Nov 2025 17:37:41 +0000 (09:37 -0800)]
pybind/mgr/dashboard: create DBCLICommand, use throughout

Also moves Command from mgr_module to DBCommand in dashboard/cli.py
since dashboard is the only user and this way it can directly use
DBCLICommand.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 9fe5a643f433127e0e59b6ba04685aa60de5588b)

3 weeks agopybind/mgr/tests/test_object_format: update DecoDemo to use fresh CLICommand
Samuel Just [Mon, 24 Nov 2025 17:36:50 +0000 (09:36 -0800)]
pybind/mgr/tests/test_object_format: update DecoDemo to use fresh CLICommand

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit d8acddcc0b6af05bed4348a36a1752b679cdd7a0)

3 weeks agopybind/mgr/smb: adapt SMBCommand to use CLICommandBase
Samuel Just [Mon, 24 Nov 2025 17:36:15 +0000 (09:36 -0800)]
pybind/mgr/smb: adapt SMBCommand to use CLICommandBase

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit a58d20cca388d2339c9b999f6279a1439d31ccbe)

3 weeks agopybind/orchestrator,cephadm: replace CLICommandMeta
Samuel Just [Mon, 24 Nov 2025 17:31:47 +0000 (09:31 -0800)]
pybind/orchestrator,cephadm: replace CLICommandMeta

orchestrator and cephadm relied on CLICommandMeta to bypass the global
behavior of CLICommand.  That is no longer a problem, so replace
CLICommandMeta with OrchestratorCLICommandBase to preserve the magic
error wrapping.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 776abe4f87cdad368cccc4c3994ec8e4b5cdaf13)

3 weeks agopybind/mgr: mechanically fix simple users to not import CLI*Command
Samuel Just [Mon, 24 Nov 2025 17:29:55 +0000 (09:29 -0800)]
pybind/mgr: mechanically fix simple users to not import CLI*Command

The next commit will introduce module specific CLICommand classes.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 9099c682c2596612df7ab698e5ac3cfa578eb6d3)

3 weeks agopybind/mgr/mgr_module: support per-module CLICommand instances and globals
Samuel Just [Mon, 24 Nov 2025 17:27:24 +0000 (09:27 -0800)]
pybind/mgr/mgr_module: support per-module CLICommand instances and globals

Otherwise, the class members on MgrModule and CLICommand are global to all
modules in the same interpreter.

Following commits will introduce a per-module CLICommand types for each
module.

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit 2d79ae64795fcd348b2f2c54d58105c941446b50)

3 weeks agopybind/.../dashboard: misc automatic linter fixes
Samuel Just [Tue, 2 Dec 2025 00:57:22 +0000 (16:57 -0800)]
pybind/.../dashboard: misc automatic linter fixes

Signed-off-by: Samuel Just <sjust@redhat.com>
(cherry picked from commit ff8639818447b8079ceacf3b5d8cbe6ce8cee007)

3 weeks agoMerge PR #65861 into tentacle
Patrick Donnelly [Thu, 16 Apr 2026 16:21:05 +0000 (12:21 -0400)]
Merge PR #65861 into tentacle

* refs/pull/65861/head:
mgr/smb: fix error handling for fundamental resource parsing

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: John Mulligan <jmulligan@redhat.com>
3 weeks agoMerge PR #68057 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:56:52 +0000 (11:56 -0400)]
Merge PR #68057 into tentacle

* refs/pull/68057/head:
qa/rgw/upgrade: symlinks are explicit about distro versions

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
3 weeks agoMerge PR #68142 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:55:31 +0000 (11:55 -0400)]
Merge PR #68142 into tentacle

* refs/pull/68142/head:
test/rgw/notification: do not use netstat in the code

Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 weeks agoMerge PR #68254 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:53:19 +0000 (11:53 -0400)]
Merge PR #68254 into tentacle

* refs/pull/68254/head:
This change introduces the shared memory communication (SMC-D) for the cluster network.

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
3 weeks agoMerge PR #67757 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:52:23 +0000 (11:52 -0400)]
Merge PR #67757 into tentacle

* refs/pull/67757/head:
qa/tasks/barbican: add kek for simple_crypto_plugin
qa/suites/rgw: use 'member' instead of 'Member' for roles in barbican
qa/suites/rgw: bump keystone to stable/2025.2

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 weeks agoMerge PR #67468 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:51:13 +0000 (11:51 -0400)]
Merge PR #67468 into tentacle

* refs/pull/67468/head:
test/rgw/lua: ignore hours for zero mtime

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Yuval Lifshitz <ylifshit@redhat.com>
3 weeks agoMerge PR #67061 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:50:51 +0000 (11:50 -0400)]
Merge PR #67061 into tentacle

* refs/pull/67061/head:
qa/tasks/dnsmasq: preserve nameserver for future use
qa/suites/rgw/website: enable centos_latest

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
3 weeks agoMerge PR #68350 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:37:54 +0000 (11:37 -0400)]
Merge PR #68350 into tentacle

* refs/pull/68350/head:
mgr/DaemonServer: Limit search for OSDs to upgrade within the crush bucket.

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 weeks agoMerge PR #67888 into tentacle
Patrick Donnelly [Tue, 14 Apr 2026 15:34:05 +0000 (11:34 -0400)]
Merge PR #67888 into tentacle

* refs/pull/67888/head:
os/bluestore: track compression_*blob_size* parameters for online update.

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
3 weeks agoMerge PR #68276 into tentacle
Patrick Donnelly [Mon, 13 Apr 2026 17:27:41 +0000 (13:27 -0400)]
Merge PR #68276 into tentacle

* refs/pull/68276/head:
debian: remove stale distutils override from py3dist-overrides

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 weeks agoMerge PR #68118 into tentacle
Patrick Donnelly [Mon, 13 Apr 2026 17:23:48 +0000 (13:23 -0400)]
Merge PR #68118 into tentacle

* refs/pull/68118/head:
qa/tasks/backfill_toofull.py: Fix assert failures with & without compression

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
3 weeks agoos/bluestore: track compression_*blob_size* parameters for online update. 67888/head
Igor Fedotov [Thu, 19 Feb 2026 17:39:56 +0000 (20:39 +0300)]
os/bluestore: track compression_*blob_size* parameters for online update.

Fixes: https://tracker.ceph.com/issues/75032
Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit b14012f3f608a279d8b650793a2f2d48d9c0e40a)

3 weeks agoThis change introduces the shared memory communication (SMC-D) for the cluster network. 68254/head
Aliaksei Makarau [Tue, 31 Mar 2026 06:40:04 +0000 (08:40 +0200)]
This change introduces the shared memory communication (SMC-D) for the cluster network.
SMC-D is faster than ethernet in IBM Z LPARs and/or VMs (zVM or KVM).

Fixes: https://tracker.ceph.com/issues/66702
Signed-off-by: Aliaksei Makarau <aliaksei.makarau@ibm.com>
(cherry picked from commit e65af75a67b445bf7014842e9d9b3cfbae1e464b)

3 weeks agomgr/DaemonServer: Limit search for OSDs to upgrade within the crush bucket. 68350/head
Sridhar Seshasayee [Wed, 25 Mar 2026 08:49:03 +0000 (14:19 +0530)]
mgr/DaemonServer: Limit search for OSDs to upgrade within the crush bucket.

The behavior of the 'ok-to-upgrade' command is now more deterministic with
respect to the parameters passed.

To achieve the above, the commit implements the following changes:

1. The 'ok-to-upgrade' command is modified to operate strictly on the OSDs
   within the CRUSH bucket and, if possible, meet the '--max' criteria when
   specified. When --max <num> is provided, the command returns up to <num>
   OSD IDs from the specified CRUSH bucket that can be safely stopped for
   simultaneous upgrade. This is useful when only a subset of OSDs within
   the bucket needs to be upgraded for performance or other reasons.

2. Modifies the standalone tests to reflect the above change.

3. Modifies the relevant documentation to reflect the change in behavior.

Fixes: https://tracker.ceph.com/issues/75681
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
(cherry picked from commit f18093fc09bfedbb02cbe7967fc85b2dea9ff71f)

4 weeks agoMerge PR #67982 into tentacle
Patrick Donnelly [Fri, 10 Apr 2026 23:58:33 +0000 (19:58 -0400)]
Merge PR #67982 into tentacle

* refs/pull/67982/head:
rgw/tests: add os-specific java 1.7 install commands to keycloak task

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 weeks agorgw/tests: add os-specific java 1.7 install commands to keycloak task 67982/head
J. Eric Ivancich [Fri, 27 Feb 2026 20:56:29 +0000 (15:56 -0500)]
rgw/tests: add os-specific java 1.7 install commands to keycloak task

Add commands to keycloak task specific for rocky, rhel, centos, and
ubuntu. Also, clean-up installed package(s) after test is run.

This is necessary as rocky can't install the same package(s) that the
other os types currently can.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit ee710390d277784ddac3d70c9e11e427f46f363d)

4 weeks agoMerge PR #67942 into tentacle
Patrick Donnelly [Fri, 10 Apr 2026 19:12:17 +0000 (15:12 -0400)]
Merge PR #67942 into tentacle

* refs/pull/67942/head:
nvmeofgw: fix issue of delete all gws from the pool/group

Reviewed-by: Aviv Caro <Aviv.Caro@ibm.com>
Reviewed-by: Alexander Indenbaum <aindenba@redhat.com>
4 weeks agoMerge PR #67790 into tentacle
Patrick Donnelly [Fri, 10 Apr 2026 19:08:22 +0000 (15:08 -0400)]
Merge PR #67790 into tentacle

* refs/pull/67790/head:
mon [stretch-mode]: Allow a max bucket weight diff threshold

Reviewed-by: Shraddha Agrawal <shraddhaag@ibm.com>
4 weeks agoMerge PR #68226 into tentacle
Patrick Donnelly [Thu, 9 Apr 2026 16:56:25 +0000 (12:56 -0400)]
Merge PR #68226 into tentacle

* refs/pull/68226/head:
rgw: enhanced java s3-tests change setting of JAVA_HOME
rgw: java s3-tests change setting of JAVA_HOME

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 weeks agoMerge PR #67533 into tentacle
Patrick Donnelly [Thu, 9 Apr 2026 16:53:15 +0000 (12:53 -0400)]
Merge PR #67533 into tentacle

* refs/pull/67533/head:
qa/cephadm: ensure host has been fully saved before considering bootstrap complete
mgr/cephadm: add __getstate__ so OSD class can be pickled

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Guillaume Abrioux <gabrioux@redhat.com>
4 weeks agodebian: remove stale distutils override from py3dist-overrides 68276/head
Kefu Chai [Wed, 8 Apr 2026 07:29:09 +0000 (15:29 +0800)]
debian: remove stale distutils override from py3dist-overrides

distutils was deprecated in Python 3.10 (PEP 632) and removed in
Python 3.12. The `python3-distutils` package no longer exists in
Debian Trixie (Python 3.13) or Ubuntu 24.04+ (Python 3.12).

The only runtime reference was in `debian/ceph-mgr.requires`, already
cleaned up by 3fb3f892aa3. This override is now dead code, hence no
installed file declares a runtime dependency on `distutils`, so
`dh_python3` never resolves it. Removing it prevents a latent
uninstallable-dependency bug if `distutils` were accidentally
reintroduced in a `.requires` file.

Fixes: https://tracker.ceph.com/issues/75901
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
Signed-off-by: Kefu Chai <k.chai@proxmox.com>
(cherry picked from commit d1d07a0542228b7c40238a9a78d138ad07130240)

4 weeks agoMerge PR #67907 into tentacle
Patrick Donnelly [Tue, 7 Apr 2026 17:38:24 +0000 (13:38 -0400)]
Merge PR #67907 into tentacle

* refs/pull/67907/head:
doc/start: Add ARM support note to hardware-recommendations.rst
doc: Improve start/hardware-recommendations.rst
doc: Update the old ceph.com/community/ links to ceph.io/en/news/blog/
doc/start: Improve hardware-recommendations.rst
doc/start: fix wording in swap tip
doc: Use ref instead of full URLs for intra-docs links
doc: Use existing labels and ref for hyperlinks in architecture.rst

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>
4 weeks agoMerge PR #66948 into tentacle
Patrick Donnelly [Tue, 7 Apr 2026 12:41:48 +0000 (08:41 -0400)]
Merge PR #66948 into tentacle

* refs/pull/66948/head:
mgr/DaemonServer: Re-order OSDs in crush bucket to maximize OSDs for upgrade
mgr/DaemonServer: Implement ok-to-upgrade command
mgr/DaemonServer: Modify offline_pg_report to handle set or vector types

Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 weeks agorgw: enhanced java s3-tests change setting of JAVA_HOME 68226/head
J. Eric Ivancich [Tue, 7 Apr 2026 00:53:34 +0000 (20:53 -0400)]
rgw: enhanced java s3-tests change setting of JAVA_HOME

Under Centos 9 the Java 8 version is recognized by the substring
"java-1.8" rather than "java-8". So the grep has been modified to
accept either.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
4 weeks agoMerge PR #67993 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 19:55:54 +0000 (15:55 -0400)]
Merge PR #67993 into tentacle

* refs/pull/67993/head:
test/rgw/kafka: fix kafka relase to more recent one

Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
4 weeks agoMerge PR #67797 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 19:40:39 +0000 (15:40 -0400)]
Merge PR #67797 into tentacle

* refs/pull/67797/head:
qa/workunits/rbd: fix unbound variable in status()
qa/workunits/rbd: short-circuit status() if "ceph -s" fails
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge PR #67795 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 19:40:07 +0000 (15:40 -0400)]
Merge PR #67795 into tentacle

* refs/pull/67795/head:
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge PR #67705 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 19:39:48 +0000 (15:39 -0400)]
Merge PR #67705 into tentacle

* refs/pull/67705/head:
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge PR #66837 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:25:25 +0000 (14:25 -0400)]
Merge PR #66837 into tentacle

* refs/pull/66837/head:
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.
test/bluestore: add volume selector tests
os/bluestore:fix bluestore_volume_selection_reserved_factor usage

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
4 weeks agoMerge PR #67354 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:24:11 +0000 (14:24 -0400)]
Merge PR #67354 into tentacle

* refs/pull/67354/head:
debian: remove invoke-rc.d calls from postrm scripts

Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 weeks agoMerge PR #67407 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:22:32 +0000 (14:22 -0400)]
Merge PR #67407 into tentacle

* refs/pull/67407/head:
osd: add pg-upmap-primary to clean_pg_upmaps

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 weeks agoMerge PR #66482 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:21:40 +0000 (14:21 -0400)]
Merge PR #66482 into tentacle

* refs/pull/66482/head:
mgr/prometheus/test_module: Adding unit-test for new classes
mgr/prometheus: metrics header for standby module
mgr/prometheus: Use RLock to fix deadlock in HealthHistory
mgr/TTLCache: fix PyObject* lifetime management and cleanup logic
mgr/prometheus: prune stale health checks, compress output

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 weeks agoMerge PR #65913 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:17:08 +0000 (14:17 -0400)]
Merge PR #65913 into tentacle

* refs/pull/65913/head:
client: signal waitfor_commit waiters for write delegation enabled inode
test/libcephfs: add test for fsync on a write delegated inode
client: adjust `Fb` cap ref count check during synchronous fsync()

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
4 weeks agoMerge PR #65957 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:16:33 +0000 (14:16 -0400)]
Merge PR #65957 into tentacle

* refs/pull/65957/head:
client: crash caused by invalid iterator in _readdir_cache_cb

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge PR #66469 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:16:13 +0000 (14:16 -0400)]
Merge PR #66469 into tentacle

* refs/pull/66469/head:
mds: MDCache: check validity of mdr requests before dispatching
mds: MDCache request cleanup handles potential null mdr

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
4 weeks agoMerge PR #67617 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:14:09 +0000 (14:14 -0400)]
Merge PR #67617 into tentacle

* refs/pull/67617/head:
qa: fix TypeError in delay

Reviewed-by: Venky Shankar <vshankar@redhat.com>
4 weeks agoMerge PR #67455 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:10:14 +0000 (14:10 -0400)]
Merge PR #67455 into tentacle

* refs/pull/67455/head:
qa: krbd_rxbounce.sh: do more reads to generate more errors

Reviewed-by: Ramana Raja <rraja@redhat.com>
4 weeks agoMerge PR #67581 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:09:55 +0000 (14:09 -0400)]
Merge PR #67581 into tentacle

* refs/pull/67581/head:
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

Reviewed-by: Ramana Raja <rraja@redhat.com>
4 weeks agoMerge PR #67583 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:09:34 +0000 (14:09 -0400)]
Merge PR #67583 into tentacle

* refs/pull/67583/head:
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

Reviewed-by: Ramana Raja <rraja@redhat.com>
4 weeks agoMerge PR #67031 into tentacle
Patrick Donnelly [Mon, 6 Apr 2026 18:03:06 +0000 (14:03 -0400)]
Merge PR #67031 into tentacle

* refs/pull/67031/head:
doc/ceph.rst: scrub-related 'tell pgid' commands
osd/scrub: operator abort: (not) handling in-the-mail scrubs
osd/scrub: added the scrub-abort command
osd/scrub: support an operator-abort command
osd/scrub: removing the unused PgScrubber::m_scrub_reg_stamp

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 weeks agorgw: java s3-tests change setting of JAVA_HOME
J. Eric Ivancich [Wed, 1 Apr 2026 16:29:01 +0000 (12:29 -0400)]
rgw: java s3-tests change setting of JAVA_HOME

Previously s3tests_java.py set JAVA_HOME using the `alternatives`
command. That had issues in that `alternatives` is not present on all
Ubuntu systems, and some installations of Java don't update
alternatives. So instead we look for a "java-8" jvm in /usr/lib/jvm/
and set JAVA_HOME to the first one we find.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
(cherry picked from commit b8e2796270f4558b406411682a9b916109d0c530)

5 weeks agoMerge PR #68185 into tentacle
Patrick Donnelly [Fri, 3 Apr 2026 16:55:31 +0000 (12:55 -0400)]
Merge PR #68185 into tentacle

* refs/pull/68185/head:
20.2.1

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
5 weeks ago20.2.1 tentacle-release 68185/head v20.2.1
Ceph Release Team [Thu, 2 Apr 2026 14:15:15 +0000 (14:15 +0000)]
20.2.1

Signed-off-by: Ceph Release Team <ceph-maintainers@ceph.io>
5 weeks agotest/rgw/notification: do not use netstat in the code 68142/head
Yuval Lifshitz [Fri, 20 Feb 2026 15:41:14 +0000 (15:41 +0000)]
test/rgw/notification: do not use netstat in the code

* net-tools are deprecated in fedora and ubuntu
* using netstat -p (used to verify that the http server is listening on
  a port) requires root privilages, which may fail in some tests environments

Fixes: https://tracker.ceph.com/issues/75820
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit 5e204e17684ec6d2ab5b44e114be6cc4dfcf10b9)

5 weeks agoqa/tasks/backfill_toofull.py: Fix assert failures with & without compression 68118/head
Sridhar Seshasayee [Mon, 9 Mar 2026 09:31:54 +0000 (15:01 +0530)]
qa/tasks/backfill_toofull.py: Fix assert failures with & without compression

The following issues with the test are addressed:

1. The test was encountering assertion failure (assert backfillfull < 0.9) with
   compression enabled. This was because the condition was not factoring in the
   compression ratio. Without it the backfillfull ratio can easily exceed 1. By
   factoring in the compression ratio, the backfillfull ratio will be in the
   range (0 - n), where n can vary depending on the type of compression used.

2. The main contributing factor for (1) above is the amount of data written to
   the pool. The writes were time-bound earlier leading to excess data and
   eventually the assertion failure. By limiting the data written to the OSDs
   to 50% of the OSD capacity in the first phase and only 20% in the re-write
   phase, the outcome of the test is more deterministic regardless of
   compression being enabled or not.

3. A potential false cluster error is avoided by swapping the setting of
   the nearfull-ratio and backfill-ratio after the re-write phase.

4. Fix a couple of typos - s/tartget/target.

Fixes: https://tracker.ceph.com/issues/71005
Signed-off-by: Sridhar Seshasayee <sridhar.seshasayee@ibm.com>
(cherry picked from commit 91de6a0b7b8b8c2531446555c25bf53e23635982)

6 weeks agoqa/rgw/upgrade: symlinks are explicit about distro versions 68057/head
Casey Bodley [Wed, 25 Mar 2026 16:38:59 +0000 (12:38 -0400)]
qa/rgw/upgrade: symlinks are explicit about distro versions

avoid relying on "ubuntu_latest" and "rpm_latest" symlinks, which change
over time on main. be explicit about the distro versions supported by
the initial release

Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 73b1d1e708725a34aa4bcdfaa7fff396a643d7fc)

Conflicts: removed tentacle, added reef
qa/suites/rgw/upgrade/1-install/reef/distro$/ubuntu_latest.yaml
qa/suites/rgw/upgrade/1-install/squid/distro$/centos_9.stream.yaml
qa/suites/rgw/upgrade/1-install/squid/distro$/ubuntu_22.04.yaml
qa/suites/rgw/upgrade/1-install/squid/distro$/ubuntu_latest.yaml

6 weeks agotest/rgw/kafka: fix kafka relase to more recent one 67993/head
Yuval Lifshitz [Wed, 4 Mar 2026 14:53:13 +0000 (14:53 +0000)]
test/rgw/kafka: fix kafka relase to more recent one

Fixes: https://tracker.ceph.com/issues/75323
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
(cherry picked from commit dc412a7e519d037acbcac8a92c7ecf2dbde9875a)

6 weeks agonvmeofgw: fix issue of delete all gws from the pool/group 67942/head
Leonid Chernin [Tue, 26 Aug 2025 12:34:32 +0000 (15:34 +0300)]
nvmeofgw: fix issue of delete all gws from the pool/group
          when gws not removed from the map

Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
(cherry picked from commit 29174099ac46d19f6dd5dd9375a2a8c606dccd17)

7 weeks agodoc/start: Add ARM support note to hardware-recommendations.rst 67907/head
Anthony D'Atri [Mon, 8 Dec 2025 19:48:58 +0000 (14:48 -0500)]
doc/start: Add ARM support note to hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit 33ecb7912f15495668a99cc64d3aa86fe93d20df)

7 weeks agodoc: Improve start/hardware-recommendations.rst
Anthony D'Atri [Mon, 17 Nov 2025 17:57:29 +0000 (12:57 -0500)]
doc: Improve start/hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit b4fa87d24fc363ecbc2dafbe5feaf15273e18128)

7 weeks agodoc: Update the old ceph.com/community/ links to ceph.io/en/news/blog/
mrVectorz [Mon, 3 Nov 2025 04:24:59 +0000 (23:24 -0500)]
doc: Update the old ceph.com/community/ links to ceph.io/en/news/blog/

Signed-off-by: Marc Methot <mb.methot@gmail.com>
(cherry picked from commit c32027ba9192b10a10acbbe7683933290dc964b5)

7 weeks agodoc/start: Improve hardware-recommendations.rst
Anthony D'Atri [Thu, 23 Oct 2025 19:29:19 +0000 (15:29 -0400)]
doc/start: Improve hardware-recommendations.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit d24c3ac173c09018cd45d8932cde264d36cee257)

7 weeks agodoc/start: fix wording in swap tip
Pierre Riteau [Thu, 21 Aug 2025 09:44:57 +0000 (11:44 +0200)]
doc/start: fix wording in swap tip

Signed-off-by: Pierre Riteau <pierre@stackhpc.com>
(cherry picked from commit 0d45223dd71f3a06d06d28260ac387bad0bde54d)

7 weeks agodoc: Use ref instead of full URLs for intra-docs links
Ville Ojamo [Sat, 2 Aug 2025 06:26:14 +0000 (13:26 +0700)]
doc: Use ref instead of full URLs for intra-docs links

Labels mostly existed already but add labels in 2 files.

Add missing closing quotation mark in
rados/troubleshooting/log-and-debug.rst.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit 40e5395db6c7dcc4ecfdc219e3e5e1fb72650558)

7 weeks agodoc: Use existing labels and ref for hyperlinks in architecture.rst
Ville Ojamo [Thu, 15 May 2025 10:32:29 +0000 (17:32 +0700)]
doc: Use existing labels and ref for hyperlinks in architecture.rst

Use validated ":ref:" hyperlinks instead of "external links" in "target
definitions" when linking within the Ceph docs:
- Update to use existing labels when linkin from architecture.rst.
- Remove unused "target definitions".

Also use title case for section titles in
doc/start/hardware-recommendations.rst because change to use link text
generated from section title.

Other than generated link texts the rendered PR should look the same as
the old docs, only differing in the source RST.

Signed-off-by: Ville Ojamo <14869000+bluikko@users.noreply.github.com>
(cherry picked from commit 15935db5d78360d5ca98c799cf9fff287b6d0a4c)

7 weeks agoMerge PR #67894 into tentacle
Patrick Donnelly [Thu, 19 Mar 2026 17:48:33 +0000 (13:48 -0400)]
Merge PR #67894 into tentacle

* refs/pull/67894/head:
src: Move the decision to build the ISA plugin to the top level make file

Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
7 weeks agoMerge PR #67765 into tentacle
Patrick Donnelly [Thu, 19 Mar 2026 15:23:06 +0000 (11:23 -0400)]
Merge PR #67765 into tentacle

* refs/pull/67765/head:
qa: reduce radosbench runs
qa: use correct upgrade order
qa: preload isa ec module
qa/tests: added inital draft for tentacle-p2p

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
7 weeks agosrc: Move the decision to build the ISA plugin to the top level make file 67894/head
Alex Ainscow [Wed, 18 Mar 2026 14:51:57 +0000 (14:51 +0000)]
src: Move the decision to build the ISA plugin to the top level make file

Previously, the first time you build ceph, common did not see the correct
value of WITH_EC_ISA_PLUGIN.  The consequence is that the global.yaml gets
build with osd_erasure_code_plugins not including isa.  This is not great
given its our default plugin.

We considered simply removing this parameter from make entirely, but this
may require more discussion about supporting old hardware.

So the slightly ugly fix is to move this erasure-code specific declartion
to the top-level.

Fixes: https://tracker.ceph.com/issues/75537
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit cecce28f16b0867ea8578a8f0c1478e24a40e525)

7 weeks agoqa: reduce radosbench runs wip-yuriw-squid-p2p-tentacle 67765/head
Patrick Donnelly [Wed, 18 Mar 2026 17:42:38 +0000 (13:42 -0400)]
qa: reduce radosbench runs

to avoid running out of space.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
7 weeks agoqa: use correct upgrade order
Patrick Donnelly [Wed, 18 Mar 2026 17:42:22 +0000 (13:42 -0400)]
qa: use correct upgrade order

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
7 weeks agoqa: preload isa ec module
Patrick Donnelly [Wed, 18 Mar 2026 17:41:59 +0000 (13:41 -0400)]
qa: preload isa ec module

So that the mons won't load an upgraded plugin.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
7 weeks agoMerge PR #67750 into tentacle
Patrick Donnelly [Wed, 18 Mar 2026 17:41:18 +0000 (13:41 -0400)]
Merge PR #67750 into tentacle

* refs/pull/67750/head:
Revert "Merge pull request #66958 from Hezko/wip-74413-tentacle"

Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Tested-by: Patrick Donnelly <pdonnell@redhat.com>
7 weeks agomon [stretch-mode]: Allow a max bucket weight diff threshold 67790/head
Kamoltat Sirivadhna [Tue, 9 Dec 2025 21:00:38 +0000 (21:00 +0000)]
mon [stretch-mode]: Allow a max bucket weight diff threshold

Problem:
Users ran into a problem where the crush bucket
weight different check in stretch mode is too strict, e.g.,
one of the disk that is added to one of the node had slight variation
in the capacity and this caused ceph to fail from enabling the stretch
cluster because crush weight is not balanced. The difference was very small.

Solution:
- Introducing: mon_stretch_max_bucket_weight_delta in mon.yaml.in
  this config var is default to 0.1 and is used as a threshold
  to allow the difference between the two crush buckets in stretch mode
  to be no greater than 10%.
- Introducing: STRETCH_MODE_BUCKET_WEIGHT_IMBALANCE as health warnings
  when the weight delta between the two sites exceeds 10%
- Modified documentations
- Modified tests that exercises this code path

Fixes: https://tracker.ceph.com/issues/72994
Signed-off-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
(cherry picked from commit d58de5174d05ad2df1f1d6771abf504b25e62c54)

Conflicts:
doc/rados/operations/health-checks.rst - Trivial Fix
PendingReleaseNotes - Remove this
Signed-off-by: Kamoltat (Junior) Sirivadhna <ksirivad@redhat.com>
7 weeks agoRevert "Merge pull request #66958 from Hezko/wip-74413-tentacle" 67750/head
Patrick Donnelly [Thu, 12 Mar 2026 06:46:09 +0000 (12:16 +0530)]
Revert "Merge pull request #66958 from Hezko/wip-74413-tentacle"

This reverts commit 6dddf544a44d3944e883c051d886f0049a95a2e5, reversing
changes made to 07ec509cf156d83e38aa6c2a151d4f06009e8dfa.

Backport 6dddf54 introduced a new connection feature bit
NVMEOF_BEACON_DIFF but there are plans (#66624) to make further
enhancements on that feature bit. This would cause the mons to crash
during upgrades.

However, this connection feature bit should not have been added to
begin with. The correct way to do this is extend
e55ad7bce2fb85096cd31ff9846403f9dbd01e85 by @athanatos to require
`CEPH_MON_FEATURE_INCOMPAT_NVMEOF_BEACON_DIFF` if all mons support it.
This should be done by having mons add/update their supported features
the MonMap via an update from `MMonJoin` (see for instance `crush_loc`
which was recently added to `mon_info_t`). Once the supported features
indicated for each mon in the `MonMap` show they understand the new
NVMEOF_BEACON_DIFF, then it should be turned on globally in the
`MonMap` as a required feature (added to the incompat set).

Conflicts:
src/mon/NVMeofGwMon.h: conflicts with header change from 19c9be2
                               fix missing header change in #66584

Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
7 weeks agoqa/workunits/rbd: fix unbound variable in status() 67797/head
Ilya Dryomov [Mon, 2 Mar 2026 11:07:48 +0000 (12:07 +0100)]
qa/workunits/rbd: fix unbound variable in status()

It was missed in commit 5fe64fa806f3 ("qa: rbd_mirror.sh: change
parameters to cluster rather than daemon name").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1a280b9a320d51bdc4cb80be9bdd6ae265151132)

7 weeks agoqa/workunits/rbd: short-circuit status() if "ceph -s" fails
Ilya Dryomov [Sun, 1 Mar 2026 21:55:52 +0000 (22:55 +0100)]
qa/workunits/rbd: short-circuit status() if "ceph -s" fails

In mirror-thrash tests, status() can be invoked after one of the
clusters is effectively stopped due to a watchdog bark:

2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T22:27:38.633 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
...
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ status
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ local cluster daemon image_pool image_ns image
2026-03-01T22:32:46.964 INFO:tasks.workunit.cluster1.client.mirror.trial199.stderr:+ for cluster in ${CLUSTER1} ${CLUSTER2}

In this scenario all commands that are invoked from the loop body
are going to time out anyway.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 82717e43a08a1262987f5e271fd72d4433c4fb3b)

7 weeks agoqa: rbd_mirror_fsx_compare.sh doesn't error out as expected
Ilya Dryomov [Sun, 1 Mar 2026 16:45:51 +0000 (17:45 +0100)]
qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

In mirror-thrash tests, one of the clusters can be effectively stopped
due to a watchdog bark while rbd_mirror_fsx_compare.sh is running and is
in the middle of the "wait for all images" loop:

2026-03-01T12:55:35.059 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ retrying_seconds=1040
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 1040 -le 7200 ']'
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ rbd --cluster cluster2 --pool mirror ls
2026-03-01T12:55:35.060 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:++ wc -l
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ '[' 290 -ge 292 ']'
2026-03-01T12:55:35.084 INFO:tasks.workunit.cluster1.client.mirror.trial055.stderr:+ sleep 10
...
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:thrasher.rbd_mirror.[cluster2] failed
2026-03-01T12:55:49.568 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons

In this scenario "rbd ls" is going to time out repeatedly, turning the
loop into up to a ~60-hour sleep (up to 720 iterations with a 5-minute
timeout + 10-second sleep per iteration).

Fixes: https://tracker.ceph.com/issues/75239
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 81a5906f0d1cc844bb4ef16aae9ace3e7d371ac2)

7 weeks agoqa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet 67795/head
Ilya Dryomov [Fri, 27 Feb 2026 14:18:27 +0000 (15:18 +0100)]
qa/tasks: make rbd_mirror_thrash inherit from ThrasherGreenlet

Commit 21b4b89e5280 ("qa/tasks: watchdog terminate thrasher") made it
required for a thrasher to have stop_and_join() method, but the
preceding commit a035b5a22fb8 ("thrashers: standardize stop and join
method names") missed to add it to rbd_mirror_thrash (whether as an
ad-hoc implementation or by way of inheriting from ThrasherGreenlet).
Later on, commit 783f0e3a9903 ("qa: Adding a new class for the
daemonwatchdog to monitor") worsened the issue by expanding the use
of stop_and_join() to all watchdog barks rather than just the case of
a thrasher throwing an exception which is something that practically
never happens.

Fixes: https://tracker.ceph.com/issues/75200
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ebe3a0a43251b0f126497d4100bd1af9ca8afc5)

8 weeks agoqa/tests: added inital draft for tentacle-p2p
Yuri Weinstein [Thu, 12 Mar 2026 21:38:39 +0000 (14:38 -0700)]
qa/tests: added inital draft for tentacle-p2p

Signed-off-by: Yuri Weinstein <yweinste@redhat.com>
8 weeks agoqa/tasks/barbican: add kek for simple_crypto_plugin 67757/head
Kyr Shatskyy [Wed, 4 Mar 2026 22:57:31 +0000 (23:57 +0100)]
qa/tasks/barbican: add kek for simple_crypto_plugin

Since 2025.1 it is mandatory to provide kek in barbican config file
for crypto plugin even if it is enabled by default or not.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit f02a9ba1d53b4f11769a1ec4d534ebeda126d70a)

8 weeks agoqa/suites/rgw: use 'member' instead of 'Member' for roles in barbican
Kyr Shatskyy [Wed, 4 Mar 2026 19:11:23 +0000 (20:11 +0100)]
qa/suites/rgw: use 'member' instead of 'Member' for roles in barbican

It appears openstack client treat role names case sensitive inside,
and creates role 'member' instead of 'Member' in the database.

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit 39b580dba368b2de8b047c407d7e8e8b1f165ab0)

8 weeks agoqa/suites/rgw: bump keystone to stable/2025.2
Kyr Shatskyy [Fri, 20 Feb 2026 09:33:08 +0000 (10:33 +0100)]
qa/suites/rgw: bump keystone to stable/2025.2

There is keystone-wsgi-public is dropped in latest versions,
so try and use uwsgi

Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>
(cherry picked from commit de062fad109bcda26d123d85add3dea2a67e9ed2)

2 months agolibrbd/cache/pwl: WriteLogOperationSet::cell can be garbage 67705/head
Ilya Dryomov [Mon, 16 Feb 2026 21:24:47 +0000 (22:24 +0100)]
librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

The pointer is never initialized but gets printed by operator<<.
Luckily outside of that it's unused.

Fixes: https://tracker.ceph.com/issues/74971
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bffa11487cb7d68c0aa39994f50fbc3b4b00e415)

2 months agoclient: signal waitfor_commit waiters for write delegation enabled inode 65913/head
Venky Shankar [Wed, 15 Oct 2025 13:40:35 +0000 (13:40 +0000)]
client: signal waitfor_commit waiters for write delegation enabled inode

Fixes: http://tracker.ceph.com/issues/73624
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Tested-by: Suhas Athani <sathani@redhat.com>
(cherry picked from commit ee864798373248abda5237ceef2258ed7236f6ee)

2 months agotest/libcephfs: add test for fsync on a write delegated inode
Venky Shankar [Mon, 29 Sep 2025 06:44:28 +0000 (06:44 +0000)]
test/libcephfs: add test for fsync on a write delegated inode

Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit be0c40c89c0556ae7696dfaaf6804684ecfaddeb)

2 months agoclient: adjust `Fb` cap ref count check during synchronous fsync()
Venky Shankar [Mon, 29 Sep 2025 06:41:23 +0000 (06:41 +0000)]
client: adjust `Fb` cap ref count check during synchronous fsync()

cephfs client holds a ref on Fb caps when handing out a write delegation[0].
As fsync from (Ganesha) client holding write delegation will block indefinitely[1]
waiting for cap ref for Fb to drop to 0, which will never happen until the
delegation is returned/recalled.

[0]: https://github.com/ceph/ceph/blob/main/src/client/Delegation.cc#L71
[1]: https://github.com/ceph/ceph/blob/main/src/client/Client.cc#L12438

If an inode has been write delegated, adjust for cap reference count
check in fsync().

Note: This only workls for synchronous fsync() since `client_lock` is
held for the entire duration of the call (at least till the patch leading
upto the reference count check). Asynchronous fsync() needs to be fixed
separately (as that can drop `client_lock`).

Fixes: https://tracker.ceph.com/issues/73298
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit d7eca69a5b887e2b65513411280158d06cdb6b3c)

2 months agomgr/prometheus/test_module: Adding unit-test for new classes 66482/head
NitzanMordhai [Mon, 2 Feb 2026 13:37:34 +0000 (13:37 +0000)]
mgr/prometheus/test_module: Adding unit-test for new classes

Fixes: https://tracker.ceph.com/issues/74149
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit e3de8c9c1f3b83ec41bbdee25a14fe1ff20e239c)

2 months agomgr/prometheus: metrics header for standby module
Nitzan Mordechai [Thu, 11 Dec 2025 06:32:31 +0000 (06:32 +0000)]
mgr/prometheus: metrics header for standby module

PR #65245 drop the header set for standby module,
we should still have it.

Fixes: https://tracker.ceph.com/issues/74149
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 2ef12b2ffa6dd11041e7120febcbd62338ec8cd3)

2 months agomgr/prometheus: Use RLock to fix deadlock in HealthHistory
Nitzan Mordechai [Tue, 9 Dec 2025 12:34:07 +0000 (12:34 +0000)]
mgr/prometheus: Use RLock to fix deadlock in HealthHistory

The HealthHistory.check() method acquires the lock and then calls
HealthHistory.save(), which also tries to acquire the same lock.
With a regular Lock(), the same thread blocks trying to re-acquire it (deadlock).
Switch to RLock to allow nested acquisition by the same thread.
PR #65245 added the locks.

Fixes: https://tracker.ceph.com/issues/74148
Signed-off-by: Nitzan Mordechai <nmordech@ibm.com>
(cherry picked from commit 26394ca06981e87ec7aee75b8467817afa330ba6)

2 months agomgr/TTLCache: fix PyObject* lifetime management and cleanup logic
Nitzan Mordechai [Tue, 26 Aug 2025 14:30:12 +0000 (14:30 +0000)]
mgr/TTLCache: fix PyObject* lifetime management and cleanup logic

Fix incorrect reference counting and memory retention behavior in TTLCache
when storing PyObject* values.
Previously, TTLCache::insert did not increment the reference count,
and `erase` / `clear` did not correctly decref the values, leading
to use-after-free or leaks depending on usage.

Changes:
- Move Py_INCREF from cacheable_get_python() to TTLCache::insert()
- Add `TTLCache::clear()` method for proper memory cleanup
- Ensure TTLCache::get() returns a new reference
- Fix misuse of std::move on c_str() in PyJSONFormatter

These changes prevent both memory leaks and use-after-free errors when
mgr modules use cached Python objects logic.

Fixes: https://tracker.ceph.com/issues/68989
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit 7fadf6a8d464456668550e1f85c6c5a86f94bb49)

2 months agomgr/prometheus: prune stale health checks, compress output
Nitzan Mordechai [Wed, 20 Aug 2025 14:50:40 +0000 (14:50 +0000)]
mgr/prometheus: prune stale health checks, compress output

This patch introduces several improvements to the Prometheus module:

 - Introduces `HealthHistory._prune()` to drop stale and inactive health checks.
  Limits the in-memory healthcheck dict to a configurable max_entries (default 1000).
  TTL for stale entries is configurable via `healthcheck_history_stale_ttl` (default 3600s).

 - Refactors HealthHistory.check() to use a unified iteration over known and current checks,
  improving concurrency and minimizing redundant updates.

 - Use cherrypy.tools.gzip instead of manual gzip.compress() for cleaner
  HTTP compression with proper header handling and client negotiation.

 - Introduces new module options:
    - `healthcheck_history_max_entries`

 - Add proper error handling for CherryPy engine startup failures
 - Remove os._exit monkey patch in favor of proper exception handling
 - Remove manual Content-Type header setting (CherryPy handles automatically)

Fixes: https://tracker.ceph.com/issues/68989
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>
(cherry picked from commit be28901f361cadcf6bf993276d71f3a79beaae4f)

2 months agoMerge PR #67318 into tentacle
Patrick Donnelly [Tue, 3 Mar 2026 20:53:18 +0000 (15:53 -0500)]
Merge PR #67318 into tentacle

* refs/pull/67318/head:
qa/multisite: use boto3's ClientError in place of assert_raises from tools.py.
qa/multisite: test fixes
qa/multisite: boto3 in tests.py
qa/multisite: zone files use boto3 resource api
qa/multisite: switch to boto3 in multisite test libraries

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoMerge PR #67425 into tentacle
Patrick Donnelly [Tue, 3 Mar 2026 20:52:10 +0000 (15:52 -0500)]
Merge PR #67425 into tentacle

* refs/pull/67425/head:
RGW | fix conditional MultiWrite

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
2 months agoMerge PR #67449 into tentacle
Patrick Donnelly [Tue, 3 Mar 2026 20:51:44 +0000 (15:51 -0500)]
Merge PR #67449 into tentacle

* refs/pull/67449/head:
tentacle: qa/rgw: bucket notifications use pynose

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
2 months agoqa: fix TypeError in delay 67617/head
Jos Collin [Tue, 24 Feb 2026 02:03:13 +0000 (07:33 +0530)]
qa: fix TypeError in delay

random.randrange() asserts TypeError for arguments of type 'float'.
So use random.uniform() to fix this.

Fixes: https://tracker.ceph.com/issues/75090
Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 027400df81aa3bff0def422acfa43eff5f6e08c0)

2 months agolibrbd/mirror: detect trashed snapshots in UnlinkPeerRequest 67583/head
Ilya Dryomov [Tue, 24 Feb 2026 11:46:35 +0000 (12:46 +0100)]
librbd/mirror: detect trashed snapshots in UnlinkPeerRequest

If two instances of UnlinkPeerRequest race with each other (e.g. due
to rbd-mirror daemon unlinking from a previous mirror snapshot and the
user taking another mirror snapshot at same time), the snapshot that
UnlinkPeerRequest was created for may be in the process of being removed
(which may mean trashed by SnapshotRemoveRequest::trash_snap()) or fully
removed by the time unlink_peer() grabs the image lock.  Because trashed
snapshots weren't handled explicitly, UnlinkPeerRequest could spuriously
fail with EINVAL ("not mirror snapshot" case) instead of the expected
ENOENT ("missing snapshot" case).  This in turn could lead to spurious
ImageReplayer failures with it stopping prematurely.

Fixes: https://tracker.ceph.com/issues/68279
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3596ca077097a4e0ff8e8d05a410c2044332391e)

2 months agolibrbd: don't complete ImageUpdateWatchers::shut_down() prematurely 67581/head
Ilya Dryomov [Wed, 25 Feb 2026 10:37:16 +0000 (11:37 +0100)]
librbd: don't complete ImageUpdateWatchers::shut_down() prematurely

ImageUpdateWatchers::flush() requests aren't tracked with
m_in_flight-like mechanism the way ImageUpdateWatchers::send_notify()
requests are, but in both cases callbacks that represent delayed work
that is very likely to (indirectly) reference ImageCtx are involved.
When the image is getting closed, ImageUpdateWatchers::shut_down() is
called before anything that belongs to ImageCtx is destroyed.  However,
the shutdown can complete prematurely in the face of a pending flush if
one gets sent shortly before CloseRequest is invoked.  The callback for
that flush will then race with CloseRequest and may execute after parts
of or even the entire ImageCtx is destroyed, leading to use-after-free
and various segfaults.

Fixes: https://tracker.ceph.com/issues/75161
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 3ea6ee62aa339d1ad9976fdcc6e207a505f9bf44)

2 months agoos/bluestore: rename row names in RocksDBBlueFSVolumeSelector. 66837/head
Igor Fedotov [Wed, 21 May 2025 08:30:15 +0000 (11:30 +0300)]
os/bluestore: rename row names in RocksDBBlueFSVolumeSelector.

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit a9f591f4e1cb1e364879165250c55cb0f841d64f)

2 months agotest/bluestore: add volume selector tests
Igor Fedotov [Mon, 19 May 2025 19:20:53 +0000 (22:20 +0300)]
test/bluestore: add volume selector tests

Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
(cherry picked from commit 158d1550a021ed60e5ad1c565b247e5b0b6d5946)