git-server-git.apps.pok.os.sepia.ceph.com Git

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

Laura Flores [Thu, 19 Mar 2026 14:45:35 +0000 (09:45 -0500)]

qa/workunits/mgr: account for nvmeof module being "always-on"

Post the merge of this: https://github.com/ceph/ceph/pull/67641

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 12 Sep 2025 20:14:30 +0000 (20:14 +0000)]

mgr, qa: clarify module checks in DaemonServer

The current check groups modules not being
enabled with failing to initialize. In this commit,
we reorder the checks:

1: Screen for a module being enabled. If it's not,
   issue an EOPNOTSUPP with instructions on how
   to enable it.

2. Screen for if a module is active. If a module
   is enabled, then the cluster expects it to
   be active to support commands. If the module
   took too long to initialize though, we will
   catch this and issue an ETIMEDOUT error with
   a link for troubleshooting.

Now, these two separate issues are not grouped
together, and they are checked in the right order.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Thu, 11 Sep 2025 22:13:51 +0000 (22:13 +0000)]

mgr, qa: add `pending_modules` to asock command

Now, the command `ceph tell mgr mgr_status` will show a
"pending_modules" field. This is another way for Ceph operators
to check which modules haven't been initalized yet (in addition
to the health error).

This command was also added to testing scenarios in the workunit.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Tue, 29 Jul 2025 22:46:46 +0000 (22:46 +0000)]

mgr, common, qa, doc: issue health error after max expiration is exceeded

----------------- Enhancement to the Original Fix -----------------

During a mgr failover, the active mgr is marked available if:
  1. The mon has chosen a standby to be active
  2. The chosen active mgr has all of its modules initialized

Now that we've improved the criteria for sending the "active" beacon
by enforcing it to retry initializing mgr modules, we need to account
for extreme cases in which the modules are stuck loading for a very long
time, or even indefinitely. In these extreme cases where the modules might
never initialize, we don't want to delay sending the "active" beacon for
too long. This can result in blocking other important mgr functionality,
such as reporting PG availability in the health status. We want
to avoid sending warnings about PGs being unknown in the health status when
that's not ultimately the problem.

To account for an exeptionally long module loading time, I added a new
configurable `mgr_module_load_expiration`. If we exceed this maximum amount
of time (in ms) allotted for the active mgr to load the mgr modules before declaring
availability, the standby will then proceed to mark itself "available" and
send the "active" beacon to the mon and unblock other critical mgr functionality.

If this happens, a health error will be issued at this time, indicating
which mgr modules got stuck initializing (See src/mgr/PyModuleRegistry.cc). The
idea is to unblock the rest of the mgr's critical functionality while making it
clear to Ceph operators that some modules are unusable.

--------------------- Integration Testing --------------------

The workunit was rewritten so it tests for these scenarios:

1. Normal module loading behavior (no health error should be issued)
2. Acceptable delay in module loading behavior (no health error should be
   issued)
3. Unacceptable delay in module loading behavior (a health error should be
   issued)

--------------------- Documentation --------------------

This documentation explains the "Module failed to initialize"
cluster error.

Users are advised to try failing over
the mgr to reboot the module initialization process,
then if the error persists, file a bug report. I decided
to write it this way instead of providing more complex
debugging tips such as advising to disable some mgr modules
since every case will be different depending on which modules
failed to initialize.

In the bug report, developers can ask for the health detail
output to narrow down which module is causing a bottleneck,
and then ask the user to try disabling certain modules until
the mgr is able to fully initialize.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Laura Flores [Fri, 25 Apr 2025 22:11:19 +0000 (22:11 +0000)]

mgr: ensure that all modules have started before advertising active mgr

----------------- Explanation of Problem ----------------

When the mgr is restarted or failed over via `ceph mgr fail` or during an
upgrade, mgr modules sometimes take longer to start up (this includes
loading their class, commands, and module options, and being removed
from the `pending_modules` map structure). This startup delay can happen
due to a cluster's specific hardware or if a code bottleneck is triggered in
a module’s `serve()` function (each mgr module has a `serve()` function that
performs initialization tasks right when the module is loaded).

When this startup delay occurs, any mgr module command issued against the
cluster around the same time fails with error saying that the command is not
supported:
```
$ ceph mgr fail; ceph fs volume ls
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'volumes' is not enabled/loaded (required by command 'fs volume ls'): use `ceph mgr module enable volumes` to enable it
```

We should try to lighten any bottlenecks in the mgr module `serve()`
functions wherever possible, but the root cause of this failure is that the
mgr sends a beacon to the mon too early, indicating that it is active before
the module loading has completed. Specifically, some of the mgr modules
have loaded their class but have not yet been deleted from the `pending_modules`
structure, indicating that they have not finished starting up.

--------------------- Explanation of Fix  --------------------

This commit improves the criteria for sending the “active” beacon to the mon so
the mgr does not signal that it’s active too early. We do this through the following additions:

1. A new context `ActivePyModules::recheck_modules_start` that will be set if not all modules
   have finished startup.

2. A new function `ActivePyModules::check_all_modules_started()` that checks if modules are
   still pending startup; if all have started up (`pending_modules` is empty), then we send
   the beacon right away. But if some are still pending, we pass the beacon task on to the new
   recheck context `ActivePyModules::recheck_modules_start` so we know to send the beacon later.

3. Logic in ActivePyModules::start_one() that only gets triggered if the modules did not all finish
   startup the first time we checked. We know this is the case if the new recheck context
   `recheck_modules_start` was set from `nullptr`. The beacon is only sent once `pending_modules` is
   confirmed to be empty, which means that all the modules have started up and are ready to support commands.

4. Adjustment of when the booleans `initializing` and `initialized` are set. These booleans come into play in
   MgrStandby::send_beacon() when we check that the active mgr has been initialized (thus, it is available).
   We only send the beacon when this boolean is set. Currently, we set these booleans at the end of Mgr::init(),
   which means that it gets set early before `pending_modules` is clear. With this adjustment, the bools are set
   only after we check that all modules have started up. The send_beacon code is triggered on mgr failover AND on
   every Mgr::tick(), which occurs by default every two seconds. If we don’t adjust when these bools are set, we
   only fix the mgr failover part, but the mgr still sends the beacon too early via Mgr::tick(). Below is the relevant
   code from MgrStandby::send_beacon(), which is triggered in Mgr::background_init() AND in Mgr::tick():
```
  // Whether I think I am available (request MgrMonitor to set me
  // as available in the map)
  bool available = active_mgr != nullptr && active_mgr->is_initialized();

  auto addrs = available ? active_mgr->get_server_addrs() : entity_addrvec_t();
  dout(10) << "sending beacon as gid " << monc.get_global_id() << dendl;

```

--------------------- Reproducing the Bug ----------------------

At face value, this issue is indeterministically reproducible since it
can depend on environmental factors or specific cluster workloads.
However, I was able to deterministically reproduce it by injecting a
bottleneck into the balancer module:
```
diff --git a/src/pybind/mgr/balancer/module.py b/src/pybind/mgr/balancer/module.py
index d12d69f..91c83fa8023 100644
--- a/src/pybind/mgr/balancer/module.py
+++ b/src/pybind/mgr/balancer/module.py
@@ -772,10 +772,10 @@ class Module(MgrModule):
                     self.update_pg_upmap_activity(plan)  # update pg activity in `balancer status detail`
                 self.optimizing = False
+                # causing a bottleneck
+                for i in range(0, 1000):
+                    for j in range (0, 1000):
+                        x = i + j
+                        self.log.debug("hitting the bottleneck in the balancer module")
             self.log.debug('Sleeping for %d', sleep_interval)
             self.event.wait(sleep_interval)
             self.event.clear()
```

Then, the error reproduces every time by running:
```
$ ./bin/ceph mgr fail; ./bin/ceph telemetry show
Error ENOTSUP: Warning: due to ceph-mgr restart, some PG states may not be up to date
Module 'telemetry' is not enabled/loaded (required by command 'telemetry show'): use `ceph mgr module enable telemetry` to enable it
```

With this fix, the active mgr is marked as "initialized" only after all
the modules have started up, and this error goes away. The command may
take a bit longer to execute depending on the extent of the delay.

---------------------- Integration Testing ---------------------

This commit adds a dev-only config that can inject a longer
loading time into the mgr module loading sequence so we can
simulate this scenario in a test.

The config is 0 ms by default since we do not add any delay
outside of testing scenarios. The config can be adjusted
with the following command:
  `ceph config set mgr mgr_module_load_delay <ms>`

A second dev-only config also allows you to specify which
module you want to be delayed in loading time. You may change
this with the following command:
  `ceph config set mgr mgr_module_load_delay_name <module name>`

The workunit added here tests a simulated slow loading module
scenario to ensure that this case is properly handled.

--------------------- Documentation --------------------

The new documentation describes the three existing mgr states so Ceph
operators can better interpret their Ceph status output.

Fixes: https://tracker.ceph.com/issues/71631
Signed-off-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Gil Bregman [Wed, 11 Mar 2026 17:46:14 +0000 (19:46 +0200)]

Merge pull request #67736 from gbregman/main

mgr/dashboard: Remove the clear-alerts parameter from NVMeoF CLI

commit | commitdiff | tree

Redouane Kachach [Wed, 11 Mar 2026 15:58:50 +0000 (16:58 +0100)]

Merge pull request #67431 from adk3798/cephadm-test-iscsi-ignorelist-pg-degraded

qa/rbd/iscsi/cluster: ignore PG_DEGRADED warning

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

commit | commitdiff | tree

Redouane Kachach [Wed, 11 Mar 2026 15:57:55 +0000 (16:57 +0100)]

Merge pull request #67428 from adk3798/test-cephadm-timeout-ignore-timeout

qa/cephadm: ignore CEPHADM_HOST_TIMEOUT_ERROR in timeout test

Reviewed-by: Redouane Kachach <rkachach@redhat.com>

commit | commitdiff | tree

Redouane Kachach [Wed, 11 Mar 2026 15:56:23 +0000 (16:56 +0100)]

Merge pull request #67393 from adk3798/cephadm-grafana-sample-fixup

cephadm/samples: don't specify localhost as grafana addr

Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Gil Bregman [Tue, 10 Mar 2026 16:37:12 +0000 (18:37 +0200)]

mgr/dashboard: Remove the clear-alerts parameter from NVMeoF CLI

Fixes: https://tracker.ceph.com/issues/74969
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Shilpa Jagannath [Tue, 10 Mar 2026 16:32:46 +0000 (09:32 -0700)]

Merge pull request #66203 from BBoozmen/wip-oozmen-73799

RGW/multisite: fix bucket-full-sync infinite loop caused by stale bucket_list_result reuse

commit | commitdiff | tree

NitzanMordhai [Tue, 10 Mar 2026 13:19:50 +0000 (15:19 +0200)]

Merge pull request #67640 from NitzanMordhai/wip-nitzan-suite-rados-singleton-bluestore-missing-mds

test: rados singleton-bluestore missing mds for cephtool tests

commit | commitdiff | tree

NitzanMordhai [Tue, 10 Mar 2026 13:19:40 +0000 (15:19 +0200)]

Merge pull request #67649 from NitzanMordhai/wip-nitzan-self-test-influx-set-hostname-plugin-test

qa/tasks/mgr: test_module_selftest set influx hostname to avoid warni…

commit | commitdiff | tree

NitzanMordhai [Tue, 10 Mar 2026 13:17:12 +0000 (15:17 +0200)]

Merge pull request #67097 from anthonyeleven/more-osd-metadata-metrics

src/pybind/mgr/prometheus: Add five OSD metadata metrics to module.py

commit | commitdiff | tree

Matan Breizman [Tue, 10 Mar 2026 08:26:18 +0000 (10:26 +0200)]

Merge pull request #67567 from xxhdx1985126/wip-seastore-objectdatahandler-zero-bug

crimson/os/seastore/object_data_handler: avoid reserving zero-length regions

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>

commit | commitdiff | tree

Matan Breizman [Tue, 10 Mar 2026 08:25:50 +0000 (10:25 +0200)]

Merge pull request #67697 from xxhdx1985126/wip-73820

crimson/osd/pg: drop inappropriate assertions

Reviewed-by: Matan Breizman <mbreizma@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 10 Mar 2026 07:59:40 +0000 (13:29 +0530)]

Merge pull request #67667 from rhcs-dashboard/consumption-chart-fixes

mgr/dashboard: fix consumption chart units

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>

commit | commitdiff | tree

Venky Shankar [Tue, 10 Mar 2026 06:36:42 +0000 (12:06 +0530)]

Merge pull request #65881 from edwinzrodriguez/ceph-wip-73427

mds: Remove unnecessary std::move in MDSRank

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Dhairya Parmar <dparmar@redhat.com>

commit | commitdiff | tree

bluikko [Tue, 10 Mar 2026 05:31:25 +0000 (12:31 +0700)]

Merge pull request #66639 from bluikko/wip-doc-radosgw-dedup-fixes

doc/radosgw: Improve language, formatting in s3_objects_dedup.rst

commit | commitdiff | tree

Shraddha Agrawal [Tue, 10 Mar 2026 05:21:13 +0000 (10:51 +0530)]

Merge pull request #67726 from shraddhaag/wip-shraddhaag-enable-crimson-basic

qa/suites/crimson-rados: enable cephadm tests

commit | commitdiff | tree

Afreen Misbah [Mon, 9 Mar 2026 20:20:06 +0000 (01:50 +0530)]

Merge pull request #67639 from rhcs-dashboard/empty-state-bugfix

mgr/dashboard: Use illustration image for empty state table

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Abhishek Desai <abhishek.desai1@ibm.com>
Reviewed-by: pujaoshahu <pshahu@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 9 Mar 2026 20:14:50 +0000 (01:44 +0530)]

Merge pull request #66254 from afreen23/remove-e2e-yaml

qa: Remove cephadm e2e tests from teuthology

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Laura Flores <lflores@ibm.com>

commit | commitdiff | tree

Matan Breizman [Mon, 9 Mar 2026 14:05:36 +0000 (16:05 +0200)]

Merge pull request #67589 from xxhdx1985126/wip-seastore-background-trans-cc-opt-new

crimson/os/seastore/cache: TRIM_DIRTY/CLEANER_* transactions won't invalidate other transactions anymore

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Kefu Chai [Mon, 9 Mar 2026 12:11:11 +0000 (20:11 +0800)]

Merge pull request #67257 from tchaikov/wip-doc-mgr-cli

doc: update mgr module command documentation for per-module registries

Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Xuehan Xu [Wed, 4 Mar 2026 15:12:51 +0000 (23:12 +0800)]

crimson/os/seastore/transaction: should consider non-aligned remapped
extents when updating paddrs for TRIM_DIRTY/CLEANER transactions

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Tue, 24 Feb 2026 07:35:58 +0000 (15:35 +0800)]

crimson/os/seastore/lba: TRIM/CLEANER trans to adjust deltas of
LBALeafNodes when committing them.

This is to deal with the following scenario:
1. A client transaction modifies the value of the LBALeafNode, but not
the pladdr but other field;
2. A TRIM/CLEANER transaction modifies the pladdr for the same laddr_t
concurrently

In the old approach, the client trans may override the pladdr with the
outdated value after the TRIM/CLEANER transaction commits

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Wed, 11 Feb 2026 06:50:27 +0000 (14:50 +0800)]

crimson/os/seastore: correct the exception condition when merging
rewritten fixed kv nodes

Fixes: https://tracker.ceph.com/issues/74798
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Sun, 1 Mar 2026 04:42:49 +0000 (12:42 +0800)]

crimson/os/seastore/btree: make updates of lba leaf nodes ptrs
synchronous with contents updates

Since we need merge content of lba leaf nodes when committing
trim/cleaner transactions, and we rely on the child ptrs to determine
whether to modify mappings of pending leaf nodes. We must make sure
the ptr updates and node content updates are synchronous.

See LBALeafNode::merge_content_to() for detail

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Thu, 11 Dec 2025 08:11:26 +0000 (16:11 +0800)]

crimson/os/seastore/extent_pinboard: reset 2q_state when removing
extents

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Mon, 1 Dec 2025 09:44:45 +0000 (17:44 +0800)]

crimson/os/seastore/transaction_manager: block client transactions if
they conflict with rewriting transactions until the rewriting
transactions finishes

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Mon, 1 Dec 2025 09:41:21 +0000 (17:41 +0800)]

crimson/os/seastore/cached_extent: treat extents under rewrite io as
stable too

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Wed, 26 Nov 2025 08:39:37 +0000 (16:39 +0800)]

crimson/os/seastore: disable linked tree node operations when committing
rewriting transactions

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Fri, 21 Nov 2025 13:01:15 +0000 (21:01 +0800)]

crimson/os/seastore/cache: rewrite transactions don't invalidate other
transactions anymore

Fixes: https://tracker.ceph.com/issues/73070
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Fri, 21 Nov 2025 08:49:00 +0000 (16:49 +0800)]

crimson/os/seastore/cache: drop unused last_commit

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Fri, 21 Nov 2025 07:20:24 +0000 (15:20 +0800)]

crimson/os/seastore/cache: since extent committer will also set
CachedExtent::prior_poffset, remove invalid asserts

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Fri, 21 Nov 2025 07:13:11 +0000 (15:13 +0800)]

crimson/os/seastore/cache: unlink mutated extents from the stable
extents' transaction views when committing or invalidating the
transaction

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Thu, 20 Nov 2025 09:16:57 +0000 (17:16 +0800)]

crimson/os/seastore/cache: add facilities to synchronize data and states
between rewriting trasactions and others when committing

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Wed, 19 Nov 2025 10:21:52 +0000 (18:21 +0800)]

crimson/os/seastore/seastore_types: define rewriting transactions

These are the transactions that only rewrite extents and mutation lba
nodes, e.g. TRIM_DIRTY and CLEANER transactions

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Tue, 14 Oct 2025 02:48:09 +0000 (10:48 +0800)]

crimson/os/seastore/async_cleaner: renew backref cursors when they are
generated by backref retrieval transactions and used by reclaim
transactions

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Sat, 11 Oct 2025 02:39:26 +0000 (10:39 +0800)]

crimson/os/seastore/lba_manager: make sure alloc_extents return viewable
mappings

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Tue, 14 Oct 2025 03:08:53 +0000 (11:08 +0800)]

crimson/os/seastore/async_cleaner: avoid its header dependence on
backref_manager.h

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Tue, 14 Oct 2025 03:05:19 +0000 (11:05 +0800)]

crimson/os/seastore/btree_types: BtreeCursors don't hold local copies of
lba/backref values

Since lba mapping values might change during the executions of
client transactions once we allow background transactions to be
submitted without invalidating client ones, we want to avoid other
components using lba/backref mappings from keep local copies to prevent
petential problem

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Xuehan Xu [Mon, 6 Oct 2025 04:00:07 +0000 (12:00 +0800)]

crimson/os/seastore/lba_mapping: don't allow classes above
TransactionManager to retrieve lba mappings' paddrs.

Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Aashish Sharma [Thu, 5 Mar 2026 04:30:02 +0000 (10:00 +0530)]

mgr/dashboard: fix consumption chart units

Fixes: https://tracker.ceph.com/issues/75278
Fixes: https://tracker.ceph.com/issues/75319
Signed-off-by: Aashish Sharma <aasharma@redhat.com>

commit | commitdiff | tree

Matan Breizman [Mon, 9 Mar 2026 09:52:36 +0000 (11:52 +0200)]

Merge pull request #67457 from Matan-B/wip-matanb-reactor-type

crimson/admin/osd_admin: introduce reactor_backend command

Reviewed-by: Mohit Agrawal <moagrawa@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 9 Mar 2026 09:33:47 +0000 (15:03 +0530)]

Merge pull request #67689 from kotreshhr/cephfs-mirror-remove-extra-wait

tools/cephfs_mirror: Remove additional wait in pop_dataq_entry

Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 9 Mar 2026 07:52:43 +0000 (13:22 +0530)]

Merge pull request #65858 from sajibreadd/wip-71167-scrub-improvement

mds: scrub pins more inodes than the mds_cache_memory_limit

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>

commit | commitdiff | tree

Ville Ojamo [Mon, 15 Dec 2025 17:04:18 +0000 (00:04 +0700)]

doc/radosgw: Improve language, formatting in s3_objects_dedup.rst

Use title case in section titles consistently.

Use the usual section title formatting syntax.

Indent all items in an unordered list consistently.

Wrap lines consistently at column 80.

Try to improve language and grammar.

Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>

commit | commitdiff | tree

Shraddha Agrawal [Mon, 23 Feb 2026 07:21:57 +0000 (12:51 +0530)]

qa/suites/crimson-rados: enable cephadm tests

This commit enables cephadm tests in the crimson suites. To
do the same, we make use of --osd-type flag to deploy crimson
OSDs.

Fixes: https://tracker.ceph.com/issues/71946
Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>

commit | commitdiff | tree

Kefu Chai [Mon, 9 Feb 2026 02:09:14 +0000 (10:09 +0800)]

doc: update mgr module command documentation for per-module registries

Update documentation to reflect the new per-module command registry
pattern introduced in PR #66467. The old global CLICommand decorators
have been replaced with module-specific registries.

Changes:
- doc/mgr/modules.rst: Rewrite CLICommand section with setup guide,
  update all examples to use AntigravityCLICommand pattern
- src/pybind/mgr/object_format.py: Add note explaining per-module
  registries and update all decorator examples
- doc/dev/developer_guide/dash-devel.rst: Update dashboard plugin
  examples to use DBCLICommand

All examples now correctly show:
- Creating registry with CLICommandBase.make_registry_subtype()
- Using module-specific decorator names (e.g., @StatusCLICommand.Read)
- Setting CLICommand class attribute for framework registration

Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Matan Breizman [Sun, 22 Feb 2026 11:37:36 +0000 (11:37 +0000)]

crimson/admin/osd_admin: introduce reactor_backend command

follow-up to: https://github.com/ceph/ceph/pull/67165

Blocked by: https://github.com/scylladb/seastar/pull/3266

Signed-off-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Gabriel Benhanokh [Sat, 7 Mar 2026 13:55:17 +0000 (15:55 +0200)]

Merge pull request #65423 from benhanokh/split_head_simple

rgw/dedup: split-head mechanism

commit | commitdiff | tree

Afreen Misbah [Sat, 7 Mar 2026 09:38:26 +0000 (15:08 +0530)]

Merge pull request #67541 from afreen23/resiliency-card

mgr/dashboard: Add resiliency card

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>

commit | commitdiff | tree

Avan [Sat, 7 Mar 2026 06:23:13 +0000 (11:53 +0530)]

Merge pull request #67435 from avanthakkar/qos-clusterwide

mgr/smb: QoS bandwidth pass-through and burst_mult parameter

Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Zack Cerza [Fri, 6 Mar 2026 23:08:01 +0000 (16:08 -0700)]

Merge pull request #66181 from anshuman-agarwala/ppc64-ci

[run-make] Added flag for Dashboard and WError

commit | commitdiff | tree

Patrick Donnelly [Fri, 6 Mar 2026 17:54:12 +0000 (12:54 -0500)]

Merge PR #67630 into main

* refs/pull/67630/head:
.github: limit what CI checks run for only doc/qa changes

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Fri, 6 Mar 2026 17:43:41 +0000 (12:43 -0500)]

Merge PR #67682 into main

* refs/pull/67682/head:
qa: remove ceph-deploy configs with no effect
qa: remove long retired ceph-deploy

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Yuri Weinstein <yweins@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>

commit | commitdiff | tree

Xuehan Xu [Fri, 6 Mar 2026 16:51:14 +0000 (00:51 +0800)]

crimson/osd/pg: drop inappropriate assertions

The handler of interruptions may be scheduler long after the
interruptions happen, when the world may has changed completely.
So the assertions about temporary states don't seem appropriate
in the handlers of those interruptions.

Fixes: https://tracker.ceph.com/issues/73820
Signed-off-by: Xuehan Xu <xuxuehan@qianxin.com>

commit | commitdiff | tree

Shraddha Agrawal [Tue, 3 Mar 2026 06:47:04 +0000 (12:17 +0530)]

.github: limit what CI checks run for only doc/qa changes

Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>

commit | commitdiff | tree

Yuval Lifshitz [Fri, 6 Mar 2026 15:54:55 +0000 (17:54 +0200)]

Merge pull request #67115 from ShreeJejurikar/wip-74491

rgw: Add bucket logging pytest test suite

commit | commitdiff | tree

Edwin Rodriguez [Wed, 8 Oct 2025 16:13:07 +0000 (12:13 -0400)]

mds: Remove unnecessary std::move on scrub_summary in MDSRank::get_task_status

Fixes: https://tracker.ceph.com/issues/73427
Signed-off-by: Edwin Rodriguez <edwin.rodriguez1@ibm.com>

commit | commitdiff | tree

Igor Fedotov [Fri, 6 Mar 2026 13:57:44 +0000 (16:57 +0300)]

Merge pull request #65513 from gardran/wip-gardran-plogpg-optimize

osd/PrimaryLogPG: avoid redundant container clones and lookups

Reviewed-by: Kefu Chai <tchaikov@gmail.com>

commit | commitdiff | tree

Ilya Dryomov [Fri, 6 Mar 2026 12:41:53 +0000 (13:41 +0100)]

Merge pull request #66368 from adamemerson/wip-neorados-leak

neorados: Fix Neorados CephContext leak and prevent future ones

Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Bill Scales <bill_scales@uk.ibm.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

benhanokh [Mon, 23 Feb 2026 09:26:17 +0000 (11:26 +0200)]

rgw/dedup split-head
Simplified check for shared-tail-objects.
Added test for copy after dedup
Use tail-ioctx when removing newly created tail-head

Signed-off-by: benhanokh <gbenhano@redhat.com>

commit | commitdiff | tree

Gabriel BenHanokh [Mon, 1 Dec 2025 06:48:57 +0000 (06:48 +0000)]

rgw/dedup split-head
Limit Split-Head to RGW-Objects without existing tail-objects (i.e.
obj_size <= 4MB)

Signed-off-by: benhanokh <gbenhano@redhat.com>

commit | commitdiff | tree

Gabriel BenHanokh [Mon, 15 Sep 2025 19:01:02 +0000 (19:01 +0000)]

rgw/dedup: split-head mechanism
Split head object into 2 objects - one with attributes and no data and
a new tail-object with only data.
The new-tail object will be deduped (unlike the head objects which can't
be dedup)
We will split head for objects with size 16MB or less

A few extra improvemnts:
Skip objects created by server-side-copy
Use reftag for comp-swap instead of manifest
Skip shared-manifest objects after readint attributes
Made max_obj_size_for_split and min_obj_size_for_dedup config value in
rgw.yaml.in

refined test: validate size after dedup
TBD: add rados ls -l to report object size on-bulk to speedup the process
improved tests - verify refcount are working, validate objects, remove
duplicates and then verify the last remaining object making sure it was
not deleted

Signed-off-by: Gabriel BenHanokh <gbenhano@redhat.com>

commit | commitdiff | tree

Naveen Naidu [Fri, 6 Mar 2026 11:39:09 +0000 (17:09 +0530)]

rados/src/common: use /etc/ceph_version to append vendor release version to the version string

commit | commitdiff | tree

Kotresh HR [Fri, 6 Mar 2026 07:28:38 +0000 (12:58 +0530)]

tools/cephfs_mirror: Remove additional wait in pop_dataq_entry

An additional wait has sneaked in while popping job from
syncm's data_q. When the conditional wait was converted to
timed wait as part of f6a6e781b887b01a640d6321a2c085577d9ba07e,
this should have been removed. The extra wait causes no
harm in most of the workflow but might cause issues when
the mirror daemon is stopped. So it should be removed.

This patch removes the extra cond wait

Introduced-by: f6a6e781b887b01a640d6321a2c085577d9ba07e
Signed-off-by: Kotresh HR <khiremat@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 3 Mar 2026 16:45:48 +0000 (22:15 +0530)]

mgr/dashboard: Fix snapshot Api firing twice

- two subs being created

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Thu, 26 Feb 2026 01:38:44 +0000 (07:08 +0530)]

mgr/dashboard: Add data resileincy panel

- adds table to show PG states and counts
- adds recovery io,read/write IO

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Wed, 25 Feb 2026 15:18:57 +0000 (20:48 +0530)]

mgr/dashboard: Add data resileincy card

- shows data resiliency status
- shows active-clean PGs donut chart
- shows reasons for missing active+clean PGs

Fixes https://tracker.ceph.com/issues/75067

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

bluikko [Fri, 6 Mar 2026 05:17:08 +0000 (12:17 +0700)]

Merge pull request #67668 from bluikko/wip-doc-crimson-fix-and-improvements

doc: Fix link and improve Crimson doc

commit | commitdiff | tree

Ville Ojamo [Thu, 5 Mar 2026 06:02:55 +0000 (13:02 +0700)]

doc: Fix link and improve Crimson doc

Fix Seastar external link that was not working.
Capitalize consistently as Crimson, SeaStore in text.
Fix typos including in a label and in a ref using it.
Wrap text at column 80.
Remove unused highlight directive.
Fix article and hyphenation.
Try to reduce amount of commas in text and improve language.
Use already existing label and ref instead of section title for link.
Use confval role for configuration keys in text.
Use an autoclass reference instead of hardcoding URL.
Trim spaces at end of lines and convert tabs to spaces.
Use a colon instead of a hyphen pretending to be an em dash.

Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>

commit | commitdiff | tree

Patrick Donnelly [Fri, 6 Mar 2026 02:07:08 +0000 (21:07 -0500)]

qa: remove ceph-deploy configs with no effect

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Thu, 5 Mar 2026 20:50:16 +0000 (15:50 -0500)]

qa: remove long retired ceph-deploy

Long live cephadm!

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Avan Thakkar [Mon, 2 Mar 2026 09:36:38 +0000 (15:06 +0530)]

qa/workunits/smb: update QoS tests for burst multipliers and bandwidth units

Signed-off-by: Avan Thakkar <athakkar@redhat.com>

commit | commitdiff | tree

Avan Thakkar [Mon, 2 Mar 2026 08:44:27 +0000 (14:14 +0530)]

doc/mgr/smb: update QoS doc with burst multipliers and bandwidth units

- Add read/write_burst_mult parameters (10-100 range, 15 default)
- Document human-readable bandwidth units (K,M,G,T)
- Add burst behavior explanation
- Remove obsolete delay parameters

Signed-off-by: Avan Thakkar <athakkar@redhat.com>

commit | commitdiff | tree

Avan Thakkar [Thu, 29 Jan 2026 07:18:59 +0000 (12:48 +0530)]

mgr/smb: QoS bandwidth pass-through and burst_mult parameter

Replace delay_max with burst_mult and add human-readable bandwidth
format support for QoS configuration.

Signed-off-by: Avan Thakkar <athakkar@redhat.com>

commit | commitdiff | tree

ShreeJejurikar [Thu, 5 Mar 2026 15:23:47 +0000 (20:53 +0530)]

rgw: add teuthology integration for bucket logging tests

Signed-off-by: ShreeJejurikar <shreemj8@gmail.com>

commit | commitdiff | tree

ShreeJejurikar [Thu, 26 Feb 2026 07:57:55 +0000 (13:27 +0530)]

rgw: add bucket logging pytest suite

Add a pytest-based test suite for RGW bucket logging that exercises the
radosgw-admin bucket logging CLI commands (list, info, flush) and
verifies the associated S3-level cleanup behavior.

Fixes: https://tracker.ceph.com/issues/74491
Signed-off-by: ShreeJejurikar <shreemj8@gmail.com>

commit | commitdiff | tree

Ilya Dryomov [Thu, 5 Mar 2026 13:42:57 +0000 (14:42 +0100)]

Merge pull request #67672 from bluikko/wip-doc-start-rbd-improve

doc: Improve start/quick-rbd.rst

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

John Mulligan [Thu, 5 Mar 2026 13:30:37 +0000 (08:30 -0500)]

Merge pull request #67571 from phlogistonjohn/jjm-smb-remotectl-local

smb: add remote-control local mode feature

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: Xavi Hernandez <xhernandez@gmail.com>

commit | commitdiff | tree

Ville Ojamo [Thu, 5 Mar 2026 09:02:42 +0000 (16:02 +0700)]

doc: Improve start/quick-rbd.rst

Remove mention of FAQ with a broken link.
Use ref for intra-docs links and add labels in destination documents.
Promptify all CLI example commands.
Use standard angle brackets for mandatory arguments in commands.
Remove an unused external link definition.
Trim spaces at end of lines and convert tabs to spaces.

Signed-off-by: Ville Ojamo <git2233+ceph@ojamo.eu>

commit | commitdiff | tree

anrao19 [Thu, 5 Mar 2026 10:06:18 +0000 (15:36 +0530)]

Merge pull request #67378 from ivancich/wip-add-datalog-error

rgw: only log errors to add_datalog_entry when error

commit | commitdiff | tree

Kefu Chai [Thu, 5 Mar 2026 09:32:49 +0000 (17:32 +0800)]

Merge pull request #67081 from adamemerson/wip-gcc16-clang21

Fixes for GCC 16 and Clang 21

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Reviewed-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Ernesto Puerta [Thu, 5 Mar 2026 08:53:21 +0000 (09:53 +0100)]

Merge pull request #67658 from batrick/qa-symlinks-workflow

.github: mitigate possible "hackerbot-claw" exploit

commit | commitdiff | tree

NitzanMordhai [Thu, 5 Mar 2026 06:48:39 +0000 (08:48 +0200)]

Merge pull request #66571 from NitzanMordhai/wip-nitzan-prometheus-HealthHistory-deadlock

mgr/prometheus: Use RLock to fix deadlock in HealthHistory

commit | commitdiff | tree

Patrick Donnelly [Wed, 4 Mar 2026 21:21:31 +0000 (16:21 -0500)]

.github: mitigate possible "hackerbot-claw" exploit

There's no reason to believe this script is actually vulnerable but
now it's best practice to avoid using pull_request_target.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Yuval Lifshitz [Wed, 4 Mar 2026 19:23:47 +0000 (21:23 +0200)]

Merge pull request #67653 from yuvalif/wip-yuval-75323

test/rgw/kafka: fix kafka relase to more recent one

commit | commitdiff | tree

John Mulligan [Fri, 27 Feb 2026 16:04:19 +0000 (11:04 -0500)]

doc: document the new locally_enabled field

Document the new locally_enabled field for the remote_control
subsection of the Cluster resource config.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Mon, 23 Feb 2026 17:24:04 +0000 (12:24 -0500)]

mgr/smb: configure smb service for new remote control local feature

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Mon, 23 Feb 2026 17:23:57 +0000 (12:23 -0500)]

mgr/smb: add an option to enable the local variation of remotectl

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Mon, 23 Feb 2026 17:23:38 +0000 (12:23 -0500)]

python-common/smb: add the remote-control-local feature

See previous commit for the meaning of this feature flag.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Mon, 23 Feb 2026 17:23:06 +0000 (12:23 -0500)]

cephadm: add support for a remote control local socket

It's not an oxymoron, it's Remote Control Local Socket (tm)!
This allows processes on the ceph host to use a unix domain socket
without mTLS to communicate with the remote control sidecar server
in the samba service.

At the higher level We treat the 2nd listener as a "feature" even
though it really configures the same sidecar as "remote-contol".
This way it's easy to have one of "remote-control",
"remote-control-local" or both in the service spec configuring the
smb service.

NOTE: This service does have the ability to verify that the client has
admin-ish access to ceph services by needing the client to pass
the ceph user name and key over the grpc headers.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

John Mulligan [Wed, 4 Mar 2026 17:45:12 +0000 (12:45 -0500)]

Merge pull request #67615 from phlogistonjohn/jjm-exo-show-fix

smb: fix ceph smb show ceph.smb.ext.cluster

Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

John Mulligan [Wed, 4 Mar 2026 17:44:32 +0000 (12:44 -0500)]

Merge pull request #67534 from phlogistonjohn/jjm-smb-debug-opts

smb: add debug level options to smb cluster resource

Reviewed-by: Xavi Hernandez <xhernandez@gmail.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Anoop C S <anoopcs@cryptolab.net>
Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Yuval Lifshitz [Wed, 4 Mar 2026 14:53:13 +0000 (14:53 +0000)]

test/rgw/kafka: fix kafka relase to more recent one

Fixes: https://tracker.ceph.com/issues/75323
Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>

commit | commitdiff | tree

Matan Breizman [Wed, 4 Mar 2026 13:34:40 +0000 (15:34 +0200)]

Merge pull request #67225 from amathuria/wip-amat-fix-74504

crimson/osd: fix PG splitting logic during map gaps

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

NitzanMordhai [Wed, 4 Mar 2026 12:38:59 +0000 (12:38 +0000)]

qa/tasks/mgr: test_module_selftest set influx hostname to avoid warnings during plugin test

This is a follow pr for pr#66376 and complete the set for influx server
start.

self-test will hit error MGR_INFLUX_NO_SERVER since we dont have
hostname configed, the following command will add a test hostname
so the error won't appear and fail the test.

Fixes: https://tracker.ceph.com/issues/72747
Signed-off-by: Nitzan Mordechai <nmordec@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Wed, 4 Mar 2026 12:35:12 +0000 (13:35 +0100)]

Merge pull request #67629 from idryomov/wip-75239

qa: rbd_mirror_fsx_compare.sh doesn't error out as expected

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom