git-server-git.apps.pok.os.sepia.ceph.com Git

]> git-server-git.apps.pok.os.sepia.ceph.com Git - ceph.git/log

Matan Breizman [Tue, 24 Feb 2026 10:25:41 +0000 (12:25 +0200)]

Merge pull request #66637 from Matan-B/wip-matanb-coroutine-repeat

test/crimson/test_crimson_coroutine: introduce interruptible repeat example

Reviewed-by: Samuel Just <sjust@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 24 Feb 2026 09:59:06 +0000 (15:29 +0530)]

Merge pull request #67474 from afreen23/health-card-hardware-tab

mgr/dashboard: Health card hardware tab

Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: pujaoshahu <pshahu@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Tue, 24 Feb 2026 09:51:42 +0000 (15:21 +0530)]

Merge pull request #67159 from rhcs-dashboard/subsystem-host-page

mgr/dashboard: NVMe – Fix host,listeners namespace list display on Subsystem resource page

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>

commit | commitdiff | tree

Kautilya Tripathi [Tue, 24 Feb 2026 08:56:34 +0000 (14:26 +0530)]

Merge pull request #67284 from knrt10/crimson-rgw-cls-get-config

cls/rgw_gc: read config via cls_get_config

commit | commitdiff | tree

Gil Bregman [Tue, 24 Feb 2026 06:40:07 +0000 (08:40 +0200)]

Merge pull request #67467 from gbregman/main

nvmeof: Change the NVMEOF image version to 1.7

commit | commitdiff | tree

Dnyaneshwari Talwekar [Tue, 24 Feb 2026 06:05:56 +0000 (11:35 +0530)]

Merge pull request #66857 from rhcs-dashboard/cephfs-mirroring-entity

mgr/dashboard: Cephfs Mirroring - Entity

Reviewed-by: Dnyaneshwari talwekar <dtalweka@redhat.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>
Reviewed-by: Ankush Behl <cloudbehl@gmail.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 23:51:58 +0000 (05:21 +0530)]

mgr/dashboard: Add hardware tab to health card

Fixes https://tracker.ceph.com/issues/75120

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 20:13:43 +0000 (01:43 +0530)]

mgr/dashboard: Added variations of alerts card sub total layout

- when health card's tab closed the layout is compact
- when health card's tab open the layout take space

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 19:33:15 +0000 (01:03 +0530)]

mgr/dashboard: Css fixes for health card and alerts card

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

baum [Mon, 23 Feb 2026 19:24:02 +0000 (21:24 +0200)]

Merge pull request #67453 from baum/crimson-ceph-context-leak

common: fix uninitialized nref in crimson CephContext

commit | commitdiff | tree

Ilya Dryomov [Mon, 23 Feb 2026 19:18:02 +0000 (20:18 +0100)]

Merge pull request #67379 from zdover23/wip-doc-2026-02-18-rados-config-mon-lookup-dns

doc: update broken reference

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Radoslaw Zarzynski [Mon, 23 Feb 2026 18:46:28 +0000 (19:46 +0100)]

Merge pull request #67295 from kamoltat/wip-ksirivad-fix-74524

qa/standalone: improve reliability of osd-backfill tests

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Shilpa Jagannath [Mon, 23 Feb 2026 18:24:34 +0000 (10:24 -0800)]

Merge pull request #66466 from smanjara/wip-fix-datasync-init

rgw/multisite: fix segfault during multisite startup

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 10:23:13 +0000 (15:53 +0530)]

fix for quorum in API

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 13 Feb 2026 23:14:46 +0000 (04:44 +0530)]

mgr/dashboard: Add systems tab to health card

Fixes https://tracker.ceph.com/issues/75065

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 15:52:41 +0000 (21:22 +0530)]

Merge pull request #67460 from afreen23/alerts-card

mgr/dashboard: Add alerts card

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>

commit | commitdiff | tree

Patrick Donnelly [Mon, 23 Feb 2026 15:29:13 +0000 (10:29 -0500)]

Merge PR #67135 into main

* refs/pull/67135/head:
pybind: remove compile_time_env parameter from setup.py files
pybind/rados,rgw: replace Tempita errno checks with C preprocessor
pybind/cephfs: replace deprecated IF with C preprocessor macro

Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>

commit | commitdiff | tree

Krunal Chheda [Mon, 23 Feb 2026 15:21:21 +0000 (20:51 +0530)]

Merge pull request #65947 from kchheda3/wipi-fix-lc-dm-delete

rgw/lc: Do not delete DM if its at end of pagination list.

Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

Krunal Chheda [Mon, 23 Feb 2026 15:20:33 +0000 (20:50 +0530)]

Merge pull request #65607 from kchheda3/wip-lc-skip-bucket

rgw/lc: Increase the timeout value while fetching the lc shard lock and update the logic on expired session

Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com>

commit | commitdiff | tree

pujaoshahu [Mon, 2 Feb 2026 08:46:20 +0000 (14:16 +0530)]

mgr/dashboard: NVMe – Fix host,listeners namespace list display on Subsystem resource page

Fixes: https://tracker.ceph.com/issues/74697
Signed-off-by: pujaoshahu <pshahu@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/block/block.module.ts

Signed-off-by: pujaoshahu <pshahu@redhat.com>

commit | commitdiff | tree

Igor Fedotov [Mon, 23 Feb 2026 14:40:20 +0000 (17:40 +0300)]

Merge pull request #67312 from ifed01/wip-ifed-fix-vselector-in-envmode_index_file

os/bluestore: fix vselector update after enveloped WAL recovery

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Casey Bodley [Mon, 23 Feb 2026 14:05:55 +0000 (09:05 -0500)]

Merge pull request #67445 from cbodley/wip-mailmap-bluikko

mailmap: update email address for Ville Ojamo

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Ville Ojamo <git2233+ceph@ojamo.eu>

commit | commitdiff | tree

Anthony D'Atri [Mon, 23 Feb 2026 14:03:40 +0000 (09:03 -0500)]

Merge pull request #67332 from anthonyeleven/docfix

doc/rados/operations: Improve formatting in crush-map.rst

commit | commitdiff | tree

kyr [Mon, 23 Feb 2026 13:23:46 +0000 (14:23 +0100)]

Merge pull request #67432 from kshtsk/wip-test-lua-ignore-tz

test/rgw/lua: ignore hours for zero mtime

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 23 Feb 2026 12:51:00 +0000 (18:21 +0530)]

Merge pull request #66108 from sseshasa/wip-rfe-implement-ok-to-upgrade-command

mgr/DaemonServer: Implement ok-to-upgrade command

Reviewed-by: Kamoltat Sirivadhna <ksirivad@redhat.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Reviewed-by: Nitzan Mordechai <nmordech@redhat.com>
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com>

commit | commitdiff | tree

Yuval Lifshitz [Mon, 23 Feb 2026 12:37:07 +0000 (14:37 +0200)]

Merge pull request #66316 from kchheda3/wip-fix-parse-url-crash

rgw/notification: Fix the crash in parse_url while initializing the regex

commit | commitdiff | tree

Yuval Lifshitz [Mon, 23 Feb 2026 12:35:46 +0000 (14:35 +0200)]

Merge pull request #66065 from mertsunacoglu/wip-lua-abort

rgw: Add Lua functionality for blocking requests

commit | commitdiff | tree

Kautilya Tripathi [Tue, 10 Feb 2026 05:31:26 +0000 (11:01 +0530)]

cls/rgw_gc/cls_rgw_gc: read config via cls_get_config

Commit https://github.com/ceph/ceph/commit/3877c1e37f2fa4e1574b57f05132288f210835a7
added new way to let CLS gain access to global configuration (`g_ceph_context`).

`cls_rgw_gc_queue_init` method is not using the new CLS call of `cls_get_config`
but instead directly uses `g_ceph_context`.

Crimson OSD implementation does **not** support `g_ceph_context` which results in a (SIGSEGV)
crash due to null access. Switching to `cls_get_config`, similarly to `cls_rgw.cc`, would allow
both OSD implementations to access the conf safely.

The above approach is well-defined due to the two orthogonal implementations of objclass.cc.
Classical OSD uses `src/osd/objclass.cc` While Crimson OSD uses `src/crimson/osd/objclass.cc`.

Fixes: https://tracker.ceph.com/issues/74844
Signed-off-by: Kautilya Tripathi <kautilya.tripathi@ibm.com>

commit | commitdiff | tree

Gil Bregman [Mon, 23 Feb 2026 10:56:54 +0000 (12:56 +0200)]

nvmeof: Change the NVMEOF image version to 1.7
Fixes: https://tracker.ceph.com/issues/75097
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>

commit | commitdiff | tree

Venky Shankar [Mon, 23 Feb 2026 10:24:10 +0000 (15:54 +0530)]

Merge PR #65467 into main

* refs/pull/65467/head:

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>

commit | commitdiff | tree

Venky Shankar [Mon, 23 Feb 2026 10:22:58 +0000 (15:52 +0530)]

Merge PR #66475 into main

* refs/pull/66475/head:

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Imran Imtiaz [Mon, 23 Feb 2026 10:22:31 +0000 (10:22 +0000)]

Merge pull request #67442 from imran-imtiaz/wip-dashboard-schedule-level

mgr/dashboard: add schedule_level to image API for pool/cluster snapshot schedule

commit | commitdiff | tree

Afreen Misbah [Sun, 22 Feb 2026 10:24:41 +0000 (15:54 +0530)]

mgr/dashboard: Add alerts card

Fixes https://tracker.ceph.com/issues/75066

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Dnyaneshwari Talwekar [Fri, 9 Jan 2026 09:59:50 +0000 (15:29 +0530)]

mgr/dashboard: Cephfs mirroring - Entity

Fixes: https://tracker.ceph.com/issues/74366
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 07:16:23 +0000 (12:46 +0530)]

Merge pull request #66981 from rhcs-dashboard/namespace-list-delete

mgr/dashboard: Add nvmeof namespace list and delete modal

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Naman Munet <nmunet@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 07:14:53 +0000 (12:44 +0530)]

Merge pull request #67360 from rhcs-dashboard/revamp-onboarding-screen

mgr/dashboard:revamp on-boarding screen

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Pedro Gonzalez Gomez <pegonzal@redhat.com>

commit | commitdiff | tree

Sridhar Seshasayee [Thu, 12 Feb 2026 20:03:25 +0000 (01:33 +0530)]

mgr/DaemonServer: Re-order OSDs in crush bucket to maximize OSDs for upgrade

DaemonServer::_maximize_ok_to_upgrade_set() attempts to find which OSDs
from the initial set found as part of _populate_crush_bucket_osds() can be
upgraded as part of the initial phase. If the initial set results in failure,
the convergence logic trims the 'to_upgrade' vector from the end until a safe
set is found.

Therefore, it would be advantageous to sort the OSDs by the ascending number
of PGs hosted by the OSDs. By placing OSDs with smallest (or no PGs) at the
beginning of the vector, the trim logic along with _check_offlines_pgs() will
have the best chance of finding OSDs to upgrade as it approaches a grouping
of OSDs that have the smallest or no PGs.

To achieve the above, a temporary vector of struct pgs_per_osd is created and
sorted for a given crush bucket. The sorted OSDs are pushed to the main
crush_bucket_osds that is eventually used to run the _check_offlines_pgs()
logic to find a safe set of OSDs to upgrade.

pgmap is passed to _populate_crush_bucket_osds() to utilize get_num_pg_by_osd()
for the above logic to work.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 27 Oct 2025 16:34:54 +0000 (22:04 +0530)]

mgr/DaemonServer: Implement ok-to-upgrade command

Implement a new Mgr command called 'ok-to-upgrade' that returns a set of OSDs
within the provided CRUSH bucket that are safe to upgrade without reducing
immediate data availability.

The command accepts the following as input:
- CRUSH bucket name (required)
   - The CRUSH bucket type is limited to 'rack', 'chassis', 'host' and 'osd'.
     This is to prevent users from specifying a bucket type higher up the tree
     which could result in performance issues if the number of OSDs in the
     bucket is very high.
- The new Ceph version to check against. The format accepted is the short
   form of the Ceph version, for e.g. 20.3.0-3803-g63ca1ffb5a2. (required)
- The maximum number of OSDs to consider if specified. (optional)

Implementation Details:

After sanity checks on the provided parameters, the following steps are
performed:

1. The set of OSDs within the CRUSH bucket is first determined.
2. From the main set of OSDs, a filtered set of OSDs not yet running the new
   Ceph version is created.
   - For this purpose, the OSD's 'ceph_version_short' string is read from
     the metadata. For this purpose a new method called
     DaemonServer::get_osd_metadata() is used. The information is determined
     from the DaemonStatePtr maintained within the DaemonServer.
3. If all OSDs are already running the new Ceph version, a success report is
   generated and returned.
4. If OSDs are not running the new Ceph version, a new set (to_upgrade) is
   created.
5. If the current version cannot be determined, an error is logged and the
   output report with 'bad_no_version' field populated with the OSD in question
   is generated.
6. On the new set (to_upgrade), the existing logic in _check_offline_pgs() is
   executed to see if stopping any or all OSDs in the set as part of the upgrade
   can reduce immediate data availability.
   - If data availability is impacted, then the number of OSDs in the filtered
     set is reduced by a factor defined by a new config option called
     'mgr_osd_upgrade_check_convergence_factor' which is set to 0.8 by default.
   - The logic in _check_offline_pgs() is repeated for the new set.
   - The above is repeated until a safe subset of OSDs that can be stopped for
     upgrade is found. Each iteration reduces the number of OSDs to check by
     the convergence factor mentioned above.
7. It must be noted that the default value of
   'mgr_osd_upgrade_check_convergence_factor' is on the higher side in order to
   help determine an optimal set of OSDs to upgrade. In other words, a higher
   convergence factor would help maximize the number of OSDs to upgrade. In this
   case, the number of iterations and therefore the time taken to determine the
   OSDs to upgrade is proportional to the number of OSDs in the CRUSH bucket.
   The converse is true if a lower convergence factor is used.
8. If the number of OSDs determined is lower than the 'max' specified, then an
   additional loop is executed to determine if other children of the CRUSH
   bucket can be added to the existing set.
9. Once a viable set is determined, an output report similar to the following is
   generated:

A standalone test is introduced that exercises the logic for both replicated
and erasure-coded pools by manipulating the min_size for a pool and check for
upgradability. The tests also performs other basic sanity checks and error
conditions.

The output shown below is for a cluster running on a single node with 10 OSDs
and with replicated pool configuration:

$ ceph osd ok-to-upgrade incerta06 01.00.00-gversion-test --format=json
{"ok_to_upgrade":true,"all_osds_upgraded":false,\
"osds_in_crush_bucket":[0,1,2,3,4,5,6,7,8,9],\
"osds_ok_to_upgrade":[0],"osds_upgraded":[],"bad_no_version":[]}

The following report is shown if all OSDs are running the desired Ceph version:

$ ceph osd ok-to-upgrade --crush_bucket  localrack \
  --ceph_version 20.3.0-3803-g63ca1ffb5a2
{"ok_to_upgrade":false,"all_osds_upgraded":true,\
"osds_in_crush_bucket":[0,1,2,3,4,5,6,7,8,9],"osds_ok_to_upgrade":[],\
"osds_upgraded":[0,1,2,3,4,5,6,7,8,9],"bad_no_version":[]}'

Fixes: https://tracker.ceph.com/issues/73031
Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

commit | commitdiff | tree

Sridhar Seshasayee [Mon, 2 Feb 2026 08:44:15 +0000 (14:14 +0530)]

mgr/DaemonServer: Modify offline_pg_report to handle set or vector types

The offline_pg_report structure to be used by both the 'ok-to-stop' and
'ok-to-upgrade' commands is modified to handle either std::set or std::vector
type containers. This is necessitated due to the differences in the way
both commands work. For the 'ok-to-upgrade' command logic to work optimally,
the items in the specified crush bucket including items found in the subtree
must be strictly ordered. The earlier std::set container re-orders the items
upon insertion by sorting the items which results in the offline pg check to
report sub-optimal results.

Therefore, the offline_pg_report struct is modified to use
std::variant<std::vector<int>, std::set<int>> as a ContainerType and handled
accordingly in dump() using std::visit(). This ensures backward compatibility
with the existing 'ok-to-stop' command while catering to the requirements of
the new 'ok-to-upgrade' command.

Signed-off-by: Sridhar Seshasayee <sseshasa@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Mon, 23 Feb 2026 07:09:40 +0000 (12:39 +0530)]

Merge pull request #67386 from afreen23/health-checks

mgr/dashboard: Add health check panel

Reviewed-by: Devika Babrekar <devika.babrekar@ibm.com>

commit | commitdiff | tree

Gadi [Mon, 23 Feb 2026 06:54:28 +0000 (08:54 +0200)]

Merge pull request #67330 from gadididi/nvmeof/add_rados_ns

mgr/dashboard: Adding RADOS namespace option into add_ns_req

commit | commitdiff | tree

pujaoshahu [Tue, 20 Jan 2026 06:14:44 +0000 (11:44 +0530)]

mgr/dashboard: Fix nvmeof namespace list and delete modal

Fixes: https://tracker.ceph.com/issues/74451
Signed-off-by: pujaoshahu <pshahu@redhat.com>
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/api/nvmeof.service.ts

Signed-off-by: pujaoshahu <pshahu@redhat.com>

commit | commitdiff | tree

SrinivasaBharathKanta [Mon, 23 Feb 2026 02:04:43 +0000 (07:34 +0530)]

Merge pull request #66575 from Tom-Sollers/ceph-pg-repeer-test

qa/standalone: Add a test for running repeer on simple ec and rep pools

commit | commitdiff | tree

SrinivasaBharathKanta [Mon, 23 Feb 2026 01:54:03 +0000 (07:24 +0530)]

Merge pull request #53457 from NitzanMordhai/wip-nitzan-crush-rule-delete

mon/OSDMonitor: remove unused crush rules after erasure code pools deleted

commit | commitdiff | tree

Ilya Dryomov [Sun, 22 Feb 2026 15:58:12 +0000 (16:58 +0100)]

Merge pull request #66876 from tchaikov/wip-librbd-pwl-fix-leaks

librbd/pwl: fix memory leaks in discard operations

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

gadi-didi [Thu, 12 Feb 2026 14:17:38 +0000 (16:17 +0200)]

mgr/dashboard: Adding rados ns option into add_ns_req

adding rados ns name option into add ns nvme command.

Signed-off-by: gadi-didi <gadi.didi@ibm.com>

commit | commitdiff | tree

Alexander Indenbaum [Sat, 21 Feb 2026 19:13:50 +0000 (21:13 +0200)]

common: fix uninitialized nref in crimson CephContext

Initialize nref(1) in the constructor so put() correctly releases
the context. LeakSanitizer reports a leak.

Signed-off-by: Alexander Indenbaum <aindenba@redhat.com>

commit | commitdiff | tree

Vallari Agrawal [Sun, 22 Feb 2026 10:09:27 +0000 (15:39 +0530)]

Merge pull request #67410 from VallariAg/wip-nvmeof-submodule-1.6.6

mgr/dashboard: bump nvmeof submodule to 1.6.7

commit | commitdiff | tree

Afreen Misbah [Mon, 16 Feb 2026 13:57:24 +0000 (19:27 +0530)]

mgr/dashboard: Add health check panel

Fixes https://tracker.ceph.com/issues/74958

- adds helath check panel in overview dashboard
- updates tests
- refactors component as per modern Angular convention
- using onPush CDS in Overview component
- using view model pattern to aggregate data for rendering

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 13 Feb 2026 23:14:46 +0000 (04:44 +0530)]

mgr/dashboard: Add health card

Fixes https://tracker.ceph.com/issues/74958

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Kefu Chai [Sun, 22 Feb 2026 02:56:02 +0000 (10:56 +0800)]

Merge pull request #64500 from tchaikov/wip-os-silence-Wsign-compare

os,common:change osd_target_transaction_size to uint

Reviewed-by: Matan Breizman <mbreizma@redhat.com>

commit | commitdiff | tree

Kyr Shatskyy [Thu, 19 Feb 2026 17:01:44 +0000 (18:01 +0100)]

test/rgw/lua: ignore hours for zero mtime

Check mtime for zero timestamp only date part if corresponds
to 1970-01-01 for UTC and ahead of UTC, and to 1969-12-31
for cases behind UTC.

Fixes: https://tracker.ceph.com/issues/75039
Signed-off-by: Kyr Shatskyy <kyrylo.shatskyy@clyso.com>

commit | commitdiff | tree

Ilya Dryomov [Sat, 21 Feb 2026 16:12:24 +0000 (17:12 +0100)]

Merge pull request #66735 from ajarr/wip-fix-schedule-start-time

mgr/rbd_support: Fix "start-time" arg behavior

Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

Ramana Raja [Wed, 24 Dec 2025 10:24:50 +0000 (05:24 -0500)]

mgr/rbd_support: Fix "start-time" arg behavior

The "start-time" argument, optionally passed when adding or removing an
mirror image snapshot schedule or a trash purge schedule, does not
behave as intended. It is meant to schedule an initial operation at a
specific time of day in a given time zone. Instead, it offsets the
schedule’s anchor time. By default, the scheduler uses the UNIX epoch as
the anchor to calculate recurring schedule times, and "start-time"
simply shifts this anchor away from UTC, which can confuse users. For
example:

```
$ # current time
$ date --universal
Wed Dec 10 05:55:21 PM UTC 2025
$ rbd mirror snapshot schedule add -p data --image img1 1h 19:00Z
$ rbd mirror snapshot schedule ls -p data --image img1
every 15m starting at 19:00:00+00:00
```

A user might assume that the scheduler will run the first snapshot each
day at 19:00 UTC and then run snapshots every 15 minutes. Instead, the
scheduler runs the first snapshot at 18:00 UTC and then continues at the
configured interval:

```
$ rbd mirror snapshot schedule status -p data --image img1
SCHEDULE TIME IMAGE
2025-12-10 18:00:00 data/img1
```

Additionally, the "start-time" argument accepts a full ISO 8601
timestamp but silently ignores everything except hour, minute, and time
zone. Even time zone handling is incorrect: specifying "23:00-01:00"
with an interval of "1d" results in a snapshot taken once per day at
22:00 UTC rather than 00:00 UTC, because only utcoffset.seconds is used
while utcoffset.days is ignored.

Fix:
Similar to the handling of the "start" argument in the FS snap-schedule
manager module, require "start-time" to use an ISO 8601 date-time format
with a mandatory date component. Time and time zone are optional and
default to 00:00 and UTC respectively.

The "start-time" now defines the anchor time used to compute recurring
schedule times. The default anchor remains the UNIX epoch. Existing
on-disk schedules with legacy-format "start-time" values are updated to
include the date Jan 1, 1970.

The `snap schedule ls` output now displays "start-time" with date and
time in the format "%Y-%m-%d %H:%M:00". The display time is in UTC.

Fixes: https://tracker.ceph.com/issues/74192
Signed-off-by: Ramana Raja <rraja@redhat.com>

commit | commitdiff | tree

Kefu Chai [Tue, 15 Jul 2025 06:40:09 +0000 (14:40 +0800)]

common/options: change osd_target_transaction_size from int to uint

Change osd_target_transaction_size from signed int to unsigned int to
match the return type of Transaction::get_num_opts() (ceph_le64).

This change:
- Eliminates compiler warnings when comparing signed/unsigned values
- Enables automatic size conversion (e.g., "4_K" → 4096) via y2c.py
for improved administrator usability
- Maintains type consistency throughout the codebase

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

commit | commitdiff | tree

Kefu Chai [Tue, 17 Feb 2026 11:41:32 +0000 (19:41 +0800)]

librbd/pwl: fix memory leaks in discard operations

Fix memory leak in librbd persistent write log (PWL) cache discard
operations by properly completing request objects.

ASan reported the following leaks in unittest_librbd:

  Direct leak of 240 byte(s) in 1 object(s) allocated from:
    #0 operator new(unsigned long)
    #1 librbd::cache::pwl::AbstractWriteLog<librbd::MockImageCtx>::discard(...)
       /ceph/src/librbd/cache/pwl/AbstractWriteLog.cc:935:5
    #2 TestMockCacheReplicatedWriteLog_discard_Test::TestBody()
       /ceph/src/test/librbd/cache/pwl/test_mock_ReplicatedWriteLog.cc:534:7

  Plus multiple indirect leaks totaling 2,076 bytes through the
  shared_ptr reference chain.

Root cause:

C_DiscardRequest objects were never deleted because their complete()
method was never called. The on_write_persist callback released the
BlockGuard cell but didn't call complete() to trigger self-deletion.

Write requests use WriteLogOperationSet which takes the request as
its on_finish callback, ensuring complete() is eventually called.
Discard requests don't use WriteLogOperationSet and must explicitly
call complete() in their on_write_persist callback.

Solution:

Call discard_req->complete(r) in the on_write_persist callback and
move cell release into finish_req() -- mirroring how C_WriteRequest
handles it. The complete() -> finish() -> finish_req() chain ensures
the cell is released after the user request is completed, preserving
the same ordering as write requests.

Test results:
- Before: 2,316 bytes leaked in 15 allocations
- After: 0 bytes leaked
- unittest_librbd discard tests pass successfully with ASan

Fixes: https://tracker.ceph.com/issues/74972
Signed-off-by: Kefu Chai <k.chai@proxmox.com>

commit | commitdiff | tree

Kefu Chai [Mon, 14 Jul 2025 10:50:48 +0000 (18:50 +0800)]

os/Transaction: change get_num_ops() return type to uint64_t

Change Transaction::get_num_ops() to return uint64_t instead of int
to match the underlying data.ops type (ceph_le<__u64>) and eliminate
compiler warnings about signed/unsigned comparison.

Fixes warning in ECTransaction.cc:

```
/home/kefu/dev/ceph/src/osd/ECTransaction.cc: In constructor ‘ECTransaction::Generate::Generate(PGTransaction&, ceph::ErasureCodeInterfaceRef&, pg_t&, const ECUtil::stripe_info_t&, const std::map<hobject_t, ECUtil::shard_extent_map_t>&, std::map<hobject_t, ECUtil::shard_extent_map_t>*, shard_id_map<ceph::os::Transaction>&, const OSDMapRef&, const hobject_t&, PGTransaction::ObjectOperation&, ECTransaction::WritePlanObj&, DoutPrefixProvider*, pg_log_entry_t*)’:
/home/kefu/dev/ceph/src/osd/ECTransaction.cc:589:25: warning: comparison of integer expressions of different signedness: ‘int’ and ‘__gnu_cxx::__alloc_traits<std::allocator<unsigned int>, unsigned int>::value_type’ {aka ‘unsigned int’} [-Wsign-compare]
589 | if (t.get_num_ops() > old_transaction_counts[int(shard)] &&
```

Signed-off-by: Kefu Chai <tchaikov@gmail.com>

commit | commitdiff | tree

John Mulligan [Fri, 20 Feb 2026 19:53:21 +0000 (14:53 -0500)]

Merge pull request #66557 from phlogistonjohn/jjm-smb-exo-cluster

smb: allow smb clusters to use cephfs from a different ceph cluster

Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Xavi Hernandez <xhernandez@gmail.com>
Reviewed-by: Adam King <adking@redhat.com>

commit | commitdiff | tree

Afreen Misbah [Fri, 20 Feb 2026 16:32:36 +0000 (22:02 +0530)]

Merge pull request #67256 from afreen23/storage-card

mgr/dashboard: Add storage card to overview page

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>

commit | commitdiff | tree

Adam Emerson [Fri, 20 Feb 2026 16:20:39 +0000 (11:20 -0500)]

Merge pull request #67423 from cbodley/wip-74573

qa/rgw: bucket notifications use pynose

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>

commit | commitdiff | tree

Imran Imtiaz [Fri, 20 Feb 2026 10:57:15 +0000 (10:57 +0000)]

mgr/dashboard: add schedule_level to image API for pool/cluster snapshot schedule

Add optional schedule_level param (image|pool|cluster) to
PUT /api/block/image/{image_spec}. Removes more-specific schedules
before setting at the chosen level. Backward compatible when omitted.

Fixes: https://tracker.ceph.com/issues/75043
Assisted-by: Cursor AI
Signed-off-by: Imran Imtiaz <imran.imtiaz@uk.ibm.com>

commit | commitdiff | tree

Casey Bodley [Fri, 20 Feb 2026 14:12:03 +0000 (09:12 -0500)]

Merge pull request #67397 from cbodley/wip-74047

doc/radosgw: document account-root for PUT and POST /admin/user

Reviewed-by: Ville Ojamo <git2233+ceph@ojamo.eu>

commit | commitdiff | tree

Casey Bodley [Fri, 20 Feb 2026 14:04:58 +0000 (09:04 -0500)]

mailmap: update email address for Ville Ojamo

add correct email address to .mailmap as the preferred address, and move
reference to github id `bluikko` to .githubmap

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Ilya Dryomov [Fri, 20 Feb 2026 11:26:00 +0000 (12:26 +0100)]

Merge pull request #67368 from idryomov/wip-write-log-operation-set-cell

librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

Reviewed-by: Miki Patel <miki.patel132@gmail.com>

commit | commitdiff | tree

Venky Shankar [Fri, 20 Feb 2026 08:49:31 +0000 (14:19 +0530)]

Merge PR #66907 into main

* refs/pull/66907/head:

Reviewed-by: Anoop C S <anoopcs@cryptolab.net>

commit | commitdiff | tree

Shraddha Agrawal [Fri, 20 Feb 2026 06:11:16 +0000 (11:41 +0530)]

Merge pull request #67274 from shraddhaag/wip-shraddhaag-cephadm-seastore-support

cephadm: add support for seastore

commit | commitdiff | tree

Shraddha Agrawal [Fri, 20 Feb 2026 06:10:49 +0000 (11:40 +0530)]

Merge pull request #67374 from shraddhaag/wip-shraddhaag-crimson-cephadm-raw

cephdm: add support for raw bluestore + crimson OSD deployment

commit | commitdiff | tree

Aashish Sharma [Mon, 16 Feb 2026 07:22:27 +0000 (12:52 +0530)]

mgr/dashboard:revamp on-borading screen

Signed-off-by: Aashish Sharma <aasharma@redhat.com>

commit | commitdiff | tree

Bill Scales [Thu, 19 Feb 2026 16:58:36 +0000 (16:58 +0000)]

Merge pull request #66685 from bill-scales/issue74218

osd: FastEC: always update pwlc epoch when activating

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>

commit | commitdiff | tree

Casey Bodley [Thu, 19 Feb 2026 15:09:44 +0000 (10:09 -0500)]

qa/rgw: bucket notifications use pynose

nose incompatibility in multisite tests was fixed by switching to pynose
in https://github.com/ceph/teuthology/pull/1947, so i'm trying the same
here

Fixes: https://tracker.ceph.com/issues/74573
Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Vallari Agrawal [Thu, 19 Feb 2026 14:28:32 +0000 (16:28 +0200)]

mgr/dashboard: bump nvmeof submodule to 1.6.7

update proto files and gateway submodule

Fixes: https://tracker.ceph.com/issues/75015
Signed-off-by: Vallari Agrawal <vallari.agrawal@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 19 Feb 2026 12:00:56 +0000 (13:00 +0100)]

Merge pull request #67221 from guits/node-proxy-various-fixes

node-proxy: major refactor and various fixes

commit | commitdiff | tree

Shraddha Agrawal [Thu, 12 Feb 2026 05:06:31 +0000 (10:36 +0530)]

doc: add docs for seastore support in cephadm

This commit updates the crimson user facing docs to add
instructions on how to deploy a crimson OSD with seastore
objectstore.

Signed-off-by: Shraddha Agrawal <shraddha.agrawal000@gmail.com>

commit | commitdiff | tree

Afreen Misbah [Thu, 19 Feb 2026 10:03:16 +0000 (15:33 +0530)]

Merge pull request #67277 from afreen23/nvmeof-api

mgr/dashboard: Add apis for add/del hosts on namespaces

Reviewed-by: Nizamudeen A <nia@redhat.com>

commit | commitdiff | tree

Matan Breizman [Thu, 19 Feb 2026 09:37:50 +0000 (11:37 +0200)]

Merge pull request #67165 from Matan-B/wip-matanb-io_uring

src/CMakeLists.txt: Allow Seastar to reuse HAVE_LIBURING

Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

commit | commitdiff | tree

Shraddha Agrawal [Thu, 19 Feb 2026 06:33:31 +0000 (12:03 +0530)]

Merge pull request #67375 from shraddhaag/wip-shraddhaag-availability-default-state

doc: update default availbility score status

commit | commitdiff | tree

Nitzan Mordechai [Thu, 14 Sep 2023 09:41:13 +0000 (09:41 +0000)]

mon/OSDMonitor: remove unused crush rules after erasure code pools deleted

When erasure code pools are created, a corresponding Crush rule is concurrently added to the Crush map.
However, when these pools are subsequently deleted, the associated rule persists within the Crush map.
In the event that a pool is re-created with the same name, the rule already exists.
However, if any modifications were made to the rule prior to the pool's deletion,
the new pool will inherit these modifications.

The proposed solution involves the automatic deletion of the Crush rule when a pool is deleted,
but only if no other pools are utilizing that particular rule.

Fixes: https://tracker.ceph.com/issues/62826
Signed-off-by: Nitzan Mordechai <nmordech@redhat.com>

commit | commitdiff | tree

naman munet [Thu, 19 Feb 2026 05:45:31 +0000 (11:15 +0530)]

Merge pull request #67132 from rhcs-dashboard/delete-gateway-nodes

mgr/dashboard: delete-gateway-nodes

commit | commitdiff | tree

Afreen Misbah [Wed, 18 Feb 2026 02:08:08 +0000 (07:38 +0530)]

mgr/dashboard: Removed Raw capacity toggle

- removed raw capacity toggle
- updated tests
- added polling for promethues queries
- added tests for formatter service functions

Signed-off-by: Afreen Misbah <afreen@ibm.com>

commit | commitdiff | tree

Ilya Dryomov [Mon, 16 Feb 2026 21:24:47 +0000 (22:24 +0100)]

librbd/cache/pwl: WriteLogOperationSet::cell can be garbage

The pointer is never initialized but gets printed by operator<<.
Luckily outside of that it's unused.

Fixes: https://tracker.ceph.com/issues/74971
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>

commit | commitdiff | tree

Ernesto Puerta [Wed, 18 Feb 2026 18:16:51 +0000 (19:16 +0100)]

Merge pull request #67119 from ceph/copilot/add-copilot-instructions-file

github: define contribution workflows for regular and backport PRs

commit | commitdiff | tree

copilot-swe-agent[bot] [Thu, 29 Jan 2026 10:01:57 +0000 (10:01 +0000)]

copilot: add GitHub Copilot instructions

Github allows to add a instructions file to each repo
(.github/copilot-instructions.md) to improve the behavior
of Copilot Reviews and Agent.

These instructions can also be customized per path, filetype, etc.:
https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions

This commit was authored through a Github Agent session: https://github.com/ceph/ceph/tasks/edeca07b-eabd-477c-917a-a18e72a0e2c2

Co-authored-by: GitHub Copilot noreply@github.com
Generated-by: Claude Sonnet 4.5
Signed-off-by: Ernesto Puerta <epuertat@redhat.com>

commit | commitdiff | tree

Igor Fedotov [Wed, 18 Feb 2026 16:28:49 +0000 (19:28 +0300)]

Merge pull request #66527 from gardran/wip-gardran-dump-omap

os/bluestore: add omap_bytes perf counter.

Reviewed-by: Adam Kupczyk <akupczyk@ibm.com>

commit | commitdiff | tree

Casey Bodley [Wed, 18 Feb 2026 15:50:25 +0000 (10:50 -0500)]

doc/radosgw: document account-root for PUT and POST /admin/user

like the `--account-root` option for `radosgw-admin user create` and
`user modify`, the admin apis also support the `account-root` query
param

Fixes: https://tracker.ceph.com/issues/74047
Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

Casey Bodley [Wed, 18 Feb 2026 15:48:46 +0000 (10:48 -0500)]

doc/radosgw: document account-id for `POST /admin/user`

the account-id field applies to both `PUT /admin/user` (Create User)
and `POST /admin/user` (Modify User) apis

Signed-off-by: Casey Bodley <cbodley@redhat.com>

commit | commitdiff | tree

John Mulligan [Fri, 23 Jan 2026 19:56:02 +0000 (14:56 -0500)]

doc/mgr: document new external ceph cluster support for smb

Document the new feature that allows smb clusters on one Ceph cluster to
make use of CephFS running on a different external Ceph cluster.

Signed-off-by: John Mulligan <jmulligan@redhat.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 16 Feb 2026 13:24:36 +0000 (14:24 +0100)]

mgr/cephadm: validate hostname in NodeProxyCache

This adds a _resolve_hosts() method to resolve hostname from kwargs
and raise OrchestratorError when the host has no node-proxy data.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Mon, 16 Feb 2026 12:49:46 +0000 (13:49 +0100)]

node-proxy: improve HTTP error logging in client

This commit makes it log the http error with the code and the reason
in sessionservice_discover() and log the error code along with the
body in query() for 5xx responses.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 12 Feb 2026 15:08:39 +0000 (16:08 +0100)]

node-proxy: get serial number instead of SKU

Let's get the serial number instead of SKU.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 12 Feb 2026 14:00:11 +0000 (15:00 +0100)]

node-proxy: allow multiple sources per component

COMPONENT_SPECS can now be a single spec or a list of specs per component.
Data from all sources is merged. Unavailable paths are skipped.

Extract get_component_data() from update_component() to support the merge logic.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 11 Feb 2026 07:41:32 +0000 (08:41 +0100)]

node-proxy: re-auth and retry once on 401

This commit makes node-proxy clear the session, call login() and retry
the request when a 401 http error is caught.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Feb 2026 15:25:47 +0000 (16:25 +0100)]

node-proxy: fix flake8 E721 in _dict_diff

Use "is not" instead of "!=" for type comparison in _dict_diff()

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Feb 2026 15:15:42 +0000 (16:15 +0100)]

node-proxy: make the update loop interval configurable

Read system.refresh_interval from config and use it in the update loop
sleep. The default value is 180s when unset.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Feb 2026 14:59:55 +0000 (15:59 +0100)]

mgr/node-proxy: fix "ceph orch hardware status --category criticals"

The criticals path was using the wrong data shape:
node-proxy sends status as:

component -> sys_id -> member

but the code assumed:

sys_id -> component -> member

This fixes get_critical_from_host() and _criticals_table() to iterate
in the correct order and build the criticals result with the right
nesting.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 10 Feb 2026 14:46:03 +0000 (15:46 +0100)]

node-proxy: normalize storage data per member

Let's apply normalize_dict() to each member's data only, so the first
level keys (that are redfish member identifiers like "Self") are not
lowercased.

This avoids duplicate entries in hardware status.

Example:

```
[root@node-proxy-1 cephadm]# ./cephadm shell -- ceph orch hardware status --category criticals
Inferring fsid 9d6d6012-067a-11f1-8e61-525400a04a72
Inferring config /var/lib/ceph/9d6d6012-067a-11f1-8e61-525400a04a72/mon.node-proxy-1/config
+--------------+-----------+------+--------+-------+
|     HOST     | COMPONENT | NAME | STATUS | STATE |
+--------------+-----------+------+--------+-------+
| node-proxy-1 |    self   | None |  N/A   |  N/A  |
| node-proxy-1 |    Self   | None |  N/A   |  N/A  |
+--------------+-----------+------+--------+-------+
```

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Thu, 5 Feb 2026 09:01:06 +0000 (10:01 +0100)]

node-proxy: encapsulate send logic in dedicated method

Move the "send data to mgr when inventory changed" logic from main()
into a dedicated method _try_send_update().
This flattens the reporter loop and keeps main() to a single call under
the lock.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 4 Feb 2026 14:46:29 +0000 (15:46 +0100)]

node-proxy: log actual data delta in reporter

this adds a _dict_diff() function that computes recursive dict diff
and uses it in reporter to log the delta (truncated at 2048 chars)

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 4 Feb 2026 14:15:23 +0000 (15:15 +0100)]

node-proxy: add periodic heartbeats in main and reporter loops

This logs an info message every 5 minutes so that logs show the agent
and reporter are still running when nothing else is logged.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Wed, 4 Feb 2026 13:16:40 +0000 (14:16 +0100)]

node-proxy: adjust log levels

Let's adjust log levels across the project:

- use warning for bad request in the API,
when thread is not alive and for retry failure,
- use error for OOB load failure,
- use info for backoff interval,
- use debug in send attempts and for member fetch

Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

commit | commitdiff | tree

Guillaume Abrioux [Tue, 3 Feb 2026 15:26:16 +0000 (16:26 +0100)]

node-proxy: add unit tests

This adds some unit tests.

Fixes: https://tracker.ceph.com/issues/74749
Signed-off-by: Guillaume Abrioux <gabrioux@ibm.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom