]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
3 months agoosd: Multiple Decode fixes.
Alex Ainscow [Wed, 18 Jun 2025 19:46:49 +0000 (20:46 +0100)]
osd: Multiple Decode fixes.

Fix 1:

These are multiple fixes that affected the same code. To simplify review
and understanding of the changes, they have been merged into a single
commit.

What happened in defect is (k,m = 6,4)

1. State is: fast_reads = true, shards 0,4,5,6,7,8 are available. Shard 1 is missing this object.
2. Shard 5 only needs zeros, so read is dropped. Other sub read message sent.
3. Object on shard 1 completes recovery (so becomes not-missing)
4. Read completes, complete notices that it only has 5 reads, so calculates what it needs to re-read.
5. Calculates it needs 0,1,4,5,6,7 - and so wants to read shard 1.
6. Code assumes that enough reads should have been performed, so refused to do another reads and instead generates an EIO.

The problem here is some "lazy" code in step (4).  What is should be doing is working out that it
can use the zero buffers and not calling get_remaining_reads().  Instead, what it attempts to do is
call get_remaining_reads() and if there is no work to do, then it assumes it has everything
already and completes the read with success.  This assumption mostly works - but in this
combination of fast_reads picking less than k shards to read from AND an object completing
recovery in parallel causes issue.

The solution is to wait for all reads to complete and then assume that any remaining zero buffers
count as completed reads.  This should then cause the plugin to declare "success"

Fix 2:

There are decodes in new EC which can occur when less than k
shards have been read.  These reads in the last stripe, where
for decoding purposes, the data past the end of the shard can
be considered zeros. EC does not read these, but instead relies
on the decode function inventing the zero buffers.

This was working correctly when fast reads were turned off, but
if such an IO was encountered with fast reads turned on the
logic was disabled and the IO returns an EIO.

This commit fixes that logic, so that if all reads have complete
and send_all_remaining_reads conveys that no new reads were
requested, then decode will still be possible.

FIX 3:

If reading the end of an object with unequally sized objects,
we pad out the end of the decode with zeros, to provide
the correct data to the plugin.

Previously, the code decided not to add the zeros to "needed"
shards.  This caused a problem where for some parity-only
decodes, an incomplete set of zeros was generated, fooling the
_decode function into thinking that the entire shard was zeros.

In the fix, we need to cope with the case where the only data
needed from the shard is the padding itself.

The comments around the new code describe the logic behind
the change.

This makes the encode-specific use case of padding out the
to-be-decoded shards unnecessary, as this is performed by the
pad_on_shards function below.

Also fixing some logic in calculating the need_set being passed
to the decode function did not contain the extra shards needed
for the decode. This need_set is actually ignored by all the
plugins as far as I know, but being wrong does not seem
helpful if its needed in the future.

Fix 4: Extend reads when recovering parity

Here is an example use case which was going wrong:
1. Start with 3+2 EC, shards 0,3,4 are 8k shard 1,2 is 4k
2. Perform a recovery, where we recover 2 and 4.  2 is missing, 4 can be copied from another OSD.
3. Recovery works out that it can do the whole recovery with shards 0,1,3. (note not 4)
4. So the "need" set is 0,1,3, the "want" set is 2,4 and the "have" set is 0,1,3,4,5
5. The logic in get_all_avail_shards then tries to work out the extents it needs - it only. looks at 2, because we "have" 4
6. Result is that we end up reading 4k on 0,1,3, then attempt to recover 8k on shard 4 from this... which clearly does not work.

Fix 5: Round up padding to 4k alignment in EC

The pad_on_shards was not aligning to 4k.  However, the decode/encode functions were. This meant that
we assigned a new buffer, then added another after - this should be faster.

Fix 6: Do not invent encode buffers before doing decode.

In this bug, during recovery, we could potentially be creating
unwanted encode buffers and using them to decode data buffers.

This fix simply removes the bad code, as there is new code above
which is already doing the correct action.

Fix 7: Fix miscompare with missing decodes.

In this case, two OSDs failed at once. One was replaced and the other was not.

This caused us to attempt to encode a missing shard while another shard was missing, which
caused a miscompare because the recovery failed to do the decode properly before doing an encode.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
3 months agoosd: Optimized EC pools bug fix when repeating GetLog
Bill Scales [Wed, 18 Jun 2025 11:11:51 +0000 (12:11 +0100)]
osd: Optimized EC pools bug fix when repeating GetLog

When the primary shard of an optimized EC pool does not have
a copy of the log it may need to repeat the GetLog peering
step twice, the first time to get a full copy of a log from
a shard that sees all log entries and then a second time
to get the "best" log from a nonprimary shard which may
have a partial log due to partial writes.

A side effect of repeating GetLog is that the missing
list is collected for both the "best" shard and the
shard that provides a full copy of the log. This later
missing list confuses later steps in the peering
process and may cause this shard to complete writes
and end up diverging from the primary. Discarding
this missing list causes Peering to behave the same as if
the GetLog step did not need to be repeated.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
3 months agoosd: Fix attribute recover in rare recovery scenario
Alex Ainscow [Wed, 11 Jun 2025 15:30:40 +0000 (16:30 +0100)]
osd: Fix attribute recover in rare recovery scenario

When recovering attributes, we read them from the first potential primary, then
if that read failures, attempt to read from another potential primary.

The problem is that the code which calculates which shards to read for a recovery
only takes into account *data* and not where the attributes are.  As such, if the
second read only required a non-primary, then the attribute read fails and the
OSD panics.

The fix is to detect this scenario and perform an empty read to that shard, which
the attribute-read code can use for attribute reads.

Code was incorrectly interpreting a failed attribute read on recovery as
meaning a "fast_read". Also, no attribute recovery would occur in this case.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
3 months agoosd: code clean up and debug in optimised EC
Alex Ainscow [Wed, 11 Jun 2025 15:23:08 +0000 (16:23 +0100)]
osd: code clean up and debug in optimised EC

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
3 months agoosd: EC Optimizations fix bugs in applying pwlc to update info and log
Bill Scales [Wed, 11 Jun 2025 14:53:48 +0000 (15:53 +0100)]
osd: EC Optimizations fix bugs in applying pwlc to update info and log

1. Refactor the code that applies pwlc to update info and log so that there
is one function rather than multiple copies of the code.

2. pwlc information is used to track shards that have not been updated by
partial writes. It is used to advance last_complete (and last_update and
the log head) to account for log entries that the shard missed. It was
only being applied if last_complete matched the range of partial writes
recorded in pwlc. When a shard has missing objects last_complete is
deliberately held before the oldest need, this stops pwlc being applied.
This is wrong - pwlc can still try and update last update and the log
head even if it cannot update last_complete.

3. When the primary receives info (and pwlc) information from OSD x(y)
it uses the pwlc information to update x(y)'s info. During backfill
there may be other shards z(y) which should also be updated using the
pwlc information.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
3 months agoMerge pull request #63408 from aainscow/ec_recovery_zero_detect
Alex Ainscow [Fri, 25 Jul 2025 06:17:05 +0000 (07:17 +0100)]
Merge pull request #63408 from aainscow/ec_recovery_zero_detect

OSD: EC recovery zero detect

3 months agoMerge pull request #64545 from ajarr/cleanup-librbd-mirror-enable-req
Ilya Dryomov [Fri, 25 Jul 2025 05:26:58 +0000 (07:26 +0200)]
Merge pull request #64545 from ajarr/cleanup-librbd-mirror-enable-req

librbd/mirror: cleanup EnableRequest::handle_get_mirror_image()

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 months agoMerge pull request #64193 from tchaikov/wip-bluestore-test-shutdown
SrinivasaBharathKanta [Fri, 25 Jul 2025 04:55:21 +0000 (10:25 +0530)]
Merge pull request #64193 from tchaikov/wip-bluestore-test-shutdown

os/bluestore: fix memory leak in HybridAllocator destructor

3 months agoMerge pull request #64661 from afreen23/revert-stylus
afreen23 [Thu, 24 Jul 2025 23:04:38 +0000 (04:34 +0530)]
Merge pull request #64661 from afreen23/revert-stylus

Revert "mgr/dashboard: Fix stylus issue"

Reviewed-by: Naman Munet <nmunet@redhat.com>
3 months agoMerge pull request #64352 from aclamk/aclamk-bs-remove-cache_blob
SrinivasaBharathKanta [Thu, 24 Jul 2025 23:03:08 +0000 (04:33 +0530)]
Merge pull request #64352 from aclamk/aclamk-bs-remove-cache_blob

os/bluestore: Get rid of unused CACHE_BLOB_BL

3 months agoMerge pull request #64219 from kamoltat/wip-ksirivad-fix-qa-rados-test
SrinivasaBharathKanta [Thu, 24 Jul 2025 23:02:24 +0000 (04:32 +0530)]
Merge pull request #64219 from kamoltat/wip-ksirivad-fix-qa-rados-test

qa/workunits/rados/test.sh: add timeout mechanism and more info to workloads with parallel tests

3 months agoMerge pull request #63642 from ifed01/wip-ifed-dynamic-vsel-params
SrinivasaBharathKanta [Thu, 24 Jul 2025 23:01:17 +0000 (04:31 +0530)]
Merge pull request #63642 from ifed01/wip-ifed-dynamic-vsel-params

os/bluestore: make vselector reserved* parameters applicable in run-time

3 months agoMerge pull request #64667 from afreen23/fix-localization-load
afreen23 [Thu, 24 Jul 2025 19:57:19 +0000 (01:27 +0530)]
Merge pull request #64667 from afreen23/fix-localization-load

mgr/dashboard: Fix issue with loading localization module

Reviewed-by: Nizamudeen A <nia@redhat.com>
3 months agoMerge pull request #62790 from ceph/rm-ceph-common-postun
Justin Caratzas [Thu, 24 Jul 2025 17:00:07 +0000 (13:00 -0400)]
Merge pull request #62790 from ceph/rm-ceph-common-postun

ceph.spec.in: don't rm ceph conf and logs in ceph-common postun

3 months agoMerge pull request #64535 from yuvalif/wip-yuval-71979
Yuval Lifshitz [Thu, 24 Jul 2025 16:26:35 +0000 (19:26 +0300)]
Merge pull request #64535 from yuvalif/wip-yuval-71979

rgw/test/notification: add more info when retry test fail

3 months agolibrbd/mirror: cleanup EnableRequest::handle_get_mirror_image()
Ramana Raja [Wed, 16 Jul 2025 17:43:18 +0000 (13:43 -0400)]
librbd/mirror: cleanup EnableRequest::handle_get_mirror_image()

In the EnableRequest state machine, clean up the handling of the async
request to fetch the mirror image, particularly when a non-primary image
is being created by the rbd-mirror daemon.

Signed-off-by: Ramana Raja <rraja@redhat.com>
3 months agoMerge pull request #64402 from sam0044/sam0044-bug_72034
Adam King [Thu, 24 Jul 2025 15:20:58 +0000 (11:20 -0400)]
Merge pull request #64402 from sam0044/sam0044-bug_72034

mgr/cephadm: updating maintenance health status in the serve loop

Reviewed-by: Adam King <adking@redhat.com>
3 months agoMerge pull request #63825 from leomylonas/main
Adam King [Thu, 24 Jul 2025 15:14:01 +0000 (11:14 -0400)]
Merge pull request #63825 from leomylonas/main

mgr/cephadm: handle possibly undefined template variable in haproxy.cfg.j2

Reviewed-by: Adam King <adking@redhat.com>
3 months agoMerge pull request #63391 from Kushal-deb/fix-rgw-bootstrap
Adam King [Thu, 24 Jul 2025 15:06:41 +0000 (11:06 -0400)]
Merge pull request #63391 from Kushal-deb/fix-rgw-bootstrap

mgr/rgw: Improve error handling and add --resume flag to 'ceph rgw realm bootstrap'

Reviewed-by: Adam King <adking@redhat.com>
3 months agoMerge pull request #64185 from kshtsk/wip-add-cephadm-file-path
Adam King [Thu, 24 Jul 2025 15:00:21 +0000 (11:00 -0400)]
Merge pull request #64185 from kshtsk/wip-add-cephadm-file-path

qa/tasks/cephadm: allow to select from 'cephadm' and 'cephadm.py'

Reviewed-by: Adam King <adking@redhat.com>
3 months agoMerge pull request #64635 from abitdrag/tracker_72134
Ilya Dryomov [Thu, 24 Jul 2025 14:45:52 +0000 (16:45 +0200)]
Merge pull request #64635 from abitdrag/tracker_72134

krbd: "rbd device map" command should use msgr2 by default

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 months agoMerge pull request #62886 from Kushal-deb/fix-issue-2302464-ceph_orch_commands_to_rep...
Adam King [Thu, 24 Jul 2025 14:11:33 +0000 (10:11 -0400)]
Merge pull request #62886 from Kushal-deb/fix-issue-2302464-ceph_orch_commands_to_report_the_exit_code

mgr/cephadm: Provide appropriate exit codes for orch operations

Reviewed-by: John Mulligan <jmulligan@redhat.com>
3 months agoMerge pull request #64454 from anoopcs9/fix-var-lib-samba-perms
Adam King [Thu, 24 Jul 2025 13:26:29 +0000 (09:26 -0400)]
Merge pull request #64454 from anoopcs9/fix-var-lib-samba-perms

cephadm: Bind mount /var/lib/samba with 0755

Reviewed-by: John Mulligan <jmulligan@redhat.com>
Reviewed-by: Sachin Prabhu <sp@spui.uk>
Reviewed-by: Shwetha K Acharya <Shwetha.K.Acharya@ibm.com>
3 months agomgr/dashboard: Fix issue with loading localization module
Afreen Misbah [Thu, 24 Jul 2025 11:29:10 +0000 (16:59 +0530)]
mgr/dashboard: Fix issue with loading localization module

- this module is only required to be imported from polyfill
-  nx now loading this module as well breaking CI tests and dashboard `@angular/localize`

Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 months agoMerge pull request #64616 from afreen23/fix-smb
afreen23 [Thu, 24 Jul 2025 11:14:54 +0000 (16:44 +0530)]
Merge pull request #64616 from afreen23/fix-smb

mgr/dashboard: Fix redirection of SMB enable module

Reviewed-by: Naman Munet <nmunet@redhat.com>
3 months agoRevert "mgr/dashboard: Fix stylus issue"
Afreen Misbah [Thu, 24 Jul 2025 06:58:33 +0000 (12:28 +0530)]
Revert "mgr/dashboard: Fix stylus issue"

This reverts commit c7053dff52bc1a93af45b8017a6e13a578a2f71e.

Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 months agokrbd: "rbd device map" command should use msgr2 by default
Miki Patel [Wed, 23 Jul 2025 11:41:10 +0000 (17:11 +0530)]
krbd: "rbd device map" command should use msgr2 by default

Making msgr2 and ms_mode=prefer-crc as default option for "rbd device
map" command

Fixes: https://tracker.ceph.com/issues/72134
Signed-off-by: Miki Patel <miki.patel132@gmail.com>
3 months agoMerge PR #62682 into main
Venky Shankar [Thu, 24 Jul 2025 05:07:31 +0000 (10:37 +0530)]
Merge PR #62682 into main

* refs/pull/62682/head:

Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 months agoMerge pull request #64633 from afreen23/fix-stylus
Nizamudeen A [Wed, 23 Jul 2025 16:50:58 +0000 (22:20 +0530)]
Merge pull request #64633 from afreen23/fix-stylus

mgr/dashboard: Fix stylus issue

3 months agoMerge pull request #64568 from cbodley/wip-qa-rgw-s3a-hadont
J. Eric Ivancich [Wed, 23 Jul 2025 16:47:23 +0000 (12:47 -0400)]
Merge pull request #64568 from cbodley/wip-qa-rgw-s3a-hadont

qa/rgw: remove hadoop-s3a subsuite

Reviewed-by: Adam C. Emerson <aemerson@redhat.com>
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
3 months agoMerge pull request #64639 from zdover23/wip-doc-2025-07-23-pr-64532-followup
Anthony D'Atri [Wed, 23 Jul 2025 13:04:49 +0000 (09:04 -0400)]
Merge pull request #64639 from zdover23/wip-doc-2025-07-23-pr-64532-followup

doc/radosgw: edit config-ref.rst

3 months agoMerge pull request #64640 from zdover23/wip-doc-2025-07-23-pr-64604-followup
Anthony D'Atri [Wed, 23 Jul 2025 13:04:06 +0000 (09:04 -0400)]
Merge pull request #64640 from zdover23/wip-doc-2025-07-23-pr-64604-followup

doc/cephfs: edit disaster-recovery.rst

3 months agodoc/cephfs: edit disaster-recovery.rst
Zac Dover [Wed, 23 Jul 2025 12:44:32 +0000 (22:44 +1000)]
doc/cephfs: edit disaster-recovery.rst

Follow up on the suggestions made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64604.

Signed-off-by: Zac Dover <zac.dover@proton.me>
3 months agodoc/radosgw: edit config-ref.rst
Zac Dover [Wed, 23 Jul 2025 12:36:04 +0000 (22:36 +1000)]
doc/radosgw: edit config-ref.rst

Follow up on the suggestions made by Anthony D'Atri in
https://github.com/ceph/ceph/pull/64532.

Signed-off-by: Zac Dover <zac.dover@proton.me>
3 months agoMerge pull request #63974 from baum/dsa
Adam King [Wed, 23 Jul 2025 12:14:44 +0000 (08:14 -0400)]
Merge pull request #63974 from baum/dsa

mgr/cephadm/nvmeof: idxd/dsa

Reviewed-by: Adam King <adking@redhat.com>
3 months agomgr/dashboard: Fix stylus issue
Afreen Misbah [Wed, 23 Jul 2025 09:33:17 +0000 (15:03 +0530)]
mgr/dashboard: Fix stylus issue

Fixes https://tracker.ceph.com/issues/72248

Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 months agoMerge pull request #64543 from phlogistonjohn/jjm-bib
Dan Mick [Tue, 22 Jul 2025 21:10:13 +0000 (14:10 -0700)]
Merge pull request #64543 from phlogistonjohn/jjm-bib

src/script/build-integration-branch improvements

3 months agoMerge pull request #62767 from rhcs-dashboard/notification-list-ui
afreen23 [Tue, 22 Jul 2025 11:07:08 +0000 (16:37 +0530)]
Merge pull request #62767 from rhcs-dashboard/notification-list-ui

mgr/dashboard: Add RGW bucket notification listing in dashboard

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
3 months agomgr/dashboard: Fix redirection of SMB enable module
Afreen Misbah [Tue, 22 Jul 2025 09:13:14 +0000 (14:43 +0530)]
mgr/dashboard: Fix redirection of SMB enable module
- taking to dashboard page due to remains of `buttonToEnableModule`

Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 months agoMerge pull request #64595 from aainscow/align_storage
Alex Ainscow [Tue, 22 Jul 2025 07:42:53 +0000 (08:42 +0100)]
Merge pull request #64595 from aainscow/align_storage

osd: Replace deprecated std::align_storage_t with alignas

3 months agoMerge pull request #64225 from rhcs-dashboard/inline-tip-notification
afreen23 [Tue, 22 Jul 2025 06:59:54 +0000 (12:29 +0530)]
Merge pull request #64225 from rhcs-dashboard/inline-tip-notification

mgr/dashboard: add support for inline-tip notification

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
3 months agoMerge pull request #64505 from abitdrag/tracker_71961
Ilya Dryomov [Tue, 22 Jul 2025 06:56:13 +0000 (08:56 +0200)]
Merge pull request #64505 from abitdrag/tracker_71961

librbd: images aren't closed in group_snap_*_by_record() on error

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 months agoMerge pull request #64527 from rhcs-dashboard/edit-storage-class-mgmt
naman munet [Tue, 22 Jul 2025 06:53:23 +0000 (12:23 +0530)]
Merge pull request #64527 from rhcs-dashboard/edit-storage-class-mgmt

mgr/dashboard: Storage Class - Update

3 months agomgr/dashboard: Add RGW bucket notification listing in dashboard
pujaoshahu [Thu, 10 Apr 2025 17:29:06 +0000 (22:59 +0530)]
mgr/dashboard: Add RGW bucket notification listing in dashboard

Fixes: https://tracker.ceph.com/issues/70880
Signed-off-by: pujaoshahu <pshahu@redhat.com>
Signed-off-by: pujashahu <pshahu@redhat.com>
3 months agoMerge pull request #64591 from tchaikov/wip-auth-remove-unused
Kefu Chai [Tue, 22 Jul 2025 01:07:57 +0000 (09:07 +0800)]
Merge pull request #64591 from tchaikov/wip-auth-remove-unused

auth: remove unused AuthTicket::renew_after member variable

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 months agoMerge pull request #64437 from avanthakkar/add-smb-metadata-metric
Avan [Mon, 21 Jul 2025 14:49:29 +0000 (20:19 +0530)]
Merge pull request #64437 from avanthakkar/add-smb-metadata-metric

mgr/prometheus: add smb_metadata metric

3 months agoMerge pull request #64604 from zdover23/wip-doc-2025-07-21-cephfs-disaster-recovery...
Zac Dover [Mon, 21 Jul 2025 13:57:34 +0000 (23:57 +1000)]
Merge pull request #64604 from zdover23/wip-doc-2025-07-21-cephfs-disaster-recovery-data-pool-damage

doc/cephfs: edit disaster-recovery.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 months agomgr/dashboard: Storage Class - Update
Dnyaneshwari [Wed, 16 Jul 2025 10:02:22 +0000 (15:32 +0530)]
mgr/dashboard: Storage Class - Update

Fixes: https://tracker.ceph.com/issues/72156
Signed-off-by: Dnyaneshwari Talwekar <dtalwekar@redhat.com>
3 months agodoc/cephfs: edit disaster-recovery.rst
Zac Dover [Mon, 21 Jul 2025 12:50:19 +0000 (22:50 +1000)]
doc/cephfs: edit disaster-recovery.rst

Edit the section "Data Pool Damage" in doc/cephfs/disaster-recovery.rst.
This commit is part of the project of improving the data-recovery parts
of the CephFS documentation, as requested in the Ceph Power Users
Feedback Summary in mid-2025.

Signed-off-by: Zac Dover <zac.dover@proton.me>
3 months agoMerge pull request #61767 from ivoalmeida/mfe-app-shell
afreen23 [Mon, 21 Jul 2025 09:44:36 +0000 (15:14 +0530)]
Merge pull request #61767 from ivoalmeida/mfe-app-shell

mgr/dashboard: set up dashboard as a app shell for plugin framework

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
3 months agoauth: remove unused AuthTicket::renew_after member variable
Kefu Chai [Sun, 20 Jul 2025 23:09:11 +0000 (07:09 +0800)]
auth: remove unused AuthTicket::renew_after member variable

The AuthTicket::renew_after field is only set in init_timestamps() and
read by dump() for debugging purposes. It has no functional use cases
and causes encoding/decoding inconsistencies.

During decoding, this field remains unchanged, creating discrepancies
between original and decoded values. This issue was masked because
check-generated.sh and readable.sh reused struct instances, preserving
stale field values across encode/decode cycles.

An upcoming change will allocate fresh instances for each decode
operation, which would expose these inconsistent values.

Remove the unused field to eliminate the encoding inconsistency and
simplify the codebase.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
3 months agoosd: Replace deprecated std::align_storage_t with alignas
Alex Ainscow [Mon, 21 Jul 2025 07:17:57 +0000 (08:17 +0100)]
osd: Replace deprecated std::align_storage_t with alignas

C++23 has been enabled, causing deprecated warnings. Following the
"possible implementation" in the C++ docs, I have replaced the last
remaining aligned_storage_t.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
3 months agoMerge pull request #63407 from adk3798/cephadm-rbd-iscsi-ignore-mon-down
Ilya Dryomov [Mon, 21 Jul 2025 07:16:51 +0000 (09:16 +0200)]
Merge pull request #63407 from adk3798/cephadm-rbd-iscsi-ignore-mon-down

qa/rbd/iscsi: ignore MON_DOWN warning in logs

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
3 months agomgr/dashboard: add support for inline-tip notification
Naman Munet [Fri, 27 Jun 2025 08:30:42 +0000 (14:00 +0530)]
mgr/dashboard: add support for inline-tip notification

https://tracker.ceph.com/issues/71870

Signed-off-by: Naman Munet <naman.munet@ibm.com>
3 months agoMerge PR #64005 into main
Venky Shankar [Mon, 21 Jul 2025 05:28:42 +0000 (10:58 +0530)]
Merge PR #64005 into main

* refs/pull/64005/head:
qa: Run test_admin with the squid client

Reviewed-by: Dhairya Parmar <dparmar@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 months agoMerge PR #63214 into main
Venky Shankar [Mon, 21 Jul 2025 05:26:30 +0000 (10:56 +0530)]
Merge PR #63214 into main

* refs/pull/63214/head:
release note: add a note that "subvolume info" cmd output can also...
doc/cephfs: update docs since "subvolume info" cmd output can also...
qa/cephfs: add test to check clone source info's present in...
mgr/vol: show clone source info in "subvolume info" cmd output
mgr/vol: keep clone source info even after cloning is finished

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Neeraj Pratap Singh <neesingh@redhat.com>
3 months agoMerge PR #57953 into main
Venky Shankar [Mon, 21 Jul 2025 05:23:54 +0000 (10:53 +0530)]
Merge PR #57953 into main

* refs/pull/57953/head:
mds: Mark the scrub passed if dirfrag is dirty

Reviewed-by: Rishabh Dave <ridave@redhat.com>
Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 months agoMerge PR #64356 into main
Venky Shankar [Mon, 21 Jul 2025 05:22:50 +0000 (10:52 +0530)]
Merge PR #64356 into main

* refs/pull/64356/head:
client: prohibit unprivileged users from setting sgid/suid bits

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
3 months agoMerge PR #58564 into main
Venky Shankar [Mon, 21 Jul 2025 04:09:13 +0000 (09:39 +0530)]
Merge PR #58564 into main

* refs/pull/58564/head:
client: clamp sizes to INT_MAX in sync i/o code paths
client: restrict bufferlist to total write size
src/test: test sync/async i/o code paths with huge (4GiB) buffers

Reviewed-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Kotresh Hiremath Ravishankar <khiremat@redhat.com>
Reviewed-by: Christopher Hoffman <choffman@redhat.com>
3 months agoMerge pull request #64459 from cbodley/wip-72083
Kefu Chai [Sun, 20 Jul 2025 10:52:19 +0000 (18:52 +0800)]
Merge pull request #64459 from cbodley/wip-72083

deb/cephadm: add explicit --home for cephadm user

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Kefu Chai <tchaikov@gmail.com>
3 months agoMerge pull request #63261 from NitzanMordhai/wip-nitzan-msg-shutdown-hang-drain
SrinivasaBharathKanta [Sun, 20 Jul 2025 10:37:41 +0000 (16:07 +0530)]
Merge pull request #63261 from NitzanMordhai/wip-nitzan-msg-shutdown-hang-drain

msg: drain stack before stopping processors to avoid shutdown hang

3 months agoMerge pull request #63239 from mohit84/upgrade_health_warning
SrinivasaBharathKanta [Sun, 20 Jul 2025 10:37:17 +0000 (16:07 +0530)]
Merge pull request #63239 from mohit84/upgrade_health_warning

qa: Add "osds down" in log-ignorelist to avoid the test case failure during upgrade

3 months ago mgr/dashboard: add rollup as optional deps
Afreen Misbah [Sat, 19 Jul 2025 15:35:31 +0000 (21:05 +0530)]
 mgr/dashboard: add rollup as optional deps

    - for arm64 hitting (Use `node --trace-warnings ...` to show where the warning was created)
     NX   Cannot find module @rollup/rollup-linux-arm64-gnu. npm has a bug related to optional dependencies (https://github.com/npm/cli/issues/4828). Please try `npm i` again after removing both package-lock.json and node_modules directory.
    Pass --verbose to see the stacktrace.
    - due this this make check arm64 failing
    - added the fix as per https://github.com/vitejs/vite/discussions/15532#discussioncomment-13369584
    - its failing then  NX  Falling back to ts-node for local typescript execution. This may be a little slower.
     NX   Cannot find module '@rspack/binding-linux-arm64-gnu'
    - the above fix failed asking more deps from rollup, so added whole rollup package

Signed-off-by: Afreen Misbah <afreen@ibm.com
Signed-off-by: Afreen Misbah <afreen@ibm.com>
3 months agoMerge pull request #64567 from ronen-fr/wip-rf-72178auto
Ronen Friedman [Sat, 19 Jul 2025 14:37:18 +0000 (17:37 +0300)]
Merge pull request #64567 from ronen-fr/wip-rf-72178auto

osd/scrub: allow auto-repair on operator-initiated scrubs

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
3 months agoMerge pull request #63672 from ArbitCode/wip-raja-get-caller-identity
Raja [Sat, 19 Jul 2025 04:01:32 +0000 (09:31 +0530)]
Merge pull request #63672 from ArbitCode/wip-raja-get-caller-identity

rgw/sts: GetCallerIdentity API

3 months agoMerge pull request #64549 from joscollin/wip-B65770-imported-exported-counters-failed...
Jos Collin [Fri, 18 Jul 2025 13:01:02 +0000 (18:31 +0530)]
Merge pull request #64549 from joscollin/wip-B65770-imported-exported-counters-failed-to-set

qa: increase the randomness to trigger the directory import/export

Reviewed-by: Venky Shankar <vshankar@redhat.com>
3 months agoMerge pull request #63128 from kshtsk/wip-backport-create-issue-syntax
kyr [Fri, 18 Jul 2025 11:09:01 +0000 (13:09 +0200)]
Merge pull request #63128 from kshtsk/wip-backport-create-issue-syntax

script/backport-create-issue: fix the syntax warning

3 months agoMerge pull request #64363 from Hezko/nvmeof-cli-aviv-feedback
afreen23 [Fri, 18 Jul 2025 01:04:34 +0000 (06:34 +0530)]
Merge pull request #64363 from Hezko/nvmeof-cli-aviv-feedback

mgr/dashboard: nvmeof cli feedback fixes

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 months agoMerge pull request #64186 from tinchee/bug_71262
SrinivasaBharathKanta [Thu, 17 Jul 2025 23:53:18 +0000 (05:23 +0530)]
Merge pull request #64186 from tinchee/bug_71262

mon/MgrMonitor: add a space before "is already disabled"

3 months agoMerge pull request #64016 from bill-scales/issue70818
SrinivasaBharathKanta [Thu, 17 Jul 2025 23:52:38 +0000 (05:22 +0530)]
Merge pull request #64016 from bill-scales/issue70818

qa: get_rand_pg_acting_set needs to wait for pool to create PGs

3 months agoMerge pull request #64003 from NitzanMordhai/wip-nitzan-perfcount-latency-overflow
SrinivasaBharathKanta [Thu, 17 Jul 2025 23:51:58 +0000 (05:21 +0530)]
Merge pull request #64003 from NitzanMordhai/wip-nitzan-perfcount-latency-overflow

Paxos: use mono clock for latency calculate in latency perfcount

3 months agoMerge pull request #62951 from phlogistonjohn/jjm-pe-crypto-plus
Adam King [Thu, 17 Jul 2025 20:36:26 +0000 (16:36 -0400)]
Merge pull request #62951 from phlogistonjohn/jjm-pe-crypto-plus

mgr:python: avoid pyo3 errors by running certain cryptographic functions in a child process

Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Paulo E. Castro <pecastro@wormholenet.com>
3 months agoMerge pull request #63890 from shreya-subramanian/bug_fix
Samuel Just [Thu, 17 Jul 2025 20:05:26 +0000 (13:05 -0700)]
Merge pull request #63890 from shreya-subramanian/bug_fix

mon: paxos_service_min_trim bug fix, ceph tracker issue: 71610

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@ibm.com>
3 months agoMerge pull request #64560 from gbregman/main
Gil Bregman [Thu, 17 Jul 2025 18:51:53 +0000 (21:51 +0300)]
Merge pull request #64560 from gbregman/main

imgr/cephadm/nvmeof: Add "force TLS" flag to NVMeOF spec file.

3 months agoqa/rgw: remove hadoop-s3a subsuite
Casey Bodley [Thu, 17 Jul 2025 17:06:01 +0000 (13:06 -0400)]
qa/rgw: remove hadoop-s3a subsuite

this suite hasn't provided much benefit since it was added, and is
becoming more of a maintenance burden recently:
* https://tracker.ceph.com/issues/71584
* https://tracker.ceph.com/issues/72179

remove the subsuite and its s3a_hadoop.py task

Signed-off-by: Casey Bodley <cbodley@redhat.com>
3 months agoosd/scrub: allow auto-repair on operator-initiated scrubs
Ronen Friedman [Thu, 17 Jul 2025 16:59:00 +0000 (11:59 -0500)]
osd/scrub: allow auto-repair on operator-initiated scrubs

Previously, operator-initiated scrubs would not auto-repair, regardless
of the value of the 'osd_scrub_auto_repair' config option.  This was
less confusing to the operator than it could have been, as most
operator commands would in fact cause a regular periodic scrub
to be initiated. However, that quirk is now fixed: operator commands
now trigger 'op-initiated' scrubs. Thus the need for this patch.

The original bug was fixed in https://github.com/ceph/ceph/pull/54615,
but was unfortunately re-introduced later on.
Fixes: https://tracker.ceph.com/issues/72178
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
3 months agoMerge branch 'ceph:main' into main
Gil Bregman [Thu, 17 Jul 2025 15:39:41 +0000 (18:39 +0300)]
Merge branch 'ceph:main' into main

3 months agoMerge pull request #63731 from matt-akamai/lc_ordered_listing
Casey Bodley [Thu, 17 Jul 2025 15:12:01 +0000 (11:12 -0400)]
Merge pull request #63731 from matt-akamai/lc_ordered_listing

rgw: allow lc listing order to be configurable

Reviewed-by: Casey Bodley <cbodley@redhat.com>
3 months agoMerge pull request #64272 from linuxbox2/wip-matt-70853
Matt Benjamin [Thu, 17 Jul 2025 12:57:18 +0000 (08:57 -0400)]
Merge pull request #64272 from linuxbox2/wip-matt-70853

rgwlc: fix removal of delete markers (SAL)

3 months agoimprove error handling and add --resume flag to 'ceph rgw realm bootstrap' for partia...
Kushal Deb [Wed, 21 May 2025 10:01:06 +0000 (15:31 +0530)]
improve error handling and add --resume flag to 'ceph rgw realm bootstrap' for partial recovery

This patch enhances the `ceph rgw realm bootstrap` command by improving error
messaging and introducing a `--resume` flag to support recovery from partial
bootstrap failures.

Signed-off-by: Kushal Deb <Kushal.Deb@ibm.com>
3 months agolibrbd: Clean up usage of IoCtx
Miki Patel [Thu, 17 Jul 2025 09:44:53 +0000 (15:14 +0530)]
librbd: Clean up usage of IoCtx

Clean up of librbd::IoCtx to librados::IoCtx in Group.cc

Signed-off-by: Miki Patel <miki.patel132@gmail.com>
3 months agoimgr/cephadm/nvmeof: Add "force TLS" flag to NVMeOF spec file.
Gil Bregman [Thu, 17 Jul 2025 11:29:34 +0000 (14:29 +0300)]
imgr/cephadm/nvmeof: Add "force TLS" flag to NVMeOF spec file.

Fixes: https://tracker.ceph.com/issues/72172
Signed-off-by: Gil Bregman <gbregman@il.ibm.com>
3 months agoMerge pull request #64516 from Hezko/nvmeof-cli-size-convert3
afreen23 [Thu, 17 Jul 2025 11:13:08 +0000 (16:43 +0530)]
Merge pull request #64516 from Hezko/nvmeof-cli-size-convert3

mgr/dashboard: support human friendly size parameter split commands to separate api and cli functions

Reviewed-by: Afreen Misbah <afreen@ibm.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
3 months agoMerge pull request #64346 from nbalacha/wip-nbalacha-options-typos
nbalacha [Thu, 17 Jul 2025 10:38:24 +0000 (16:08 +0530)]
Merge pull request #64346 from nbalacha/wip-nbalacha-options-typos

options: fix typos

3 months agomgr/dashboard: nvmeof cli rename ns to namespace, fixes for text responses, subsys...
Tomer Haskalovitch [Sun, 6 Jul 2025 20:15:50 +0000 (23:15 +0300)]
mgr/dashboard: nvmeof cli rename ns to namespace, fixes for text responses, subsys add params

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
3 months agolibrbd: images aren't closed in group_snap_*_by_record() on error
Miki Patel [Tue, 15 Jul 2025 11:07:16 +0000 (16:37 +0530)]
librbd: images aren't closed in group_snap_*_by_record() on error

Fixes memory leak and handles resource leak scenario when at leat one IoCtx is not
created successfully. This is done by returning error before opening any image.
Changes are made in group_snap_remove_by_record and group_snap_rollback_by_record

Fixes: https://tracker.ceph.com/issues/71961
Signed-off-by: Miki Patel <miki.patel132@gmail.com>
3 months agoMerge pull request #62985 from connorfawcett/wip-exerciser-consistency-2604
Connor Fawcett [Thu, 17 Jul 2025 09:03:56 +0000 (10:03 +0100)]
Merge pull request #62985 from connorfawcett/wip-exerciser-consistency-2604

common/io_exerciser: Add consistency checking functionality to IO exerciser

3 months agoMerge pull request #64448 from nbalacha/wip-nbalacha-typo-1
nbalacha [Thu, 17 Jul 2025 08:41:58 +0000 (14:11 +0530)]
Merge pull request #64448 from nbalacha/wip-nbalacha-typo-1

rgw: fix typos in log messages

3 months agorgw/test/notification: add more info when retry test fail
Yuval Lifshitz [Wed, 16 Jul 2025 14:51:22 +0000 (14:51 +0000)]
rgw/test/notification: add more info when retry test fail

Signed-off-by: Yuval Lifshitz <ylifshit@ibm.com>
3 months agoMerge pull request #63421 from rhcs-dashboard/storageClass-LocalGlacier
Nizamudeen A [Thu, 17 Jul 2025 06:22:35 +0000 (11:52 +0530)]
Merge pull request #63421 from rhcs-dashboard/storageClass-LocalGlacier

mgr/dashboard: Glacier Storage Class - create and list

3 months agoqa: increase the randomness to trigger the directory import/export
Jos Collin [Wed, 16 Jul 2025 10:02:26 +0000 (15:32 +0530)]
qa: increase the randomness to trigger the directory import/export

Fixes: https://tracker.ceph.com/issues/65770
Signed-off-by: Jos Collin <jcollin@redhat.com>
3 months agoMerge pull request #64537 from bluikko/doc-bucket-logging-formatting-radosgw
Anthony D'Atri [Thu, 17 Jul 2025 04:06:03 +0000 (00:06 -0400)]
Merge pull request #64537 from bluikko/doc-bucket-logging-formatting-radosgw

doc/radosgw: Improve formatting consistency and language in bucket_logging.rst

3 months agoMerge pull request #64532 from zdover23/wip-doc-2025-07-16-radosgw-config-ref
Zac Dover [Thu, 17 Jul 2025 03:32:05 +0000 (13:32 +1000)]
Merge pull request #64532 from zdover23/wip-doc-2025-07-16-radosgw-config-ref

doc/radosgw: edit "Lifecycle Settings"

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 months agoAdd a new flag to the IO exerciser which enables live consistency
Connor Fawcett [Wed, 23 Apr 2025 14:45:24 +0000 (15:45 +0100)]
Add a new flag to the IO exerciser which enables live consistency
checking during IO sequences (disabled by default).

Signed-off-by: Connor Fawcett <connorfa@uk.ibm.com>
3 months agoGet chunk size for consistency checking from EC profile instead of command line arg
Connor Fawcett [Thu, 17 Jul 2025 01:19:16 +0000 (02:19 +0100)]
Get chunk size for consistency checking from EC profile instead of command line arg

Signed-off-by: Connor Fawcett <connorfa@uk.ibm.com>
3 months agoMerge pull request #64524 from tchaikov/wip-function2-alignas
Kefu Chai [Wed, 16 Jul 2025 22:51:56 +0000 (06:51 +0800)]
Merge pull request #64524 from tchaikov/wip-function2-alignas

include/function2.hpp: avoid using std::aligned_storage_t

Reviewed-by: Adam Emerson <aemerson@redhat.com>
3 months agobuild-integration-branch: allow setting git trailer on final commit
John Mulligan [Wed, 16 Jul 2025 18:07:33 +0000 (14:07 -0400)]
build-integration-branch: allow setting git trailer on final commit

After the last commit is made, provide a simple mechanism for adding
git trailers to the commit message. The git trailers [1] are metadata
that tools may make use of. In particular, we add a few of the
trailers documented by ceph-build here [2] as well as allowing
for arbitrary trailers for future changes (before this code can
be updated), advanced trailer, or other unrelated purposes.

[1] https://www.alchemists.io/articles/git_trailers
[2] https://github.com/ceph/ceph-build/tree/main/ceph-trigger-build

Signed-off-by: John Mulligan <jmulligan@redhat.com>
3 months agobuild-integration-branch: convert to argparse
John Mulligan [Wed, 16 Jul 2025 17:39:27 +0000 (13:39 -0400)]
build-integration-branch: convert to argparse

Convert build-integration-branch to use the stdlib argparse module.

Argparse is:
* Part of the python standard library and available since 3.2
* Well documented as a stdlib component
* Widely used
* Fairly simple and direct

docopt is:
* Clever
* Not documented as a dependency of this script (so I bet most users
  are relying on the fallback behavior)
* Of questionable maintenance status with:
  - No releases since 2014
  - Only four PRs merged since 2019
  - Last merged PR was merged, recommending an alternate repo, and then
    disappeared from the commit history of the master branch, indicating
    a possible maintainership/status discrepancy
  - Only a couple of commits merged since 2018 (visible on github)
* In my opinion: not particularly ergonomic esp. wrt dictionary based
  key access

I feel pretty comfortable making this conversion as I think it will
make the script easier to maintain and extend.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
3 months agoMerge pull request #64157 from concubidated/wip-doc-balancer
Josh Durgin [Wed, 16 Jul 2025 18:33:11 +0000 (11:33 -0700)]
Merge pull request #64157 from concubidated/wip-doc-balancer

doc: Fixes a typo in balancer operations

Reviewed-by: Josh Durgin <jdurgin@redhat.com>
3 months agomgr/dashboard: split ns add to separate api and cli functions
Tomer Haskalovitch [Mon, 14 Jul 2025 18:53:30 +0000 (21:53 +0300)]
mgr/dashboard: split ns add to separate api and cli functions

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>