]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
2 weeks agoosd: Fix issue where not all shards are receiving setattr when it's sent to an object...
Jon Bailey [Thu, 3 Jul 2025 13:24:41 +0000 (14:24 +0100)]
osd: Fix issue where not all shards are receiving setattr when it's sent to an object with the whiteout flag set.

Signed-off-by: Jon Bailey <jonathan.bailey1@ibm.com>
(cherry picked from commit 89fef784aa46e74dd05ef8f1bff16f357016dfc3)

2 weeks agoosd: Relax assertion that all recoveries require a read.
Alex Ainscow [Tue, 1 Jul 2025 14:51:58 +0000 (15:51 +0100)]
osd: Relax assertion that all recoveries require a read.

If multiple object are being read as part of the same recovery (this happens
when recovering some snapshots) and a read fails, then some reads from other
shards will be necessary.  However, some objects may not need to read. In this
case it is only important that at least one read message is sent, rather than
one read message per object is sent.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 9f9ea6ddd38ebf6ae7159855267e61858bb2b7fc)

2 weeks agoosd: Recovery of zero length reads when we add a new OSD without an interval.
Alex Ainscow [Tue, 1 Jul 2025 14:49:20 +0000 (15:49 +0100)]
osd: Recovery of zero length reads when we add a new OSD without an interval.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 3493d13d733454bb75616c628e25b2fa94dcb400)

2 weeks agoosd: Relax PGLog assert when ec optimisations are enabled on a pool.
Alex Ainscow [Mon, 30 Jun 2025 13:31:21 +0000 (14:31 +0100)]
osd: Relax PGLog assert when ec optimisations are enabled on a pool.

The versions on partial shards are permitted to be behind, so we need
to relax several asserts, this is another example.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 0c89e7ef2ab48199ee3f7296cf1cb44c9aeec667)

2 weeks agoosd: Truncate coding shards to minimal size
Alex Ainscow [Sun, 29 Jun 2025 21:54:51 +0000 (22:54 +0100)]
osd: Truncate coding shards to minimal size

Scrub detected a bug where if an object was truncated to a size where the first
shard is smaller than the chunk size (only possible for >4k chunk sizes), then
the coding shards were being aligned to the chunk size, rather than to 4k.

This fixes changes how the write plan is calculated to write the correct size.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit a39a309631482b0caa071d586f192cd19a7ae470)

2 weeks agoosd: EC Optimizations fix peering bug causing unfound objects
Bill Scales [Fri, 27 Jun 2025 12:35:58 +0000 (13:35 +0100)]
osd: EC Optimizations fix peering bug causing unfound objects

Fix some unusual scenarios where peering was incorrectly
declaring that objects were missing on stray shards. When
proc_master_log rolls forward partial writes it need to
update pwlc exactly the same way as if the write had been
completed. This ensures that stray shards that were not
updated because of partial writes do not cause objects
to be incorrectly marked as missing.

The fix also means some code in GetMissing which was trying
to do a similar thing for shards that were acting,
recovering or backfilling (but not stray) can be deleted.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 83a9c0a9f8e9ed4f514adb32f1ae2df1602c3f88)

2 weeks agoosdc: Optimized EC pools routing bug
Bill Scales [Wed, 25 Jun 2025 09:26:17 +0000 (10:26 +0100)]
osdc: Optimized EC pools routing bug

Fix bug with routing to an acting set like [None,Y,X,X]p(X)
for a 3+1 optimzed pool where osd X is representing more
than one shard. For an optimized EC pool we want it to
choose shard 3 because shard 2 is a non-primary. If we
just search the acting set for the first OSD that matches
X this will pick shard 2, so we have to convert the order
to primary's first, then find the matching OSD and then
convert this back to the normal ordering to get shard 3.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 3310f97859109090706b84824cac2f8a6cfe6928)

2 weeks agomon: Optimized EC clean_temps needs to permit primary change
Bill Scales [Mon, 23 Jun 2025 10:36:37 +0000 (11:36 +0100)]
mon: Optimized EC clean_temps needs to permit primary change

Optimized EC pools were blocking clean_temps from clearing pg_temp
when up == acting but up_primary != acting_primary because optimized
pools sometimes use pg_temp to force a change of primary shard.

However this can block merges which require the two PGs being
merged to have the same primary. Relax clean_temps to permit
pg_temp to be cleared so long as the new primary is not a
non-primary shard.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit ce53276191e60375486f75d93508690f780bee21)

2 weeks agoosd: Optimized EC pools - fix overaggressive assert in read_log_and_missing
Bill Scales [Mon, 23 Jun 2025 09:24:17 +0000 (10:24 +0100)]
osd: Optimized EC pools - fix overaggressive assert in read_log_and_missing

Non-primary shards may not be updated because of partial writes. This means
that the OI verison for an object on these shards may be stale. An assert
in read_log_and_missing was checking that the OI version matched the have
version in a missing entry. The missing entry calculates the have version
using the prior_version from a log entry, this does not take into account
partial writes so can be ahead of the stale OI version.

Relax the assert for optimized pools to require have >= oi.version

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 74e138a7c1f8b7e375568c6811a60f6bdad181b3)

2 weeks agoosd: rewind_divergent_log needs to dirty log if crt changes or ...
Bill Scales [Mon, 23 Jun 2025 09:12:10 +0000 (10:12 +0100)]
osd: rewind_divergent_log needs to dirty log if crt changes or ...
rollback_info_trimmed_to changes

PGLog::rewind_divergent_log was only causing the log to be marked
dirty and checkpointed if there were divergent entries. However
after a PG split it is possible that the log can be rewound
modifying crt and/or rollback_info_trimmed_to without creating
divergent entries because the entries being rolled back were
all split into the other PG.

Failing to checkpoint the log generates a window where if the OSD
is reset you can end up with crt (and rollback_info_trimmed_to) > head.
One consequence of this is asserts like
ceph_assert(rollback_info_trimmed_to == head); firing.

Fixes: https://tracker.ceph.com/issues/55141
Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d8f78adf85f8cb11deeae3683a28db92046779b5)

2 weeks agoosd: Correct truncate logic for new EC
Alex Ainscow [Fri, 20 Jun 2025 20:47:32 +0000 (21:47 +0100)]
osd: Correct truncate logic for new EC

The clone logic in the truncate was only cloning from the truncate
to the end of the pre-truncate object. If the next shard was being
truncated to a shorter length (which is common), then this shard
has a larger clone.

The rollback, however, can only be given a single range, so it was
given a range which covers all clones.  The problem is that if shard
0 is rolled back, then some empty space from the clone was copied
to shard 0.

Fix is easy - calculate the full clone length and apply to all shards, so it matches the rollback.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 5d7588c051b31098c9970877ab6a784967ff94c8)

2 weeks agoosd: Fix incorrect invalidate_crc during slice iterate
Alex Ainscow [Fri, 20 Jun 2025 10:48:59 +0000 (11:48 +0100)]
osd: Fix incorrect invalidate_crc during slice iterate

The CRCs were being invalidate at the wrong point, so the last CRC was
not being invalidated.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 564b53c446201ed33b5345c936c3c4b5d32bdaab)

2 weeks agoosd: Do not apply log entry if shard not written
Bill Scales [Thu, 19 Jun 2025 13:26:04 +0000 (14:26 +0100)]
osd: Do not apply log entry if shard not written

This was a failed test, where the primary concluded that all objects were present
despite one missing object on the non primary shard.

The problem was caused because the log entries are sent to the unwritten shards if that
shard is missing in order to update the version number in the missing object. However,
the log entry should not actually be added to the log.

Further testing showed there are other scenarios where log entries are sent to
unwritten shards (for example a clone + partial_write in the same transaction),
these scenarios do not want to add the log entry either.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 24cd772f2099aa5f7dfeb7609522f770d0ae1115)

2 weeks agoosd: EC Optimizations proc_replica_log needs to apply pwlc
Bill Scales [Thu, 19 Jun 2025 12:41:17 +0000 (13:41 +0100)]
osd: EC Optimizations proc_replica_log needs to apply pwlc

PeeringState::proc_replica_log needs to apply pwlc before
calling PGLog so that any partial writes that have occurred
are taken into account when working out where a replica/stray
has diverged from the primary.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit 6c3c0a88b68e2548df670dbe9797d54f89259398)

2 weeks agoosd: Multiple Decode fixes.
Alex Ainscow [Wed, 18 Jun 2025 19:46:49 +0000 (20:46 +0100)]
osd: Multiple Decode fixes.

Fix 1:

These are multiple fixes that affected the same code. To simplify review
and understanding of the changes, they have been merged into a single
commit.

What happened in defect is (k,m = 6,4)

1. State is: fast_reads = true, shards 0,4,5,6,7,8 are available. Shard 1 is missing this object.
2. Shard 5 only needs zeros, so read is dropped. Other sub read message sent.
3. Object on shard 1 completes recovery (so becomes not-missing)
4. Read completes, complete notices that it only has 5 reads, so calculates what it needs to re-read.
5. Calculates it needs 0,1,4,5,6,7 - and so wants to read shard 1.
6. Code assumes that enough reads should have been performed, so refused to do another reads and instead generates an EIO.

The problem here is some "lazy" code in step (4).  What is should be doing is working out that it
can use the zero buffers and not calling get_remaining_reads().  Instead, what it attempts to do is
call get_remaining_reads() and if there is no work to do, then it assumes it has everything
already and completes the read with success.  This assumption mostly works - but in this
combination of fast_reads picking less than k shards to read from AND an object completing
recovery in parallel causes issue.

The solution is to wait for all reads to complete and then assume that any remaining zero buffers
count as completed reads.  This should then cause the plugin to declare "success"

Fix 2:

There are decodes in new EC which can occur when less than k
shards have been read.  These reads in the last stripe, where
for decoding purposes, the data past the end of the shard can
be considered zeros. EC does not read these, but instead relies
on the decode function inventing the zero buffers.

This was working correctly when fast reads were turned off, but
if such an IO was encountered with fast reads turned on the
logic was disabled and the IO returns an EIO.

This commit fixes that logic, so that if all reads have complete
and send_all_remaining_reads conveys that no new reads were
requested, then decode will still be possible.

FIX 3:

If reading the end of an object with unequally sized objects,
we pad out the end of the decode with zeros, to provide
the correct data to the plugin.

Previously, the code decided not to add the zeros to "needed"
shards.  This caused a problem where for some parity-only
decodes, an incomplete set of zeros was generated, fooling the
_decode function into thinking that the entire shard was zeros.

In the fix, we need to cope with the case where the only data
needed from the shard is the padding itself.

The comments around the new code describe the logic behind
the change.

This makes the encode-specific use case of padding out the
to-be-decoded shards unnecessary, as this is performed by the
pad_on_shards function below.

Also fixing some logic in calculating the need_set being passed
to the decode function did not contain the extra shards needed
for the decode. This need_set is actually ignored by all the
plugins as far as I know, but being wrong does not seem
helpful if its needed in the future.

Fix 4: Extend reads when recovering parity

Here is an example use case which was going wrong:
1. Start with 3+2 EC, shards 0,3,4 are 8k shard 1,2 is 4k
2. Perform a recovery, where we recover 2 and 4.  2 is missing, 4 can be copied from another OSD.
3. Recovery works out that it can do the whole recovery with shards 0,1,3. (note not 4)
4. So the "need" set is 0,1,3, the "want" set is 2,4 and the "have" set is 0,1,3,4,5
5. The logic in get_all_avail_shards then tries to work out the extents it needs - it only. looks at 2, because we "have" 4
6. Result is that we end up reading 4k on 0,1,3, then attempt to recover 8k on shard 4 from this... which clearly does not work.

Fix 5: Round up padding to 4k alignment in EC

The pad_on_shards was not aligning to 4k.  However, the decode/encode functions were. This meant that
we assigned a new buffer, then added another after - this should be faster.

Fix 6: Do not invent encode buffers before doing decode.

In this bug, during recovery, we could potentially be creating
unwanted encode buffers and using them to decode data buffers.

This fix simply removes the bad code, as there is new code above
which is already doing the correct action.

Fix 7: Fix miscompare with missing decodes.

In this case, two OSDs failed at once. One was replaced and the other was not.

This caused us to attempt to encode a missing shard while another shard was missing, which
caused a miscompare because the recovery failed to do the decode properly before doing an encode.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit c116b8615d68a3926dc78a4965cc0a28ff85d4f2)

2 weeks agoosd: Optimized EC pools bug fix when repeating GetLog
Bill Scales [Wed, 18 Jun 2025 11:11:51 +0000 (12:11 +0100)]
osd: Optimized EC pools bug fix when repeating GetLog

When the primary shard of an optimized EC pool does not have
a copy of the log it may need to repeat the GetLog peering
step twice, the first time to get a full copy of a log from
a shard that sees all log entries and then a second time
to get the "best" log from a nonprimary shard which may
have a partial log due to partial writes.

A side effect of repeating GetLog is that the missing
list is collected for both the "best" shard and the
shard that provides a full copy of the log. This later
missing list confuses later steps in the peering
process and may cause this shard to complete writes
and end up diverging from the primary. Discarding
this missing list causes Peering to behave the same as if
the GetLog step did not need to be repeated.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit d2ba0f932c61746b51d0d427056a53e24db6ea0f)

2 weeks agoosd: Fix attribute recover in rare recovery scenario
Alex Ainscow [Wed, 11 Jun 2025 15:30:40 +0000 (16:30 +0100)]
osd: Fix attribute recover in rare recovery scenario

When recovering attributes, we read them from the first potential primary, then
if that read failures, attempt to read from another potential primary.

The problem is that the code which calculates which shards to read for a recovery
only takes into account *data* and not where the attributes are.  As such, if the
second read only required a non-primary, then the attribute read fails and the
OSD panics.

The fix is to detect this scenario and perform an empty read to that shard, which
the attribute-read code can use for attribute reads.

Code was incorrectly interpreting a failed attribute read on recovery as
meaning a "fast_read". Also, no attribute recovery would occur in this case.

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 98eae78f7629295800cb7dbb252cac7d0feff680)

2 weeks agoosd: code clean up and debug in optimised EC
Alex Ainscow [Wed, 11 Jun 2025 15:23:08 +0000 (16:23 +0100)]
osd: code clean up and debug in optimised EC

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 1352868cec8d644e5fff68df7050c52bb4ed7e65)

2 weeks agoosd: EC Optimizations fix bugs in applying pwlc to update info and log
Bill Scales [Wed, 11 Jun 2025 14:53:48 +0000 (15:53 +0100)]
osd: EC Optimizations fix bugs in applying pwlc to update info and log

1. Refactor the code that applies pwlc to update info and log so that there
is one function rather than multiple copies of the code.

2. pwlc information is used to track shards that have not been updated by
partial writes. It is used to advance last_complete (and last_update and
the log head) to account for log entries that the shard missed. It was
only being applied if last_complete matched the range of partial writes
recorded in pwlc. When a shard has missing objects last_complete is
deliberately held before the oldest need, this stops pwlc being applied.
This is wrong - pwlc can still try and update last update and the log
head even if it cannot update last_complete.

3. When the primary receives info (and pwlc) information from OSD x(y)
it uses the pwlc information to update x(y)'s info. During backfill
there may be other shards z(y) which should also be updated using the
pwlc information.

Signed-off-by: Bill Scales <bill_scales@uk.ibm.com>
(cherry picked from commit c6d2bb9f6479767927ef01f6e2871dc1bb6d0f54)

2 weeks agoMerge pull request #64892 from aainscow/wip-72369-tentacle
Alex Ainscow [Sun, 7 Sep 2025 22:57:39 +0000 (23:57 +0100)]
Merge pull request #64892 from aainscow/wip-72369-tentacle

tentacle: OSD: EC recovery zero detect

Reviewed-by: Laura Flores <lajefl@gmail.com>
2 weeks agoMerge pull request #65416 from ceph/fix-api-tests-tentacle
David Galloway [Fri, 5 Sep 2025 22:11:41 +0000 (18:11 -0400)]
Merge pull request #65416 from ceph/fix-api-tests-tentacle

tentacle: Use teuthology's actual requirements

2 weeks agopybind/mgr/dashboard: Use teuthology's actual requirements 65416/head
David Galloway [Fri, 5 Sep 2025 17:58:43 +0000 (13:58 -0400)]
pybind/mgr/dashboard: Use teuthology's actual requirements

Signed-off-by: David Galloway <david.galloway@ibm.com>
(cherry picked from commit 22a87d959bca74478de1e2d9f86859676385491d)

3 weeks agoMerge pull request #65397 from ceph/wip-uadk-tentacle-fix
David Galloway [Thu, 4 Sep 2025 21:13:04 +0000 (17:13 -0400)]
Merge pull request #65397 from ceph/wip-uadk-tentacle-fix

tentacle: uadk: Build with ceph fork (for FORTIFY_SOURCE fix)

3 weeks agoosd: Deduplicate zeros in EC slice iterator 64892/head
Alex Ainscow [Fri, 27 Jun 2025 15:00:56 +0000 (16:00 +0100)]
osd: Deduplicate zeros in EC slice iterator

Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit 06658fdac16dde95d20a8907511afb7fde7313da)
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
3 weeks agobuffer: raw_zeros mprotects the zeros
Radoslaw Zarzynski [Wed, 4 Sep 2024 09:04:47 +0000 (09:04 +0000)]
buffer: raw_zeros mprotects the zeros

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
(cherry picked from commit e928779faa4157cb5f93d87f38c0c188c5ba4257)

3 weeks agobuffer, test: bl::append_zero2() deduplicates zeros, introduce bptr::is_zero_fast()
Radoslaw Zarzynski [Tue, 3 Sep 2024 14:38:54 +0000 (14:38 +0000)]
buffer, test: bl::append_zero2() deduplicates zeros, introduce bptr::is_zero_fast()

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Signed-off-by: Alex Ainscow <aainscow@uk.ibm.com>
(cherry picked from commit efcac634185dd82b31755a06c7fcc26d9baa1e2c)

3 weeks agoMerge pull request #64814 from ifed01/wip-ifed-compress-use-isal-ten
Yuri Weinstein [Thu, 4 Sep 2025 19:42:02 +0000 (12:42 -0700)]
Merge pull request #64814 from ifed01/wip-ifed-compress-use-isal-ten

tentacle: compressor: compressor_zlib_isal did not take effect in compression

Reviewed-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
3 weeks agouadk: Build with ceph fork (for FORTIFY_SOURCE fix) 65397/head
David Galloway [Thu, 4 Sep 2025 15:17:55 +0000 (11:17 -0400)]
uadk: Build with ceph fork (for FORTIFY_SOURCE fix)

See https://github.com/ceph/ceph/pull/65371

Signed-off-by: David Galloway <david.galloway@ibm.com>
(cherry picked from commit c52e361fdd311b54b34a83c4ca0e826e2d360087)

3 weeks agoMerge pull request #65342 from dmick/wip-72821-tentacle
Yuri Weinstein [Thu, 4 Sep 2025 15:24:33 +0000 (08:24 -0700)]
Merge pull request #65342 from dmick/wip-72821-tentacle

tentacle: Fix uadk build (arm64 only) on debian (conflict with DESTDIR)

Reviewed-by: David Galloway <dgallowa@redhat.com>
Reviewed-by: Yuri Weinstein <yweinste@redhat.com>
3 weeks agoMerge pull request #65378 from zdover23/wip-doc-2025-09-04-backport-65325-to-tentacle
Zac Dover [Thu, 4 Sep 2025 03:51:58 +0000 (13:51 +1000)]
Merge pull request #65378 from zdover23/wip-doc-2025-09-04-backport-65325-to-tentacle

tentacle: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agocmake/modules/Builduadk.cmake: fix build 65342/head
Dan Mick [Sat, 30 Aug 2025 07:40:24 +0000 (00:40 -0700)]
cmake/modules/Builduadk.cmake: fix build

DESTDIR should not be set when building.

Fixes: https://tracker.ceph.com/issues/72722
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 1c14d3cd5319f219d72c05a5e6ff745a564b7235)

3 weeks agocmake/modules/Builduadk.cmake: fix tabs/spaces
Dan Mick [Sat, 30 Aug 2025 07:34:55 +0000 (00:34 -0700)]
cmake/modules/Builduadk.cmake: fix tabs/spaces

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 1cd5aa4e03088576f69410cc77f1772990fdfbb1)

3 weeks agodoc/cephfs: edit troubleshooting.rst 65378/head
Zac Dover [Tue, 2 Sep 2025 00:31:41 +0000 (10:31 +1000)]
doc/cephfs: edit troubleshooting.rst

Update the "Disconnected+Remounted FS" section in
doc/cephfs/troubleshooting.rst, as suggested by Venky Shankar in https://github.com/ceph/ceph/pull/65129/files#r2312903062

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit f4b40422fefaa993441396a5c31fbfd3d8714595)

3 weeks agoMerge pull request #65377 from ceph/wip-tentacle-65371
David Galloway [Thu, 4 Sep 2025 02:12:34 +0000 (22:12 -0400)]
Merge pull request #65377 from ceph/wip-tentacle-65371

tentacle: cmake: remove _FORTIFY_SOURCE define

3 weeks agocmake: remove _FORTIFY_SOURCE define 65377/head
Casey Bodley [Wed, 3 Sep 2025 17:22:30 +0000 (13:22 -0400)]
cmake: remove _FORTIFY_SOURCE define

according to `dpkg-buildflags`, ubuntu 24 raised this value to
`-D_FORTIFY_SOURCE=3` which causes `error: "_FORTIFY_SOURCE" redefined`
compilation failures because Ceph itself adds `-D_FORTIFY_SOURCE=2`

`_FORTIFY_SOURCE` is a hardening option. both our rpm and debian builds
already specify that via environment variables, so Ceph's cmake should
leave it alone

Fixes: https://tracker.ceph.com/issues/72361
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 66bec97b0dc90b91f5be586351f52082beb6374a)

3 weeks agoMerge pull request #65248 from ceph/tentacle-pipeline-backports
David Galloway [Wed, 3 Sep 2025 13:14:39 +0000 (09:14 -0400)]
Merge pull request #65248 from ceph/tentacle-pipeline-backports

tentacle: Recent pipeline backports

3 weeks agoMerge pull request #64925 from NitzanMordhai/wip-72213-tentacle
SrinivasaBharathKanta [Mon, 1 Sep 2025 13:10:56 +0000 (18:40 +0530)]
Merge pull request #64925 from NitzanMordhai/wip-72213-tentacle

tentacle: msg: drain stack before stopping processors to avoid shutdown hang

3 weeks agoMerge pull request #65220 from Hezko/wip-72719-tentacle
afreen23 [Mon, 1 Sep 2025 07:52:49 +0000 (13:22 +0530)]
Merge pull request #65220 from Hezko/wip-72719-tentacle

tentacle: mgr/dashboard: catch broader exception to show relevant cli output

Reviewed-by: Afreen Misbah <afreen@ibm.com>
3 weeks agoMerge pull request #65221 from Hezko/wip-72720-tentacle
afreen23 [Mon, 1 Sep 2025 07:52:27 +0000 (13:22 +0530)]
Merge pull request #65221 from Hezko/wip-72720-tentacle

tentacle: mgr/dashboard: fix missing gw group error

Reviewed-by: Afreen Misbah <afreen@ibm.com>
3 weeks agoMerge pull request #65205 from zdover23/wip-doc-2025-08-26-backport-64074-to-tentacle
Zac Dover [Mon, 1 Sep 2025 04:29:00 +0000 (14:29 +1000)]
Merge pull request #65205 from zdover23/wip-doc-2025-08-26-backport-64074-to-tentacle

tentacle: doc/rados/configuration: Mention show-with-defaults and ceph-conf

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agoMerge pull request #65210 from zdover23/wip-doc-2025-08-26-backport-65180-to-tentacle
Zac Dover [Mon, 1 Sep 2025 04:28:26 +0000 (14:28 +1000)]
Merge pull request #65210 from zdover23/wip-doc-2025-08-26-backport-65180-to-tentacle

tentacle: doc/dev:update blkin.rst doc for lttng trace

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agoMerge pull request #65237 from zdover23/wip-doc-2025-08-26-backport-65230-to-tentacle
Zac Dover [Mon, 1 Sep 2025 04:27:52 +0000 (14:27 +1000)]
Merge pull request #65237 from zdover23/wip-doc-2025-08-26-backport-65230-to-tentacle

tentacle: doc/rados/operations: Improve health-checks.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
3 weeks agoMerge pull request #65310 from zdover23/wip-doc-2025-08-30-backport-8ff129c89-to...
Zac Dover [Mon, 1 Sep 2025 04:27:16 +0000 (14:27 +1000)]
Merge pull request #65310 from zdover23/wip-doc-2025-08-30-backport-8ff129c89-to-tentacle

tentacle: doc/dev/crimson: Update docs

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agodoc/dev/crimson: Update docs 65310/head
Matan Breizman [Wed, 13 Aug 2025 08:11:30 +0000 (08:11 +0000)]
doc/dev/crimson: Update docs

* CPU allocation missed some information and was confusing.
* Drop alienized term when possible
* introdice release/debug builds

Signed-off-by: Matan Breizman <mbreizma@redhat.com>
(cherry picked from commit 8ff129c89ffcd4831dcd9d8b8f0d49687cc57183)

4 weeks agoMerge pull request #65085 from rhcs-dashboard/wip-72614-tentacle
afreen23 [Fri, 29 Aug 2025 13:27:46 +0000 (18:57 +0530)]
Merge pull request #65085 from rhcs-dashboard/wip-72614-tentacle

tentacle : mgr/dashboard: Made expandable table row height consistent across the cluster

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoMerge pull request #64906 from ronen-fr/wip-rf-64859-tentacle
Ronen Friedman [Fri, 29 Aug 2025 13:03:20 +0000 (16:03 +0300)]
Merge pull request #64906 from ronen-fr/wip-rf-64859-tentacle

tentacle: osd/scrub: avoid using moved-from auth_n_errs

Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 weeks agoMerge pull request #65282 from afreen23/wip-72758-tentacle
afreen23 [Fri, 29 Aug 2025 13:00:48 +0000 (18:30 +0530)]
Merge pull request #65282 from afreen23/wip-72758-tentacle

tentacle: mgr/dashboard: Optimize css styles.css bundle

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
4 weeks agoMerge pull request #64898 from adamemerson/wip-70882-tentacle
SrinivasaBharathKanta [Thu, 28 Aug 2025 10:59:46 +0000 (16:29 +0530)]
Merge pull request #64898 from adamemerson/wip-70882-tentacle

tentacle: rgw/admin: Fix assert on datalog list of invalid shard

4 weeks agomgr/dashboard: Optimize css styles.css bundle 65282/head
Afreen Misbah [Tue, 26 Aug 2025 18:58:17 +0000 (00:28 +0530)]
mgr/dashboard: Optimize css styles.css bundle

-  compresses the css bundle before sending to browser
-  this improves LCP value value as well

Fixes https://tracker.ceph.com/issues/72742

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 164c1cf1cfc40607960a9b04eeb0d2644a1584aa)

4 weeks agomgr/dashboard: Made grafana iframe scrollable and height consistent 65085/head
Abhishek Desai [Fri, 25 Jul 2025 18:32:44 +0000 (00:02 +0530)]
mgr/dashboard: Made grafana iframe scrollable and height consistent
Fixes : https://tracker.ceph.com/issues/72044

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 64361708338a8cf97de7f7d4f2768c109a50c82f)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/cephfs/cephfs-list/cephfs-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/cephfs/cephfs-tabs/cephfs-tabs.component.html
src/pybind/mgr/dashboard/frontend/src/styles.scss
src/pybind/mgr/dashboard/frontend/src/app/ceph/cephfs/cephfs-list/cephfs-list.component.html
src/pybind/mgr/dashboard/frontend/src/app/ceph/cephfs/cephfs-tabs/cephfs-tabs.component.html

4 weeks agoMerge pull request #65200 from zdover23/wip-doc-2025-08-25-backport-65185-to-tentacle
Zac Dover [Wed, 27 Aug 2025 19:57:27 +0000 (05:57 +1000)]
Merge pull request #65200 from zdover23/wip-doc-2025-08-25-backport-65185-to-tentacle

tentacle: doc/cephfs: edit troubleshooting.rst (Slow MDS)

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agoMerge pull request #65182 from zdover23/wip-doc-2025-08-22-backport-64726-to-tentacle
Zac Dover [Wed, 27 Aug 2025 19:56:41 +0000 (05:56 +1000)]
Merge pull request #65182 from zdover23/wip-doc-2025-08-22-backport-64726-to-tentacle

tentacle: doc/man/8: Improve mount.ceph.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agoMerge pull request #65136 from zdover23/wip-doc-2025-08-20-backport-65128-to-tentacle
Zac Dover [Wed, 27 Aug 2025 19:56:17 +0000 (05:56 +1000)]
Merge pull request #65136 from zdover23/wip-doc-2025-08-20-backport-65128-to-tentacle

tentacle: doc/rados: repair short underline

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
4 weeks agoMerge pull request #65235 from afreen23/backport-status-api
afreen23 [Wed, 27 Aug 2025 11:40:57 +0000 (17:10 +0530)]
Merge pull request #65235 from afreen23/backport-status-api

tentacle: Backport of /health/snapshot api and its dependencies

Reviewed-by: Nizamudeen A <nia@redhat.com>
4 weeks agoMerge pull request #65241 from afreen23/wip-72733-tentacle
afreen23 [Wed, 27 Aug 2025 10:08:38 +0000 (15:38 +0530)]
Merge pull request #65241 from afreen23/wip-72733-tentacle

tentacle: mgr/dashboard: Dashboard nfs export editor rejects ipv6 addresses

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
4 weeks agomgr/dashboard: Remove .nx folder 65235/head
Afreen Misbah [Tue, 26 Aug 2025 13:59:10 +0000 (19:29 +0530)]
mgr/dashboard: Remove .nx folder
The nx folder was commited by mistake in PR #65107

Fixes https://tracker.ceph.com/issues/72730

Signed-off-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoRemove git clean -fdx 65248/head
Dan Mick [Tue, 26 Aug 2025 00:45:21 +0000 (17:45 -0700)]
Remove git clean -fdx

either
1) a source tarball is supplied, in which case the local dir is
   irrelevant, or
2) make-debs calls make-dist, which doesn't care about a dirty cwd

so it just punishes the unaware by removing things that they may
have wanted to keep.

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit ad529cce49c466daa02bb3b90804ff6a6ec548e8)

4 weeks agomake-debs.sh: invoke tar with --no-same-owner
Dan Mick [Sat, 23 Aug 2025 00:43:24 +0000 (17:43 -0700)]
make-debs.sh: invoke tar with --no-same-owner

When running as a normal user, tar does not attempt to preserve
owners set on the tar content files.  When running as root, it does.
Containerized builds are running as root.  Stop make-debs.sh from
trying to set other owners for files, and leaving files in the
host system with mapped UIDs other than the user running the container
(which causes jenkins to be unable to clear the workspace).

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 46c540444dd0dc4b4572e71ef452436a3b580d51)

4 weeks agomake-debs.sh: make "skip debug packages" conditional
Dan Mick [Thu, 21 Aug 2025 20:00:43 +0000 (13:00 -0700)]
make-debs.sh: make "skip debug packages" conditional

Now that we're using make-debs.sh as a builder inside containers,
the default should be to build all the packages, including debug.
(Also, fix a typo.)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit 86d6e931bd10bc15252d76aa58e4835a72742fcd)

4 weeks agouadk: build with ceph fork (fix for __DATE__ usage)
Dan Mick [Thu, 14 Aug 2025 19:04:39 +0000 (12:04 -0700)]
uadk: build with ceph fork (fix for __DATE__ usage)

Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit f5e160edde0334e17edf01d7171ffa74449d5614)

4 weeks agoMerge pull request #65203 from rhcs-dashboard/wip-72625-tentacle
afreen23 [Tue, 26 Aug 2025 17:04:28 +0000 (22:34 +0530)]
Merge pull request #65203 from rhcs-dashboard/wip-72625-tentacle

tentacle: mgr/dashboard: Target Storage Class in s3 tiering Config

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoMerge pull request #65224 from rhcs-dashboard/wip-72711-tentacle
afreen23 [Tue, 26 Aug 2025 15:18:59 +0000 (20:48 +0530)]
Merge pull request #65224 from rhcs-dashboard/wip-72711-tentacle

tentacle: mgr/dashboard : Optimized /host API to minimum resp

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agoMerge pull request #65225 from rhcs-dashboard/wip-72712-tentacle
afreen23 [Tue, 26 Aug 2025 14:30:21 +0000 (20:00 +0530)]
Merge pull request #65225 from rhcs-dashboard/wip-72712-tentacle

tentacle: mgr/dashboard : 72522 - Remove service instances column to imporve API perf

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agomgr/dashboard: Dashboard nfs export editor rejects ipv6 addresses 65241/head
Afreen Misbah [Thu, 21 Aug 2025 09:41:43 +0000 (15:11 +0530)]
mgr/dashboard: Dashboard nfs export editor rejects ipv6 addresses

Fixes https://tracker.ceph.com/issues/72660

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 6c9a5536f3459cb3e676e240730eb5f7c45352ff)

4 weeks agomgr/dashboard: Add /health/snapshot api
Afreen Misbah [Wed, 13 Aug 2025 06:49:02 +0000 (12:19 +0530)]
mgr/dashboard: Add /health/snapshot api

Fixes https://tracker.ceph.com/issues/72609

- The current minimal API relies on fetching data from osdmap and pgmap.
- These commands produce large, detailed payloads that become a performance bottleneck and impact scalability, especially in large clusters.
- To address this, we propose switching to the ceph snapshot API using ceph status command, which retrieves essential information directly from the cluster map.
- ceph status is significantly more lightweight compared to osdmap/pgmap, reducing payload sizes and processing overhead.
- This change ensures faster response times, improves system efficiency in large deployments, and minimizes unnecessary data transfer.
- update tests

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 2609d4f62e9e3906cf3e3fcc042bfdf0bcc633bf)

 Conflicts:
src/pybind/mgr/dashboard/frontend/package-lock.json
src/pybind/mgr/dashboard/frontend/package.json
src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.html

4 weeks agomgr/dashboard:Fixed issue with clipboard icon
Afreen Misbah [Thu, 10 Jul 2025 20:29:12 +0000 (01:59 +0530)]
mgr/dashboard:Fixed issue with clipboard icon

- clipboard icon not displaying breaking several places
- cliboard icon on click gets filed primary green color losing the visibilty of icon. The icon now remain visible on click
- clipboard button for path and copy in tables on mouseover does not give `hand` but `cursor`. which was not ideal from a usability standpoint. This behavior has been updated to use the hand cursor making the interaction semantically correct and more intuitive for users.

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 1dd42e1a339ef4cf1f3253534646aa6c9342233d)

Conflicts:
       src/pybind/mgr/dashboard/frontend/src/app/shared/components/copy2clipboard-button/copy2clipboard-button.component.html

4 weeks agomgr/dashboard: Add generic component for icons
Afreen Misbah [Thu, 3 Jul 2025 11:09:25 +0000 (16:39 +0530)]
mgr/dashboard: Add generic component for icons

Fixes https://tracker.ceph.com/issues/71947
Fixes https://tracker.ceph.com/issues/71933

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit fa5160444e71b09e2e845c040206e34c775c27ff)

 Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/shared/components/card-row/card-row.component.html
src/pybind/mgr/dashboard/frontend/src/app/shared/components/components.module.ts
src/pybind/mgr/dashboard/frontend/src/app/shared/enum/icons.enum.ts
src/pybind/mgr/dashboard/frontend/src/styles/_carbon-defaults.scss

4 weeks agoMerge pull request #65106 from afreen23/wip-72525-tentacle
afreen23 [Tue, 26 Aug 2025 12:57:15 +0000 (18:27 +0530)]
Merge pull request #65106 from afreen23/wip-72525-tentacle

tentacle: mgr/dashboard: Stop rules api being polled on every page

Reviewed-by: Nizamudeen A <nia@redhat.com>
4 weeks agodoc/rados/operations: Improve health-checks.rst 65237/head
Anthony D'Atri [Tue, 26 Aug 2025 11:38:58 +0000 (07:38 -0400)]
doc/rados/operations: Improve health-checks.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit ba5cb7b8d63040730934a06d13baf2968952e813)

4 weeks agoMerge pull request #65107 from afreen23/wip-72607-tentacle
afreen23 [Tue, 26 Aug 2025 09:03:51 +0000 (14:33 +0530)]
Merge pull request #65107 from afreen23/wip-72607-tentacle

tentacle: Replace capacity threshold data with prometheus metrics

Reviewed-by: Nizamudeen A <nia@redhat.com>
4 weeks agoMerge pull request #65148 from cloudbehl/wip-72639-tentacle
Aashish Sharma [Tue, 26 Aug 2025 08:19:43 +0000 (13:49 +0530)]
Merge pull request #65148 from cloudbehl/wip-72639-tentacle

tentacle: prometheus: Add RBD image metadata to prometheus

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
4 weeks agoMerge pull request #65226 from aaSharma14/wip-72688-tentacle
Aashish Sharma [Tue, 26 Aug 2025 08:02:37 +0000 (13:32 +0530)]
Merge pull request #65226 from aaSharma14/wip-72688-tentacle

tentacle: Handle failures in metric parsing

Reviewed-by: Aashish Sharma <aasharma@redhat.com>
4 weeks agoMerge pull request #65223 from rhcs-dashboard/wip-72716-tentacle
afreen23 [Tue, 26 Aug 2025 07:22:33 +0000 (12:52 +0530)]
Merge pull request #65223 from rhcs-dashboard/wip-72716-tentacle

tentacle: mgr/dashboard: rgw_crypt_kmip_addr in SSE-KMS kmip address input field validation updated

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agomgr/dashboard: Fix test_host.py test case 65225/head
Abhishek Desai [Mon, 25 Aug 2025 14:11:45 +0000 (19:41 +0530)]
mgr/dashboard: Fix test_host.py test case
fixes: https://tracker.ceph.com/issues/72717

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
4 weeks agoHandle failures in metric parsing 65226/head
Anmol Babu [Thu, 3 Jul 2025 13:25:39 +0000 (18:55 +0530)]
Handle failures in metric parsing

fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2345460
Signed-off-by: Anmol Babu <anmolbabu@Anmols-MacBook-Pro.local>
(cherry picked from commit f29e3f307c46401328e920204cbe893fbd837c65)

4 weeks agomgr/dashboard : 72522 - Remove service instances column to imporve API perf
Abhishek Desai [Mon, 11 Aug 2025 11:53:52 +0000 (17:23 +0530)]
mgr/dashboard : 72522 - Remove service instances column to imporve API perf
fixes : https://tracker.ceph.com/issues/72522

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 1dbebbd86c434137829fb9fce88cdc368f9d4993)

4 weeks agomgr/dashboard : Optimized /host API to minimum resp 65224/head
Abhishek Desai [Sun, 17 Aug 2025 18:37:47 +0000 (00:07 +0530)]
mgr/dashboard : Optimized /host API to minimum resp
fixes : https://tracker.ceph.com/issues/72608

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit cd2d91a6e78af339c86a24e42bdb705d854262f5)

4 weeks agomgr/dashboard: rgw_crypt_kmip_addr in SSE-KMS kmip address input field validation... 65223/head
Abhishek Desai [Mon, 4 Aug 2025 19:30:52 +0000 (01:00 +0530)]
mgr/dashboard: rgw_crypt_kmip_addr in SSE-KMS kmip address input field validation updated
fixes : https://tracker.ceph.com/issues/72408

Signed-off-by: Abhishek Desai <abhishek.desai1@ibm.com>
(cherry picked from commit 4aa462b091780dcfc7a297e57b314ba5a3a479fe)

4 weeks agoMerge pull request #65072 from baum/72508_backport
Yuri Weinstein [Mon, 25 Aug 2025 21:06:46 +0000 (14:06 -0700)]
Merge pull request #65072 from baum/72508_backport

tentacle: nvmeof: create /dev/dsa if DSA acceleration is enabled and the device…

Reviewed-by: Adam King adking@redhat.com
4 weeks agomgr/dashboard: Replace capacity threshold data with prometheus metrics 65107/head
Afreen Misbah [Mon, 11 Aug 2025 09:03:32 +0000 (14:33 +0530)]
mgr/dashboard: Replace capacity threshold data with prometheus metrics

- Fixes https://tracker.ceph.com/issues/72519
- the osd dump metrics is used in /api/osd/settings
- this metrics creates perf bottleneck when osds are 1000s
- replacing with similar prometheus metrics
- minor refactors - including renaming, comments.

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit 3281ccfe3542e22e141681cba77cc7970ba10e7b)

Conflicts:
    src/pybind/mgr/dashboard/frontend/src/app/ceph/dashboard-v3/dashboard/dashboard-v3.component.ts
    src/pybind/mgr/dashboard/frontend/src/app/shared/api/prometheus.service.ts

4 weeks agomgr/dashboard: fix missing gw group error 65221/head
Tomer Haskalovitch [Mon, 11 Aug 2025 21:53:07 +0000 (00:53 +0300)]
mgr/dashboard: fix missing gw group error

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit f83a3d93ddd9d2f153338f266f243225183b5934)

4 weeks agomgr/dashboard: catch more exception to show relevant cli output 65220/head
Tomer Haskalovitch [Tue, 12 Aug 2025 05:49:09 +0000 (08:49 +0300)]
mgr/dashboard: catch more exception to show relevant cli output

Signed-off-by: Tomer Haskalovitch <tomer.haska@ibm.com>
(cherry picked from commit 677c2c2f3275f6669dbff2e75b5a2d871635a6f2)

4 weeks agomgr/dashboard: Stop rules api being polled on every page 65106/head
Afreen Misbah [Wed, 6 Aug 2025 07:37:16 +0000 (13:07 +0530)]
mgr/dashboard: Stop rules api being polled on every page

- /rules ar epolled every 5 seconds on every page
- it is only required for alerts page where full rules list is shown in `Alerts` tab
- also added observable for getting rules instead of plain array

Signed-off-by: Afreen Misbah <afreen@ibm.com>
(cherry picked from commit df984d72df1370181328dc3ec30a22619841c185)

4 weeks agodoc/dev:update blkin.rst doc for lttng trace 65210/head
lizhipeng [Fri, 22 Aug 2025 03:53:52 +0000 (11:53 +0800)]
doc/dev:update blkin.rst doc for lttng trace
fixes:https://tracker.ceph.com/issues/72059

Signed-off-by: lizhipeng <qiuxinyidian@gmail.com>
(cherry picked from commit 3029cc9afdee352fb22db0895c5d3ec4a35277d3)

4 weeks agodoc/rados/configuration: Mention show-with-defaults and ceph-conf 65205/head
Niklas Hambüchen [Sat, 21 Jun 2025 17:46:13 +0000 (19:46 +0200)]
doc/rados/configuration: Mention show-with-defaults and ceph-conf

A small improvement based on
"Why is it still so difficult to just dump all config and where it comes from?"
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EZSLRYBYEWDA6YIARQVMUKQUWHAE3PGR/

`show-with-defaults` is very useful, and `ceph-conf` is mentioned
so that it's clear that it's legacy, and the user doesn't have to
wonder if it's actually useful but was forgotten in the list.

Signed-off-by: Niklas Hambüchen <mail@nh2.me>
(cherry picked from commit 978ab834c464b993ec77c914cb36da47211a1cd4)

4 weeks agoMerge pull request #65100 from rhcs-dashboard/wip-72622-tentacle
afreen23 [Mon, 25 Aug 2025 13:47:44 +0000 (19:17 +0530)]
Merge pull request #65100 from rhcs-dashboard/wip-72622-tentacle

tentacle: mgr/dashboard: add tab structure to File Systems page and embed Ceph-Filesystem Overview dashboard

Reviewed-by: Afreen Misbah <afreen@ibm.com>
4 weeks agomgr/dashboard: [RGW] - Target Storage Class in s3 tiering config 65203/head
Dnyaneshwari [Thu, 14 Aug 2025 05:15:52 +0000 (10:45 +0530)]
mgr/dashboard: [RGW] - Target Storage Class in s3 tiering config

Fixes: https://tracker.ceph.com/issues/72583
Signed-off-by: Dnyaneshwari Talwekar <dtalweka@redhat.com>
(cherry picked from commit 0478bf227e77a986217adc7b1b28d1568661fb32)

4 weeks agodoc/cephfs: edit troubleshooting.rst (Slow MDS) 65200/head
Zac Dover [Fri, 22 Aug 2025 08:39:29 +0000 (18:39 +1000)]
doc/cephfs: edit troubleshooting.rst (Slow MDS)

Move the "Slow requests (MDS)" section immediately after the first
section in this document ("Slow/Stuck Operations"), because the first
procedure on the page directs the reader to undertake the operation in
"Slow requests (MDS)" before trying anything else.

Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 55af6643c9a119afc4e22e2591774e1d68ef5580)

4 weeks agoMerge pull request #65110 from rhcs-dashboard/wip-72492-tentacle
afreen23 [Mon, 25 Aug 2025 09:22:10 +0000 (14:52 +0530)]
Merge pull request #65110 from rhcs-dashboard/wip-72492-tentacle

tentacle: mgr/dashboard: fix table loading while fetching data

Reviewed-by: Naman Munet <nmunet@redhat.com>
4 weeks agomon/MonClient: post version request completions outside of monc_lock 64898/head
Ilya Dryomov [Thu, 21 Aug 2025 19:39:29 +0000 (21:39 +0200)]
mon/MonClient: post version request completions outside of monc_lock

dispatch() is allowed to invoke the completion object in the current
thread, before control returns from dispatch().  This isn't desirable
when it comes to discarding version requests in MonClient::shutdown()
and MonClient::_reopen_session() because completion objects could then
be invoked under monc_lock.  In case of MonClient::_reopen_session() in
particular, this leads to an attempt to acquire monc_lock once again in
MonClient::get_version() on a retry due to monc_errc::session_reset
that is converted to errc::resource_unavailable_try_again:

  MonClient::ms_handle_reset
    < takes monc_lock >
    MonClient::_reopen_session
      < invokes the completion object via dispatch() with ec == monc_errc::session_reset >
      Objecter::CB_Objecter_GetVersion::operator() [ ec == errc::resource_unavailable_try_again ]
        Objecter::_wait_for_latest_osdmap
          MonClient::get_version
            < attempts to take monc_lock in the body of the lambda >

The end result is either a lockup or some form of undefined behavior.
The best possible outcome here is an exception (std::system_error with
"Resource deadlock avoided" error) and a successive call to
std::terminate().

This is a regression introduced in commit e81d4eae4e76 ("common/async:
Update `use_blocked` for newer asio").  Revert to posting version
request completions for the error cases in a way that is uniform with
the success case in MonClient::handle_get_version_reply().

Fixes: https://tracker.ceph.com/issues/72692
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bd449f0ac823413a55069e3df9e163a4b4adbebd)
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
5 weeks agoMerge pull request #65179 from phlogistonjohn/jjm-tentacle-bwc
David Galloway [Fri, 22 Aug 2025 15:22:17 +0000 (11:22 -0400)]
Merge pull request #65179 from phlogistonjohn/jjm-tentacle-bwc

tentacle: backport build-with-container patches from main

5 weeks agoMerge pull request #65124 from zdover23/wip-doc-2025-08-19-backport-64929-to-tentacle
Zac Dover [Fri, 22 Aug 2025 08:46:33 +0000 (18:46 +1000)]
Merge pull request #65124 from zdover23/wip-doc-2025-08-19-backport-64929-to-tentacle

tentacle: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 weeks agoMerge pull request #65121 from zdover23/wip-doc-2025-08-19-backport-65021-to-tentacle
Zac Dover [Fri, 22 Aug 2025 08:46:08 +0000 (18:46 +1000)]
Merge pull request #65121 from zdover23/wip-doc-2025-08-19-backport-65021-to-tentacle

tentacle: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 weeks agoMerge pull request #65092 from zdover23/wip-doc-2025-08-18-backport-64931-to-tentacle
Zac Dover [Fri, 22 Aug 2025 08:45:46 +0000 (18:45 +1000)]
Merge pull request #65092 from zdover23/wip-doc-2025-08-18-backport-64931-to-tentacle

tentacle: doc/cephfs: edit troubleshooting.rst

Reviewed-by: Anthony D'Atri <anthony.datri@gmail.com>
5 weeks agodoc/man/8: Improve mount.ceph.rst 65182/head
Anthony D'Atri [Tue, 29 Jul 2025 00:38:37 +0000 (20:38 -0400)]
doc/man/8: Improve mount.ceph.rst

Signed-off-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com>
(cherry picked from commit 230d91c2e28f3df27dd5007b937b477922cb7655)

5 weeks agobuild-with-container: improve source rpm detection 65179/head
John Mulligan [Tue, 19 Aug 2025 23:12:07 +0000 (19:12 -0400)]
build-with-container: improve source rpm detection

Improve source rpm detection by adding a new detection method that
executes and rpm command in a container to get exactly the version of
the source rpm that the ceph.spec file would have generated.  For
backwards compatibility and that I don't entirely trust myself to have
tested this the old methods are still available.

The old `--rpm-no-match-sha` is now an alias for `--srpm-match=any` to
cause it to build any (unique) ceph srpm it finds.
`--srpm-match=versionglob` retains the previous default behavior of
using a glob matching on the git id or ceph version value.  The new
default of `--srpm-match=auto` implements the rpm command based behavior
described above.

All of this is wrapped in a new step `find-rpm` but that's mostly an
implementation detail and for testing.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 916088a4e7380cd7ac1403fb4416ef91ab07aa52)

5 weeks agomake-srpm.sh: don't shell out redundantly to pwd
John Mulligan [Tue, 19 Aug 2025 19:03:51 +0000 (15:03 -0400)]
make-srpm.sh: don't shell out redundantly to pwd

Just something that annoyed me while reading the script.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 544d8ab5eb81fc5b8b950c2c1c116fad6b1a40c4)

5 weeks agopybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc
Dan Mick [Wed, 13 Aug 2025 19:16:45 +0000 (12:16 -0700)]
pybind/mgr/dashboard/frontend: add NPM_CACHEDIR envvar, use in bwc

Add an optional NPM_CACHEDIR environment variable to serve as the
cache parameter for npm in the dashboard frontend build.  The idea
is to allow it to persist across builds so that we decrease the load
on registry.npmjs.org, which has been throttling our requests when
using build-with-container.py, and also hopefully improve the time
of the frontend npm operations.

build-with-container.py also grows a --npm-cache-path option to allow
setting it for container builds and passing the envvar to the build.

Fixes: https://tracker.ceph.com/issues/72298
Signed-off-by: Dan Mick <dan.mick@redhat.com>
(cherry picked from commit ad7e6117a9e99061a3ad7e03709dd31e34832966)

5 weeks agodashboard: fix the workaround for unpacking node sources
John Mulligan [Wed, 21 May 2025 21:46:40 +0000 (17:46 -0400)]
dashboard: fix the workaround for unpacking node sources

My previous workaround in the dashboard for the unpacking of non-root
own tarball as the fake root of a container did not work because of the
strange quoting/escaping behavior of cmake (it tried to run `id -u` as a
single command, not a command and an argument).
Use single quoted string and old school backticks to work around this issue.

Fixes: 24dbfb5da4813c6588f9cd199b9f527bb67f1e88
Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 3a36180a373d91adcf9726660204f0cc1dcecba3)

5 weeks agodashboard: ensure nodeenv downloaded content is owned by current user
John Mulligan [Fri, 2 May 2025 15:17:53 +0000 (11:17 -0400)]
dashboard: ensure nodeenv downloaded content is owned by current user

When testing ceph builds in a container we discovered that certain files
could not be deleted by jenkins after a build. This was due to the way
the container maps IDs - files owned by the root user in the container
become owned by the "real" user/jenkins user on the "host".
However, the node tarball that is fetched and unpacked by nodeenv has
a different owner name/uid that is preserved in the tree and this id
gets mapped to something that can be managed by the "fake root" of the
container but not by the "regular" user outside the container.

The simplest workaround I can think of is to chown the tree back
to the current user and avoid leaving files on disk with uncleanly
mapped uids.

Signed-off-by: John Mulligan <jmulligan@redhat.com>
(cherry picked from commit 24dbfb5da4813c6588f9cd199b9f527bb67f1e88)