The test intended to validate the failure of 'mds fail'
cmd on any active mds when one of them has warning.
The commit 221700273a82658c642a282c5761c0cbb00ec5b6
(PR 61554) changes this behavior and allows 'mds fail'
on mds without the warning. The test should have always
failed with this commit. But the test never failed until
tested extensively because the test mostly generated
warnings for both active mdses. Occasionaly, the test
generated a warning on single mds and failed. So it's a
race. This patch fixes the same by changing the following.
a. Changed the mds_cache_memory_limit to '50K' from '1K'
as '1K' was to less and generating warning on both the mdses.
b. Create a directory and pin it a single mds and open 400 files
in the backend to create cache pressure on one mds.
Also, there are two tests with the same name as
'test_with_health_warn_with_2_active_MDSs' but in different classes
though. So changed the test name to
'test_with_health_warn_on_1_mds_with_2_active_MDSs' to avoid
confusion and indicate what the test actually does.
Merge pull request #62713 from soumyakoduri/wip-skoduri-restore-glacier
rgw/cloud-restore [PART2] : Add Restore support from Glacier/Tape cloud endpoints
Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com>
This should fix the chmod 777 /var/log/ceph failures.
We were missing the install task which resulted in no /var/log/ceph:
```
2025-07-07T08:55:44.586 INFO:teuthology.run_tasks:Running task ceph...
2025-07-07T08:55:44.679 INFO:tasks.ceph:Making ceph log dir writeable by
non-root...
2025-07-07T08:55:44.679 DEBUG:teuthology.orchestra.run.smithi144:> sudo
chmod 777 /var/log/ceph
2025-07-07T08:55:44.711
INFO:teuthology.orchestra.run.smithi144.stderr:chmod: cannot access
'/var/log/ceph': No such file or directory
```
Matan Breizman [Sun, 8 Jun 2025 10:20:25 +0000 (10:20 +0000)]
crimson/CMakeLists: simplify crimson-common deps
instead of appending conditional dependencies to crimson-common with
crimson_common_deps and crimson_common_public_deps, use
target_link_libraries directly.
Connor Fawcett [Tue, 24 Jun 2025 11:45:06 +0000 (12:45 +0100)]
Adds a new command-line utility which can check the consistency of objects within an erasure coded pool.
A new test-only inject tells the EC backend to return both data and parity shards to the client so that they can
be checked for consistency by the new tool.
Soumya Koduri [Fri, 23 May 2025 20:25:30 +0000 (01:55 +0530)]
rgw/cloud-restore: Handle failure with adding restore entry
In case adding restore entry to FIFO fails, reset the `restore_status`
of that object as "RestoreFailed" so that restore process can be
retried from the end S3 user.
Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
rgw/cloud-restore: Support restoration of objects transitioned to Glacier/Tape endpoint
Restoration of objects from certain cloud services (like Glacier/Tape) could
take significant amount of time (even days). Hence store the state of such restore requests
and periodically process them.
Brief summary of changes
* Refactored existing restore code to consolidate and move all restore processing into rgw_restore* file/class
* RGWRestore class is defined to manage the restoration of objects.
* Lastly, for SAL_RADOS, FIFO is used to store and read restore entries.
Currently, this PR handles storing state of restore requests sent to cloud-glacier tier-type which need async processing.
The changes are tested with AWS Glacier Flexible Retrieval with tier_type Expedited and Standard.
Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Adam Emerson <aemerson@redhat.com> Reviewed-by: Jiffin Tony Thottan <thottanjiffin@gmail.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Signed-off-by: Soumya Koduri <skoduri@redhat.com>
Dnyaneshwari [Thu, 22 May 2025 07:08:25 +0000 (12:38 +0530)]
mgr/dashboard: Local Storage Class - create and list Fixes: https://tracker.ceph.com/issues/71460 Signed-off-by: Dnyaneshwari Talwekar <dtalwekar@redhat.com>
alienstore FTBFS [1] due to virtual-dtor warning when compiling seastar [2].
Instead of using alien::cflags which define INTERFACE_COMPILE_OPTIONS of
-Wno-non-virtual-dtor - Let's directly add this compile option to
tagets using seastar.
Crimson non-alien targets solve that with crimson::cflags which
defines the relevant compile flag. However, we don't reuse it here since
it also carries WITH_CRIMSON.
As both crimson::cflags and crimson-alienstore which are using seastar
have to set no-non-virtual-dtor - The compile option moved to the common
cmake file instead of setting it in both targets.
[1]
```
crimson/os/alienstore/alien_log.cc:21:28: required from here
seastar/include/seastar/core/future.hh:666:7:
warning: ‘class seastar::continuation_base<void>’ has virtual functions
and accessible non-virtual destructor [-Wnon-virtual-dtor]
```
mon: Integrate discard queue overflow into pg health warnings
Added a health warning mechanism to monitor the discard queue for potential overload
Emits a warning if the accumulated discarded bytes in the queue exceed the configured threshold
Introduced a debugging tool to simulate slow discard operations by adding a configurable delay
common/options: Added bdev_discard_max_bytes and bdev_debug_discard_sleep options
Added a health warning mechanism to monitor the discard queue for potential overload
Emits a warning if the accumulated discarded bytes in the queue exceed the configured threshold
Introduced a debugging tool to simulate slow discard operations by adding a configurable delay
Jaya Prakash [Thu, 9 Jan 2025 16:14:05 +0000 (21:44 +0530)]
blk:Warning added for discard queue overflow
Added a health warning mechanism to monitor the discard queue for potential overload
Emits a warning if the accumulated discarded bytes in the queue exceed the configured threshold
Introduced a debugging tool to simulate slow discard operations by adding a configurable delay
Matan Breizman [Mon, 30 Jun 2025 09:44:24 +0000 (09:44 +0000)]
crimson: switch to ceph_abort_msg
ceph_abort doesn't print a message. Use ceph_abort_msg instead.
Most of the instances are not printing useful information but some are:
ceph_abort_msg("seastore device size setting is too small");
Leonid Chernin [Tue, 24 Jun 2025 13:00:49 +0000 (16:00 +0300)]
nvmeofgw: fixing GW delete issues
1.fixing the issue when gw is deleted based on invalid subsystem info
2. in function track_deleting_gws: break from loop only if
delete was really done
3. fix published rebalance index - publish ana-group instead of
index
4. do not dump gw-id string after gw was removed
Fixes: https://tracker.ceph.com/issues/71896 Signed-off-by: Leonid Chernin <leonidc@il.ibm.com>
.github/workflows/scripts/config-diff-post-comment.js: fix config check ok logic
currently, whenever a "config diff tool output" comment is created it
also has the string `/config check ok` string in it. The next time the
test run it see's this text and assumes that the user has commented it.
We fix the logic to makes sure that we ignore such cases.