qa: Add Teuthology test for BlueStore ESB assertion failure
Adds a test to reproduce the !ito->is_valid() assertion in BlueStore
with bluestore_elastic_shared_blobs=true on a 2+1 EC pool using a
FIO randwrite workload (512 concurrent ops, 50G, 12,500 objects).
The test deploys a 6-OSD cluster and runs FIO for 1 hour via workunit,
failing if an OSD crashes.
Zac Dover [Mon, 2 Jun 2025 02:32:36 +0000 (12:32 +1000)]
doc/start: edit documenting-ceph.rst
Edit the section "Build the Source" in doc/start/documenting-ceph.rst.
Also correct a misuse of the word "presently", which means "in a little
while", not "now".
Zac Dover [Mon, 2 Jun 2025 02:16:47 +0000 (12:16 +1000)]
doc/dev/cephfs-mirroring: edit file 4 of x
Add prompts (and perform necessary corrections to glaring grammatical
errors) to doc/dev/cephfs-mirroring.rst, as requested by Jos Collin in
https://github.com/ceph/ceph/pull/63237/files#r2085886075.
This commit edits the fourth (and final) quarter of the
doc/dev/cephfs-mirroring.rst file.
Further refinements to the English in this file are possible.
Zac Dover [Sun, 1 Jun 2025 23:45:42 +0000 (09:45 +1000)]
doc/mgr: edit nfs.rst
Edit the "Updating an NFS Cluster" section of doc/mgr/nfs.rst. This
commit includes changes requested by Anthony D'Atri in
https://github.com/ceph/ceph/pull/63452.
Previously, we had memory leak in the test_bluestore_types.cc tests where
`BufferCacheShard` and `OnodeCacheShard` objects were allocated with
raw pointers but never freed, causing leaks detected by AddressSanitizer.
ASan rightly pointed this out:
```
Direct leak of 224 byte(s) in 1 object(s) allocated from:
#0 0x55a7432a079d in operator new(unsigned long) (/home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_bluestore_types+0xf2e79d) (BuildId: c3bec647afa97df6bb147bc82eac937531fc6272)
#1 0x55a743523340 in BlueStore::BufferCacheShard::create(BlueStore*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, ceph::common::PerfCounters*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/os/bluestore/Bl
ueStore.cc:1678:9
#2 0x55a74330b617 in ExtentMap_seek_lextent_Test::TestBody() /home/jenkins-build/build/workspace/ceph-pull-requests/src/test/objectstore/test_bluestore_types.cc:1077:7
#3 0x55a7434f2b2d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.
cc:2653:10
#4 0x55a7434b5775 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:
2689:14
#5 0x55a74347005d in testing::Test::Run() /home/jenkins-build/build/workspace/ceph-pull-requests/src/googletest/googletest/src/gtest.cc:2728:5
```
```
Direct leak of 9928 byte(s) in 1 object(s) allocated from:
#0 0x7ff249d21a2d in operator new(unsigned long) /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_new_delete.cpp:86
#1 0x6048ed878b76 in BlueStore::OnodeCacheShard::create(ceph::common::CephContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::common::PerfCounters*) /home/kefu/dev/ceph/src/os/bluestore/BlueStore.cc:1219
#2 0x6048ed66d4f9 in GarbageCollector_BasicTest_Test::TestBody() /home/kefu/dev/ceph/src/test/objectstore/test_bluestore_types.cc:2662
#3 0x6048ed820555 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2653
#4 0x6048ed80c78a in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2689
#5 0x6048ed7b8bfa in testing::Test::Run() /home/kefu/dev/ceph/src/googletest/googletest/src/gtest.cc:2728
```
In this change, we replace raw pointer allocation with unique_ptr to
ensure automatic cleanup when the objects go out of scope.
` Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Casey Bodley [Wed, 28 May 2025 20:33:44 +0000 (16:33 -0400)]
script: ceph-backport.sh adds redmine key to api requests
the ceph-backport.sh script recently started failing with:
> ceph-backport.sh: DEBUG: Considering Redmine issue: https://tracker.ceph.com/issues/70374 - is it in the Backport tracker?
> ceph-backport.sh: DEBUG:
> ceph-backport.sh: ERROR: Issue https://tracker.ceph.com/issues/70374 is not a Backport
because the command `curl --silent https://tracker.ceph.com/issues/70374.json`
now fails with `HTTP/2 401` (Unauthorized) and returns an empty string
the command succeeds after adding my redmine key as a query param like
some of the other redmine requests
Fixed calculation on effective blob size.
When fully non-compressible data is passed,
it could cause losing few bytes in the end.
Example:
-107> 2025-05-17T20:40:50.468+0000 7f267a42f640 15 bluestore(/var/lib/ceph/osd/ceph-4) _do_write_v2_compressed 200000~78002 -> 200000~78002
-106> 2025-05-17T20:40:50.468+0000 7f267a42f640 20 blobs to put: 200000~f000(4d61) 20f000~f000(b51) 21e000~f000(b51) 22d000~f000(b51) 23c000~f000(b51) 24b000~f000(b51) 25a000~f000(b51) 269000~f000(b51)
In result we split 0x78002 into 8 * 0xf000, losing 0x2 in the process.
Calculations for original:
>>> size=0x78002
>>> blobs=(size+0xffff) / 0x10000
>>> blob_size = size / blobs
>>> print hex(size), blobs, hex(blob_size)
0x78002 8 0xf000 <-this means roundup is 0xf000
Laura Flores [Tue, 27 May 2025 17:09:04 +0000 (12:09 -0500)]
qa/crontab: update priority for tentacle upgrade command
The current prio (100) results in this error:
```
teuthology.exceptions.ScheduleFailError: Scheduling failed: Unable to schedule 244 jobs with priority 100.
```
I tested the prio on 150 on my teuthology setup, and this passes with the amount of jobs.
Connor Fawcett [Mon, 9 Dec 2024 17:02:11 +0000 (17:02 +0000)]
qa/tasks: Add a task which performs an offline check of the consistency of parity shards
Add a Python script which can be used to scan a Ceph cluster, find any erasure coded data objects and
check them for consistency. This is achieved by reading the data shards for a given object, running the data shards
through the existing EC tool and verifying the output matches the parity shards stored on the OSDs.
This commit adds a new teuthology task but does not add it to any YAMLs currently, this work will be
expanded on in future commits.
crimson/osd/osd_operations/pg_advance_map: Add splitting as a function
As we initiate pg splitting as part of the PGAdvanceMap workflow, it is not required
to maintain it as a separate osd_operation.
A new function in PGAdvanceMap - split_pg(), will now take care of the splitting workflow
if we detect split children in an OSD map.
Since we do not follow the same queuing system as classical OSD in crimson, we will not
need to maintain pg_num_history. This makes the splitting check simpler.
With most of the splitting code being part of PGAdvanceMap, it makes sense to have the
splitting check there as well and leave broadcast_map_to_pgs untouched.