edef [Thu, 16 Mar 2023 09:43:58 +0000 (09:43 +0000)]
common: use close_range on Linux
Fix rook/rook#10110, which occurs when _SC_OPEN_MAX/RLIMIT_NOFILE is
set to very large values (2^30), leaving fork_function pegging a core
busylooping.
The glibc wrappers closefrom(3)/close_range(3) are not available before
glibc 2.34, so we invoke the syscall directly. When glibc 2.34 is old
enough to be a reasonable hard minimum dependency, we should switch to
using closefrom.
If we're not running on (recent enough) Linux, we fall back to the
existing approach.
Ilya Dryomov [Thu, 30 Jan 2025 19:30:18 +0000 (20:30 +0100)]
doc/rbd: use https links in live import examples
Even though it's explicitly said that "http" stream can be used to
import via both HTTP and HTTPS, it can still be confusing that "type":
"http" is expected to go with "url": "https://...". Switch example
URLs from HTTP to HTTPS to make it more obvious.
Ilya Dryomov [Wed, 22 Jan 2025 19:34:11 +0000 (20:34 +0100)]
librbd: clear ctx before initiating close in Image::{aio_,}close()
Image::aio_close() must clear ctx before initiating close. Otherwise
the provided callback may see a non-NULL ctx and attempt to close the
image again from Image destructor, leading to an invalid memory access
as ImageCtx and ImageState are both freed immediately after the image
is closed (i.e. before AioCompletion is completed and the callback is
executed).
The same adjustment is made to Image::close() just for consistency.
Ilya Dryomov [Sat, 25 Jan 2025 10:11:14 +0000 (11:11 +0100)]
doc/rados: pool and namespace are independent osdcap restrictions
For the "profile {name}" syntax, pool and namespace restrictions are
independent of each other (i.e. specifying namespace doesn't also
require specifying pool like is currently suggested). A cap can look
like "profile rbd namespace=myns", signifying that the RBD profile is
to be allowed in myns namespace of any pool.
For the "allow {access-spec}" syntax, pool restriction is optional.
A cap can look like "allow r namespace=myns", "allow w object_prefix
myprefix" or "allow rw namespace=myns object_prefix myprefix", for
example.
Zac Dover [Fri, 24 Jan 2025 13:46:19 +0000 (23:46 +1000)]
doc/cephfs: edit disaster-recovery-experts (6 of x)
In doc/cephfs/disaster-recovery-experts.rst, incorporate Anthony's
suggestions in
https://github.com/ceph/ceph/pull/61462#discussion_r1923917812
and
https://github.com/ceph/ceph/pull/61462#discussion_r1923920724
and reword the sentences in the section "Using an alternate metadata
pool for recovery" to be in the imperative mood, which better suits the
ordered list format that was introduced in
https://github.com/ceph/ceph/pull/61493.
Follows https://github.com/ceph/ceph/pull/61493.
https://tracker.ceph.com/issues/69557
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 5670054bd0533c8f2507d0596797214da8ba489a)
msg: insert PriorityDispatchers in sorted position
avoid calling stable_sort() after every insertion by inserting directly
into the sorted position. use lower_bound() to insert at the head and
upper_bound() to insert at the tail
this generally only happens during startup so isn't a performance
problem, but std::stable_sort() was triggering strange valgrind warnings
for "Mismatched free() / delete / delete []" when it allocates a
temporary buffer
Zac Dover [Thu, 23 Jan 2025 09:49:26 +0000 (19:49 +1000)]
doc/cephfs: edit disaster-recovery-experts (5 of x)
Put the procedure in the section called "Using an alternate metadata
pool for recovery" into an ordered list, so that it is in a proper
procedure format.
This commit is meant only to break the procedure into steps. The English
language in each of these steps could be improved, but that improvement
will be done after this formatting has been merged and backported.
pg-split-merge using ceph daemon command to check merge.
but it doesn't use asok path, which causes the check not to
return the correct output. change the command to use asok path.
Zac Dover [Tue, 21 Jan 2025 05:53:19 +0000 (15:53 +1000)]
doc/cephfs: edit disaster-recovery-experts (4 of x)
Edit the seventh and final section of
doc/cephfs/disaster-recovery-experts.rst in preparation for adding
deeper explanations of the contexts in which one should use the various
commands listed on that page.
The section edited in this commit is
* Using an alternate metadata pool for recovery
A future commit might beneficially put this section into the format of
an ordered list. If so, such a commit should only reformat the
content and should not make any changes to the English. It's enough to
verify content or format. Let's not overload our editorial faculties by
forcing ourselves to walk and chew gum at the same time.
Zac Dover [Tue, 7 Jan 2025 06:42:52 +0000 (16:42 +1000)]
doc/cephfs: edit grammar in snapshots.rst
This commit improves the grammar in doc/cephfs/snapshots.rst. The PR
associated with this commit follows from
https://github.com/ceph/ceph/pull/61240, the PR raised by Neeraj Pratap
Singh to introduce information about snapshots into the CephFS
documentation.
Aashish Sharma [Mon, 28 Oct 2024 10:09:52 +0000 (15:39 +0530)]
mgr/dashboard: fix total objects/Avg object size in RGW Overview Page
Till now we are calculating the total number of objects and the average
object size in the RGW Overview Page using `ceph df` command's output.
As per the discussion with RGW team, this data is not correct as S3
objects in rgw can occupy more than one rados object. This PR tends to
make the overview page's info in sync with the RGW bucket page's info.
Zac Dover [Sat, 18 Jan 2025 04:04:14 +0000 (14:04 +1000)]
doc/cephfs: edit disaster-recovery-experts (3 of x)
Edit the fifth and sixth sections of
doc/cephfs/disaster-recovery-experts.rst in preparation for adding
deeper explanations of the contexts in which one should use the various
commands listed on that page.
The sections edited in this commit are
- MDS Map Reset
- Recovery From Mission Metadata Objects
Zac Dover [Sun, 19 Jan 2025 12:49:52 +0000 (22:49 +1000)]
doc/cephfs: disaster-recovery-experts cleanup
Properly wrap a poorly-formatted paragraph that looks just awful in an
80-column viewport and change MDS to "MDS daemons" where the latter
makes the sentence a lot clearer.
Zac Dover [Fri, 17 Jan 2025 12:33:49 +0000 (22:33 +1000)]
doc/cephfs: edit disaster-recovery-experts (2 of x)
Edit the third and fourth sections of
doc/cephfs/disaster-recovery-experts.rst in preparation for adding
deeper explanations of the contexts in which one should use the various
commands listed on that page.
Follows https://github.com/ceph/ceph/pull/61426
https://tracker.ceph.com/issues/69557
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 4f3a69eb919fc0d99cdf943f095ca3a951c82897)
This commit updates RGW Config Reference - Lifecycle Settings section. In particular it addresses an incorrect suggestion to decrease parallel threads in the workers pool for a more aggressive/accelerated per-bucket lifecycle processing. A more aggressive lifecycle processing for a bucket containing higher number of objects is achieved by increasing, not decreasing parallel threads.
Current suggestion is miss-leading.
Zac Dover [Thu, 16 Jan 2025 11:51:46 +0000 (21:51 +1000)]
doc/cephfs: edit disaster-recovery-experts
Edit the first two sections of doc/cephfs/disaster-recovery-experts.rst
in preparation for adding deeper explanations of the contexts in which
one should use the various commands listed on that page.
Adam Kupczyk [Fri, 10 Jan 2025 08:26:54 +0000 (08:26 +0000)]
os/bluestore: Fix BlueFS::truncate()
In `struct bluefs_fnode_t` there is a vector `extents` and
the vector `extents_index` that is a log2 seek cache.
Until modifications to truncate() we never removed extents from files.
Modified truncate() did not update extents_index.
For example 10 extents long files when truncated to 0 will have:
0 extents, 10 extents_index.
After writing some data to file:
1 extents, 11 extents_index.
Now, `bluefs_fnode_t::seek` will binary search extents_index,
lets say it located seek at item #3.
It will then jump up from #0 extent (that exists) to #3 extent which
does not exist at.
The worst part is that code is now broken, as #3 != extent.end().
There are 3 parts of the fix:
1) assert in `bluefs_fnode_t::seek` to protect against
jumping outside extents
2) code in BlueFS::truncate to sync up `extents_index` with `extents`
3) dampening down assert in _replay to give a way out of cases
where incorrect "offset 12345" (12345 is file size) instead of
"offset 20000" (allocations occupied) was written to log.
Fixes: https://tracker.ceph.com/issues/69481 Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit 7f3601089d41bfc23f530c7bf3fb7efad2d055ec)
Adam Kupczyk [Fri, 10 Jan 2025 10:07:18 +0000 (10:07 +0000)]
os/bluestore: bluefs unittest for truncate bug
Unittest showing 2 different flavours of problems:
1) bluefs log corruption
2) bluefs sigsegv
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
(cherry picked from commit f2b5e2fa0a9274c1667fccafa597fff9be7a74b1)
+ fixes for add_block_device
+ fix for bad usage of std::string's fill constructor