Expose rbd_default_clone_format option which has a fairly comprehensive
description (much more verbose than most other options, anyway). This
should help with understanding the difference between clone v1 and v2.
test/librbd: Remove crimson skip from TestDeepCopy
The TestDeepCopy.Stress and TestDeepCopy.Stress_SmallerDstObjSize tests
were previously skipped for the crimson store. This commit removes the
SKIP_IF_CRIMSON() calls, indicating that the tests should now pass with
the crimson osd.
chungfengz [Thu, 6 Nov 2025 09:46:51 +0000 (09:46 +0000)]
bluestore/BlueFS: fix bytes_written_slow counter with aio_write
The bytes_written_slow performance counter was incorrectly reporting
0 when using async I/O.
When aio_write() is called with a bufferlist, it uses claim_append()
to transfer ownership of the buffer to the aio structure, leaving the
source bufferlist empty. Using t.length() after aio_write() returns 0
instead of the actual bytes written.
Fix by using the pre-calculated x_len value which contains the actual
write size and is not affected by the buffer ownership transfer.
The with_obc() function acquires a lock before invoking the
lambda it wraps. Earlier the lambda itself called send_to_osd()
which returns a future to with_obc. If a future is not resolved
immediately and a response could arrive and trigger
handle_pull_response() which attempts to acquire an exclusive lock.
Because a future is not returned yet to with_obc() so the original
lock is still holding by with_obc and handle_pull_response() throw
an assertion failure due to that osd is crashed.
Solution: Move send_to_osd() call outside with_obc lambda so that
the lock is released before handle_pull_response() is triggered.
test/client: When testing large io, consider fscrypt
When testing large io sizes and clamping that io, consider
fscrypt max io size. This max io size should be a multiple
of 4K (fscrypt block size), but not to exceed INT_MAX.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: Use nearest fscrypt block when clamping max io size
A max io size can currently be up to INT_MAX. If it is greater,
then clamp the size to INT_MAX. This conflicts with fscrypt io
operations. An fscrypt, op needs to read a whole fscrypt block.
The size of fscrypt block size is 4K, INT_MAX % 4K is not equal
to 0. Therefore, get the nearest multiple of 4K to INT_MAX that
does not go over. In the fscrypt case, this value will be used
for clamping max io size.
Fixes: https://tracker.ceph.com/issues/73346 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: Do not expose ceph_fscrypt_key_identifier in api
The libcephfs API call add_fscrypt_key exposes an internal fscrypt
data structure. This is because a hash keyid (of the master key) is used
for calls such as remove_fscrypt_key. Instead of using this structure,
use a char array to obtain keyid.
Fixes: https://tracker.ceph.com/issues/63293 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Fix warnings/errors in ceph API tests that are present in various files
that were introduced by fscrypt feature
src/client/FSCrypt.cc:90:6: error: variable 'olen' set but not used [-Werror,-Wunused-but-set-variable]
90 | int olen = 0;
| ^
src/client/FSCrypt.cc:91:6: error: variable 'line' set but not used [-Werror,-Wunused-but-set-variable]
91 | int line = 0;
| ^
src/client/FSCrypt.cc:945:2: error: is this the way to do it? [-Werror,-W#warnings]
945 | #warning is this the way to do it?
src/client/Client.cc:11850:2: error: read holes [-Werror,-W#warnings]
11850 | #warning read holes
| ^
src/client/Client.cc:11855:2: error: implement file read here [-Werror,-W#warnings]
11855 | #warning implement file read here
| ^
src/client/Inode.cc:847:2: error: need to make sure that we do not skip entire subtree somehow [-Werror,-W#warnings]
847 | #warning need to make sure that we do not skip entire subtree somehow
| ^
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Fix warnings/errors in ceph API tests that are present in FSCrypt.cc
src/client/FSCrypt.cc:90:6: error: variable 'olen' set but not used [-Werror,-Wunused-but-set-variable]
90 | int olen = 0;
| ^
src/client/FSCrypt.cc:91:6: error: variable 'line' set but not used [-Werror,-Wunused-but-set-variable]
91 | int line = 0;
| ^
src/client/FSCrypt.cc:945:2: error: is this the way to do it? [-Werror,-W#warnings]
945 | #warning is this the way to do it?
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Add fscrypt dummy encryption to client. This will allow
for mounting a cephfs volume without providing any fscrypt
information. This will allow for more straightforward setup
for development and test suites.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Marcus Watts [Sat, 28 Jun 2025 00:56:05 +0000 (20:56 -0400)]
libcephfs: ll_set_fscrypt_policy_v2 - use in->dirstat
Better check for empty direcotry.
It turns out in->dirstat contains a count of files and subdirectories
from a directory, so all we have to do is make sure that's valid.
Rishabh Dave [Wed, 16 Jul 2025 16:04:18 +0000 (21:34 +0530)]
client: in fcopyfile(), update len to read only leftover fragment
fcopyfile() reads 1 MiB of data every time but when a fragment smaller
than 1 MiB is left, it still reads 1 MiB of data, causing to never meet
the condition of "off == size". This leads to an infinity loop which
continues to write until CephFS becomes full.
Resolves: rhbz#2379716 Fixes: https://tracker.ceph.com/issues/72238 Signed-off-by: Rishabh Dave <ridave@redhat.com>
During an fscrypt write a read may be needed to ensure changed
portion of file is merged with an existing data block. No need
to read unnecessarily when writes line up to fscrypt block and
span a whole block or more.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Removed ifdef for a failure we encountered during rebase against
case sensitive feature
-https://github.com/ceph/ceph/pull/61137#discussion_r2006324762w
Add debug dout when entering WriteEncMgr::read
-https://github.com/ceph/ceph/pull/61137#discussion_r2008140457
Add comment to various lines
-https://github.com/ceph/ceph/pull/61137#discussion_r2006301120
-https://github.com/ceph/ceph/pull/61137#discussion_r2006247613
-https://github.com/ceph/ceph/pull/61137#discussion_r2006251232
During write_success mark FILE_WR as dirty
-https://github.com/ceph/ceph/pull/61137#discussion_r2008210365
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: During fscrypt rmw (write) use correct read type
During fscrypt rmw use internal Client::_read to utilize
correct buffered or non buffered reads based on client wide
options. For example, if client_oc = false, use only
non-buffered reads in rmw.
Fixes: https://tracker.ceph.com/issues/72143 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
In fscrypt decryption code path, ensure if a data block
is hit when there are holes present in adjacent blocks,
that we exit hole traversal and continue on to decrypt the block.
Fixes: https://tracker.ceph.com/issues/71602 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client, test: Remove FS_IOC_GETFLAGS and STATX_ATTR_ENCRYPTED
Remove previous work done to support FSCrypt encrypted in
FS_IOC_GETFLAGS which changes the structure of statx ABI.
This is due to backward compatibility issues.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: Skip fscrypt_last_block if in non-fscrypt mode
Skip reading and sending fscrypt_last_block if client_fscrypt_as
is false during do_setattr. Without the key, fscrypt truncate is
not possible on fscrypt block boundary.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
When looking up the effective_size and the client_fscrypt_as
option is false show the inode size value. This will allow for
reading raw encrypted data when no key is provided.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Snapshot names are visible within the .snap directory
as dir entries. They can be created by a client that
has an fscrypt key present and also by the manager who
does not have any key. While the client with the key
can create an encrypted name the manager cannot.
Standardize functionality of these semantics to the
common of the two.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
During unwrap name, get_decrypted_fname parameters accepts
dname/b64 name and altname. If altname holds a value, this means
that a plaintext name will be built from altname. In this
case, dname/b64 name is irrelevant. In the case of empty altname,
build name from b64 name.
Fixes: https://tracker.ceph.com/issues/70995 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: Add additional case for fscrypt enabled setattr
During setattr in fscrypt case, there's two cases that happen
1. A logical size is provided and then a vector must be populated.
2. A request from setxattr is received and fscrypt_file vector
is already set.
Also rework tests when setting fscrypt_file, to use logical sizes.
Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: When creating WriteEncMgr take into account client_oc
When determining if a write is buffered or not, take into account
the client_oc config. This option allows non-buffered writes when
caps normally used in buffered writes are present.
Fixes: https://tracker.ceph.com/issues/70568 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
client: use path walk and on-the-fly enc/dec for fscrypt
The code before would encrypt/decrypt the dentry and store the result as the
dentry name. This would cause the client to have a different view of the dentry
names compared to the MDS. This created an unnecessary and complex divergence
that requires fixing the name in any code path involving the MDS.
Instead, maintain the same view as before with the MDS. The client uses the new
`Client::path_walk`, `Client::_wrap_name`, and `Client::_unwrap_name`
mechanisms to correctly change from the application's namespace (unencrypted /
case insensitve names) to the Client/MDS namespace.
The complication here is that the Client now needs to recompute the
encrypted/decrypted name for any path walk. This can and should be mitigated by
memoizing the results of the decryption/encryption. This is particularly
important as we can keep the decrypted names in a separate memory region that
is protected from core dump / trace inspection.
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>