Zac Dover [Fri, 12 Aug 2022 21:53:21 +0000 (07:53 +1000)]
doc/rados: add prompts to pools.rst
This commit adds ".. prompt:: bash $"-style prompts to pools.rst.
This brings this file up to the standard established in 2020 when
Kefu added support for the ".. prompt::" directive.
This commit is a part of an initiative to modernize the presentation
of all BASH commands in the RADOS documentation.
The progress of this project can be tracked here:
https://tracker.ceph.com/issues/57108
test/{librbd, rgw}: increase delay between and number of bind attempts
Commit aa7885f7cc41 ("test/{librbd, rgw}: retry when bind fail with
port 0") reduced the frequency of sporadic unit test failures caused
by EADDRINUSE a lot, but not entirely.
Currently, it yields a cumulative sleep of ~9 seconds. Let's increase
that to 1 minute.
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
qa: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
mgr/volumes: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
Tim Serong [Thu, 21 Jul 2022 05:55:19 +0000 (15:55 +1000)]
cephfs-shell: move source to separate subdirectory
This ensures the package discovery done by python setuptools >= 61
doesn't get confused when building cephfs-shell and cephfs-top.
This commit moves cephfs-shell to a separate "shell" subdirectory,
which is the same approach we've already got with the cephfs-top
tool being in a separate "top" subdirectory.
Tamar Shacked [Sun, 15 May 2022 08:39:22 +0000 (11:39 +0300)]
client: allow overwrites to files with size greater than the max_file_size cfg
Before this change, overwriting from file-offset >= max_file_size config
returns "File too large" (even though the data is being written)
This change allow overwrites as the file size is not further increasing.
Zac Dover [Tue, 30 Aug 2022 11:48:08 +0000 (21:48 +1000)]
doc/start: update documenting-ceph branch names
This PR updates the branch names in the
documenting-ceph.rst file. It gets rid of all references
to the "master" branch, and updates the language to
reflect the state of play in 2022.
inb4: This PR merely removes the most egregious inaccuracies,
the ones that were most readily evident on a cursory perusal.
The full text remains to be carefully read and fitted together
with care.
Lucian Petrut [Fri, 26 Aug 2022 12:54:10 +0000 (12:54 +0000)]
include: fix IS_ERR on Windows
The "long" type uses 32b on x64 Windows platforms, which means
it's not large enough to store a pointer. intptr_t or uintptr_t
should be used instead.
This change fixes include/err.h, using the right types. There was
a previous patch on this topic but unfortunately it didn't address
all the type casts.
This issue was brought up by the unittest_crush test, which recently
started to fail as the CrushWrapper methods use IS_ERR.
cmake: link denc-mod-rgw against Boost::filesystem
to address the runtime link failure.
this change is not cherry-picked from main branch. as, in main branch,
the Boost::filesystem linkage is pulled in by rgw_common, which was
changed to a static library in 43d10b9e44ca50700e9076a47f2c38b360d1d632.
but this change is not included in pacific. so rgw added the linkage
via rgw_libs CMake variable. unfortunately, the lexical scope of this
variable does not not include tools/ceph-dencoder/CMakeLists.txt, so
we have to add this linkage manually here.
Signed-off-by: Tim Serong <tserong@suse.com> Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Ilya Dryomov [Tue, 30 Aug 2022 09:45:44 +0000 (11:45 +0200)]
rbd-mirror: skip setting error code on snapshot replayer shutdown
This is regarding failures in unregister_remote_update_watcher() and
unregister_local_update_watcher(). handle_replay_complete() can't be
called in these cases anymore as it would blindly attempt to unregister
watchers from scratch again. Dropping handle_replay_complete() calls
there means that these failures would only be logged and would not be
surfaced by snapshot replayer. But the only caller ignores them
anyway:
void ImageReplayer<I>::shut_down(int r) {
...
// close the replayer
if (m_replayer != nullptr) {
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->destroy();
m_replayer = nullptr;
ctx->complete(0); <------
});
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->shut_down(ctx);
});
}
Ilya Dryomov [Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)]
rbd-mirror: resume pending shutdown on error in snapshot replayer
If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.
Ilya Dryomov [Sat, 27 Aug 2022 09:09:00 +0000 (11:09 +0200)]
librbd: use actual monitor addresses when creating a peer bootstrap token
Relying on mon_host config option is fragile, as the user may confuse
v1 and v2 addresses, group them incorrectly, etc. Get mon_host value
only as a fallback.
tools/ceph-dencoder: register dencoders in "lib" in dev env
if "CMakeCache.txt" is found in current directory, try to load
dencoder shared libraries from ./lib. this heuristics is used by
`ceph.in` also for relaunching itself to get access to python
bindings.
Kefu Chai [Tue, 3 Aug 2021 12:44:01 +0000 (20:44 +0800)]
tools/ceph-dencoder: register dencoders in plugin
so we can allocate and deallocate dencoders in the shared library,
instead of allocating them in the shared library, while deallocating
them in the executable.
after this change
- the plugin holds the strong references of the dencoders
- the registry holds the plugins and weak references of dencoders
- the dencoder shared libraries calls the method exposed by plugin
to alloc/dealloc the dencoders
this change should address the segfault when compiling with Clang.
FreeBSD ceph-dencoder crashes in the exit() calls, due to
invalid pointer references during the release process of
the loaded libraries.
Often this is signaled by libc reporting:
__cxa_thread_call_dtors: dtr 0x47efc0 from unloaded dso, skipping
The cause for this is different behaviour between FreeBSD and Linux:
https://groups.google.com/g/bsdmailinglist/c/22ncTZAbDp4/m/Dii_pII5AwAJ
_The FreeBSD implementation here looks racy. If one thread dlcloses an
object while another thread is exiting, we can end up calling a
function at an invalid memory address. It also looks as if it may
be possible to unload one library, load another at the same address,
and end up executing entirely the wrong code, which would have some
serious security implications.
The GNU/Linux equivalent of this function locks the DSO in memory
until all references to it have gone away. A call to dlclose() on
GNU/Linux will not actually unload the library until all threads
with destructors in that library have been unloaded. I believe
that this reuses the same reference counting mechanism that
allows the same library to be dlopened and dlclosed multiple times.
de6c8250a6d91403e6d334aeb901bf9720ba40eb added an explicit %dir directive for
a new directory added to the ceph-common package, but -- due to a typo --
neglected to include the "%". As a result, RPM builds started to fail with:
Processing files: ceph-common-17.0.0-2787.gde6c8250.el8.x86_64
error: File must begin with "/": {_libdir}/ceph/denc/
RPM build errors:
File must begin with "/": {_libdir}/ceph/denc/
2d3c6561b4ac1473a728e81c232d7dfe6fc0188c introduced a new library directory
"%{_libdir}/ceph/denc/" in ceph-common but did not explicitly state that it
should be owned by the package. This caused OBS builds to fail as follows:
[ 5515s] ceph-common-17.0.0-2786.1.x86_64.rpm: directories not owned by a package:
[ 5515s] - /usr/lib64/ceph/denc
Kefu Chai [Sat, 27 Mar 2021 16:56:39 +0000 (00:56 +0800)]
tools/ceph-dencoder: build dencoders as plugins
to reduce the memory footprint when linking ceph-dencoder.
* src/tools/ceph-dencoder:
* build dencoders as shared libraries named with the prefix of
"den-mod-". so ceph-dencoder can find them
* install dencoders into $prefix/lib/ceph/denc, so ceph-dencoder
can find them
* only expose "register_dencoders()" function from plugins.
* load plugins in specified directory
* ceph.spec.in: package plugins
* debian: package plugins
qa: Update the qa tests to be compatible with the new structure of 'perf stats' o/p.
test_client_metrics_and_metadataand other tests has been
updated as earlier it was checking according to the old structure
of perf stats o/p, which has been changed in this PR.
Conflicts:
qa/tasks/cephfs/test_mds_metrics.py
Resolved cherry-pick conflicts.
Edited the cherry-picked commit to drop the redundant version of 'test_client_metrics_and_metadata'.
Kefu Chai [Fri, 6 Aug 2021 09:26:16 +0000 (17:26 +0800)]
cmake: fail on unknown attribute
on Clang, the option for detecting unknown attribute is
-Wunknown-attributes, so "-Wattributes -Werror" does not fail the test
when the C compiler is Clang.
in this change, we just turn all warnings into errors.
this should fail the test if the compiler does not understand
`__attribute__((__symver__ ...))`
librados/librados_c: check .symver support using cmake
the __asm__(".asmver ..") is a support provided by the compiler, so
would be better to detect it by either checking the compiler identifer
or just try it out.
in this change, instead of checking the building platform, we check this
feature using check_c_source_compiles().
in future, we could support versioned symbols using function attriubte
or symbol tables or version-script.
on platform where symbol versioning is not supported, we might need to
go with a different approach.
Boris Ranto [Tue, 3 Aug 2021 08:11:58 +0000 (10:11 +0200)]
rpm: Re-enable LTO on supported systems
We can now use LTO when building ceph. The symver issue was fixed by
using the gcc __symver__ attribute. The systems that support it can now
re-enable LTO.
Fixes: https://tracker.ceph.com/issues/40060 Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 381507a31c4740bfc75dff8f13026df89e0ccdf8)
Kefu Chai [Thu, 4 Aug 2022 13:52:43 +0000 (21:52 +0800)]
cmake,debian: install pure python module to deb_system path
in ubuntu 22.04 and debian unstable, the layout (scheme) for system
python module is named "deb_system", the default one is 'posix_local'.
and 'posix_local' installs python modules into paths like
usr/local/lib/python3.10/dist-packages/. hence dh_install fails
when it tries to find the files to be packaged under directory of
usr/lib/python3*/site-packages/.
in this change, the "deb_system" scheme is used if it is available,
and fall back to "posix_prefix" to be backward compatible with older
debian (derivative) distros.
also, update the source directories of pure python's installation
from `site-packages` to `*-packages`, to be compatible with ubuntu focal
and ubuntu jammy. as we are now using the specified scheme instead of
the default one.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 04967404ed682835f81c1a5e51f94d09805d38b3)
Conflicts:
apply the same change to debian/python3-cephfs.install, which
was remove in main branch, but we need to preserve it in pacific.
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
*Adding fs_name as a field option in perf_stats o/p
*perf stats command incorrect output with non-existing mds_rank filter
If `ceph fs perf stats` runs with non-existing mds_rank filter,
even then it shows all the clients `client_metadata` and `global_metrics`
Xiubo Li [Wed, 1 Jun 2022 02:32:58 +0000 (10:32 +0800)]
mds: notify the xattr_version to replica MDSes
When one client changes a xattr's value in the auth MDS, when replying
to the client the MDS possibly will drop the increased xattr_version
and new value in the reply message if no 'Xs' caps will be issued to
the client together.
And when the client wants to get this xattr's value, and if it sends
the request to a replicated MDS, since the replicated MDS still has
the old value of the xattr_version, and then the client will drop the
xattr value since xattr_version is not changed.
We need to notify the xattr_version to the replicated MDSes together
with the xattrs when notifying the lock state.
Xiubo Li [Thu, 24 Feb 2022 07:56:54 +0000 (15:56 +0800)]
mds/MDLog: use committed seq instead of committing seq
Since commit 9242ce90130 it won't allow multiple open file table
commits to be submit and will be worked sequentially. So whenever
is_any_committing() is false the committing seq will always equal
to committed seq.