Tim Serong [Thu, 21 Jul 2022 05:55:19 +0000 (15:55 +1000)]
cephfs-shell: move source to separate subdirectory
This ensures the package discovery done by python setuptools >= 61
doesn't get confused when building cephfs-shell and cephfs-top.
This commit moves cephfs-shell to a separate "shell" subdirectory,
which is the same approach we've already got with the cephfs-top
tool being in a separate "top" subdirectory.
Zac Dover [Tue, 30 Aug 2022 11:48:08 +0000 (21:48 +1000)]
doc/start: update documenting-ceph branch names
This PR updates the branch names in the
documenting-ceph.rst file. It gets rid of all references
to the "master" branch, and updates the language to
reflect the state of play in 2022.
inb4: This PR merely removes the most egregious inaccuracies,
the ones that were most readily evident on a cursory perusal.
The full text remains to be carefully read and fitted together
with care.
cmake: link denc-mod-rgw against Boost::filesystem
to address the runtime link failure.
this change is not cherry-picked from main branch. as, in main branch,
the Boost::filesystem linkage is pulled in by rgw_common, which was
changed to a static library in 43d10b9e44ca50700e9076a47f2c38b360d1d632.
but this change is not included in pacific. so rgw added the linkage
via rgw_libs CMake variable. unfortunately, the lexical scope of this
variable does not not include tools/ceph-dencoder/CMakeLists.txt, so
we have to add this linkage manually here.
Signed-off-by: Tim Serong <tserong@suse.com> Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Ilya Dryomov [Tue, 30 Aug 2022 09:45:44 +0000 (11:45 +0200)]
rbd-mirror: skip setting error code on snapshot replayer shutdown
This is regarding failures in unregister_remote_update_watcher() and
unregister_local_update_watcher(). handle_replay_complete() can't be
called in these cases anymore as it would blindly attempt to unregister
watchers from scratch again. Dropping handle_replay_complete() calls
there means that these failures would only be logged and would not be
surfaced by snapshot replayer. But the only caller ignores them
anyway:
void ImageReplayer<I>::shut_down(int r) {
...
// close the replayer
if (m_replayer != nullptr) {
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->destroy();
m_replayer = nullptr;
ctx->complete(0); <------
});
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->shut_down(ctx);
});
}
Ilya Dryomov [Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)]
rbd-mirror: resume pending shutdown on error in snapshot replayer
If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.
Ilya Dryomov [Sat, 27 Aug 2022 09:09:00 +0000 (11:09 +0200)]
librbd: use actual monitor addresses when creating a peer bootstrap token
Relying on mon_host config option is fragile, as the user may confuse
v1 and v2 addresses, group them incorrectly, etc. Get mon_host value
only as a fallback.
tools/ceph-dencoder: register dencoders in "lib" in dev env
if "CMakeCache.txt" is found in current directory, try to load
dencoder shared libraries from ./lib. this heuristics is used by
`ceph.in` also for relaunching itself to get access to python
bindings.
Kefu Chai [Tue, 3 Aug 2021 12:44:01 +0000 (20:44 +0800)]
tools/ceph-dencoder: register dencoders in plugin
so we can allocate and deallocate dencoders in the shared library,
instead of allocating them in the shared library, while deallocating
them in the executable.
after this change
- the plugin holds the strong references of the dencoders
- the registry holds the plugins and weak references of dencoders
- the dencoder shared libraries calls the method exposed by plugin
to alloc/dealloc the dencoders
this change should address the segfault when compiling with Clang.
FreeBSD ceph-dencoder crashes in the exit() calls, due to
invalid pointer references during the release process of
the loaded libraries.
Often this is signaled by libc reporting:
__cxa_thread_call_dtors: dtr 0x47efc0 from unloaded dso, skipping
The cause for this is different behaviour between FreeBSD and Linux:
https://groups.google.com/g/bsdmailinglist/c/22ncTZAbDp4/m/Dii_pII5AwAJ
_The FreeBSD implementation here looks racy. If one thread dlcloses an
object while another thread is exiting, we can end up calling a
function at an invalid memory address. It also looks as if it may
be possible to unload one library, load another at the same address,
and end up executing entirely the wrong code, which would have some
serious security implications.
The GNU/Linux equivalent of this function locks the DSO in memory
until all references to it have gone away. A call to dlclose() on
GNU/Linux will not actually unload the library until all threads
with destructors in that library have been unloaded. I believe
that this reuses the same reference counting mechanism that
allows the same library to be dlopened and dlclosed multiple times.
de6c8250a6d91403e6d334aeb901bf9720ba40eb added an explicit %dir directive for
a new directory added to the ceph-common package, but -- due to a typo --
neglected to include the "%". As a result, RPM builds started to fail with:
Processing files: ceph-common-17.0.0-2787.gde6c8250.el8.x86_64
error: File must begin with "/": {_libdir}/ceph/denc/
RPM build errors:
File must begin with "/": {_libdir}/ceph/denc/
2d3c6561b4ac1473a728e81c232d7dfe6fc0188c introduced a new library directory
"%{_libdir}/ceph/denc/" in ceph-common but did not explicitly state that it
should be owned by the package. This caused OBS builds to fail as follows:
[ 5515s] ceph-common-17.0.0-2786.1.x86_64.rpm: directories not owned by a package:
[ 5515s] - /usr/lib64/ceph/denc
Kefu Chai [Sat, 27 Mar 2021 16:56:39 +0000 (00:56 +0800)]
tools/ceph-dencoder: build dencoders as plugins
to reduce the memory footprint when linking ceph-dencoder.
* src/tools/ceph-dencoder:
* build dencoders as shared libraries named with the prefix of
"den-mod-". so ceph-dencoder can find them
* install dencoders into $prefix/lib/ceph/denc, so ceph-dencoder
can find them
* only expose "register_dencoders()" function from plugins.
* load plugins in specified directory
* ceph.spec.in: package plugins
* debian: package plugins
Kefu Chai [Fri, 6 Aug 2021 09:26:16 +0000 (17:26 +0800)]
cmake: fail on unknown attribute
on Clang, the option for detecting unknown attribute is
-Wunknown-attributes, so "-Wattributes -Werror" does not fail the test
when the C compiler is Clang.
in this change, we just turn all warnings into errors.
this should fail the test if the compiler does not understand
`__attribute__((__symver__ ...))`
librados/librados_c: check .symver support using cmake
the __asm__(".asmver ..") is a support provided by the compiler, so
would be better to detect it by either checking the compiler identifer
or just try it out.
in this change, instead of checking the building platform, we check this
feature using check_c_source_compiles().
in future, we could support versioned symbols using function attriubte
or symbol tables or version-script.
on platform where symbol versioning is not supported, we might need to
go with a different approach.
Boris Ranto [Tue, 3 Aug 2021 08:11:58 +0000 (10:11 +0200)]
rpm: Re-enable LTO on supported systems
We can now use LTO when building ceph. The symver issue was fixed by
using the gcc __symver__ attribute. The systems that support it can now
re-enable LTO.
Fixes: https://tracker.ceph.com/issues/40060 Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 381507a31c4740bfc75dff8f13026df89e0ccdf8)
Kefu Chai [Thu, 4 Aug 2022 13:52:43 +0000 (21:52 +0800)]
cmake,debian: install pure python module to deb_system path
in ubuntu 22.04 and debian unstable, the layout (scheme) for system
python module is named "deb_system", the default one is 'posix_local'.
and 'posix_local' installs python modules into paths like
usr/local/lib/python3.10/dist-packages/. hence dh_install fails
when it tries to find the files to be packaged under directory of
usr/lib/python3*/site-packages/.
in this change, the "deb_system" scheme is used if it is available,
and fall back to "posix_prefix" to be backward compatible with older
debian (derivative) distros.
also, update the source directories of pure python's installation
from `site-packages` to `*-packages`, to be compatible with ubuntu focal
and ubuntu jammy. as we are now using the specified scheme instead of
the default one.
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 04967404ed682835f81c1a5e51f94d09805d38b3)
Conflicts:
apply the same change to debian/python3-cephfs.install, which
was remove in main branch, but we need to preserve it in pacific.
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
Xiubo Li [Wed, 6 Apr 2022 00:12:26 +0000 (08:12 +0800)]
ceph-fuse: add dedicated snap stag map for each directory
This will fix the fino colliding bug, which is caused when the
snapid is later than 0xffff.
From mds 'mds_max_snaps_per_dir' option, we can see that the max
snapshots for each directory is 4_K, and in ceph-fuse we have
around 64_K, which is from 0xffff - 2, stags could be used to make
the fake fuse inode numbers for each directory.
Xiubo Li [Thu, 24 Mar 2022 02:01:57 +0000 (10:01 +0800)]
ceph-fuse: return EINVAL if get invalid fino instead of assert
All the snap ids of the finos returned to libfuse from libcephfs
will be recorded in the map of 'stag_snap_map', and will never be
erased before unmounting. So if libfuse passes invalid fino the
ceph-fuse should return EINVAL errno instead of crash itself.
Xiubo Li [Wed, 23 Mar 2022 02:05:32 +0000 (10:05 +0800)]
mds-client: make the fake inos option unchangeable in runtime
If the flags is empty then in option.h in can_update_at_runtime()
it will return true. That means this opetion could be changed in
runtime, which is buggy. Because if this is false, ceph-fuse will
use its own fake inos instead of libcephfs'. If this is changed
during runtime, we will hit inos dosn't exist assert bugs.
Adam King [Wed, 24 Aug 2022 14:36:53 +0000 (10:36 -0400)]
doc/cephadm: fix example for specifying networks for rgw
count_per_host must be used with underscores rather
than dashes to work, you need to pass service_id not
service_name and the option for the port is called
rgw_frontend_port not just "port"
Zac Dover [Tue, 23 Aug 2022 06:59:04 +0000 (16:59 +1000)]
doc/mgr: edit orchestrator.rst
This PR improves the English language in the "Orchestrator CLI"
section of the MGR documentation. It adds a couple of section
headers in order to signpost the information in the document
a bit more than had already been done, but it makes no major
structural changes to the presentation of the information here.
This PR was motivated by feedback from the 2022 Ceph User Survey
in which one of the respondents wrote "better ceph orch documen-
tation".
The final section on this page, "Current Implementation Status",
must be verified by someone who is familiar with the current state
of "ceph orch" and a date stamp should be applied to the top of
the section so that the word "current" has a meaningful referent.
Kefu Chai [Mon, 8 Aug 2022 14:41:17 +0000 (22:41 +0800)]
pybind/mgr/dashboard: do not use distutils.version.StrictVersion
replace `distutils.version.StrictVersion` with
`pkg_resources.parse_version()`
as the former is deprecated, see https://peps.python.org/pep-0632/.
let's use `pkg_resources` instead. this change also addresses
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010894.
we have this issue when testing with an ubuntu jammy test node.
see https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139
Conflicts:
debian/ceph-mgr-dashboard.requires: add the runtime requirement
to debian/control instead.
src/mypy-constrains.txt: not backported (too new for pacific)
src/pybind/mgr/dashboard/requirements.txt: trivial resolution
src/pybind/mgr/tox.ini: not backported (too new for pacific)
as we don't have "jsonpatch" installed for running "make check", "rook"
module always fails to load, and the error message when mgr is
unable to load is misleading and distracting. let's just disabled it.
Kefu Chai [Fri, 12 Aug 2022 05:06:25 +0000 (13:06 +0800)]
mgr/dashboard: bump up more-itertools
before this change, more-itertools tries to import Sequence from
collections, this leads us to failures like:
```
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
__import__(pkg_name)
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/__init__.py",
line 9, in <module>
import cherrypy
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/cherrypy/__init__.py",
line 76, in <module>
from . import _cprequest, _cpserver, _cptree, _cplogging, _cpconfig
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/cherrypy/_cprequest.py",
line 11, in <module>
from cherrypy import _cpreqbody
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/cherrypy/_cpreqbody.py",
line 135, in <module>
import cheroot.server
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/cheroot/server.py",
line 96, in <module>
from .workers import threadpool
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/cheroot/workers/threadpool.py",
line 20, in <module>
from jaraco.functools import pass_none
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/jaraco/functools.py",
line 8, in <module>
import more_itertools
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/more_itertools/__init__.py",
line 1, in <module>
from more_itertools.more import * # noqa
File
"/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/lib/python3.10/site-packages/more_itertools/more.py",
line 3, in <module>
from collections import Counter, defaultdict, deque, Sequence
ImportError: cannot import name 'Sequence' from 'collections'
(/usr/lib/python3.10/collections/__init__.py)
ERROR: InvocationError for command
/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/bin/python3
-m dashboard.controllers.docs
/home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr/dashboard/.tox/openapi-check/tmp/openapi.yaml
(exited with code 1)
```
after this change, more-itertools is pin'ed at the latest stable
at the time of writing, which includes the fixes including
https://github.com/more-itertools/more-itertools/commit/30a861bc5a4f53a9ba73923c9048a3632a0f9d18
.
please note, more-itertools dropped python3.3 support. but neither
do us support this python version, so we should be safe.
Patrick Donnelly [Thu, 21 Jul 2022 20:34:05 +0000 (16:34 -0400)]
mon/MDSMonitor: fix check for standby reversion
A standby-replay daemon always has a rank, so this check is completely
wrong. If a beacon from a standby-replay daemon reaches
MDSMonitor::prepare_beacon, it will always be evicted/removed by the
mons. This is rare (usually a reply occurs directly from
MDSMonitor::preprocess_beacon) but can happen in certain circumstances,
like a health warning included in the beacon.
Fixes: https://tracker.ceph.com/issues/56666 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit b2c40cc07faa3751e28e946bc10d9700621db140)
Kefu Chai [Fri, 25 Mar 2022 15:06:46 +0000 (23:06 +0800)]
cmake/modules: avoid using distutils
to address following warning from python 3.9:
<string>:1: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
<string>:1: DeprecationWarning: The distutils.sysconfig module is deprecated, use sysconfig instead
Kefu Chai [Thu, 26 Aug 2021 15:57:00 +0000 (23:57 +0800)]
cmake: use upstream repo for fio
this change partially reverts 10baab3fc8293b8c30ca90a4acd76f70d011f1b5,
but since the fix for C++ build is not included by any tag or branche so
far. let's just use the sha1 for now.
Kefu Chai [Sat, 14 Aug 2021 01:47:36 +0000 (09:47 +0800)]
cmake/modules/BuildFIO: use ceph fork
to pick up the clang build fix. we should use the upstream repo, once the
clang build fix gets merged.
bumping up the fio helps to address following error, as this part was rewritten:
src/test/fio/CMakeFiles/fio_ceph_objectstore.dir/fio_ceph_objectstore.cc.o.d -o src/test/fio/CMakeFiles/fio_ceph_objectstore.dir/fio_ceph_objectstore.cc.o -c ../src/test/fio/fio_ceph_objectstore.cc
In file included from ../src/test/fio/fio_ceph_objectstore.cc:26:
In file included from src/fio/fio.h:18:
In file included from src/fio/thread_options.h:6:
In file included from src/fio/options.h:8:
src/fio/parse.h:128:13: error: arithmetic on a pointer to void
return ret + offset;
~~~ ^
1 error generated.