Kotresh HR [Tue, 16 Aug 2022 11:41:33 +0000 (17:11 +0530)]
mgr/volumes: Remove stale snapshot user metadata
This patch adds the capability to remove the stale snapshot user
metadata while loading the subvolume if it is present. It can't
be done in 'SubvolumeBase.discover' since v1 and v2 snapshot paths
are different. This is done just after the discover before returning
the specific version object.
mgr/volumes: Allow forceful snapshot removal on osd full
When the osd is full, if the snapshot has metadata set, it
can't be removed as user metadata can't be removed when osd
is full. This patch provides a way to remove the snapshot
with 'force' option while keeping the corresponding metadata
which gets removed on subvolume discover when it finds space.
Kotresh HR [Tue, 16 Aug 2022 11:38:16 +0000 (17:08 +0530)]
mgr/volumes: Better handle config file on osd full scenario
The 'metadata_mgr.flush()' used to truncate the config file
before flushing the new config data. This could lead to an
empty config file when there is no space to write new config
data. This patch handles this scenario by writing it to
temporary file and rename it to config file. This would
retain the config file without truncating it.
Also, there are bunch of places which wasn't handling
'MetadataMgrException' because of this. Fixed those.
John Mulligan [Tue, 2 Aug 2022 13:45:59 +0000 (09:45 -0400)]
mgr/volumes: drop pre-python 3.2 version checks
Based on other conversations we believe that there is no need to support
python versions lower than Python 3.6 for pacific and later. This means
it is safe to drop the remaining version checks for python
3.2.
John Mulligan [Mon, 11 Jul 2022 20:44:00 +0000 (16:44 -0400)]
mgr/volumes: a lock to guard against races reading/writing config
Fixes: https://tracker.ceph.com/issues/55583
Use a python threading lock to avoid race conditions where the
config file is being both read and written to at the same time.
Before this change, the content of the config file being parsed could be
'corrupted' by the MetadataManager racing with itself. Along with the
previous two patches, additional logging was added to the mgr code to
produce the simplified version of the mgr log below:
```
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/c9f773af-5221-49c6-846c-d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'[GLOBAL]\nversion = 2\ntype = clone\npath = /volumes/Park/babydino2/c9f773af-5221-49c6-846c-d65c0920ae3f\nstate = pending\n\n[source]\nvolume = cephfs\ngroup = Park\nsubvolume = Jurrasic\nsnapshot = dinodna0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] wrote 203 bytes to config b'/volumes/Park/babydino2/.meta'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b'a0\n\n'
[volumes INFO volumes.fs.operations.versions.metadata_manager] READ: b''
[volumes ERROR volumes.module] Failed _cmd_fs_clone_cancel(clone_name:babydino2, format:json, group_name:Park, prefix:fs clone cancel, vol_name:cephfs) < "":
Traceback (most recent call last):
...
File "/usr/lib64/python3.6/configparser.py", line 1111, in _read
raise e
configparser.ParsingError: Source contains parsing errors: b'/volumes/Park/babydino2/.meta'
[line 13]: 'a0\n'
```
Looking at the above you can see that the log indicates a write to the
config file (of 203 bytes). This happens before the file has finished
reading and thus instead of getting an empty string indicating EOF, it
gets that last four bytes of the new content of the file. The lock
prevents the MetadataManager from both reading and writing the config
file at the same time.
John Mulligan [Tue, 12 Jul 2022 22:33:07 +0000 (18:33 -0400)]
mgr/volumes: write volume metadata with shim class
Add a class that works a bit like a python file object so that we
can simplify the flush function. Providing a file-like object to
the ConfigParser's write function avoids unnecessary copies to
a StringIO object and makes the code easier to read.
With no more uses of StringIO, the StringIO imports are removed.
John Mulligan [Tue, 12 Jul 2022 22:32:54 +0000 (18:32 -0400)]
mgr/volumes: read volume metadata file using read_string
The read_string method, available in Python 3.2 (we assume Python 3.6 as
our current minimum python versino), supports parsing a provided string
for ini-style configuration parameters. Refactoring the reading of the
config file from cephfs into a simple iterator function and then
providing it to the ConfigParser as a single string, allows us to avoid
using StringIO and simplifies the refresh function.
If any clone is in pending or in-progress state then
show these clones in 'fs subvolume snapshot info'
command output. This field only exists if clones are
in pending or in progress state.
Zac Dover [Fri, 12 Aug 2022 21:53:21 +0000 (07:53 +1000)]
doc/rados: add prompts to pools.rst
This commit adds ".. prompt:: bash $"-style prompts to pools.rst.
This brings this file up to the standard established in 2020 when
Kefu added support for the ".. prompt::" directive.
This commit is a part of an initiative to modernize the presentation
of all BASH commands in the RADOS documentation.
The progress of this project can be tracked here:
https://tracker.ceph.com/issues/57108
test/{librbd, rgw}: increase delay between and number of bind attempts
Commit aa7885f7cc41 ("test/{librbd, rgw}: retry when bind fail with
port 0") reduced the frequency of sporadic unit test failures caused
by EADDRINUSE a lot, but not entirely.
Currently, it yields a cumulative sleep of ~9 seconds. Let's increase
that to 1 minute.
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
qa: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
mgr/volumes: filter internal directories in 'subvolumegroup ls' command
Internal directories: '_nogroup', '_index', '_legacy', '_deleting'
1. Internal directories should be filtered in 'subvolmegroup ls' command.
2. Internal directories should not be accepted as a group name.
Tim Serong [Thu, 21 Jul 2022 05:55:19 +0000 (15:55 +1000)]
cephfs-shell: move source to separate subdirectory
This ensures the package discovery done by python setuptools >= 61
doesn't get confused when building cephfs-shell and cephfs-top.
This commit moves cephfs-shell to a separate "shell" subdirectory,
which is the same approach we've already got with the cephfs-top
tool being in a separate "top" subdirectory.
Tamar Shacked [Sun, 15 May 2022 08:39:22 +0000 (11:39 +0300)]
client: allow overwrites to files with size greater than the max_file_size cfg
Before this change, overwriting from file-offset >= max_file_size config
returns "File too large" (even though the data is being written)
This change allow overwrites as the file size is not further increasing.
Zac Dover [Tue, 30 Aug 2022 11:48:08 +0000 (21:48 +1000)]
doc/start: update documenting-ceph branch names
This PR updates the branch names in the
documenting-ceph.rst file. It gets rid of all references
to the "master" branch, and updates the language to
reflect the state of play in 2022.
inb4: This PR merely removes the most egregious inaccuracies,
the ones that were most readily evident on a cursory perusal.
The full text remains to be carefully read and fitted together
with care.
Lucian Petrut [Fri, 26 Aug 2022 12:54:10 +0000 (12:54 +0000)]
include: fix IS_ERR on Windows
The "long" type uses 32b on x64 Windows platforms, which means
it's not large enough to store a pointer. intptr_t or uintptr_t
should be used instead.
This change fixes include/err.h, using the right types. There was
a previous patch on this topic but unfortunately it didn't address
all the type casts.
This issue was brought up by the unittest_crush test, which recently
started to fail as the CrushWrapper methods use IS_ERR.
cmake: link denc-mod-rgw against Boost::filesystem
to address the runtime link failure.
this change is not cherry-picked from main branch. as, in main branch,
the Boost::filesystem linkage is pulled in by rgw_common, which was
changed to a static library in 43d10b9e44ca50700e9076a47f2c38b360d1d632.
but this change is not included in pacific. so rgw added the linkage
via rgw_libs CMake variable. unfortunately, the lexical scope of this
variable does not not include tools/ceph-dencoder/CMakeLists.txt, so
we have to add this linkage manually here.
Signed-off-by: Tim Serong <tserong@suse.com> Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Ilya Dryomov [Tue, 30 Aug 2022 09:45:44 +0000 (11:45 +0200)]
rbd-mirror: skip setting error code on snapshot replayer shutdown
This is regarding failures in unregister_remote_update_watcher() and
unregister_local_update_watcher(). handle_replay_complete() can't be
called in these cases anymore as it would blindly attempt to unregister
watchers from scratch again. Dropping handle_replay_complete() calls
there means that these failures would only be logged and would not be
surfaced by snapshot replayer. But the only caller ignores them
anyway:
void ImageReplayer<I>::shut_down(int r) {
...
// close the replayer
if (m_replayer != nullptr) {
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->destroy();
m_replayer = nullptr;
ctx->complete(0); <------
});
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->shut_down(ctx);
});
}
Ilya Dryomov [Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)]
rbd-mirror: resume pending shutdown on error in snapshot replayer
If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.
Ilya Dryomov [Sat, 27 Aug 2022 09:09:00 +0000 (11:09 +0200)]
librbd: use actual monitor addresses when creating a peer bootstrap token
Relying on mon_host config option is fragile, as the user may confuse
v1 and v2 addresses, group them incorrectly, etc. Get mon_host value
only as a fallback.
tools/ceph-dencoder: register dencoders in "lib" in dev env
if "CMakeCache.txt" is found in current directory, try to load
dencoder shared libraries from ./lib. this heuristics is used by
`ceph.in` also for relaunching itself to get access to python
bindings.
Kefu Chai [Tue, 3 Aug 2021 12:44:01 +0000 (20:44 +0800)]
tools/ceph-dencoder: register dencoders in plugin
so we can allocate and deallocate dencoders in the shared library,
instead of allocating them in the shared library, while deallocating
them in the executable.
after this change
- the plugin holds the strong references of the dencoders
- the registry holds the plugins and weak references of dencoders
- the dencoder shared libraries calls the method exposed by plugin
to alloc/dealloc the dencoders
this change should address the segfault when compiling with Clang.
FreeBSD ceph-dencoder crashes in the exit() calls, due to
invalid pointer references during the release process of
the loaded libraries.
Often this is signaled by libc reporting:
__cxa_thread_call_dtors: dtr 0x47efc0 from unloaded dso, skipping
The cause for this is different behaviour between FreeBSD and Linux:
https://groups.google.com/g/bsdmailinglist/c/22ncTZAbDp4/m/Dii_pII5AwAJ
_The FreeBSD implementation here looks racy. If one thread dlcloses an
object while another thread is exiting, we can end up calling a
function at an invalid memory address. It also looks as if it may
be possible to unload one library, load another at the same address,
and end up executing entirely the wrong code, which would have some
serious security implications.
The GNU/Linux equivalent of this function locks the DSO in memory
until all references to it have gone away. A call to dlclose() on
GNU/Linux will not actually unload the library until all threads
with destructors in that library have been unloaded. I believe
that this reuses the same reference counting mechanism that
allows the same library to be dlopened and dlclosed multiple times.
de6c8250a6d91403e6d334aeb901bf9720ba40eb added an explicit %dir directive for
a new directory added to the ceph-common package, but -- due to a typo --
neglected to include the "%". As a result, RPM builds started to fail with:
Processing files: ceph-common-17.0.0-2787.gde6c8250.el8.x86_64
error: File must begin with "/": {_libdir}/ceph/denc/
RPM build errors:
File must begin with "/": {_libdir}/ceph/denc/
2d3c6561b4ac1473a728e81c232d7dfe6fc0188c introduced a new library directory
"%{_libdir}/ceph/denc/" in ceph-common but did not explicitly state that it
should be owned by the package. This caused OBS builds to fail as follows:
[ 5515s] ceph-common-17.0.0-2786.1.x86_64.rpm: directories not owned by a package:
[ 5515s] - /usr/lib64/ceph/denc
Kefu Chai [Sat, 27 Mar 2021 16:56:39 +0000 (00:56 +0800)]
tools/ceph-dencoder: build dencoders as plugins
to reduce the memory footprint when linking ceph-dencoder.
* src/tools/ceph-dencoder:
* build dencoders as shared libraries named with the prefix of
"den-mod-". so ceph-dencoder can find them
* install dencoders into $prefix/lib/ceph/denc, so ceph-dencoder
can find them
* only expose "register_dencoders()" function from plugins.
* load plugins in specified directory
* ceph.spec.in: package plugins
* debian: package plugins
qa: Update the qa tests to be compatible with the new structure of 'perf stats' o/p.
test_client_metrics_and_metadataand other tests has been
updated as earlier it was checking according to the old structure
of perf stats o/p, which has been changed in this PR.
Conflicts:
qa/tasks/cephfs/test_mds_metrics.py
Resolved cherry-pick conflicts.
Edited the cherry-picked commit to drop the redundant version of 'test_client_metrics_and_metadata'.
Kefu Chai [Fri, 6 Aug 2021 09:26:16 +0000 (17:26 +0800)]
cmake: fail on unknown attribute
on Clang, the option for detecting unknown attribute is
-Wunknown-attributes, so "-Wattributes -Werror" does not fail the test
when the C compiler is Clang.
in this change, we just turn all warnings into errors.
this should fail the test if the compiler does not understand
`__attribute__((__symver__ ...))`
librados/librados_c: check .symver support using cmake
the __asm__(".asmver ..") is a support provided by the compiler, so
would be better to detect it by either checking the compiler identifer
or just try it out.
in this change, instead of checking the building platform, we check this
feature using check_c_source_compiles().
in future, we could support versioned symbols using function attriubte
or symbol tables or version-script.
on platform where symbol versioning is not supported, we might need to
go with a different approach.
Boris Ranto [Tue, 3 Aug 2021 08:11:58 +0000 (10:11 +0200)]
rpm: Re-enable LTO on supported systems
We can now use LTO when building ceph. The symver issue was fixed by
using the gcc __symver__ attribute. The systems that support it can now
re-enable LTO.
Fixes: https://tracker.ceph.com/issues/40060 Signed-off-by: Boris Ranto <branto@redhat.com>
(cherry picked from commit 381507a31c4740bfc75dff8f13026df89e0ccdf8)