Ilya Dryomov [Fri, 20 May 2022 12:05:03 +0000 (14:05 +0200)]
qa/suites/rbd: disable workunit timeout for dynamic_features_no_cache
The I/O workload in this test is xfstests (qa/run_xfstests_qemu.sh)
which isn't subjected to any timeout other than global max_job_time
limit in any other subsuite (e.g. qemu/workloads/qemu_xfstests.yaml).
But here, there is a parallel "op" workload defined as a workunit.
The workunit task has a default timeout of 3 hours which is effectively
imposed on the entire job. In the "rbd cache = false" configuration,
it's sometimes exceeded.
It doesn't really thrash anything, just repeatedly restarts the
workload on top of a dirty cache file. rbd_pwl_cache_recovery is
more on point and gets covered by existing CODEOWNERS.
Currently, the following transaction exec sequence would lead to
loss of backref:
1. Trans `A` merge a alloc backref for extent `X`
2. Trans `B` add a release backref for extent `X` to backref cache,
during which it finds an in-cache alloc backref for extent `X` and
decide not to add the release backref to cache
3. Trans `A` commit
In the above sequece, the release backref for extent `X` is lost.
This is a regression introduced when we try to optimize the backref cache.
This commit fix the issue by caching inflight backrefs in a multiset,
alloc/release ops that happen on the same paddr are queued in the order of
their happening. When doing gc, all those backrefs are merged.
Tatjana Dehler [Thu, 7 Jul 2022 15:21:14 +0000 (17:21 +0200)]
mgr/dashboard: prevent alert redirect
Prevent Alertmanager alerts from being redirected to the active mgr
dashboard instance. There are two reasons for it:
1. It doesn't bring any additional benefit. The Alertmanager config
includes all available mgr instances - active and passive ones. In
case of an alert, it will be sent to all of them. It ensures that
the active mgr dashboard will receive the alert in any case.
2. The redirect URL includes the mgr IP and NOT the FQDN. This leads
to issues in environments where an SSL certificate is configured and
matches the FQDNs, only.
Fixes: https://tracker.ceph.com/issues/56401 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Yin Congmin [Fri, 7 Jan 2022 07:03:44 +0000 (15:03 +0800)]
qa/tasks: add thrash test for persistent write log cache
add thrash test for persistent write log cache. run rbd bench
on persistent write log cache, thrashes rbd bench, test the
recovery function of persistent write log cache.
crimson/os/seastore/cache: fine-grained lru cache control with GC
GC transaction is not sourced by user behaviors, so the extent read
operations from GC transaction don’t satisfy the time locality
principle. These extents should not be added to LRU cache.
Image contexts are reopen even though we pass the context as an
argument. This commit changes that so you can forget about reopening
a rbd image context again.
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
mgr/dashboard: add rbd list search and disable sorting
- Disable sorting in each column because it will not be possible to
sort with this pagination implementation.
- Add search capabilities to the rbd list pagination endpoint.
Signed-off-by: Pere Diaz Bou <pdiazbou@redhat.com>
Yin Congmin [Fri, 24 Jun 2022 16:11:18 +0000 (00:11 +0800)]
cmake: add ndctl and daxctl to build pmdk
In order to enable the pmem character device, add ndctl=y to the parameter
of compiling the pmdk library when WITH_BLUESTORE_PMEM is ON. Because
after find_ndctl and find_daxctl are added, the dependency packages
required by WITH_BLUESTORE_PMEM and WITH_RBD_RWL become more different.
So separated these two. libpmem has no version required. libpmemobj
required version >=1.8.
Yin Congmin [Fri, 13 May 2022 12:47:07 +0000 (20:47 +0800)]
cmake: add findndctl and finddaxctl function
In order to support the character device of pmem usage in bluestore via
libpmem built by Ceph itself, we need to enable daxctl and ndctl
dependency. add the installation of ndctl and find it. the version of
ndctl and daxctl library requires >63. "apt-get install" meet the version
under ubuntu focal.
the installation of ndctl-devel in ceph.spec.in has not been verified.
This PR adds a section to the Developer Guide chapter
"Essentials" that explains what Dependabot is. This
section is adapted from an email from Ernesto Puerta
to the CLT that was sent on 08 Jul 2022.
luo rixin [Mon, 11 Jul 2022 02:28:58 +0000 (10:28 +0800)]
tools/crimson/perf_crimson_msgr: init ConfigPorxy when pref_crimson_msgr start running
While constructing socket connections, there are some config items needed(
`ceph::msgr::v2::FrameAssembler` need the value of `common::local_conf()->ms_crc_data`,
Access `common::local_conf()->ms_crc_data` will cause SIGSEGV without the ConfigProxy init).
So adding ConfigPorxy init before pref_crimson_msgr start running.
Fixes: https://tracker.ceph.com/issues/56520 Signed-off-by: luo rixin <luorixin@huawei.com>
TransactionManager::get_extents_if_live should return a list of
extents that are located in range paddr~len. When SegmentCleaner
invokes get_extents_if_live, the target extent may have been split into
multiple pieces by other transaction, so only search the paddr as key
will lose other pieces need to be rewritten.
Signed-off-by: Zhang Song <zhangsong325@gmail.com>
This check was added in commit ecd3778a6f9a ("rbd-mirror: ensure that
the last non-primary snapshot cannot be pruned") as an additional
safeguard against pruning an incomplete non-primary snapshot in case
there is no predecessor mirror snapshot. However it still fires if the
predecessor is there but happens to be a primary demotion snapshot.
A bogus "incomplete local non-primary snapshot" error is reported and
the replayer gets stuck.
Remove completed_non_primary_snapshots_exist tracking as the presence
of the predecessor in the incomplete non-primary snapshot pruning arm
is already ensured by "m_local_snap_id_start > 0" condition.
Incomplete non-primary snapshot handling is bifurcated depending
on whether any data objects have been copied. If no data objects
have been copied, an incomplete non-primary snapshot is assumed to
be malformed and gets pruned; the sync is restarted from scratch.
cmake: link librados applications against ceph-common
to address link failures like:
[100%] Linking CXX executable ../../../bin/unittest_global_doublefree
/opt/rh/gcc-toolset-12/root/usr/bin/ld: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++_nonshared.a(sstream-inst80.o): undefined reference to symbol '_ZTVNSt7__cxx1119basic_ostringstreamIwSt11char_traitsIwESaIwEEE@@GLIBCXX_3.4.21'
/opt/rh/gcc-toolset-12/root/usr/bin/ld: /usr/lib64/libstdc++.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
this happens when using gcc-toolset to build the tree.
because neither librados.so nor libcephfs exposes libstdc++ symbols
to executable linking against it. while CMake uses "c++" to link
C++ executables. the "c++" executable comes from GTS links the C++
executables agaist
/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++.so,
which in turn is a ld script:
```
$ cat /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/libstdc++.so
/* GNU ld script
Use the shared library, but some functions are only in
the static library, so try that secondarily. */
OUTPUT_FORMAT(elf64-x86-64)
INPUT ( /usr/lib64/libstdc++.so.6 -lstdc++_nonshared )
```
but the thing is, stdc++_nonshared references some symbols
provided by libstdc++.so.6, and it is listed before it. that's
why "ld" is not able to resolve the referenced symbols used by
the executable, despite that they are provided by libstdc++ in
this case.
in this change, ceph-common is added to the linkage of executables
linked against librados and/or libcephfs, even the executables
in question does not reference ceph-common symbols. unlike librados,
libcephfs and librgw, ceph-common is an internal library, which does
not hide *any* symbols from its consumer, it is also able to provide
symbols from C++ standard library linked by it. so, in our case,
we can link the C++ executables against ceph-common for accessing
the C++ standard library. the reason why we don't link aginst libstdc++
explictly is that, we should leave this to the C++ compiler instead of
referencing a specific C++ standard library explictly by its name.
what if user wants to link against libc++ instead of libstdc++?
another fix could be to remove '-Wl,--as-needed' linker options
from the command line linking the librados applications, so the linker
does not ignore the symbols from libstdc++ when resolving the ones
referenced by stdc++_nonshared, but that would be complicated.
please note, linking against ceph-common does not change the linkage
of
* Ceph executables compiled using non-gcc-toolset toolchain, because we
always pass '-Wl,--as-needed' to "c++" when linking executables,
so "ld" should be able to drop ceph-common even we instruct it
to link against ceph-common. so it would be a no-op in this case.
* 3rd party librados executables compiled using non-gcc-toolset toolchain,
but linked against librados compiled using gcc-toolset toolchain.
because they still link against the /usr/lib64/libstdc++.so.6, when
these executables are compiled and linked. and librados is always
able to access libceph-common. so librados is safe.