librbd: bail from schedule_request_lock() if already lock owner
Race condition may be hit if there are multiple pending locks for the
same image and pending callbacks. Abort exclusive lock process if
already exclusive lock owner.
Fixes: https://tracker.ceph.com/issues/56549 Signed-off-by: Christopher Hoffman <choffman@redhat.com>
Matt Benjamin [Tue, 5 Jul 2022 22:33:09 +0000 (18:33 -0400)]
rgwlc: activate lifecycle processing on non-master zones
The basic idea of this change is the same as the proposal by
Ilsoo Byun <ilsoobyun@linecorp.com>, but some details have changed.
The main differences are to use the existing
RGWLC::set(remove)_bucket_config methods, and to use the
RGWBucketInstanceMetadataHandler infrastructue to dispatch
the corresponding calls. Thank you!
Fixes: https://tracker.ceph.com/issues/44268
Related PR: #33524
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Sat, 20 Nov 2021 18:45:51 +0000 (13:45 -0500)]
rgwlc: introduce lifecycle config flags extension
rgwlc: add uint32_t flags bitmap to LCFilter
This is intended to support a concise set of extensions to S3
LifecycleConfiguration, initially, just a flag that indicates a
rule is intended for execution on RGW ArchiveZone.
rgwlc: add machinery to define and recognize LCFilter flags
Add a concept of filter flags to lifecycle filter rules, an RGW
extension. The initial purpose of flags is to permit marking
specific lifecycle rules as specific to an RGW archive zone, but
other flags could be added in future.
rgwlc: add new unittest_rgw_lc to run internal checks, add a few
valid and invalid lifecycle configuration xml parses for now.
Fixes: https://tracker.ceph.com/issues/53361 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Ilya Dryomov [Fri, 20 May 2022 12:05:03 +0000 (14:05 +0200)]
qa/suites/rbd: disable workunit timeout for dynamic_features_no_cache
The I/O workload in this test is xfstests (qa/run_xfstests_qemu.sh)
which isn't subjected to any timeout other than global max_job_time
limit in any other subsuite (e.g. qemu/workloads/qemu_xfstests.yaml).
But here, there is a parallel "op" workload defined as a workunit.
The workunit task has a default timeout of 3 hours which is effectively
imposed on the entire job. In the "rbd cache = false" configuration,
it's sometimes exceeded.
rbd: don't default empty pool name unless namespace is specified
Commit 96f05a7956b3 ("rbd: delay determination of default pool name")
broke "rbd perf image iostat" and "rbd perf image iotop" GLOBAL_POOL_KEY
support (the ability to blend all rbd pools together into a single
view).
It doesn't really thrash anything, just repeatedly restarts the
workload on top of a dirty cache file. rbd_pwl_cache_recovery is
more on point and gets covered by existing CODEOWNERS.
* make paxos_size() unsigned, as paxos_size() returns the size of
MonMap::mon_info, so it should be always a non-negative value,
and more importantly, it represents a size.
* change the type of MonMap::removed_ranks from std::set<int>
to std::set<unsigned>. for two reasons:
- removed_ranks only tracks the rank which is greater or equal to 0
- helps to silence the warnings listed below.
MonMap::removed_ranks is persisted using encode()/decode(), but this
change is backward compatible, as we use the raw encoder to encode
signed and unsigned integers, the difference between the encoding
schema between them only matters when MSB in the number is used,
but this is not likely happen, as we neither have a negative
rank in removed_ranks, no have a rank greater than `(unsigned)-1`,
i.e., 0xffffffff.
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc: In member function ‘void ElectionLogic::end_election_period()’:
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc:173:23: error: comparison of integer expressions of different signedness: ‘std::set<int>::size_type’ {aka ‘long unsigned int’} and ‘int’ [-Werror=sign-compare]
173 | acked_me.size() > (elector->paxos_size() / 2)) {
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc: In member function ‘void ElectionLogic::propose_connectivity_handler(int, epoch_t, const ConnectionTracker*)’:
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc:338:28: error: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Werror=sign-compare]
338 | for (unsigned i = 0; i < elector->paxos_size(); ++i) {
| ~~^~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc: In member function ‘void ElectionLogic::receive_ack(int, epoch_t)’:
/home/kefu/dev/ceph/src/mon/ElectionLogic.cc:469:25: error: comparison of integer expressions of different signedness: ‘std::set<int>::size_type’ {aka ‘long unsigned int’} and ‘int’ [-Werror=sign-compare]
469 | if (acked_me.size() == elector->paxos_size()) {
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
make[3]: *** [src/mon/CMakeFiles/mon.dir/build.make:328: src/mon/CMakeFiles/mon.dir/ElectionLogic.cc.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory '/home/kefu/dev/ceph/build'
[ 48%] Built target libglobal_objs
/home/kefu/dev/ceph/src/mon/Elector.cc: In member function ‘void Elector::notify_rank_removed(int)’:
/home/kefu/dev/ceph/src/mon/Elector.cc:734:43: error: comparison of integer expressions of different signedness: ‘unsigned int’ and ‘int’ [-Werror=sign-compare]
734 | for (unsigned i = rank_removed + 1; i <= paxos_size() ; ++i) {
| ~~^~~~~~~~~~~~~~~
myoungwon oh [Tue, 28 Jun 2022 04:42:21 +0000 (13:42 +0900)]
osd: return ENOENT if pool information is invalid during tier-flush
During tier-flush, OSD sends reference increase message to target OSD.
At this point, sending message with invalid pool information (e.g., deleted pool)
causes unexpected behavior.
Therefore, this commit return ENOENT early before sending the message
Currently, the following transaction exec sequence would lead to
loss of backref:
1. Trans `A` merge a alloc backref for extent `X`
2. Trans `B` add a release backref for extent `X` to backref cache,
during which it finds an in-cache alloc backref for extent `X` and
decide not to add the release backref to cache
3. Trans `A` commit
In the above sequece, the release backref for extent `X` is lost.
This is a regression introduced when we try to optimize the backref cache.
This commit fix the issue by caching inflight backrefs in a multiset,
alloc/release ops that happen on the same paddr are queued in the order of
their happening. When doing gc, all those backrefs are merged.
Samuel Just [Tue, 12 Jul 2022 22:35:44 +0000 (22:35 +0000)]
crimson/osd: introduce pg_shard_manager to clarify shard-local vs osd-wide state
This commits begins to change ShardServices to be the interface by which
PGs access shard local and osd wide state. Future work will further
clarify this interface boundary and introduce machinery to mediate cold
path access to state on remote shards.
Tatjana Dehler [Thu, 7 Jul 2022 15:21:14 +0000 (17:21 +0200)]
mgr/dashboard: prevent alert redirect
Prevent Alertmanager alerts from being redirected to the active mgr
dashboard instance. There are two reasons for it:
1. It doesn't bring any additional benefit. The Alertmanager config
includes all available mgr instances - active and passive ones. In
case of an alert, it will be sent to all of them. It ensures that
the active mgr dashboard will receive the alert in any case.
2. The redirect URL includes the mgr IP and NOT the FQDN. This leads
to issues in environments where an SSL certificate is configured and
matches the FQDNs, only.
Fixes: https://tracker.ceph.com/issues/56401 Signed-off-by: Tatjana Dehler <tdehler@suse.com>
Yin Congmin [Fri, 7 Jan 2022 07:03:44 +0000 (15:03 +0800)]
qa/tasks: add thrash test for persistent write log cache
add thrash test for persistent write log cache. run rbd bench
on persistent write log cache, thrashes rbd bench, test the
recovery function of persistent write log cache.
crimson/os/seastore/cache: fine-grained lru cache control with GC
GC transaction is not sourced by user behaviors, so the extent read
operations from GC transaction don’t satisfy the time locality
principle. These extents should not be added to LRU cache.