Yingxin Cheng [Fri, 24 Jun 2022 03:04:50 +0000 (11:04 +0800)]
crimson/os/seastore/segment_cleaner: increase avaliable ratio limit
Journal trimming may consume unexpected number of segments when the
available ratio limit is reached with user transactions blocked, causing
ceph_abort(). So increase the limit as a simple workaround.
Yingxin Cheng [Fri, 24 Jun 2022 05:25:51 +0000 (13:25 +0800)]
crimson/os/seastore: improve GC policies with modify-time
* record_header_t to store the average modify time for dirty extents.
* Drop tracking rewrite-time.
* Drop the last-modify field in extent_info_t.
* Maintain modify-time during rewriting.
* Introduce 3 GC policies: greedy, benefit, and cost-benefit.
Yingxin Cheng [Fri, 27 May 2022 09:13:06 +0000 (17:13 +0800)]
crimson/os/seastore: implement generational GC
Place extents into the dedicated RecordSubmitter by their data-category
and reclaimed-count. Segments of different data-category or
reclaimed-count should have different locality in the access patterns,
which is the foundation to form a desired bimodal distribution of
segment utilizations, so that GC can be more efficient.
Yin Congmin [Sat, 25 Jun 2022 09:43:52 +0000 (17:43 +0800)]
cmake: rename a series of pmem libraries to pmdk
At first, libpmem was the only library. Later, pmem related libraries
such as libpmemobj and libpmem2 were gradually added. These libraries
were also integrated into one named pmdk. So rename to pmdk.
Kefu Chai [Sat, 25 Jun 2022 14:27:02 +0000 (22:27 +0800)]
cmake: use CMAKE_<LANG>_COMPILER_LAUNCHER for configuring ccache
ccache only works for c and c++, so instead of using the universal
`RULE_LAUNCH_COMPILE` use `CMAKE_<LANG>_COMPILER_LAUNCHER` instead,
so ccache is only configured for c and c++ compilation. this is a better
solution for integrating ccache into our building system.
Yin Congmin [Fri, 13 May 2022 12:44:53 +0000 (20:44 +0800)]
install-deps: install pmdk libraries
Install libpmem and libpmemobj under focal ubuntu. the version of apt
list can meet the current requirements. libpmemobj require >=1.8.
Libpmem has no version requirements.
Yin Congmin [Sat, 25 Jun 2022 09:04:44 +0000 (17:04 +0800)]
cmake: lower the required version of libpmem to 1.8
The upgrade of pmemobj in https://github.com/ceph/ceph/pull/40493
is to introduce new API. The minimum version requirement is 1.8.
Therefore, the requirements for find_package can be lowered.
David Galloway [Fri, 24 Jun 2022 16:27:43 +0000 (12:27 -0400)]
.github: Add labels while PR is open
I think https://github.com/tibdex/backport will only create backport PRs if our doc/releases PRs are labelled *and then* closed. This action currently labels after the PR is closed.
Signed-off-by: David Galloway <dgallowa@redhat.com>
Ronen Friedman [Mon, 20 Jun 2022 12:47:57 +0000 (12:47 +0000)]
scrub/osd: disable blocked-scrub warnings during some tests
As some Teuthology tests seem to block objects for long minutes,
we must not issue the "scrub is blocked for too long" warning
(that warning causes the tests to fail).
A new configuration parameter now controls the grace period before
the warning is issued. Some tests were modified to set this
configuration parameter to a large value.
we do not require to maintain our own patched version. Also, we were
using git fetch as a temporary change, now that tracing is always on, it
should be right time to adapt to using opentelemetry-cpp as a submodule.
This removes compile time fetch for opentelemetry-cpp instead use
official opentelemetry-cpp lib as a submodule.
Kefu Chai [Tue, 21 Jun 2022 15:28:23 +0000 (23:28 +0800)]
install-deps.sh: do not install libpmem from chacra
this change reverts 17d2bc3707bb0078e2fa1b4eef31b39804e45135, before
we recreate a chacra repo hosting libpmem packages, we are not able
to query the repo from shaman or pull the dependencies from chacra.
in future, we should be able to get the libpmem dependencies from
offical ubuntu package repo and fedora, CentOS Stream and RHEL repos.
Zac Dover [Tue, 21 Jun 2022 14:09:05 +0000 (00:09 +1000)]
doc/dev: add context note to dev guide config
This PR adds a note directing first-time cloners of
their Ceph git forks to make sure to cd into the ceph/
directory before trying to run the "git config" commands.
Prashant D [Tue, 21 Jun 2022 06:53:41 +0000 (02:53 -0400)]
pybind/mgr/autoscaler: Donot show NEW PG_NUM value if autoscaler is not on
When noautscale is set, autoscale-status shows NEW PG_NUM
value if pool pg_num is more than 96. If autoscaler is in
off or warn mode for the pool then donot adjust the final
pg count for the pool.
Fixes: https://tracker.ceph.com/issues/56136 Signed-off-by: Prashant D <pdhange@redhat.com>
Ilya Dryomov [Sun, 19 Jun 2022 10:12:01 +0000 (12:12 +0200)]
mgr/rbd_support: always rescan image mirror snapshots on refresh
Establishing a watch on rbd_mirroring object and skipping rescanning
image mirror snapshots on periodic refresh unless rbd_mirroring object
gets notified in the interim is flawed. rbd_mirroring object is
notified when mirroring is enabled or disabled on some image (including
when the image is removed), but it is not notified when images are
promoted or demoted. However, load_pool_images() discards images that
are not primary at the time of the scan. If the image is promoted
later, no snapshots are created even if the schedule is in place. This
happens regardless of whether the schedule is added before or after the
promotion.
This effectively reverts commit 69259c8d3722 ("mgr/rbd_support: make
mirror_snapshot_schedule rescan only updated pools"). An alternative
fix could be to stop discarding non-primary images (i.e. drop
if not info['primary']:
continue
check added in commit d39eb283c5ce ("mgr/rbd_support: mirror snapshot
schedule should skip non-primary images")), but that would clutter the
queue and therefore "rbd mirror snapshot schedule status" output with
bogus entries. Performing a rescan roughly every 60 seconds should be
manageable: currently it amounts to a single mirror_image_status_list
request, followed by mirror_image_get, get_snapcontext and snapshot_get
requests for each snapshot-based mirroring enabled image and concluded
by a single dir_list request. Among these, per-image get_snapcontext
and snapshot_get requests are necessary for determining primaryness.
Ilya Dryomov [Sat, 18 Jun 2022 13:25:49 +0000 (15:25 +0200)]
rbd-mirror: spell out "remote image is not primary" status correctly
There is a difference: non-primary means NON_PRIMARY promotion state,
while "not primary" can refer to any of NON_PRIMARY, ORPHAN or UNKNOWN
promotion states.
Ilya Dryomov [Sat, 18 Jun 2022 11:00:34 +0000 (13:00 +0200)]
rbd-mirror: fix up PrepareReplayDisconnected test case
It was botched in commit 2bca9ee96c65 ("rbd-mirror: consolidate
prepare local/remote image steps to bootstrap") and went unnoticed
because currently no special handling is needed for disconnected
clients -- is_disconnected() check happens to be the last step
and it doesn't generate an error.
Ilya Dryomov [Mon, 20 Jun 2022 12:19:41 +0000 (14:19 +0200)]
rbd-mirror: generally skip replay/resync if remote image is not primary
Replay and resync should generally be skipped if the remote image is
not primary.
If this is not done for replay, snapshot-based mirroring can run into
a livelock if the primary image is demoted while a mirror snapshot is
being synced. On the demote site, rbd-mirror would pick up the just
demoted image, grab the exclusive lock on it and idle waiting for a new
mirror snapshot to be created. On the (still) non-primary site,
rbd-mirror would eventually finish syncing that mirror snapshot and
attempt to unlink from it on the demote site. These attempts would
fail with EROFS due to exclusive lock being held in the "refuse proxied
maintenance operations" mode, blocking forward progress (syncing of the
demotion snapshot so that the non-primary image can be orderly promoted
to primary, etc).
If this is not done for resync, data loss can ensue as the just demoted
image would be immediately trashed, underneath the non-primary site that
is still syncing.
Currently this is done in PrepareReplayRequest only for journal-based
mirroring. Note that it is conditional: if the local image is linked
to the remote image, proceeding is desirable.
Generalize this check, consolidate it with a related check in
PrepareRemoteImageRequest and move the result to BootstrapRequest to
cover both "local image does not exist" and "local image is unlinked"
cases for both modes.
Ilya Dryomov [Sat, 18 Jun 2022 10:35:51 +0000 (12:35 +0200)]
rbd-mirror: strengthen is_local_primary() and is_linked()
Initialize local_promotion_state and remote_promotion_state to UNKNOWN
instead of counterintuitive PRIMARY and NON_PRIMARY -- half the time the
final values are flipped. Then is_local_primary() and is_linked() can
be strengthened as a non-existent image should stay in UNKNOWN.