Raf Lopez [Fri, 3 Jun 2022 04:28:16 +0000 (04:28 +0000)]
msg: add new async event driver based on poll()
Driver to replace select() where useful, currently this is
windows clients as select is the only available driver for it.
Windows is limited by the FD_SETSIZE hard limit of 64
descriptors. This driver Uses poll() or WSAPoll() and maintains
pollfd structures to overcome select() limitations.
Fixes: https://tracker.ceph.com/issues/55840 Signed-off-by: Rafael Lopez <rafael.lopez@softiron.com>
qa/tasks/rgw_multisite.py uses 'zonegroup set' to create zonegroups from
their json format. this doesn't enable any of the supported zonegroup
features by default, so this adds the 'enabled_features' field to the
json representations
Jeff Layton [Wed, 1 Jun 2022 17:57:29 +0000 (13:57 -0400)]
qa: fix .teuthology_branch file in qa/
According to teuthology-suite:
-t <branch>, --teuthology-branch <branch>
The teuthology branch to run against.
Default value is determined in the next order.
There is TEUTH_BRANCH environment variable set.
There is `qa/.teuthology_branch` present in
the suite repo and contains non-empty string.
There is `teuthology_branch` present in one of
the user or system `teuthology.yaml` configuration
files respectively, otherwise use `main`.
The .teuthology_branch file in the qa/ dir currently points at "master".
Change it to point to "main".
cephadm: fix osd adoption with custom cluster name
When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.
Fixes: https://tracker.ceph.com/issues/55654 Signed-off-by: Adam King <adking@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
Redouane Kachach [Tue, 31 May 2022 10:59:26 +0000 (12:59 +0200)]
mgr/cephadm: capture exception when not able to list upgrade tags Fixes: https://tracker.ceph.com/issues/55801 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Redouane Kachach [Tue, 31 May 2022 10:11:03 +0000 (12:11 +0200)]
mgr/cephadm: check if a service exists before trying to restart it Fixes: https://tracker.ceph.com/issues/55800 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
Zac Dover [Mon, 30 May 2022 13:32:06 +0000 (23:32 +1000)]
doc/start: update "memory" in hardware-recs.rst
This PR corrects some usage errors in the "Memory" section
of the hardware-recommendations.rst file. It also closes
some opened but never closed parentheses.
Yingxin Cheng [Mon, 30 May 2022 10:35:33 +0000 (18:35 +0800)]
crimson/os/seastore/transaction_manager: set to test mode under debug build
* force to test mode under debug build.
* make reclaim to happen and validated as early as possible.
* do not block user transaction when reclaim-ratio (unalive/unavailable)
is high, especially in the beginning.
Yingxin Cheng [Mon, 30 May 2022 05:27:30 +0000 (13:27 +0800)]
crimson/os/seastore/segment_cleaner: delay reclaim until near full
It should be generically better to delay reclaim as much as possible, so
that:
* unalive/unavailable can higher to reduce reclaim efforts;
* less conflicts between mutate and reclaim transactions;
Ronen Friedman [Tue, 17 May 2022 16:13:59 +0000 (16:13 +0000)]
osd/scrub: restart snap trimming after a failed scrub
A followup to PR#45640.
In PR#45640 snap trimming was restarted (if blocked) after all
successful scrubs, and after most scrub failures. Still, a few
failure scenarios did not handle snaptrim restart correctly.
The current PR cleans up and fixes the interaction between
scrub initiation/termination (for whatever cause) and snap
trimming.
Adam C. Emerson [Fri, 13 May 2022 19:56:28 +0000 (15:56 -0400)]
test/rgw: bucket sync run recovery case
1. Write several generations worth of objects. Ensure that everything
has synced and that at least some generations have been trimmed.
2. Turn off the secondary `radosgw`.
3. Use `radosgw-admin object rm` to delete all objects in the bucket
on the secondary.
4. Invoke `radosgw-admin bucket sync init` on the secondary.
5. Invoke `radosgw-admin bucket sync run` on the secondary.
6. Verify that all objects on the primary are also present on the
secondary.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 11 May 2022 22:39:08 +0000 (18:39 -0400)]
test/rgw: Add incremental test of bucket sync run
This tests for iterating properly over the generations.
1. Create a bucket and write some objects to it. Wait for sync to
complete. This ensures we are in Incremental.
2. Turn off the secondary `radosgw`.
3. Manually reshard. Then continue writing objects and resharding.
4. Choose objects so that each generation has objects in many but not
all shards.
5. After building up several generations, run `bucket sync run` on the
secondary.
6. Verify that all objects on the primary are on the secondary.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Adam C. Emerson [Wed, 13 Apr 2022 02:10:31 +0000 (22:10 -0400)]
rgw: Disentangle init_sync_status from RemoteBucketManager
RGWRemoteBucketManager's current design isn't really compatible with
what we need for bucket sync run to work as the number of shards
changes from run to run.
We can make a smaller 'hold information common to all three
operations' class and simplify things a bit.
We also need to fetch `rgw_bucket_index_marker_info` and supply it to
`InitBucketFullSyncStatusCR` to ensure we have the correct generation
and shard count.
Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
Casey Bodley [Fri, 25 Mar 2022 21:14:05 +0000 (17:14 -0400)]
rgw multisite: resharding scales up shard counts 4x faster
in multisite reshard, we need to keep the old index shards around until
other zones finishing syncing from them. we don't want to allow a
bunch of reshards in a row, because we have to duplicate that many
sets of index objects. so we impose a limit of 4 bilog generations (or 3
reshards), and refuse to reshard again until bilog trimming catches up/
trims the oldest generation
under a sustained write workload, a bucket can fill quickly and need
successive reshards. if we have a limit of 3, we should make them count!
so instead of doubling the shard count at each step, multiply by 8
instead when we're in a multisite configuration
Yuval Lifshitz [Wed, 23 Feb 2022 15:21:10 +0000 (17:21 +0200)]
rgw: prevent spurious/lost notifications in the index completion thread
this was happening when asyn completions happened during reshard.
more information about testing:
https://gist.github.com/yuvalif/d526c0a3a4c5b245b9e951a6c5a10517
we also add more logs to the completion manager.
should allow finding unhandled completions due to reshards.
Yuval Lifshitz [Thu, 10 Feb 2022 16:12:55 +0000 (18:12 +0200)]
rgw: refrash the generation of the bucket shard when fetching info
when RGWRados::block_while_resharding() fails because reshard is in
progress, in the next iteration we should fetch the bucket shard
generation. for the case that the generation changed in the middle.
Casey Bodley [Wed, 9 Feb 2022 21:55:38 +0000 (16:55 -0500)]
rgw: prevent 'radosgw-admin bucket reshard' if zonegroup reshard is disabled
dynamic reshard was gated behind the zonegroup resharding flag with
RGWSI_Zone::can_reshard(), but manual reshard was only calling
RGWBucketReshard::can_reshard()
Casey Bodley [Wed, 19 Jan 2022 01:39:37 +0000 (20:39 -0500)]
rgw: RGWBucket::sync() no longer duplicates datalog/bilog entries
RGWSI_BucketIndex_RADOS::handle_overwrite() is already writing the
datalog/bilog entries related to BUCKET_DATASYNC_DISABLED
RGWBucket::sync() calls handle_overwrite() indirectly from
bucket->put_info() when it writes the bucket instance with this new
BUCKET_DATASYNC_DISABLED flag, so RGWBucket::sync() shouldn't
duplicate those writes here
Casey Bodley [Tue, 18 Jan 2022 21:43:42 +0000 (16:43 -0500)]
rgw: use get_current_index() instead of log_to_index_layout()
several places were getting the current index layout indirectly
with layout.logs.back() and rgw::log_to_index_layout(). use
get_current_index() instead so we don't rely on layout.logs, which may
be empty for indexless buckets