With auto-deletion of trashed snapshots, it is relatively easy to lose
a race to "rbd flatten" as follows:
- when V2_GET_PARENT runs, the image is technically still a clone
- when V2_REFRESH_PARENT runs, the image is fully flattened and the
snapshot in the parent image is deleted
This results in a spurious ENOENT error, mainly when trying to open the
image (e.g. for "rbd info"). This race condition has always been there
but auto-deletion of trashed snapshots makes it much worse.
Retry ENOENT in V2_REFRESH_PARENT the same way as in V2_GET_SNAPSHOTS.
librbd: fix a bunch of issues with restarting RefreshRequest
Make RefreshRequest properly restartable, at least up until and including
V2_REFRESH_PARENT step:
- clear m_migration_spec when skipping GET_MIGRATION_HEADER
- don't rely on potentially stale m_incomplete_update on retry
- reset m_legacy_parent when retrying more than just V2_GET_PARENT
- don't rely on potentially stale m_parent_md.overlap and
m_head_parent_overlap on retry
- clear m_metadata before fetching image metadata (but not before
fetching pool metadata)
- clear m_op_features when skipping V2_GET_OP_FEATURES
- clear m_group_spec on EOPNOTSUPP error in V2_GET_GROUP
- reset m_legacy_snapshot when retrying more than just V2_GET_SNAPSHOTS
- don't rely on potentially stale m_snap_parents on retry
Soumya Koduri [Thu, 26 May 2022 16:55:06 +0000 (22:25 +0530)]
rgw: Avoid dereferencing nullptr while configuring bucket sync policy
While configuring bucket sync policy, in "rgw_sync_bucket_entities::set_bucket()",
there could be a case where in bucket doesnt contain any value but is still being
dereferenced. This commit fixes the same.
test/{librbd, rgw}: retry when bind fail with port 0
there is chance that the bind() call may fail if we have another test
happen to pick the free port picked by operating system. in this case,
we just retry up to 42 times.
in theory, this change does not fully address the racing, but it should
help to alleviate this issue.
Zac Dover [Tue, 30 Aug 2022 11:48:08 +0000 (21:48 +1000)]
doc/start: update documenting-ceph branch names
This PR updates the branch names in the
documenting-ceph.rst file. It gets rid of all references
to the "master" branch, and updates the language to
reflect the state of play in 2022.
inb4: This PR merely removes the most egregious inaccuracies,
the ones that were most readily evident on a cursory perusal.
The full text remains to be carefully read and fitted together
with care.
Adam King [Wed, 17 Aug 2022 20:54:54 +0000 (16:54 -0400)]
cephadm: return nonzero exit code when applying spec fails in bootstrap
This is mostly useful for testing automation, but right now if applying the
spec provided with --apply-spec fails, the return code remains zero. We don't
want to error out entirely in that case as we still want to print the remaining
output (e.g. the dashboard password). Continuing onward and then returning a
nonzero code could provide a balance where we still give all the output but
still have something to make it easier for those writing automation around bootstrap.
mgr/cephadm: Adding logic to store grafana cert/key per node Fixes: https://tracker.ceph.com/issues/56508 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 3c990f974e3beac0fc03f58c4c47f26f9d5afe56)
Redouane Kachach [Wed, 31 Aug 2022 11:49:37 +0000 (13:49 +0200)]
mgr/cephadm: Fix how we check if a host belongs to public network Fixes: https://tracker.ceph.com/issues/57060 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c8833feaf42fd518e19c9a347c6c5781943862a)
Ilya Dryomov [Thu, 18 Aug 2022 16:48:39 +0000 (18:48 +0200)]
librbd/cache/pwl: generate image cache state json under m_lock
The previous commit moved the entirety of write_image_cache_state()
from under m_lock. This was a step too far because the generated image
cache state json is no longer guaranteed to be consistent.
Arrange for m_lock to still be held during image cache json generation
but released before owner_lock is grabbed.
librbd/cache/pwl: move write_image_cache_state() out of m_lock
periodic_stats() will get m_lock, then get owner_lock. It is opposite
to the lock getting order of SnapshotCreateRequest::handle_notify_quiesce().
move write_image_cache_state() out of m_lock scope. After calling
update_image_cache_state(), and m_lock auto released, then call
write_image_cache_state() to update state in osds.
Nizamudeen A [Fri, 2 Sep 2022 10:19:31 +0000 (15:49 +0530)]
Merge pull request #47867 from MrFreezeex/quincy-ceph-mixin-backports
quincy: monitoring: ceph mixin backports
Reviewed-by: Aashish Sharma <aasharma@redhat.com> Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Nizamudeen A <nia@redhat.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com>
Ilya Dryomov [Tue, 30 Aug 2022 09:45:44 +0000 (11:45 +0200)]
rbd-mirror: skip setting error code on snapshot replayer shutdown
This is regarding failures in unregister_remote_update_watcher() and
unregister_local_update_watcher(). handle_replay_complete() can't be
called in these cases anymore as it would blindly attempt to unregister
watchers from scratch again. Dropping handle_replay_complete() calls
there means that these failures would only be logged and would not be
surfaced by snapshot replayer. But the only caller ignores them
anyway:
void ImageReplayer<I>::shut_down(int r) {
...
// close the replayer
if (m_replayer != nullptr) {
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->destroy();
m_replayer = nullptr;
ctx->complete(0); <------
});
ctx = new LambdaContext([this, ctx](int r) {
m_replayer->shut_down(ctx);
});
}
Ilya Dryomov [Wed, 24 Aug 2022 10:56:31 +0000 (12:56 +0200)]
rbd-mirror: resume pending shutdown on error in snapshot replayer
If a shutdown is requested, e.g. by update_pool_replayers() because
remote RADOS instance got blocklisted, and Replayer::shut_down() pends
it on completion of current snapshot sync, it gets stuck if replayer
encounters an error in the interim. This is particularly likely in the
blocklist case: a higher layer may detect that client got blocklisted
and request a shutdown first, and then when replayer sees EBLOCKLISTED
in turn, it calls handle_replay_complete() -- which does not resume
a pending shutdown. Because update_pool_replayers() blocks on shutdown
with Mirror::m_lock held, eventually the entire daemon hangs in
perpetuity.
Ilya Dryomov [Sat, 27 Aug 2022 09:09:00 +0000 (11:09 +0200)]
librbd: use actual monitor addresses when creating a peer bootstrap token
Relying on mon_host config option is fragile, as the user may confuse
v1 and v2 addresses, group them incorrectly, etc. Get mon_host value
only as a fallback.
Adam King [Thu, 18 Aug 2022 12:49:57 +0000 (08:49 -0400)]
qa/cephadm: specify using container host distros for workunits
Right now, the OS Type and OS Version for these workunits
tests is left blank on pulpito and they appear to be trying to
run ubuntu jammy currently which is causing failures. We should
specify what distros the tests should run on then very explicitly
tell it to start trying new distros when we can get the tests to
pass.
胡玮文 [Sun, 9 Jan 2022 15:17:38 +0000 (23:17 +0800)]
mon/MDSMonitor: remove redundant state change check
There are two sets of checks to state change in prepare_beacon.
Since the last commit, many of these checks are covered by
`MDSMap::state_transition_valid`. So merging these checks.
This fixes the bug that standby-replay is evicted unexpectedly.
This bug is introduced in 794d13c9ff4 (mon/MDSMonitor: reject illegal want_states from MDS)
but only reveal itself after 20509bb6c82 (MDSMonitor: handle damaged from standby-replay)
胡玮文 [Fri, 7 Jan 2022 17:14:00 +0000 (01:14 +0800)]
osd/PeeringState: fix missed `recheck_readable` from laggy
Previously, the first `pg_lease_ack_t` after becoming laggy would not
trigger `recheck_readable`. However, every other ack would trigger it.
The logic is inverted, causing unnecessarily long laggy PG state.
Fixes: 3bb8a7210a6 (osd: requeue ops when PG is no longer laggy) Fixes: https://tracker.ceph.com/issues/53806 Signed-off-by: 胡玮文 <huww98@outlook.com>
(cherry picked from commit caeca396e8b149cfa09ed99eda4f7a7186b005b4)
Kotresh HR [Tue, 16 Aug 2022 11:41:33 +0000 (17:11 +0530)]
mgr/volumes: Remove stale snapshot user metadata
This patch adds the capability to remove the stale snapshot user
metadata while loading the subvolume if it is present. It can't
be done in 'SubvolumeBase.discover' since v1 and v2 snapshot paths
are different. This is done just after the discover before returning
the specific version object.
mgr/volumes: Allow forceful snapshot removal on osd full
When the osd is full, if the snapshot has metadata set, it
can't be removed as user metadata can't be removed when osd
is full. This patch provides a way to remove the snapshot
with 'force' option while keeping the corresponding metadata
which gets removed on subvolume discover when it finds space.
Kotresh HR [Tue, 16 Aug 2022 11:38:16 +0000 (17:08 +0530)]
mgr/volumes: Better handle config file on osd full scenario
The 'metadata_mgr.flush()' used to truncate the config file
before flushing the new config data. This could lead to an
empty config file when there is no space to write new config
data. This patch handles this scenario by writing it to
temporary file and rename it to config file. This would
retain the config file without truncating it.
Also, there are bunch of places which wasn't handling
'MetadataMgrException' because of this. Fixed those.
If any clone is in pending or in-progress state then
show these clones in 'fs subvolume snapshot info'
command output. This field only exists if clones are
in pending or in progress state.
The addition of unselectable prompts to these three files
completes the work begun in PR#47810 (d8064b4), which sought
to bring dashboard.rst into line with the unselectable prompt
standard introduced by Kefu Chai in 2020.
In 4a3afcf, the $PATH is set for the test, but we cannot set multiple
properties with a single `set_property()` cmake command. We fix that by
adding the installation path of jsonnet-bundler
(CMAKE_CURRENT_BINARY_DIR) to the $PATH used for every tox test.
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch> Co-Authored-By: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit d46e14c71bffda1381dac7da244ab8347d035769)