Venky Shankar [Wed, 28 Jun 2023 04:53:54 +0000 (10:23 +0530)]
Merge PR #49971 into main
* refs/pull/49971/head:
doc/cephfs: document MDS_CLIENTS_LAGGY health warning
qa: ignore warnings
qa: add test cases to check client eviction if an OSD is laggy
mds,messages: enable beacon to report clients lagginess
mds: do not evict client on laggy osds
common: add new config option to defer client eviction
osd: add method to check for laggy osds
Venky Shankar [Thu, 22 Jun 2023 10:08:44 +0000 (06:08 -0400)]
qa: assign file system affinity for replaced MDS
Otherwise, the MDS that just got replaced can transition to a rank
for another file system and the test cannot deterministically infer
which MDS needs to checked.
cephfs-journal-tool: disambiguate usage of all keyword (in tool help).
The fs:all for rank option description was confusing. It seemd
like the fs was optional, but it is mandatory. This change modifies the
help message to reflect the correct way to use all in the --rank option.
Fixes: https://tracker.ceph.com/issues/61753 Signed-off-by: Manish M Yathnalli <myathnal@redhat.com>
Note to the documentation team: This is not a line-edit. This commit
includes nothing but the removal of pipes added to the left of much of
the text in this file. Several future commits will line-edit this file
and correct its formatting.
Xinyu Huang [Fri, 9 Jun 2023 07:25:48 +0000 (15:25 +0800)]
crimson/os/seastore: fix bug in check_node
EXIST_CLEAN and EXIST_MUTATION_PENDING shuold not be treated as
CLEAN in check_node because they are transaction private and the
leafnode has been duplicated for write.
Venky Shankar [Fri, 23 Jun 2023 05:10:51 +0000 (10:40 +0530)]
Merge PR #51974 into main
* refs/pull/51974/head:
doc: fix grammar in cephfs/standby
Reviewed-by: Anthony D Atri <anthony.datri@gmail.com> Reviewed-by: Jos Collin <jcollin@redhat.com> Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Laura Flores [Thu, 22 Jun 2023 22:09:33 +0000 (17:09 -0500)]
doc/rados/operations: change file back to original name
The name of this file was changed in 4fcab2e7fc9f7ac170ede21cb07912b79926ccb9,
but on second thought, this could cause 404 situations.
Reverting the file name back to the original name.
Casey Bodley [Thu, 22 Jun 2023 12:47:06 +0000 (08:47 -0400)]
qa/s3tests: make extra_attrs additive
the s3tests.py task is filtering out several attrs by default. but
when dbstore uses `extra_attrs` to add 'not fails_on_dbstore', it
overwrites those other filters
John Mulligan [Mon, 19 Jun 2023 16:54:24 +0000 (12:54 -0400)]
cephadm: dont set ctx.image if json deploy image is unset
If no image has been provided in the "deploy from" json do not set
ctx.image to it (empty-string or None) as we may have had a valid
value passed on the --image CLI option.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Yuval Lifshitz [Fri, 16 Jun 2023 15:10:19 +0000 (15:10 +0000)]
rgw/amqp: remove possible race conditions with the amqp connections
* simplify memory management of the connection by not using a unique_ptr
* simplify the logic by handling all issues inside the amqp manager
* fix iterator invalidation issue with miltiple n/acks
* allow different connections with different exchanges
* modify the unit tests according to the new behavior
Ilya Dryomov [Fri, 16 Jun 2023 12:01:52 +0000 (14:01 +0200)]
qa/workunits/rbd: make continuous export-diff test actually work
The current version is pretty useless:
- "rbd bench" writes the same byte (0xff) over and over again, so
almost all checksumming is in vain
- snapshots are taken in a steady state (i.e. not under I/O), so no
race conditions can get exposed
- even with these caveats, it's not wired up into the suite
Redo this workunit to be a reliable reproducer for the issue fixed
in the previous commit and wire it up for both krbd and rbd-nbd.
Venky Shankar [Tue, 20 Jun 2023 10:04:30 +0000 (15:34 +0530)]
Merge PR #51500 into main
* refs/pull/51500/head:
test/libcephfs: add test case for revoking caps
client: issue a cap release immediately if no cap exists
mds: add the revoking caps back to _revokes list
mds: move confirm_receipt() to Capability.cc
Rishabh Dave [Wed, 9 Jun 2021 07:55:02 +0000 (13:25 +0530)]
AuthMonitor: make code for updating caps reusable
Move the code for "ceph auth caps" command to a separate function so
that the code can be reused to update caps.
Most of the code here is same as the code for creating new entity, so
let's modify this method to also create an entity. Also, create helper
methods to update and create an entity.
Jos Collin [Mon, 22 May 2023 04:31:39 +0000 (10:01 +0530)]
mds: display sane hex value (0x0) for empty feature bit
Print a valid hex (0x0) during empty feature bit, so that the clients could recognize it.
When the _vec size becomes 0, print() function creates an invalid hex (0x) and 'perf stats'
crashes with the below error:
"
File "/opt/ceph/src/pybind/mgr/stats/fs/perf_stats.py", line 177, in notify_cmd
metric_features = int(metadata[CLIENT_METADATA_KEY]["metric_spec"]["metric_flags"]["feature_bits"], 16)
ValueError: invalid literal for int() with base 16: '0x'
"
This patch creates a valid hex (0x0), when _vec size is 0.
Fixes: https://tracker.ceph.com/issues/59551 Signed-off-by: Jos Collin <jcollin@redhat.com>
Ilya Dryomov [Tue, 13 Jun 2023 11:36:02 +0000 (13:36 +0200)]
librbd: stop passing IOContext to image dispatch write methods
This is a major footgun since any value passed e.g. at the API layer
may be stale by the time we get to object dispatch. All callers are
passing the IOContext returned by get_data_io_context() for their
ImageCtx anyway, highlighting that the parameter is fictitious.
Only the read method can meaningfully take IOContext.
Ilya Dryomov [Mon, 12 Jun 2023 19:45:03 +0000 (21:45 +0200)]
librbd: use an up-to-date snap context when owning the exclusive lock
By effectively moving capturing of the snap context to the API layer,
commit 1d0a3b17f590 ("librbd: pass IOContext to image-extent IO
dispatch methods") introduced a nasty regression. The snap context can
be captured only after exclusive lock is safely held for the duration
of dealing with the image request and even then must be refreshed if
a snapshot creation request is accepted from a peer. This is needed to
ensure correctness of the object map in general and fast-diff states in
particular (OBJECT_EXISTS vs OBJECT_EXISTS_CLEAN) and object deltas
computed based off of them. Otherwise the object map that is forked
for the snapshot isn't guaranteed to accurately reflect the contents of
the snapshot when the snapshot is taken under I/O (as in disabling the
object map may lead to different results being returned for reads).
The regression affects mainly differential backup and snapshot-based
mirroring use cases with object-map and/or fast-diff enabled: since
some object deltas may be incomplete, the destination image may get
corrupted.
This commit represents a reasonable minimal fix: IOContext passed
through to ImageDispatch is effected only for reads and just gets
ignored for writes. The next commit cleans up further by undoing the
passing of IOContext through the image dispatch layers for writes.
Rishabh Dave [Fri, 9 Jun 2023 11:10:35 +0000 (16:40 +0530)]
AuthMonitor: use map, not vector, for caps
Although the variable "caps_vec" is defined as a vector (with following
as its members: {"mon", <moncap>, "osd", <osdcap>, "mgr", <mgrcap>,
"mds", <mdscap>}) it actually is used and iterated as a map. Simplify
the code by using a map instead of vector for storing caps.