Xiubo Li [Thu, 11 Apr 2024 01:53:04 +0000 (09:53 +0800)]
mds: do remove the cap when seqs equal or larger than last issue
There is a race in case of:
MDS rw Client
- Issue the 'Asx' caps to
rw client
- Adds the cap, then removes it
later by queuing it to the cap
release list. But the cap->seq
may have been updated by previous
cap grant requests.
And the cap grant request won't
increase the 'last_issue' seq in
MDS.
- ro client's lookup
request comes and the
MDS sends a 'Ax' caps
revoke request to rw
client by increasing
the 'seq'.
- The revoke request just finds
that the cap doesn't exist, then
queues a new cap release
immediately with the new 'seq'.
Then trigger to flush the pending
cap releases to MDS.
- Just receives the cap
release request but the
'seq' > cap's 'last_issue',
then MDS will skip
removing the cap. And
then the _do_cap_release()
will issue the 'Ax' caps
back to rw client.
Then wakes up the ro
client's lookup request,
while the lookup request
will try to revoke the
'Ax' caps again from the
rw client.
This will cause a spinlock infinitely in mds side.
doc/rados: edit "Placement Groups Never Get Clean"
Make grammar improvements (and correct a verb disagreement) in the
section "Placement Groups Never Get Clean" in
doc/rados/troubleshooting/troubleshooting-pg.rst.
* the steps performed by the Windows CI job
* artifact structure
* frequently asked questions
The document is meant to assist the Ceph developers in investigating
CI failures. This is especially important as the Windows CI job runs
integration tests that would otherwise only be executed by
Teuthology, thus helping catch potential regressions quickly.
Note that the identified regressions are not necessarily Windows
specific, usually affecting Linux builds as well.
doc/rados: add confval directives to health-checks
Add confval directives to doc/rados/operations/health-checks.rst, as
requested by Anthony D'Atri here: https://github.com/ceph/ceph/pull/59635#pullrequestreview-2286205705
An indentation of five spaces relative to the previous line creates a
command that is copyable with a single mouse click. This commit adds
those copyabale commands to the procedure in the section "Building
Ceph".
Add a second method of changing the value of osd_deep_scrub_interval to
remedy the condition indicated by the "PGs not deep-scrubbed in time"
warning.
This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/
N Balachandran [Thu, 22 Aug 2024 08:15:36 +0000 (13:45 +0530)]
rbd-mirror: use correct ioctx for namespace
The PoolReplayer uses the ioctx for the default namespace
to check if other namespaces are enabled for mirroring, causing
it to incorrectly conclude that they are all enabled.
Fixes: https://tracker.ceph.com/issues/67676 Signed-off-by: N Balachandran <nibalach@redhat.com>
(cherry picked from commit 2346cd912ee2c5aefe5b203cc872e0528fc96a49)
doc/install: Keep the name field of the created user consistent with the node name in the Start RADOSGW service command
If the user name does not match the name of the node that started the RADOSGW service, this will cause confusion for those who are new to ceph. Because they can't start the radosgw service normally as shown in the tutorial.
doc/rados: add "pgs not deep scrubbed in time" info
Add a procedure to doc/rados/operations/health-warnings.rst that
explains how to remedy the "X PGs not deep-scrubbed in time" health
warning.
This procedure was developed by Eugen Block, and is at the time of this
commit available on his blog at
https://heiterbiswolkig.blogs.nde.ag/2024/09/06/pgs-not-deep-scrubbed-in-time/
`set_dmcrypt_no_workqueue()` from `ceph_volume.util.encryption`
The function `set_dmcrypt_no_workqueue` in `encryption.py` now
dynamically retrieves the installed cryptsetup version using `cryptsetup
--version` command. It then parses the version string using a regular
expression to accommodate varying digit counts. If the retrieved version
is greater than or equal to the specified target version,
`conf.dmcrypt_no_workqueue` is set to True, allowing for flexible version
handling.
ceph-volume: fix partitions support in disk.get_devices()
The following:
```
is_part = get_file_contents(os.path.join(_sys_dev_block_path, item, 'partition')) == "1"
```
assumes any `/sys/dev/block/x:y/partition` contains '1' which is wrong.
This file actually contains the corresponding partition number.
ceph-volume: use 'no workqueue' options with dmcrypt
CloudFlare engineers made some testing and realized that using
workqueues with encryption on flash devices has a bad effect.
See [1] for details.
With this patch it will make ceph-volume call crypsetup with
`--perf-no_read_workqueue` and `--perf-no_write_workqueue` options
when the device is not a rotational.
Edit the section "bluefs-bdev-migrate" in
doc/man/8/ceph-bluestore-tool.rst to add the information that this
operation expands the target storage by updating its size label, making
"bluefs-bdev-expand" unnecessary.
Improve the subject-verb agreement in this section, and supply some
absent definite articles.
Co-authored-by: Peter Gervai <grin@drop.grin.hu> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 6b34707f827b2b197f53fe2e430d173b30b81401)
Ilya Dryomov [Sat, 31 Aug 2024 10:33:55 +0000 (12:33 +0200)]
librbd/migration: prune snapshot extents in RawFormat::list_snaps()
list-snaps is exempt from clipping in ImageDispatcher::PreprocessVisitor
because it's considered to be an internal API. Further, reads issued
by ObjectCopyRequest based on list-snaps results may also be exempt
because of READ_FLAG_DISABLE_CLIPPING.
Since RawFormat allows specifying a set of snapshots (possibly of
varying size!) to be imported, it needs to compensate for that in its
list-snaps implementation. Otherwise, an out-of-bounds read will
eventually be submitted to the stream.
Ilya Dryomov [Fri, 30 Aug 2024 12:00:44 +0000 (14:00 +0200)]
rbd: mention namespace in "rbd mirror pool" command descriptions
Commit 5e64748927d0 ("doc/rbd: add namespace information for mirror
commands") did this for the man page, update the built-in help as well.
The "by default" bit in the description of "rbd mirror pool enable" and
"rbd mirror pool disable" commands is specific to pool mode which is in
turn specific to journal-based mirroring, so it's removed.
Make it clearer that, despite a full image or group spec being taken
for source and destination, an image or a group can be renamed only
within its pool or namespace.
Rename across pools or namespaces within the same pool is unsupported.
Zac Dover [Fri, 30 Aug 2024 11:16:57 +0000 (21:16 +1000)]
doc/ceph-volume: add spillover fix procedure
Add a procedure that explains how, after an upgrade, to move bytes that
have spilled over to a relatively slow device back to the faster device.
This procedure was developed by Chris Dunlop on the [ceph-users] mailing
list, here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/POPUFSZGXR3P2RPYPJ4WJ4HGHZ3QESF6/
Eugen Block requested the addition of this procedure to the
documentation on 30 Aug 2024.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
(cherry picked from commit 98618aaa1c8b786c7d240a210b62cc737fdb048d)
Ilya Dryomov [Fri, 23 Aug 2024 21:00:24 +0000 (23:00 +0200)]
rbd: "rbd bench" always writes the same byte
It's expected that the buffer is filled with the same byte, but the
byte should differ from run to run:
memset(bp.c_str(), rand() & 0xff, io_size);
This was broken in commit c7f71d14a5d3 ("rbd: migrated existing command
logic to new namespaces") which inadvertently moved the call to srand(),
leaving rand() unseeded for the above memset().