Adam King [Mon, 23 May 2022 19:57:14 +0000 (15:57 -0400)]
mgr/cephadm: store device info separately from rest of host cache
device info tends to take up the most space out of
everything, so the hope is by giving it its own
location in the config key store we can avoid hitting
issues where the host cache value we attempt to
place in the config key store exceeds the size limit
Fixes: https://tracker.ceph.com/issues/54251 Fixes: https://tracker.ceph.com/issues/53624 Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit e35d4144d380cef190a04517b4d7b30d520d5b4f)
Redouane Kachach [Tue, 31 May 2022 10:59:26 +0000 (12:59 +0200)]
mgr/cephadm: capture exception when not able to list upgrade tags Fixes: https://tracker.ceph.com/issues/55801 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0e7a4366c0c1edd74d52acad5ed4dc3df0ef7679)
Zac Dover [Tue, 14 Jun 2022 22:15:33 +0000 (08:15 +1000)]
doc/dev: s/master/main/ in basic workflow
This PR changes "master" to "main" in the
basic_workflow.rst file. I have even changed
"master" to "main" in some terminal output from
several years ago. This isn't historically ac-
curate, of course, but my hope is that this change
will prevent someone in the future from being con-
fused about why an antiquated branch name is ref-
erred to.
Zac Dover [Mon, 13 Jun 2022 21:48:46 +0000 (07:48 +1000)]
doc/dev: s/master/main/ essentials.rst dev guide
This PR changes all reference to the "master" branch
to references to the "main" branch (because we renamed
"master" to main", and the docs now need to reflect that).
Zac Dover [Sun, 12 Jun 2022 23:41:28 +0000 (09:41 +1000)]
doc/start: rewrite CRUSH para
This PR supersedes https://github.com/ceph/ceph/pull/46584
and makes changes suggested by Anthony D'Atri that improve
the coherence and consistency of the paragraph that explains
the basics of the CRUSH algorithm.
Zac Dover [Wed, 8 Jun 2022 19:19:16 +0000 (05:19 +1000)]
doc/start: make OSD and MDS structures parallel
This PR makes the "Ceph OSDs" and "MDSs" bullet points
parallel by naming "object storage daemon" before referring
to the (admittedly more common and colloquial, but surely
unknown to people who genuinely require a document called
'Intro') acronym "OSD".
Zac Dover [Mon, 13 Jun 2022 04:34:36 +0000 (14:34 +1000)]
doc/start: rewrite hardware-recs networks section
This rewrites the first two-thirds of the "Networks"
section of the Hardware Recommendations page in the
Intro to Ceph document. I have tried to divide the
techincal content in this section into subsections
that foreground the various subjects covered.
We really want to have the ability to know how many
entries `PGLog::IndexedLog::dups` has inside.
The current ways are either invasive (stopping an OSD)
or indirect (examination of `dump_mempools`).
Although the chunking in off-line `dups` trimming (via COT) seems
fine, the `ceph-objectstore-tool` is a client of `trim()` of
`PGLog::IndexedLog` which means than a partial revert is not
possible without extensive changes.
The backport ticket is: https://tracker.ceph.com/issues/55981
Revert "osd/PGLog.cc: Trim duplicates by number of entries"
This reverts commit 3ff0df6a28a1d9e197bdba40be7126fed8a14ae9
which is the in-OSD part of the fix for accumulation of `dup`
entries in a PG Log. Brainstorming it has brought questions
on the OSD's behaviour during an upgrade if there are tons of
dups in the log. What must be double-checked before bringing
it back is ensuring we chunk the deletions properly to not
impose OOMs / stalls in, to exemplify, RocksDB.
The backport ticket is: https://tracker.ceph.com/issues/55981
cephadm: fix osd adoption with custom cluster name
When adopting Ceph OSD containers from a Ceph cluster with a custom name, it fails
because the name isn't propagated in unit.run.
The idea here is to change the lvm metadata and enforce 'ceph.cluster_name=ceph'
given that cephadm doesn't support custom names anyway.
Fixes: https://tracker.ceph.com/issues/55654 Signed-off-by: Adam King <adking@redhat.com> Co-authored-by: Guillaume Abrioux <gabrioux@redhat.com>
(cherry picked from commit e720a658d6a1582c0497bdf709ef4bd26bb5bb73)
qa: set, get, list and remove custom metadata for snapshot
Following test are added:
1. Set custom metadata for subvolume snapshot.
2. Set custom metadata for subvolume snapshot(Idempotency).
3. Get custom metadata for specified key.
4. Get custom metadata if specified key not exist (Expecting error ENOENT).
5. Get custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
6. Update value for existing key in custom metadata.
7. List custom metadata of subvolume snapshot.
8. List custom metadata of subvolume snapshot if no any key-value is added (Expect empty json/dictionary)
9. Remove custom metadata for specified key.
10. Remove custom metadata if specified key not exist (Expecting error ENOENT).
11. Remove custom metadata if no any key-value is added means section not exist (Expecting error ENOENT).
12. Remove custom metadata with --force option.
13. Remove custom metadata with --force option if specified key not exist (Expecting command to succeed because of '--force' option)
14. Remove subvolume snapshot and verify whether metadata for snapshot is removed or not
docs: set, get, list and remove custom metadata for snapshot
Set custom metadata on the snapshot as a key-value pair using
$ ceph fs subvolume snapshot metadata set <vol_name> <subvol_name> <snap_name> <key_name> <value> [--group_name <subvol_group_name>]
note: If the key_name already exists then the old value will get replaced by the new value.
note: The key_name and value should be a string of ASCII characters (as specified in python's string.printable). The key_name is case-insensitive and always stored in lower case.
note: Custom metadata on a snapshots is not preserved when snapshotting the subvolume, and hence, is also not preserved when cloning the subvolume snapshot.
Get custom metadata set on the snapshot using the metadata key::
$ ceph fs subvolume snapshot metadata get <vol_name> <subvol_name> <snap_name> <key_name> [--group_name <subvol_group_name>]
List custom metadata (key-value pairs) set on the snapshot using::
$ ceph fs subvolume snapshot metadata ls <vol_name> <subvol_name> <snap_name> [--group_name <subvol_group_name>]
Remove custom metadata set on the snapshot using the metadata key::
$ ceph fs subvolume snapshot metadata rm <vol_name> <subvol_name> <snap_name> <key_name> [--group_name <subvol_group_name>] [--force]
Using the '--force' flag allows the command to succeed that would otherwise fail if the metadata key did not exist.
mgr/volumes: set, get, list and remove custom metadata for snapshot
If CephFS in ODF configured in external mode, user like to use
subvolume snapshot metadata to store some Openshift specific
information, as the PVC/PV/namespace the subvolumes/snapshot
are coming from. For RBD volumes, it's possible to add metadata
information to the images using the 'rbd image-meta' command.
However, this feature is not available for CephFS volumes.
We'd like to request this capability.
Xiubo Li [Tue, 19 Apr 2022 06:21:49 +0000 (14:21 +0800)]
mds: trigger to flush the mdlog in handle_find_ino()
If the the CInode was just created by using openc in current
auth MDS, but the client just sends a getattr request to another
replica MDS. Then here it will make a path of '#INODE-NUMBER'
only because the CInode hasn't been linked yet, and the replica
MDS will keep retrying until the auth MDS flushes the mdlog and
the C_MDS_openc_finish and link_primary_inode are called at most
5 seconds later.
Xiubo Li [Tue, 12 Apr 2022 11:40:02 +0000 (19:40 +0800)]
qa: add file sync stuck test support
This will test the file sync of a directory, which maybe stuck for
at most 5 seconds. This was because the related code will wait for
all the unsafe requests to get safe reply from MDSes, but the MDSes
just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Xiubo Li [Tue, 12 Apr 2022 04:37:13 +0000 (12:37 +0800)]
qa: add filesystem sync stuck test support
This will test the sync of the filesystem, which maybe stuck for
at most 5 seconds. This was because the related code will wait
for all the unsafe requests to get safe reply from MDSes, but the
MDSes just think that it's unnecessary to flush the mdlog immediately
after early reply, and the mdlog will be flushed every 5 seconds
in the tick thread.
This should have been fixed in kclient and libcephfs by triggering
mdlog flush before waiting requests' safe reply.
Ronen Friedman [Tue, 17 May 2022 16:13:59 +0000 (16:13 +0000)]
osd/scrub: restart snap trimming after a failed scrub
A followup to PR#45640.
In PR#45640 snap trimming was restarted (if blocked) after all
successful scrubs, and after most scrub failures. Still, a few
failure scenarios did not handle snaptrim restart correctly.
The current PR cleans up and fixes the interaction between
scrub initiation/termination (for whatever cause) and snap
trimming.
Jos Collin [Wed, 4 May 2022 13:03:12 +0000 (18:33 +0530)]
qa: fix is_addr_blocklisted() to get blocklisted clients from 'osd dump'
By the introduction of range blocklist, the 'blocklist ls' command outputs
two lists. It's also straightforward to get the blocklisted clients directly
from 'osd dump' to avoid regression.
Fixes: https://tracker.ceph.com/issues/55516 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 47de5d79b8190458847072aae1c29db7d6a9b66b)
Xiubo Li [Wed, 16 Mar 2022 09:15:57 +0000 (17:15 +0800)]
mds, client: only send the metrices supported by MDSes
For the old ceph clusters the clients won't send any metrics to
them as default unless they have backported this commit, but there
has one option 'client_collect_and_send_global_metrics' still could
be used to enable it manually.
This will fix the crash bug when upgrading from old ceph clusters,
which will crash the MDSes once they receive unknown metrics.
Greg Farnum [Tue, 30 Nov 2021 18:29:46 +0000 (18:29 +0000)]
osd: Check range_blocklist in is_blocklisted(): we actually blocklist ranges
Carry a parallel map from cidr addresses to a new
range_bits class (stored entirely as ephemeral state) so that we
don't need to re-compute masks and bit mappings too often, and to
separate out the unpleasant ipv6 bit mapping logic. Then check
against those with range_bits::matches() the same way we check
for equality on specific-entity matches. Nice and simple loops!