Xiubo Li [Mon, 26 Apr 2021 08:24:31 +0000 (16:24 +0800)]
qa: use the pg autoscale mode to calcuate the pg_num
Setting the pg_num to 8 is too small that some osds maybe not covered by the
pools, some osds maybe overloaded. Remove the hardcodeing pg_num here and let
the pg autoscale mode to calculate it as needed, and at the same time set the
pg_num_min to 64 to avoid the pg_num to small.
If ec pool is used, for the test cases most datas will go to the ec pool and
the primary replicated pool will store a small amount of metadata for all the
files only, so set the target size ratio to 0.05 should be enough.
Greg Farnum [Thu, 17 Jun 2021 19:56:20 +0000 (19:56 +0000)]
mon: Sanely set the default CRUSH rule when creating pools in stretch mode
If we get a pool create request while in stretch mode that does not explicitly
specify a crush rule, look at the stretch-mode pools and their rules, and
select the most common one.
Also update set_up_stretch_mode.sh to add a few more rules that let me test
this locally.
Casey Bodley [Tue, 10 Aug 2021 19:40:25 +0000 (15:40 -0400)]
cls/cmpomap: empty values are 0 in U64 comparisons
previously, when trying to use cmpomap interfaces on an omap key with
an empty value, U64 comparisons would fail to decode with -EIO. so
cmp_set_vals() and cmp_rm_keys() are unable to update or remove such
keys
for backward-compatibility with rgw's data sync error repo, where the
keys used to have empty values, enable these comparisons by treating an
empty value as 0
Adam Kupczyk [Wed, 14 Jul 2021 21:35:12 +0000 (23:35 +0200)]
kv/RocksDBStore: Add handling of block_cache option for resharding
Synchronized all situations when we initialize DB to include handling of block_cache option.
Lack of it prevented ability to reshard into specification that we have as default.
Conflicts:
src/kv/RocksDBStore.cc
Trivial conflict, related to gist of the change. No logic involved in resolving.
Deepika Upadhyay [Wed, 23 Jun 2021 05:12:38 +0000 (10:42 +0530)]
mon/PGMap: DIRTY field as N/A in `df detail` when cache tier not in use
'ceph df detail' reports a column for DIRTY objects under POOLS even
though cache tiers not being used. In replicated or EC pool all objects
in the pool are reported as logically DIRTY as they have never been
flushed .
we display N/A for DIRTY objects if the pool is not a cache tier.
Kefu Chai [Tue, 17 Aug 2021 07:53:51 +0000 (15:53 +0800)]
mgr/dashboard/api: set a UTF-8 locale when running pip
ansible-core started to include files whose filenames are encoded in
non-ascii characters, so we have to use a more capable encoding for the
locale in order to install this package. otherwise we'd have following
error:
Collecting ansible-core<2.12,>=2.11.3
Using cached ansible-core-2.11.4.tar.gz (6.8 MB)
ERROR: Exception:
Traceback (most recent call last):
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
...
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
rgw/sts: correcting the evaluation of session policies
passed in with AssumeRoleWithWebIdentity.
Session Policies are used to restrict the permissions
granted by identity-based (Role's permission policy
and resource-policy (bucket policy) in some cases.
Nizamudeen A [Mon, 9 Aug 2021 07:52:51 +0000 (13:22 +0530)]
mgr/dashboard: Refresh button on the iscsi targets page
Added a refresh button on the iscsi targets page. I've noticed that the
auto reload causes some load on the backend. So I disabled the auto
reload and went for the same approach as we have on rgw. A yellow
warning color on refresh btn to let the user know to manually refresh
that whenever needed.
Patrick Donnelly [Tue, 30 Mar 2021 21:26:08 +0000 (14:26 -0700)]
mon,mds: use per-MDS compat to inform replacement
This diff makes the following changes:
- FSMap::compat is now just a "default compat" of currently unknown
utility. It is used when constructing a new file system but does
not really have any effect or current use.
- The `mds compat *` CLI commands are deprecated. They manipulate
the default compat which has no useful effect.
- Each MDS sends its compat to the mons in its beacon. This is from
MDSMap::get_compat_set_all() at MDS boot. This CompatSet does not
change for the duration of the MDS lifetime.
- Mons record each MDS compat in the FSMap to inform standby failover.
An MDS is only promoted if it is compatible with the file system
compat.
- Mons upgrade (merge) the file system compat when (a) the number of
*in* MDS is 1 (effected by max_mds=1) and (b) the mons are promoting a
standby with a new compat. A file system is never upgraded when there
is more than 1 rank to prevent two MDS with incompatible compat.
- A suite of `fs compat` commands exist to manipulate the file system
compat. These exist mostly for testing.
The consequence of these changes is that the upgrade procedure for MDS
can be updated to no longer require turning off all MDS but rank 0
before performing any upgrades. A CompatSet change would cause all MDS
receiving the new MDSMap to suicide due to incompatibility (if so).
Instead, the monitors will no longer assign an incompatible MDS to a
file system and enforce an upgrade procedure if incompatibilities exist.
Fixes: https://tracker.ceph.com/issues/49720 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 58eaa237b0a16d3c934ded77ed4dc53137d9b4a1)
Conflicts:
src/mds/FSMap.h: trivial
src/mon/MDSMonitor.cc: trivial
src/mon/MonCommands.h: work around removed OBSOLETE commands
Patrick Donnelly [Tue, 30 Mar 2021 21:07:46 +0000 (14:07 -0700)]
mon: do not update inline incompat except via mds
The MDS_FEATURE_INCOMPAT_INLINE feature indicates that an MDS knows how
to read/write inline data and that the file system may have it. The
separate setting for inline_data protects this file system feature.
Adam Kupczyk [Mon, 9 Aug 2021 13:59:46 +0000 (15:59 +0200)]
os/bluestore: Better handling of deferred write trigger
Now deferred write in _do_alloc_write does not depend on blob size,
but on size of extent allocated on disk.
It is now possible to set bluestore_prefer_deferred_size way larger than
bluestore_max_blob_size and still get desired behavior.
Example: for deferred=256K, blob=64K : when op write is 128K both blobs will be
written as deferred. When op write is 256K then all will go as regular write.
See Rook issue https://github.com/rook/rook/issues/7940 for full
information.
Ceph bluestore disks can sometimes appear as though they have "phantom"
Atari (AHDI) partitions created on them when they don't in reality. This
is due to a series of bugs in the Linux kernel when it is built with
Atari support enabled. This behavior does not appear for raw mode OSDs on
partitions, only on disks.
Changing the on-disk format of Bluestore OSDs comes with
backwards-compatibility challenges, and fixing the issue in the Kernel
could be years before users get a fix. Working around the Kernel issue
in ceph-volume is therefore the best place to fix the issue for Ceph.
To work around the issue in Ceph volume, there are two behaviors that need
adjusted:
1. `ceph-volume inventory` should not report that a partition is
available if the parent device is a BlueStore OSD.
2. `ceph-volume raw list` should report parent disks if the disk is a
BlueStore OSD and not report the disk's children, BUT it should still
report children if the parent disk is not a BlueStore OSD.
Using only the exit status of `ceph-bluestore-tool show-label` to
determine if a device is a bluestore OSD could report a false negative
if there is a system error when `ceph-bluestore-tool` opens the device.
A better check is to open the device and read the bluestore device
label (the first 22 bytes of the device) to look for the bluestore
device signature ("bluestore block device"). If ceph-volume fails to
open the device due to a system error, it is safest to assume the device
is BlueStore so that an existing OSD isn't overwritten.
Sage Weil [Mon, 9 Aug 2021 18:15:28 +0000 (14:15 -0400)]
cephadm: fix container name detection
'enter' was broken because we weren't correctly identifying the container
name. Strip the newline from the inspect result so that we can reliably
match against the 'running' state.