Kamoltat [Fri, 25 Jun 2021 22:40:43 +0000 (22:40 +0000)]
pybind/mgr/autoscaler: don't scale pools with overlapping roots
In the previous version of get_subtree_resource_status() in
src/pybind/mgr/pg_autoscaler/module.py we ignore overlapping
pools which in some cases if combined with the new `scale-down`
algorithm in https://github.com/ceph/ceph/pull/38805 can cause
some pools to scale up/down to inapproriate amount of pgs.
Therefore, the PR identifies the overlapping roots and prevent the pools
with such roots from scaling. This only happens with `scale-down` profile
as we see no problem with the default `scale-up` profile.
Removed the variable `pool_root` since it is not used anywhere in
the code, it only gets assigned and reassigned
Also included a unit test test_overlapping_roots.py that tests the function
identify_subtrees_and_overlaps() as well as edited test_cal_final_pg_target.py
to account for pools that contain overlapping roots, therefore, those pools
are expected not to scale.
Kefu Chai [Mon, 28 Jun 2021 04:28:17 +0000 (12:28 +0800)]
pybind/mgr/pg_autoscaler: extract CrushSubtreeResourceStatus out
as it also serves as part of interface of get_subtree_resource_status(),
not only its internals. to ease adding the type annotations, this class
is promoted out of the class.
The autoscaler by default will start out each pool with minimal
pgs and `scale-up` the pgs when there is more usage in each pool.
Users can now use the commands:
`osd pool set autoscale-profile scale-down` to make the pools
start out with a full complement of pgs and only `scale-down`
when usage ratio across the pools are not even.
`osd pool set autoscale-profile scale-up` (by default) to make the pools
start out with minimal pgs and `scale-up` the pgs when there
is more usage in each pool.
Edited KVMonitor.cc file to make the `autoscale_profile` variable
persistent.
Edited tests/test_cal_final_pg_target.py so that it takes into account
the new `profile` argument when calling cal_final_pg_target(). Also,
added some new test cases for when profile is `scale-up`
Renamed tests/test_autoscaler.py to a more appropriate name:
tests/test_cal_ratio.py
Kamoltat [Thu, 7 Jan 2021 15:39:19 +0000 (15:39 +0000)]
mgr/pg_autoscaler: avoid scale-down until there is pressure
The autoscaler will start out with scaling each
pools to have a full complements of pgs from the start
and will only decrease it when pools need more due to
increased usage.
Introduced a unit test that tests only the
function get_final_pg_target_and_ratio() which
deals with the distrubtion of pgs amongst the
pools
Edited workunit script to reflect the change
of how pgs are calculated and distrubted.
Greg Farnum [Thu, 17 Jun 2021 19:56:20 +0000 (19:56 +0000)]
mon: Sanely set the default CRUSH rule when creating pools in stretch mode
If we get a pool create request while in stretch mode that does not explicitly
specify a crush rule, look at the stretch-mode pools and their rules, and
select the most common one.
Also update set_up_stretch_mode.sh to add a few more rules that let me test
this locally.
Casey Bodley [Tue, 10 Aug 2021 19:40:25 +0000 (15:40 -0400)]
cls/cmpomap: empty values are 0 in U64 comparisons
previously, when trying to use cmpomap interfaces on an omap key with
an empty value, U64 comparisons would fail to decode with -EIO. so
cmp_set_vals() and cmp_rm_keys() are unable to update or remove such
keys
for backward-compatibility with rgw's data sync error repo, where the
keys used to have empty values, enable these comparisons by treating an
empty value as 0
Ramana Raja [Mon, 28 Jun 2021 23:39:10 +0000 (19:39 -0400)]
mds: create file system with specific ID
File system will need to be recreated when monitor databases are lost
and rebuilt. Some applications (e.g., CSI) expect that the recovered
file system have the same ID as before. Allow creating a file system
with a specific ID to help in such scenarios. This can now be done by
the `fs new` command using the argument 'fscid' and 'force' flag.
Newer file systems will no longer have increasing IDs as a corollary.
Adam Kupczyk [Wed, 14 Jul 2021 21:35:12 +0000 (23:35 +0200)]
kv/RocksDBStore: Add handling of block_cache option for resharding
Synchronized all situations when we initialize DB to include handling of block_cache option.
Lack of it prevented ability to reshard into specification that we have as default.
Conflicts:
src/kv/RocksDBStore.cc
Trivial conflict, related to gist of the change. No logic involved in resolving.
Deepika Upadhyay [Wed, 23 Jun 2021 05:12:38 +0000 (10:42 +0530)]
mon/PGMap: DIRTY field as N/A in `df detail` when cache tier not in use
'ceph df detail' reports a column for DIRTY objects under POOLS even
though cache tiers not being used. In replicated or EC pool all objects
in the pool are reported as logically DIRTY as they have never been
flushed .
we display N/A for DIRTY objects if the pool is not a cache tier.
Kefu Chai [Tue, 17 Aug 2021 07:53:51 +0000 (15:53 +0800)]
mgr/dashboard/api: set a UTF-8 locale when running pip
ansible-core started to include files whose filenames are encoded in
non-ascii characters, so we have to use a more capable encoding for the
locale in order to install this package. otherwise we'd have following
error:
Collecting ansible-core<2.12,>=2.11.3
Using cached ansible-core-2.11.4.tar.gz (6.8 MB)
ERROR: Exception:
Traceback (most recent call last):
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
...
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
rgw/sts: correcting the evaluation of session policies
passed in with AssumeRoleWithWebIdentity.
Session Policies are used to restrict the permissions
granted by identity-based (Role's permission policy
and resource-policy (bucket policy) in some cases.
Nizamudeen A [Mon, 9 Aug 2021 07:52:51 +0000 (13:22 +0530)]
mgr/dashboard: Refresh button on the iscsi targets page
Added a refresh button on the iscsi targets page. I've noticed that the
auto reload causes some load on the backend. So I disabled the auto
reload and went for the same approach as we have on rgw. A yellow
warning color on refresh btn to let the user know to manually refresh
that whenever needed.
Patrick Donnelly [Tue, 30 Mar 2021 21:26:08 +0000 (14:26 -0700)]
mon,mds: use per-MDS compat to inform replacement
This diff makes the following changes:
- FSMap::compat is now just a "default compat" of currently unknown
utility. It is used when constructing a new file system but does
not really have any effect or current use.
- The `mds compat *` CLI commands are deprecated. They manipulate
the default compat which has no useful effect.
- Each MDS sends its compat to the mons in its beacon. This is from
MDSMap::get_compat_set_all() at MDS boot. This CompatSet does not
change for the duration of the MDS lifetime.
- Mons record each MDS compat in the FSMap to inform standby failover.
An MDS is only promoted if it is compatible with the file system
compat.
- Mons upgrade (merge) the file system compat when (a) the number of
*in* MDS is 1 (effected by max_mds=1) and (b) the mons are promoting a
standby with a new compat. A file system is never upgraded when there
is more than 1 rank to prevent two MDS with incompatible compat.
- A suite of `fs compat` commands exist to manipulate the file system
compat. These exist mostly for testing.
The consequence of these changes is that the upgrade procedure for MDS
can be updated to no longer require turning off all MDS but rank 0
before performing any upgrades. A CompatSet change would cause all MDS
receiving the new MDSMap to suicide due to incompatibility (if so).
Instead, the monitors will no longer assign an incompatible MDS to a
file system and enforce an upgrade procedure if incompatibilities exist.
Fixes: https://tracker.ceph.com/issues/49720 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
(cherry picked from commit 58eaa237b0a16d3c934ded77ed4dc53137d9b4a1)
Conflicts:
src/mds/FSMap.h: trivial
src/mon/MDSMonitor.cc: trivial
src/mon/MonCommands.h: work around removed OBSOLETE commands
Patrick Donnelly [Tue, 30 Mar 2021 21:07:46 +0000 (14:07 -0700)]
mon: do not update inline incompat except via mds
The MDS_FEATURE_INCOMPAT_INLINE feature indicates that an MDS knows how
to read/write inline data and that the file system may have it. The
separate setting for inline_data protects this file system feature.
Adam Kupczyk [Mon, 9 Aug 2021 13:59:46 +0000 (15:59 +0200)]
os/bluestore: Better handling of deferred write trigger
Now deferred write in _do_alloc_write does not depend on blob size,
but on size of extent allocated on disk.
It is now possible to set bluestore_prefer_deferred_size way larger than
bluestore_max_blob_size and still get desired behavior.
Example: for deferred=256K, blob=64K : when op write is 128K both blobs will be
written as deferred. When op write is 256K then all will go as regular write.