Kefu Chai [Sat, 28 May 2022 09:03:34 +0000 (17:03 +0800)]
debian: add .requires for specifying python3 deps
we use dh_python3 to define subvar of ${python3:Depends} as a part
of the runtime dependencies of python3 packages, like,
ceph-mgr modules named "ceph-mgr-*", python3 bindings named "python3-*".
but unlike python3 bindings of Ceph APIs, the ceph-mgr modules are
not packaged in a typical python way. in other words, they do not
ship a "dist-info" or an "egg-info" directory. instead, we just
install the python scripts into a directory which can be found by
ceph-mgr, by default it is /usr/share/ceph/mgr/dashboard/plugins.
this does not follow the convention of python packaging or
debian packaging policies related to python package. but it
still makes to put these files in this non-convention place, as
they are not supposed to be python packages consumed by the
outer world -- they are but plugins. and should always work
with the same version of ceph-mgr.
the problem is, despite that we have ${python3:Depends} in
the "Depends" field of packages like ceph-mgr-dashboard, dh_python3
is not able to figure out the dependencies by looking at the
installed files. for instance, we have following "Depends" of
ceph-mgr-dashboard:
apparently, none of the subvar is materialized to
a non-empty string.
to improve the packaging, in this change:
* drop all subvars from ceph-mgr-*, as they
are all implemented in pure python.
* add debian/ceph-mgr-*.requires, it's content
is replicated with the corresponding requirements.txt
files.
* add python3-distutils for distutils, as debian
and its derivatives package non-essetial part of
distutils into a separate package, see
https://packages.debian.org/stable/python3-distutils
* add ${python3:Depends} so dh_python3
can extract the deps from debian/ceph-mgr-*.pydist
* update the rule for "override_dh_python3" target,
so dh_python3 can pick up the dependencies specified
in .requires file.
* remove the python3 dependencies not used by
ceph-mgr from ceph-mgr's "Depends"
Kefu Chai [Sat, 28 May 2022 10:37:46 +0000 (18:37 +0800)]
debian: s/${python:Depends}/${python3:Depends}/
${python:Depends} is added by dh_python2. but we've migrated to
python3 and Ceph is not compatible with python2 anymore. let's
replace all references of python2 with python3.
Xuehan Xu [Fri, 20 May 2022 09:23:03 +0000 (17:23 +0800)]
crimson/os/seastore/segment_cleaner: add dedicated backref trimming process
Space reclamation needs to merge backrefs up to the point where the latest
release of extents within the scope of the reclamation process happened.
When the journal size is large, that merge may generate a transaction
record with size exceeds the max record size threshold. So we need have a
backref trimming process that merge most of the backrefs before the space
reclamation happens.
This commit also fixes issue: https://tracker.ceph.com/issues/55692, by
repeating the inflight backrefs trimming transaction when it's
invalidated by other trans on the ROOT block
Xiubo Li [Thu, 31 Mar 2022 07:16:49 +0000 (15:16 +0800)]
client: stop retrying the request when exceeding 256 times
The type of 'retry_attempt' in 'MetaRequest' is 'int', while in
'ceph_mds_request_head' the type of 'num_retry' is '__u8'. So in
case the request retries exceeding 256 times, the MDS will receive
a incorrect retry seq.
In this case it's ususally a bug in MDS and continue retrying the
request makes no sense. For now let's limit it to 256. In future
this could be fixed in ceph code, so avoid using the hardcode here.
Fixes: https://tracker.ceph.com/issues/55144 Signed-off-by: Xiubo Li <xiubli@redhat.com>
as per https://www.json.org/json-en.html, JSON encodes bool as
"true" or "false", without the quotes. before this change, the quotes
are always added when encoding boolean values.
but this change is not backward compatible.
encode_json()'s bool overload is used by rgw. it uses JSONObj
defined in common/ceph_json.h to decode JSON-encoded structs.
and it does not differentiate bool from str when decoding a boolean
value despite that it could have check the "quoted" member variable
of JSONObj for validating the type of value. so we should be fine.
but gcc-toolset-8-annobin provides this file. upgrading to
gcc-toolset-11 does not help. see https://centos.pkgs.org/8-stream/centos-appstream-x86_64/gcc-toolset-11-annobin-plugin-gcc-10.23-1.el8.x86_64.rpm.html
so, the intermediate solution would be to disable the plugin, if
we want to use gcc-toolset to build rpm packages.
in this change, _annotated_build is undefined to prevent the compiler
from adding extra information to the binary. in general this change
shuold be safe, without these information, it'd be hard to tell if
the binary is hardened or what ABI version it expects. see
also https://fedoraproject.org/wiki/Changes/Annobin
Rishabh Dave [Thu, 19 May 2022 18:29:25 +0000 (23:59 +0530)]
qa/cephfs: remove temporary files
These temporary files don't matter for test execution with teuthology
but they do matter for execution with vstart_runner.py since the test
fails if these files exist already. And tests are often run repeatedly
with vstart_runner.py, unlike with teuthology.
Fixes: https://tracker.ceph.com/issues/55719 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Laura Flores [Mon, 16 May 2022 22:59:42 +0000 (17:59 -0500)]
qa/suites/rados/thrash-erasure-code-big/thrashers: add `osd max backfills` setting to mapgap and pggrow
All `rados/thrash-erasure-code-big` tests that die due to the “wait_for_recovery” timeout have one thing in common: They contain either `thrashers/pggrow` or `thrashers/mapgap`.
The difference between pggrow and mapgap vs. all other non-offending thrashers (default, careful, fastread, and morepggrow) is that they lack an override setting for `osd max backfills`. `osd max backfills` is the max number of backfill operations allowed to/from an OSD. The higher the number, the quicker the recovery. By default, this value is 1. On all of the non-offending thrashers (default, careful, fastread, and morepggrow), the default 1 value gets overridden in their .yaml files with a value > 1. This is not the case for pggrow and mapgap, however, as they lack an `osd max backfills` override setting.
The mclock op scheduler is known to override `osd max backfills` with a high value, but all of the thrash-erasure-code-big thrashers have their op queue set to “debug_random”, which chooses randomly between op queues (the debug_random op queue is set to override the default mclock_scheduler in qa/config/rados.yaml). So, coupled with the “debug_random” op queue, the low `osd max backfill` setting is causing some tests to time out in recovery.
WITHOUT `osd max backfills`, as they are now, “mapgap” and “pggrow” tests die due to timed-out recovery about 17/100 times, as seen here with a pggrow test: http://pulpito.front.sepia.ceph.com/lflores-2022-05-18_14:24:29-rados:thrash-erasure-code-big-master-distro-default-smithi/
WITH `osd max backfills` specified, as I have suggested in this PR, 99/100 tests passed, with one test failing for a different reason:
http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_22:40:27-rados:thrash-erasure-code-big-master-distro-default-smithi/
I also scheduled 145 tests WITH `osd max backfills` that are a mix of pggrow and mapgap thrashers. 144/145 tests passed, with one test failing for a different reason. http://pulpito.front.sepia.ceph.com/lflores-2022-05-17_15:27:54-rados:thrash-erasure-code-big-master-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/51076 Signed-off-by: Laura Flores <lflores@redhat.com>
Adam King [Fri, 1 Apr 2022 12:20:28 +0000 (08:20 -0400)]
mgr/cephadm: make UpgradeState from_json a bit safer
This way, for downgrades to whatever versions
this lands in onward, having added new parameters to
UpgradeState shouldn't break anything. Can't do much
about downgrades to older versions from this one
but this should help in the future.
Adam King [Mon, 28 Mar 2022 16:10:15 +0000 (12:10 -0400)]
mgr/cephadm: split _do_upgrade into sub functions
This function was around 500 lines and difficult to work
with. Splitting it into sub functions should hopefully make
it a bit easier to understand and make changes to.
Rishabh Dave [Thu, 19 May 2022 15:33:54 +0000 (21:03 +0530)]
cephfs-shell: check version before importing Cmd2ArgparseError
Cmd2ArgparseError is available only cmd2 version 1.0.1 onwards. Before
that, SystemExit(2) is raised. This commit creates an empty class
Cmd2ArgparseError for earlier version so that similar error won't creep
up again.
Fixes: https://tracker.ceph.com/issues/55716 Signed-off-by: Rishabh Dave <ridave@redhat.com>
Soumya Koduri [Fri, 6 May 2022 17:10:12 +0000 (22:40 +0530)]
rgw/qa: Run tests on multiple cloudtier config
Run cloudtier tests with parameter 'retain_head_object'
set to true and false.
However having multiple cloudtier storage classes in the same task
is increasing the transition time and resulting in spurious failures.
Hence until there is a consistent way of running the tests, without
having to depend on lc_debug_interval, disabled one of the config for
now.