Kefu Chai [Tue, 30 Mar 2021 18:32:38 +0000 (02:32 +0800)]
mgr/PyModule: put mgr_module_path before Py_GetPath()
pip comes with _vendor/progress. so there is chance to import the vendored
version of "progress" module instead of the "progress" mgr module, and
fail to import the latter.
in this change, the order of paths are rearranged so the configured
`mgr_module_path` is put before the return value of `Py_GetPath()`.
Kefu Chai [Thu, 25 Feb 2021 09:56:02 +0000 (17:56 +0800)]
pybind/mgr/dashboard: bump up requests to 2.25.1
request 2.20 is not compatible with urllib3 v1.25.2 and up. this causes
trouble of incompatibility with other python modules. for instance, we
now have following error:
ERROR: pip's dependency resolver does not currently take into account
all the packages that are installed. This behaviour is the source of the
following dependency conflicts.
botocore 1.20.14 requires urllib3<1.27,>=1.25.4, but you have urllib3
1.24.3 which is incompatible.
see also https://github.com/psf/requests/pull/5092
Kefu Chai [Sat, 12 Dec 2020 07:19:40 +0000 (15:19 +0800)]
admin/build-doc: stop passing --use-feature=2020-resolver to pip
to silence the warning of
WARNING: --use-feature=2020-resolver no longer has any effect, since it is now the default dependency resolver in pip. This will become an error in pip 21.0.
Kefu Chai [Fri, 19 Mar 2021 04:05:45 +0000 (12:05 +0800)]
pybind/mgr/dashboard: bump flake8 to 3.9.0
to address the failure of
ERROR: Cannot install -r requirements-lint.txt (line 2) and -r requirements-lint.txt (line 8) because these package versions have conflicting dependencies.
The conflict is caused by:
flake8 3.8.4 depends on pycodestyle<2.7.0 and >=2.6.0a1
autopep8 1.5.6 depends on pycodestyle>=2.7.0
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict
also, loosen the version of pytest:
The conflict is caused by:
The user requested pytest<4
The user requested pytest<4
pytest-cov 2.11.1 depends on pytest>=4.6
To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency
conflict
Kefu Chai [Sat, 16 Jan 2021 06:33:17 +0000 (14:33 +0800)]
mgr: update mon metadata when monmap is updated
there is chance that some monitor(s) is updated / upgraded in a single
monmap update without being removed from cluster state's metata first,
so, without this change, we will not update the metadata associated with
that monitor, hence the mgr modules which consumes the metadata is not
updated accordingly and keep reporting the stale information.
in this change, we always update the metadata associated with all monitor
included by the latest monmap. multiple "mon metadata" commands are sent
to monitor for retrieving their updated metadata, instead of sending a
single one, so that we can reuse "MetadataUpdate" to update the metadata
of a given daemon. as the number of monitors in a typical cluster is
relatively small, and the frequency of monmap update is low, so this
overhead should be fine.
unlike other places where we ask mon for metadata in Mgr class, the code
sending the mon command for updated monitor metata is located outside of
`cluster_state.with_monmap()` block, the reason is that `with_monmap()`
is guraded by the monc_lock under the hood, while `start_mon_command()`
also need to acquire the monc_lock, which is not a recursive lock. so we
have to do this out of the `with_monmap()` block.
Kefu Chai [Thu, 25 Mar 2021 09:08:48 +0000 (17:08 +0800)]
run-make-check.sh: let ctest generate XML output
to enable XUnit plugin of jenkins to consume the ctest output and
publish it in the dashboard, we need to
* let ctest generate XML output instead of plain text output
* do not fail the test if any test case fails. this allows the publisher
to do its job by checking the XML output.
* prevent ctest from compressing the output. see
https://issues.jenkins.io/browse/JENKINS-21737
Kamoltat [Mon, 8 Feb 2021 15:45:06 +0000 (15:45 +0000)]
qa/tasks/mgr/test_progress: fix wait_until_equal
Octopus ceph_test_case doesn't have period arg
so remove that in wait_until_equal. Also increase
time to wait for complete events by using RECOVERY_PERIOD
instead of EVENT_CREATION_PERIOD
Not needed in masters because only octopus and nautilus
doesn't have a period argument in qa/tasks/mgr/test_progress.py
wait_until_equals() function
The osd_fast_shutdown option may cause the cluster log to receive
too many entries of 'osd.X reported immediately failed by osd.Y',
depending on cluster scale.
This might be an issue for LMA stacks/tools that check ceph logs
for failed lines, and then require additional logic to filter on
an intended OSD (fast) shutdown; might not be an option/possible,
and require an admin to analyze.
So, add osd_fast_shutdown_notify_mon option for OSD to also tell
the monitor it is shutting down (done in slow/non-fast shutdown)
under osd_fast_shutdown.
This introduces minimal delay (the ack from the mon is required
to prevent the messages), and addresses the cluster log issue.
Note: the osd_mon_shutdown_timeout option can be used to control
the maximum amount of time waiting for the monitor ack to arrive.
Fixes: http://tracker.ceph.com/issues/46978 Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
(cherry picked from commit c75734729764868c5c501722fc8de08dac9ebd4a)
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus, octopus: v1.72
- pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Sage Weil [Wed, 24 Feb 2021 20:59:57 +0000 (14:59 -0600)]
mon/OSDMonitor: fix safety/idempotency of {set,rm}-device-class
If the command is resent (e.g., due to network reconnect), the second
instance may find that the pending crush map already has the changes
and not wait for it to commit.
Note that the stderr message will be misleading in this case; that is a
problem with most of our mon commands. :(
Sage Weil [Sat, 13 Mar 2021 16:34:43 +0000 (11:34 -0500)]
osd: propagate base pool application_metadata to tiers
If there is application metadata on the base pool, it should be mirrored
to any other tiers in the set. This aligns with the fact that the
'ceph osd pool application ...' commands refuse to operate on a non-base
pool.
This fixes problems with accessing tiers (e.g., cache tiers) when the
cephx cap is written in terms of application metadata.
delete the part where _osd_in_out_completed_events_count()
was called in test_osd_cannot_recover() and revert to initial
state of the function since we don't need to use this function
in octopus. Also delete a duplicate of _osd_in_out_events_count().
This must be added by mistake in #39289 as well.
No need to fix for the backport in Nautilus: #38173
since the bugs are occured by adding additional code to
the cherry-pick specifically for Octopus.
Ilya Dryomov [Wed, 17 Mar 2021 10:00:33 +0000 (11:00 +0100)]
qa: krbd_blkroset.t: update for separate hw and user read-only flags
Since kernel 5.12, hardware read-only state and user read-only
policy (BLKROGET/SET ioctls) are tracked separately in the block
layer. As the purpose of our ->set_read_only() method was exactly
that, it was removed.
As a side effect, BLKROSET no longer returns EROFS on an attempt
to make a read-only mapping read-write with "blockdev --setrw".
The policy gets updated, but the device remains read-only as before
because the hardware (== mapping) state is controlled by the driver.
Neha Ojha [Tue, 9 Mar 2021 00:48:58 +0000 (00:48 +0000)]
pybind/mgr/balancer/module.py: assign weight-sets to all buckets before balancing
Add an additional check to make sure that the choose_args section has the same
number of buckets as the crushmap. If not, ensure that
get_compat_weight_set_weights assigns weight-sets to all buckets.
Without this change, if we end up with an orig_ws, which has fewer buckets
than the crushmap, the mgr will crash due a KeyError in do_crush_compat().
Yuval Lifshitz [Tue, 17 Nov 2020 11:31:59 +0000 (13:31 +0200)]
rgw/notification: trigger notifications on changes from any user
any user authorized to make changes to a bucket may trigger
notifications defined on that bucket.
manual test procedure of the fix is described here:
https://gist.github.com/yuvalif/39c183aa0f74d286ecef7844268817df
Xuehan Xu [Sun, 7 Feb 2021 04:40:36 +0000 (12:40 +0800)]
mgr: relax osd ok-to-stop condition on degraded pgs
Right now, the "ok-to-stop" condition is relatively rigorous, it allows
stopping an osd only if no PG on it is non-active or degraded. But there
are situations in which an OSD is part of a degraded pg and the pg still
still have > min_size complete replicas after the OSD is stopped.
In 9750061d5d4236aaba156d60790e0b8bcd7cfb64, we changed from considering
just acting to using avail_no_missing (OSDs that have no missing objects).
When the projected pg_acting is constructed this way, we can safely compare
to min_size... even for a PG marked degraded.