Xiubo Li [Wed, 23 Nov 2022 05:24:38 +0000 (13:24 +0800)]
qa: switch to https protocol for repos' server
Since the git:// is not reachable any more and have switch to
https://.
The git archive does not support the https protocol, so we couldn't
user the git archive to retrieve the tar ball any more, will split
this into 3 steps:
1, clone the whole ceph repo
2, checkout the commit/tag/branch
3, then change directory to qa/workunits/.
Signed-off-by: Xiubo Li <xiubli@redhat.com>
(cherry picked from commit 89177d65988c56324916de8394089b6e4b38aab7)
Conflicts:
- qa/workunits/fs/snaps/snaptest-git-ceph.sh: minor conflicts
- qa/machine_types/schedule_subset.sh: no need to fix this
- qa/tasks/cephfs/xfstests_dev.py: minor confilicts
Kamoltat [Wed, 14 Dec 2022 19:54:00 +0000 (19:54 +0000)]
mon/Monitor.cc: notify_new_monmap() skips removal of non-exist rank
Problem:
In RHCS the user can choose to manually remove a monitor rank
before shutting the monitor down. Causing inconsistency in monmap.
for example we remove mon.a from the monmap, there is a short period
where mon.a is still operational and will try to remove itself from
monmap but we will run into an assertion in
ConnectionTracker::notify_ranks_removed().
Solution:
In Monitor::notify_new_monmap() we prevent the func
from going into removing our own rank, or
ranks that doesn't exists in monmap.
FYI: this is an RHCS problem only, in ODF,
we never remove a monitor from monmap
before shutting it down.
--mon-initial-members does nothing but causes monmap
to populate ``removed_ranks`` because the way we start
monitors in standalone tests uses ``run_mon $dir $id ..``
on each mon. Regardless of --mon-initial-members=a,b,c, if
we set --mon-host=$MONA,$MONB,$MONC (which we do every single tests),
everytime we run a monitor (e.g.,run mon.b) it will pre-build
our monmap with
Now, with --mon-initial-members=a,b,c we are letting
monmap know that we should have initial members name:
a,b,c, which we only have `b` as a match. So what
``MonMap::set_initial_members`` do is that it will
remove noname-a and noname-c which will
populate `removed_ranks`.
Solution:
remove all instances of --mon-initial-members
in the standalone test as it has no impact on
the nature of the tests themselves.
When upgrading the monitors (include booting up),
we check if `peer_tracker` is dirty or not. If
so, we clear it. Added some functions in `Elector` and
`ConnectionTracker` class to
check for clean `peer_tracker`.
Moreover, there could be some cases where due
to startup weirdness or abnormal circumstances,
we might get a report from our own rank. Therefore,
it doesn't hurt to add a sanity check in
`ConnectionTracker::report_live_connection` and
`ConnectionTracker::report_dead_connection`.
In `notify_clear_peer_state()` we another
mechanism in reseting our `peer_tracker.rank`
to match our own monitor.rank.
This is added so there is a way for us
to recover from a scenrio where `peer_tracker.rank`
is messed up from adjusting the ranks or removing
ranks.
`notifiy_clear_peer_state()` can be triggered
by using the command:
`ceph connection scores reset`
Also in `clear_peer_reports`, besides
reassigning my_reports to an empty object,
we also have to make `my_reports` = `rank`
from `peer_tracker`, such that we don't get
-1 as a rank in my_reports.
Kamoltat [Wed, 2 Nov 2022 01:59:52 +0000 (01:59 +0000)]
mon: change how we handle removed_ranks
when a new monitor joins, there is a chance that
it will recive a monmap that recently removed
a monitor and ``removed_rank`` will have some
content in it. A new monitor that joins
should never remove rank in peer_tracker but
rather call ``notify_clear_peer_state()``
to reset the `peer_report`.
In the case when it is a monitor that
has joined quorum before and is only 1
epoch behind the newest monmap provided
by the probe_replied monitor. We can
actually remove and adjust ranks in `peer_report`
since we are sure that if there is any content in
removed_ranks, then it has to be because in the
next epoch we are removing a rank, since every
update of an epoch we always clear the removed_ranks.
There is no point in keeping the content
of ``removed_ranks`` after monmap gets updated
to the epoch.
Therefore, clear ``removed_ranks`` every update.
When there is discontinuity between
monmaps for more 1 epoch or the new monitor never joined quorum before,
we always reset `peer_tracker`.
Moreover, beneficial for monitor log to also log
which rank has been removed at the current time
of the monmap. So add removed_ranks to `print_summary`
and `dump` in MonMap.cc.
In `ConnectionTracker::receive_peer_report`
we loop through ranks which is bad when
there is `notify_rank_removed` before this and
the ranks are not adjusted yet. When we rely
on the rank in certain scenarios, we end up
with extra peer_report copy which we don't
want.
SOLUTION:
In `ConnectionTracker::receive_peer_report`
instead of passing `report.rank` in the function
`ConnectionTracker::reports`, we pass `i.first`
instead so that trim old ranks properly.
We also added a assert in notify_rank_removed(),
comparing expected rank provided by the monmap
against the rank that we adjust ourself to as
a sanity check.
We edited test/mon/test_election.cc
to reflect the changes made in notify_rank_removed().
Ilya Dryomov [Thu, 22 Dec 2022 15:32:44 +0000 (16:32 +0100)]
qa: switch to curl for qemu-xfstests
This is a follow-up for commit 631899ffeb84 ("qa: switch back to git
protocol for qemu-xfstests"), needed for the same "ancient execution
environment" reason.
Kefu Chai [Sun, 18 Dec 2022 12:18:44 +0000 (20:18 +0800)]
pybind/mgr/tox.ini: add commas in "modules" variable
since tox v4.0.13, it parses the variables differently, so the newlines
in a variable are passed right to the command referencing it. so we now
have failure like:
```
flake8: commands[0] /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr> flake8 --config=tox.ini alerts
flake8: commands[1] /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr> balancer
flake8: exit 2 (0.00 seconds) /home/jenkins-build/build/workspace/ceph-pull-requests/src/pybind/mgr> balancer
flake8: FAIL ✖ in 3.33 seconds
```
so we have to add comma as line continuation separator to address
this problem.
Ilya Dryomov [Mon, 19 Dec 2022 17:54:08 +0000 (18:54 +0100)]
qa: switch back to git protocol for qemu-xfstests
As noted in commit 89177d65988c ("qa: switch to https protocol for
repos' server"), git.ceph.com mirror doesn't make git:// available
anymore. However, run_xfstests-obsolete.sh has "obsolete" in its
name for a reason -- due to an ancient execution environment, git://
is the only viable option:
$ git clone https://git.ceph.com/xfstests-dev.git
Cloning into 'xfstests-dev'...
error: gnutls_handshake() failed: A TLS fatal alert has been received. while accessing https://git.ceph.com/xfstests-dev.git/info/refs
fatal: HTTP request failed
Kefu Chai [Sun, 18 Dec 2022 12:16:02 +0000 (20:16 +0800)]
pybind/mgr: s/setup(self)/setup_method(self)/
avoid pytest warnings like:
4: pg_autoscaler/tests/test_cal_final_pg_target.py::TestPgAutoscaler::test_even_pools_one_meta_three_bulk
4: /home/kefu/dev/ceph/src/pybind/mgr/.tox/py3/lib/python3.10/site-packages/_pytest/fixtures.py:900: PytestRemovedIn8Warning: Support for nose tests is deprecated and will be removed in a future release.
4: pg_autoscaler/tests/test_cal_final_pg_target.py::TestPgAutoscaler::test_even_pools_one_meta_three_bulk is using nose-specific method: `setup(self)`
4: To remove this warning, rename it to `setup_method(self)`
4: See docs: https://docs.pytest.org/en/stable/deprecations.html#support-for-tests-written-for-nose
4: fixture_result = next(generator)
Kefu Chai [Sun, 18 Dec 2022 12:15:06 +0000 (20:15 +0800)]
pybind/mgr/prometheus: avoid using distutils
to silence warnings like:
4: prometheus/module.py:35
4: /var/ssd/ceph/src/pybind/mgr/prometheus/module.py:35: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
4: v = StrictVersion(cherrypy.__version__)
Adam Kupczyk [Wed, 14 Dec 2022 16:43:47 +0000 (16:43 +0000)]
os/bluestore: BlueFS: harmonize log read and writes modes
BlueFS log has always been written in non-buffered mode.
Reading of it depends on bluefs_buffered_io option.
It is strongly suspected that this causes some wierd problems.
It is targetted directly to pacific.
Ultimately same fix will go to all versions.
This problem is severe, but happens very infrequently, mostly on contenerized
environments. We have a lot of problems in tracker that we suspect are caused by this.
To find them, we have "problem-detection" PR #49198 in main.
Then we will apply equivalent solution there too.
Patrick Donnelly [Mon, 12 Dec 2022 20:52:00 +0000 (15:52 -0500)]
Merge PR #47891 into pacific
* refs/pull/47891/head:
qa: add a upgrade test suite from nautilus and test the new getvxattr op
qa: make filesystem to be compatible with nautilus for blocklist
qa: make filesystem to be compatible with nautilus when creating pools
test/libcephfs: add newops test case
client: fail the request if the peer MDS doesn't support getvxattr op
mds: add CEPHFS_FEATURE_OP_GETVXATTR feature bit support
mds: notify clients if the session has already opened
Rename/re-symlink whitelist_*.yaml
Reviewed-by: Venky Shankar <vshankar@redhat.com> Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Adam King [Fri, 9 Dec 2022 16:10:36 +0000 (11:10 -0500)]
mgr/pybind: fix mypy arg parsing
On the new tox version it is treating each line as a new command
so it will do something like "mypy --config-file=../../mypy.ini"
as one command and then "-m balancer" as a totally separate command.
The first one immediately fails as it doesn't include any modules
to test. Adding backslashes to the ends of the lines gets it to
handle the lines as one long command
Kefu Chai [Thu, 8 Dec 2022 06:42:42 +0000 (14:42 +0800)]
qa: set locale to C.UTF-8 in tox.ini
as ansible is using UTF-8 encoded characters in the file names, so,
to avoid failures like:
File "/home/jenkins-build/build/workspace/ceph-pull-requests/qa/.tox/py3/lib/python3.10/site-packages/pip/_internal/utils/unpacking.py", line 217, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
we have to set a locale which is able to handle UTF-8.
see also https://github.com/ceph/teuthology/pull/1671
Kefu Chai [Thu, 8 Dec 2022 10:25:20 +0000 (18:25 +0800)]
pybind/mgr: drop cython from requires
cython is not required for running tox commands.
this should address the test failure like:
ROOT: will run in automatically provisioned tox, host /home/jenkins-build/build/workspace/ceph-pull-requests/build/mgr-virtualenv/bin/python3.10 is missing [requires (has)]: cython
Kefu Chai [Thu, 8 Dec 2022 06:53:33 +0000 (14:53 +0800)]
*: s/whitelist_externals/allowlist_externals/
as allowlist_externals was introduced in
tox v4.0. see
https://github.com/tox-dev/tox/commit/5e33fda1a40ffb4973de3d607a572891eb3cb2d2 , but
this option was backported to 3.18 as an alias of whitelist_externals, so we don't need
to specify the minversion to 4.0 in this change.
as we started using tox 4.0 and up (v4.0.2 in specific). tox complains
and fails like:
alerts-lint: failed with promtool is not allowed, use allowlist_externals to allow it
alerts-lint: FAIL code 1 (9.25 seconds)
see https://tox.wiki/en/latest/faq.html#tox-4-removed-tox-ini-keys
and https://tox.wiki/en/latest/config.html#allowlist_externals
it'd be nice to use a more inclusive language also. so, in this change,
s/whitelist_externals/allowlist_externals/ in all tox.ini in this
project.