Yingxin Cheng [Sun, 7 Aug 2022 08:05:42 +0000 (16:05 +0800)]
crimson/os/seastore: construct TransactionManager classes after device mount
To construct TransactionManager after all the devices are discoverred.
Also, it makes the following cleanups possible:
* Cleanup SeaStore and TransactionManager factory methods.
* Decouple TransactionManager from SegmentManagerGroup.
* Drop the unnecessary tm_make_config_t.
* Drop the unnecessary add_device() methods.
Casey Bodley [Wed, 10 Aug 2022 22:23:34 +0000 (18:23 -0400)]
ceph.spec.in: install gcc-toolset-11-libatomic-devel in x86_64 also
otherwise after enabling gcc-toolset-11, cmake fails with:
- Performing Test HAVE_LIBATOMIC - Failed
CMake Error at cmake/modules/CheckCxxAtomic.cmake:66 (message):
Host compiler /opt/rh/gcc-toolset-11/root/usr/bin/g++ requires libatomic,
but it is not found
Adam King [Wed, 10 Aug 2022 15:44:55 +0000 (11:44 -0400)]
Merge pull request #46400 from rkachach/fix_issue_55733
mgr/cephadm: adding dynamic prometheus configuration based on http_sd_config
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Kefu Chai [Wed, 10 Aug 2022 14:35:14 +0000 (22:35 +0800)]
ceph.spec.in: %enable_devtoolset11 only if the macro is defined
there is chance that we are using `yum-builddep` to prepare the
build dependencies. in that case, gcc-toolset-11-build is not
installed. it's like a chicken-egg dilemma, but the point is
`yum-builddep` is able to pull in the gcc-toolset-11-build. once
gcc-toolset-11-build is installed, we will have the %enable_devtoolset11
rpm macro.
Ernesto Puerta [Fri, 5 Aug 2022 08:56:36 +0000 (10:56 +0200)]
.github/workflows: add create-backport action
Currently there's a cron job in a teuthology VM running a script to find all trackers in needs-backports and to create their corresponding backport trackers. This is done through the [Backport Bot](https://tracker.ceph.com/users/12172) Redmine account.
This PR intends to run this cron job task as a periodic Github Action.
pybind/mgr: maximum recursion depth exceeded in comparison
Original issue from https://github.com/python/typing_extensions/issues/10
In case typing-extensions is upgrade to 4.1.1
(https://pypi.org/project/typing-extensions/4.1.1/, the final Python 3.6
supported version), Ceph MGR will failed with pg_autoscaler as below:
[root@cp-nightsky ~]# pip list | grep typing-extensions
typing-extensions 4.1.1
[root@cp-nightsky ~]# journalctl -xef -u ceph-mgr@cp-nightsky.service
May 12 06:16:39 cp-nightsky.novalocal systemd[1]: Started Ceph cluster manager daemon.
-- Subject: Unit ceph-mgr@cp-nightsky.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-mgr@cp-nightsky.service has finished starting up.
--
-- The start-up result is done.
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.181+0000 7fe3eb521700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.cp-nightsky: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 pg_autoscaler.serve:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 Traceback (most recent call last):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 206, in serve
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._maybe_adjust()
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 423, in _maybe_adjust
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: ps, root_map, pool_root = self._get_pool_status(osdmap, pools)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 325, in _get_pool_status
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self.log.debug('skipping empty subtree %s', cr_name)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1296, in debug
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._log(DEBUG, msg, args, **kwargs)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1443, in _log
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: exc_info, func, extra, sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1413, in makeRecord
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 277, in __init__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if (args and len(args) == 1 and isinstance(args[0], collections.Mapping)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 193, in __instancecheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return cls.__subclasscheck__(subclass)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/typing.py", line 1154, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return super().__subclasscheck__(cls)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: [Previous line repeated 239 more times]
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 421, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if self.__extra__ in subclass.__mro__:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: RecursionError: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
Solution provided by
https://github.com/python/typing_extensions/issues/10#issuecomment-1131767191
not working:
Reviewed-by: Kefu Chai <tchaikov@gmail.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Kefu Chai [Mon, 8 Aug 2022 14:41:17 +0000 (22:41 +0800)]
pybind/mgr/dashboard: do not use distutils.version.StrictVersion
replace `distutils.version.StrictVersion` with
`pkg_resources.parse_version()`
as the former is deprecated, see https://peps.python.org/pep-0632/.
let's use `pkg_resources` instead. this change also addresses
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010894.
we have this issue when testing with an ubuntu jammy test node.
see https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139
When a network mount is present in `/proc/mounts` but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers `ceph-volume inventory` commands that hang and stay in D
state.
The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
`get_mounts()` is now a method of a class so we can exclude a path
altogether during the whole `inventory` execution (otherwise,
ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)
Michaela Lang [Thu, 4 Aug 2022 17:14:49 +0000 (19:14 +0200)]
- mgr/cephadm: provide an additional hint when running into I/O closed exception from execnet seen as
Error EINVAL: Can't communicate with remote host `127.0.0.1`, possibly because python3 is not installed there: cannot send (already closed?)
Kefu Chai [Mon, 8 Aug 2022 12:40:52 +0000 (20:40 +0800)]
ceph.spec.in: add libatomic to BuildRequires on fedora
otherwise we'd have failures like
/opt/compiler-explorer/gcc-trunk-20220808/bin/../lib/gcc/x86_64-linux-gnu/13.0.0/../../../../x86_64-linux-gnu/bin/ld:
/tmp/ccVlMbVh.o: in function `std::atomic<tagged_ptr>::store(tagged_ptr,
std::memory_order)':
/opt/compiler-explorer/gcc-trunk-20220808/include/c++/13.0.0/atomic:273:
undefined reference to `__atomic_store_16'
when generating the building system using CMake on fedora 36.
Kefu Chai [Sat, 6 Aug 2022 10:26:44 +0000 (18:26 +0800)]
crimson/os: rewrite ordering using std::strong_ordering
the goals are
1. to use std::strong_ordering mechinary for better readability
and maintainbility
2. to remove unnecessary abstraction
3. use concept for better error message and readability.
changes:
* replace MatchKindCMP with std::strong_ordering
* replace compare_to() with operator<=>
* introduce a concept `IsFullKey` so we can use it when developing
generic facilities to operate on both materialized key or view.
* use `IsFullKey` in place of `KeyT` when appropriate
Kefu Chai [Sat, 6 Aug 2022 00:24:12 +0000 (08:24 +0800)]
mgr/dashboard: bump up teuthology
to include the fix of e7c5d67e10fe29da22180f9e09b8973ae166c8fc,
see https://github.com/ceph/teuthology/pull/1746.
to address the test failure on ubuntu jammy. where we have python3.10
Laura Flores [Fri, 5 Aug 2022 16:34:20 +0000 (11:34 -0500)]
.github/workflows: increase operations-per-run to 100 in stale bot
The stale bot's `operations-per-run`
(https://github.com/actions/stale#operations-per-run) corresponds to the max
number of API calls it is allowed to make per hour. Currently,
`operations-per-run` is set to 30, which means that the stale bot
can make up to 30 API calls per hour.
With this limit in place, the stale bot is only able to process 400 PRs at a time.
Since there are 900+ PRs in the Ceph repository, we should increase the number of
operations to cover them all. This needs to be done with care though, since GitHub
has a rate limit
(https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting)
depending on business plan.
According to GitHub's documentation on GitHub action requests
(https://docs.github.com/en/rest/overview/resources-in-the-rest-api#requests-from-github-actions),
the rate limit is 1,000 requests per hour per repository when using `GITHUB_TOKEN` (which we are).
For enterprise accounts, GitHub Enterprise Cloud's rate limit applies, and the limit is 15,000
requests per hour per repository.
Based on this information, we should be fine to increase the max `operations-per-run`
to 100. This would cover a little over 1000 PRs, which should be enough
to process the 900-some-odd PRs in the Ceph repository.