Xiubo Li [Wed, 18 May 2022 04:59:38 +0000 (12:59 +0800)]
mds: wait unlink to finish to avoid conflict when creating same dentries
If the previous unlink request has been delayed due to some reasons,
and the new creating for the same dentry may fail or new open will
succeeds but new contents wrote to it will be lost.
The kernel client will make sure before the unlink getting the first
reply it won't send the followed create requests for the same dentry.
Here we need to make sure that before the first reply has been sent
out the dentry must be marked as unlinking.
Fixes: https://tracker.ceph.com/issues/55332 Signed-off-by: Xiubo Li <xiubli@redhat.com>
Yingxin Cheng [Sun, 7 Aug 2022 08:05:42 +0000 (16:05 +0800)]
crimson/os/seastore: construct TransactionManager classes after device mount
To construct TransactionManager after all the devices are discoverred.
Also, it makes the following cleanups possible:
* Cleanup SeaStore and TransactionManager factory methods.
* Decouple TransactionManager from SegmentManagerGroup.
* Drop the unnecessary tm_make_config_t.
* Drop the unnecessary add_device() methods.
Casey Bodley [Wed, 10 Aug 2022 22:23:34 +0000 (18:23 -0400)]
ceph.spec.in: install gcc-toolset-11-libatomic-devel in x86_64 also
otherwise after enabling gcc-toolset-11, cmake fails with:
- Performing Test HAVE_LIBATOMIC - Failed
CMake Error at cmake/modules/CheckCxxAtomic.cmake:66 (message):
Host compiler /opt/rh/gcc-toolset-11/root/usr/bin/g++ requires libatomic,
but it is not found
Adam King [Wed, 10 Aug 2022 15:44:55 +0000 (11:44 -0400)]
Merge pull request #46400 from rkachach/fix_issue_55733
mgr/cephadm: adding dynamic prometheus configuration based on http_sd_config
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Kefu Chai [Wed, 10 Aug 2022 14:35:14 +0000 (22:35 +0800)]
ceph.spec.in: %enable_devtoolset11 only if the macro is defined
there is chance that we are using `yum-builddep` to prepare the
build dependencies. in that case, gcc-toolset-11-build is not
installed. it's like a chicken-egg dilemma, but the point is
`yum-builddep` is able to pull in the gcc-toolset-11-build. once
gcc-toolset-11-build is installed, we will have the %enable_devtoolset11
rpm macro.
Ernesto Puerta [Fri, 5 Aug 2022 08:56:36 +0000 (10:56 +0200)]
.github/workflows: add create-backport action
Currently there's a cron job in a teuthology VM running a script to find all trackers in needs-backports and to create their corresponding backport trackers. This is done through the [Backport Bot](https://tracker.ceph.com/users/12172) Redmine account.
This PR intends to run this cron job task as a periodic Github Action.
pybind/mgr: maximum recursion depth exceeded in comparison
Original issue from https://github.com/python/typing_extensions/issues/10
In case typing-extensions is upgrade to 4.1.1
(https://pypi.org/project/typing-extensions/4.1.1/, the final Python 3.6
supported version), Ceph MGR will failed with pg_autoscaler as below:
[root@cp-nightsky ~]# pip list | grep typing-extensions
typing-extensions 4.1.1
[root@cp-nightsky ~]# journalctl -xef -u ceph-mgr@cp-nightsky.service
May 12 06:16:39 cp-nightsky.novalocal systemd[1]: Started Ceph cluster manager daemon.
-- Subject: Unit ceph-mgr@cp-nightsky.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-mgr@cp-nightsky.service has finished starting up.
--
-- The start-up result is done.
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.181+0000 7fe3eb521700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.cp-nightsky: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 pg_autoscaler.serve:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 Traceback (most recent call last):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 206, in serve
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._maybe_adjust()
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 423, in _maybe_adjust
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: ps, root_map, pool_root = self._get_pool_status(osdmap, pools)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 325, in _get_pool_status
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self.log.debug('skipping empty subtree %s', cr_name)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1296, in debug
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._log(DEBUG, msg, args, **kwargs)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1443, in _log
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: exc_info, func, extra, sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1413, in makeRecord
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 277, in __init__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if (args and len(args) == 1 and isinstance(args[0], collections.Mapping)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 193, in __instancecheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return cls.__subclasscheck__(subclass)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/typing.py", line 1154, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return super().__subclasscheck__(cls)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: [Previous line repeated 239 more times]
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 421, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if self.__extra__ in subclass.__mro__:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: RecursionError: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
Solution provided by
https://github.com/python/typing_extensions/issues/10#issuecomment-1131767191
not working:
Reviewed-by: Kefu Chai <tchaikov@gmail.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Kefu Chai [Mon, 8 Aug 2022 14:41:17 +0000 (22:41 +0800)]
pybind/mgr/dashboard: do not use distutils.version.StrictVersion
replace `distutils.version.StrictVersion` with
`pkg_resources.parse_version()`
as the former is deprecated, see https://peps.python.org/pep-0632/.
let's use `pkg_resources` instead. this change also addresses
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010894.
we have this issue when testing with an ubuntu jammy test node.
see https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139
When a network mount is present in `/proc/mounts` but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers `ceph-volume inventory` commands that hang and stay in D
state.
The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
`get_mounts()` is now a method of a class so we can exclude a path
altogether during the whole `inventory` execution (otherwise,
ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)
Michaela Lang [Thu, 4 Aug 2022 17:14:49 +0000 (19:14 +0200)]
- mgr/cephadm: provide an additional hint when running into I/O closed exception from execnet seen as
Error EINVAL: Can't communicate with remote host `127.0.0.1`, possibly because python3 is not installed there: cannot send (already closed?)
Kefu Chai [Mon, 8 Aug 2022 12:40:52 +0000 (20:40 +0800)]
ceph.spec.in: add libatomic to BuildRequires on fedora
otherwise we'd have failures like
/opt/compiler-explorer/gcc-trunk-20220808/bin/../lib/gcc/x86_64-linux-gnu/13.0.0/../../../../x86_64-linux-gnu/bin/ld:
/tmp/ccVlMbVh.o: in function `std::atomic<tagged_ptr>::store(tagged_ptr,
std::memory_order)':
/opt/compiler-explorer/gcc-trunk-20220808/include/c++/13.0.0/atomic:273:
undefined reference to `__atomic_store_16'
when generating the building system using CMake on fedora 36.
Kefu Chai [Sat, 6 Aug 2022 10:26:44 +0000 (18:26 +0800)]
crimson/os: rewrite ordering using std::strong_ordering
the goals are
1. to use std::strong_ordering mechinary for better readability
and maintainbility
2. to remove unnecessary abstraction
3. use concept for better error message and readability.
changes:
* replace MatchKindCMP with std::strong_ordering
* replace compare_to() with operator<=>
* introduce a concept `IsFullKey` so we can use it when developing
generic facilities to operate on both materialized key or view.
* use `IsFullKey` in place of `KeyT` when appropriate
Kefu Chai [Sat, 6 Aug 2022 00:24:12 +0000 (08:24 +0800)]
mgr/dashboard: bump up teuthology
to include the fix of e7c5d67e10fe29da22180f9e09b8973ae166c8fc,
see https://github.com/ceph/teuthology/pull/1746.
to address the test failure on ubuntu jammy. where we have python3.10