Sungmin Lee [Tue, 23 Aug 2022 04:51:31 +0000 (13:51 +0900)]
qa: add validation stage for deduplication.py
To validate sample-dedup actually works, validate() runs
separated thread from sample-dedup and verifies
two following things.
1. check sample-dedup starts properly.
2. check references of all the chunk objects' in chunk tier
exists in designated base pool.
This routune repeats for max_valication_cnt times while
sample-dedup is running. If it doesn't raise any fail while the loop,
we can pretend sample-dedup works accurately.
If not, assert() will stop this test.
In case that a reference of chunk object doesn't exist in base pool,
validate() gives a second chance after repairing it (chunk-repair op)
to deal with false-positive reference inconsistency.
Signed-off-by: Sungmin Lee <sung_min.lee@samsung.com>
JinyongHa [Mon, 21 Feb 2022 02:09:17 +0000 (02:09 +0000)]
dedup-tool: add sampling ratio to crawling
By using lower sampling ratio, runtime deduplication with lower overhead.
It tries deduplication with sampled few objects in base-pool, and
only deduplicates objects or chunks which are highly duplicated among
samples.
The option "sampling-ratio" is used for controling the ratio of the objects
to be picked.
JinyongHa [Fri, 25 Feb 2022 06:48:36 +0000 (06:48 +0000)]
dedup-tool: add basic crawling
Create crawling threads which crawl objects in base pool and deduplicate
based on their deduplication efficiency. Crawler samples objects and finds
duplicated chunks within the samples. It regards an object which has
duplicated chunks higher than object_dedup_threshold value as an efficient
object to be deduplicated. Besides the chunk which is duplicated more than
chunk_dedup_threshold times is also deduplicated.
The commit contains basic crawling which crawls all objects in base pool
instead of sampling among the objects.
[usage]
ceph_dedup_tool --op sample-dedup --pool POOL --chunk-pool POOL \
--chunk-altorithm ALGO --fingerprint-algorithm FP \
--object-dedup-threshold <percentile> --chunk-dedup-threshold <number>
Yingxin Cheng [Sun, 7 Aug 2022 08:05:42 +0000 (16:05 +0800)]
crimson/os/seastore: construct TransactionManager classes after device mount
To construct TransactionManager after all the devices are discoverred.
Also, it makes the following cleanups possible:
* Cleanup SeaStore and TransactionManager factory methods.
* Decouple TransactionManager from SegmentManagerGroup.
* Drop the unnecessary tm_make_config_t.
* Drop the unnecessary add_device() methods.
Casey Bodley [Wed, 10 Aug 2022 22:23:34 +0000 (18:23 -0400)]
ceph.spec.in: install gcc-toolset-11-libatomic-devel in x86_64 also
otherwise after enabling gcc-toolset-11, cmake fails with:
- Performing Test HAVE_LIBATOMIC - Failed
CMake Error at cmake/modules/CheckCxxAtomic.cmake:66 (message):
Host compiler /opt/rh/gcc-toolset-11/root/usr/bin/g++ requires libatomic,
but it is not found
Adam King [Wed, 10 Aug 2022 15:44:55 +0000 (11:44 -0400)]
Merge pull request #46400 from rkachach/fix_issue_55733
mgr/cephadm: adding dynamic prometheus configuration based on http_sd_config
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Anthony D'Atri <anthonyeleven@users.noreply.github.com> Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Kefu Chai [Wed, 10 Aug 2022 14:35:14 +0000 (22:35 +0800)]
ceph.spec.in: %enable_devtoolset11 only if the macro is defined
there is chance that we are using `yum-builddep` to prepare the
build dependencies. in that case, gcc-toolset-11-build is not
installed. it's like a chicken-egg dilemma, but the point is
`yum-builddep` is able to pull in the gcc-toolset-11-build. once
gcc-toolset-11-build is installed, we will have the %enable_devtoolset11
rpm macro.
Ernesto Puerta [Fri, 5 Aug 2022 08:56:36 +0000 (10:56 +0200)]
.github/workflows: add create-backport action
Currently there's a cron job in a teuthology VM running a script to find all trackers in needs-backports and to create their corresponding backport trackers. This is done through the [Backport Bot](https://tracker.ceph.com/users/12172) Redmine account.
This PR intends to run this cron job task as a periodic Github Action.
pybind/mgr: maximum recursion depth exceeded in comparison
Original issue from https://github.com/python/typing_extensions/issues/10
In case typing-extensions is upgrade to 4.1.1
(https://pypi.org/project/typing-extensions/4.1.1/, the final Python 3.6
supported version), Ceph MGR will failed with pg_autoscaler as below:
[root@cp-nightsky ~]# pip list | grep typing-extensions
typing-extensions 4.1.1
[root@cp-nightsky ~]# journalctl -xef -u ceph-mgr@cp-nightsky.service
May 12 06:16:39 cp-nightsky.novalocal systemd[1]: Started Ceph cluster manager daemon.
-- Subject: Unit ceph-mgr@cp-nightsky.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-mgr@cp-nightsky.service has finished starting up.
--
-- The start-up result is done.
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.181+0000 7fe3eb521700 -1 log_channel(cluster) log [ERR] : Unhandled exception from module 'pg_autoscaler' while running on mgr.cp-nightsky: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 pg_autoscaler.serve:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.182+0000 7fe3eb521700 -1 Traceback (most recent call last):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 206, in serve
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._maybe_adjust()
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 423, in _maybe_adjust
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: ps, root_map, pool_root = self._get_pool_status(osdmap, pools)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/usr/share/ceph/mgr/pg_autoscaler/module.py", line 325, in _get_pool_status
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self.log.debug('skipping empty subtree %s', cr_name)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1296, in debug
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: self._log(DEBUG, msg, args, **kwargs)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1443, in _log
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: exc_info, func, extra, sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 1413, in makeRecord
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: sinfo)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/logging/__init__.py", line 277, in __init__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if (args and len(args) == 1 and isinstance(args[0], collections.Mapping)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 193, in __instancecheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return cls.__subclasscheck__(subclass)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/typing.py", line 1154, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: return super().__subclasscheck__(cls)
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib64/python3.6/abc.py", line 228, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 426, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if issubclass(subclass, scls):
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: [Previous line repeated 239 more times]
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: File "/lib/python3.6/site-packages/typing_extensions.py", line 421, in __subclasscheck__
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: if self.__extra__ in subclass.__mro__:
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: RecursionError: maximum recursion depth exceeded in comparison
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
May 12 06:16:51 cp-nightsky.novalocal ceph-mgr[1719]: 2022-05-12T06:16:51.291+0000 7fe3de3c7700 -1 client.0 error registering admin socket command: (17) File exists
Solution provided by
https://github.com/python/typing_extensions/issues/10#issuecomment-1131767191
not working:
Reviewed-by: Kefu Chai <tchaikov@gmail.com> Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Reviewed-by: Matt Benjamin <mbenjamin@redhat.com> Reviewed-by: Willem Jan Withagen <wjw@digiware.nl>
Kefu Chai [Mon, 8 Aug 2022 14:41:17 +0000 (22:41 +0800)]
pybind/mgr/dashboard: do not use distutils.version.StrictVersion
replace `distutils.version.StrictVersion` with
`pkg_resources.parse_version()`
as the former is deprecated, see https://peps.python.org/pep-0632/.
let's use `pkg_resources` instead. this change also addresses
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1010894.
we have this issue when testing with an ubuntu jammy test node.
see https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1967139
When a network mount is present in `/proc/mounts` but for any reason
the corresponding server is down, this function hangs forever.
In a cluster deployed with cephadm, the consequence is that
it triggers `ceph-volume inventory` commands that hang and stay in D
state.
The idea here is to use a thread with a timeout to abort the call if the
timeout is reached.
`get_mounts()` is now a method of a class so we can exclude a path
altogether during the whole `inventory` execution (otherwise,
ceph-volume would try to access it as many devices there is on the
host which could slow down the inventory execution)
Michaela Lang [Thu, 4 Aug 2022 17:14:49 +0000 (19:14 +0200)]
- mgr/cephadm: provide an additional hint when running into I/O closed exception from execnet seen as
Error EINVAL: Can't communicate with remote host `127.0.0.1`, possibly because python3 is not installed there: cannot send (already closed?)
Kefu Chai [Mon, 8 Aug 2022 12:40:52 +0000 (20:40 +0800)]
ceph.spec.in: add libatomic to BuildRequires on fedora
otherwise we'd have failures like
/opt/compiler-explorer/gcc-trunk-20220808/bin/../lib/gcc/x86_64-linux-gnu/13.0.0/../../../../x86_64-linux-gnu/bin/ld:
/tmp/ccVlMbVh.o: in function `std::atomic<tagged_ptr>::store(tagged_ptr,
std::memory_order)':
/opt/compiler-explorer/gcc-trunk-20220808/include/c++/13.0.0/atomic:273:
undefined reference to `__atomic_store_16'
when generating the building system using CMake on fedora 36.