Igor Fedotov [Tue, 31 Aug 2021 12:54:23 +0000 (15:54 +0300)]
os/bluestore: fix bluefs migrate command
After migrating DB volume to a slow one RocksDB still
needs to be provided with slow.db path to properly access relevant files under db.slow subfolder.
Without that specification it tries to access them under 'db' one which
results in "not-found" error.
Fixes: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 90852d9b6f0da7967121200c9a1c56bed1929d2d)
This is a regression introduced by 9212420, when the host is using a
logical partition then lsblk reports that partition as a child from the
physical device.
That logical partition is prefixed by the `└─` character.
This leads the `raw list` subcommand to show the lsblk error on the stderr.
```
$ ceph-volume raw list
{}
stderr: lsblk: `-/dev/sda1: not a block device
```
Paul Cuzner [Wed, 18 Aug 2021 05:02:32 +0000 (17:02 +1200)]
cephadm:Add listening ports to gather-facts output
This patch adds tcp and udp listening ports to the data
returned by gather-facts. This can be used to check port
availability prior to trying to deploying daemons, to
catch port conflicts earlier. IPv4 and IPv6 are supported
Oleander Reis [Wed, 18 Aug 2021 13:45:42 +0000 (15:45 +0200)]
cephadm: check for openntpd.service as time sync service
openntpd is an alternative implementation of time synchronization
by the openbsd project and is packaged for debian and ubuntu
since at least jessie / 18.04 with the service named openntpd.service
mgr/dashboard: rgw service creation form: add realm and zone to service spec.
Align rgw service id pattern with cephadm: https://github.com/ceph/ceph/pull/39877
- Update rgw pattern to allow service id for non-multisite config.
- Extract realm and zone from service id (when detected) and add them to the service spec.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 0575844192502ded32962b75a91cf51de22e97e6)
mgr/dashboard: connect-rgw: rename to set-rgw-credentials; refactoring
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 6e20ef1dd35f3681d14cd4e08ca63eb20edc2c88)
Alfonso Martínez [Wed, 28 Jul 2021 07:48:18 +0000 (09:48 +0200)]
mgr/dashboard: connect-rgw: adaptation and test coverage
- Align Dashboard with cephadm: configure credentials using the same logic.
- Fix: create a 'dashboard' user per realm (before: only on 1st realm).
- Lint fixes, test coverage, method renaming to better reflect behavior and method visibility.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
(cherry picked from commit 0fcf0a7827cf4e8748a382613f9c8d1715c4a1e8)
Sage Weil [Thu, 5 Aug 2021 14:24:13 +0000 (10:24 -0400)]
mgr/cephadm: enable prometheus module before deploying prometheus
The mon will restart the mgr when the module is enabled, so we don't
really have to do anything here. The raise is there just in case the
mgr doesn't immediately get the new mgrmap and respawn, although there is
likely no harm done if we continue to deploy prometheus in the meantime,
even if we're interrupted partway through.
Igor Fedotov [Wed, 18 Aug 2021 10:39:02 +0000 (13:39 +0300)]
os/bluestore: accept undecodable multi-block bluefs transactions on log
replay.
We should proceed with OSD startup when detecting undecodable bluefs
transaction spanning multiple disk blocks during log replay.
The rationale is that such a transaction might appear during unexpected
power down - just not every disk block is written to disk. Hence we can
consider this a normal log replay stop condition.
Igor Fedotov [Tue, 9 Feb 2021 15:29:01 +0000 (18:29 +0300)]
os/bluestore: cap omap naming scheme upgrade transactoin.
We shouldn't use single per-onode transaction for such an upgrade when onode's omap list is huge. This results in similarly sized WAL/SST files which are inefficient, might cause high memory usage and sometimes error-prone.
Fixes: https://tracker.ceph.com/issues/49170 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit e897fa243c1dd38329733b452872616023f14ac8)
Sunny Kumar [Fri, 14 May 2021 11:09:17 +0000 (12:09 +0100)]
qa/workunits/rbd-nbd: add new test for map/unmap
This patch includes twe new test cases:
a. map/unmap test with only image name and
b. map/unmap test after changing default pool which expects the image
to come from new default pool.
rbd: fix default pool handling for nbd/ggate map/unmap
Default pool is not being picked up when providing only image name
during map/unmap. However, if the pool name is provided manually map/unmap
works as expected.
pybind/rbd: explain why "primary" isn't exposed in mirror_image_status_list()
"primary" is part of mirror image info (rbd_mirror_image_info_t) and
is exposed in mirror_image_get_status(). mirror_image_status_list(),
even though it is often thought of as an equivalent of repeated calls
to mirror_image_get_status(), doesn't actually fetch the mirror image
info.
pybind/rbd: actually append site_status dict to remote_statuses
Using += operator is wrong -- only site_status keys get appended
(and repeatedly at that in case there is more than one remote site
as the keys are added one by one).
Will Smith [Fri, 23 Jul 2021 19:18:12 +0000 (15:18 -0400)]
rbd: Fix mirror_image_get_status in rbd python bindings
When retrieving the status of a mirrored image from the Python rbd
library, a TypeError is raised.
*To Reproduce:*
Set up two Ceph clusters for block storage, and configure image
mirroring between their pools. Create a least one image with mirroring
enabled, then run the following script on either cluster (once the image
exists everywhere):
with rados.Rados(conffile=CONF_PATH) as cluster:
with cluster.open_ioctx(POOL_NAME) as ioctx:
with rbd.Image(ioctx, IMAGE_LABEL) as image:
image.mirror_image_get_status()
```
This will result in the following stack trace:
```
Traceback (most recent call last):
File "repo-bug.py", line 10, in <module>
image.mirror_image_get_status()
File "rbd.pyx", line 3363, in rbd.requires_not_closed.wrapper
File "rbd.pyx", line 5209, in rbd.Image.mirror_image_get_status
TypeError: list indices must be integers or slices, not str
```
libudev uses fnmatch(3) for matching attributes, meaning that shell
glob pattern matching is employed instead of literal string matching.
Escape glob metacharacters to suppress pattern matching.
Kamoltat [Fri, 25 Jun 2021 22:40:43 +0000 (22:40 +0000)]
pybind/mgr/autoscaler: don't scale pools with overlapping roots
In the previous version of get_subtree_resource_status() in
src/pybind/mgr/pg_autoscaler/module.py we ignore overlapping
pools which in some cases if combined with the new `scale-down`
algorithm in https://github.com/ceph/ceph/pull/38805 can cause
some pools to scale up/down to inapproriate amount of pgs.
Therefore, the PR identifies the overlapping roots and prevent the pools
with such roots from scaling. This only happens with `scale-down` profile
as we see no problem with the default `scale-up` profile.
Removed the variable `pool_root` since it is not used anywhere in
the code, it only gets assigned and reassigned
Also included a unit test test_overlapping_roots.py that tests the function
identify_subtrees_and_overlaps() as well as edited test_cal_final_pg_target.py
to account for pools that contain overlapping roots, therefore, those pools
are expected not to scale.
Kefu Chai [Mon, 28 Jun 2021 04:28:17 +0000 (12:28 +0800)]
pybind/mgr/pg_autoscaler: extract CrushSubtreeResourceStatus out
as it also serves as part of interface of get_subtree_resource_status(),
not only its internals. to ease adding the type annotations, this class
is promoted out of the class.
The autoscaler by default will start out each pool with minimal
pgs and `scale-up` the pgs when there is more usage in each pool.
Users can now use the commands:
`osd pool set autoscale-profile scale-down` to make the pools
start out with a full complement of pgs and only `scale-down`
when usage ratio across the pools are not even.
`osd pool set autoscale-profile scale-up` (by default) to make the pools
start out with minimal pgs and `scale-up` the pgs when there
is more usage in each pool.
Edited KVMonitor.cc file to make the `autoscale_profile` variable
persistent.
Edited tests/test_cal_final_pg_target.py so that it takes into account
the new `profile` argument when calling cal_final_pg_target(). Also,
added some new test cases for when profile is `scale-up`
Renamed tests/test_autoscaler.py to a more appropriate name:
tests/test_cal_ratio.py
Kamoltat [Thu, 7 Jan 2021 15:39:19 +0000 (15:39 +0000)]
mgr/pg_autoscaler: avoid scale-down until there is pressure
The autoscaler will start out with scaling each
pools to have a full complements of pgs from the start
and will only decrease it when pools need more due to
increased usage.
Introduced a unit test that tests only the
function get_final_pg_target_and_ratio() which
deals with the distrubtion of pgs amongst the
pools
Edited workunit script to reflect the change
of how pgs are calculated and distrubted.