Deepika Upadhyay [Wed, 23 Jun 2021 05:12:38 +0000 (10:42 +0530)]
mon/PGMap: DIRTY field as N/A in `df detail` when cache tier not in use
'ceph df detail' reports a column for DIRTY objects under POOLS even
though cache tiers not being used. In replicated or EC pool all objects
in the pool are reported as logically DIRTY as they have never been
flushed .
we display N/A for DIRTY objects if the pool is not a cache tier.
Kefu Chai [Tue, 17 Aug 2021 07:53:51 +0000 (15:53 +0800)]
mgr/dashboard/api: set a UTF-8 locale when running pip
ansible-core started to include files whose filenames are encoded in
non-ascii characters, so we have to use a more capable encoding for the
locale in order to install this package. otherwise we'd have following
error:
Collecting ansible-core<2.12,>=2.11.3
Using cached ansible-core-2.11.4.tar.gz (6.8 MB)
ERROR: Exception:
Traceback (most recent call last):
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 173, in _main
status = self.run(options, args)
...
File "/tmp/tmp.fX76ASIrch/venv/lib/python3.8/site-packages/pip/_internal/utils/unpacking.py", line 226, in untar_file
with open(path, "wb") as destfp:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 137-140: ordinal not in range(256)
Nizamudeen A [Mon, 9 Aug 2021 07:52:51 +0000 (13:22 +0530)]
mgr/dashboard: Refresh button on the iscsi targets page
Added a refresh button on the iscsi targets page. I've noticed that the
auto reload causes some load on the backend. So I disabled the auto
reload and went for the same approach as we have on rgw. A yellow
warning color on refresh btn to let the user know to manually refresh
that whenever needed.
Adam Kupczyk [Mon, 9 Aug 2021 13:59:46 +0000 (15:59 +0200)]
os/bluestore: Better handling of deferred write trigger
Now deferred write in _do_alloc_write does not depend on blob size,
but on size of extent allocated on disk.
It is now possible to set bluestore_prefer_deferred_size way larger than
bluestore_max_blob_size and still get desired behavior.
Example: for deferred=256K, blob=64K : when op write is 128K both blobs will be
written as deferred. When op write is 256K then all will go as regular write.
See Rook issue https://github.com/rook/rook/issues/7940 for full
information.
Ceph bluestore disks can sometimes appear as though they have "phantom"
Atari (AHDI) partitions created on them when they don't in reality. This
is due to a series of bugs in the Linux kernel when it is built with
Atari support enabled. This behavior does not appear for raw mode OSDs on
partitions, only on disks.
Changing the on-disk format of Bluestore OSDs comes with
backwards-compatibility challenges, and fixing the issue in the Kernel
could be years before users get a fix. Working around the Kernel issue
in ceph-volume is therefore the best place to fix the issue for Ceph.
To work around the issue in Ceph volume, there are two behaviors that need
adjusted:
1. `ceph-volume inventory` should not report that a partition is
available if the parent device is a BlueStore OSD.
2. `ceph-volume raw list` should report parent disks if the disk is a
BlueStore OSD and not report the disk's children, BUT it should still
report children if the parent disk is not a BlueStore OSD.
Using only the exit status of `ceph-bluestore-tool show-label` to
determine if a device is a bluestore OSD could report a false negative
if there is a system error when `ceph-bluestore-tool` opens the device.
A better check is to open the device and read the bluestore device
label (the first 22 bytes of the device) to look for the bluestore
device signature ("bluestore block device"). If ceph-volume fails to
open the device due to a system error, it is safest to assume the device
is BlueStore so that an existing OSD isn't overwritten.
Sage Weil [Mon, 9 Aug 2021 18:15:28 +0000 (14:15 -0400)]
cephadm: fix container name detection
'enter' was broken because we weren't correctly identifying the container
name. Strip the newline from the inspect result so that we can reliably
match against the 'running' state.
Sebastian Wagner [Wed, 21 Jul 2021 12:13:52 +0000 (14:13 +0200)]
cephadm: Introduce unit.stop
Reason is we now have to stop two containers named differently. This is possible
with `bash -c ... echo %i | tr . -`. But we're not gainig anything readability
compared to putting it into a unit.stop script.
As not all daemons have this stop script, we still have to call podman for old daemons.
podman adds the current container name to the /etc/hosts
file. Turns out, python's `socket.getfqdn()` differs from
`hostname -f`, when we have the container names containing
dots in it.:
This was partially done in b94c8de but only for haproxy in the cephadm
mgr module not in the cephadm binary.
This adds the same change for keepalived container image.
Now both haproxy and keepalived container images are fully qualified
(registry + namespace + image).
This PR rewrites a section in the Troubleshooting
chapter of the Cephadm Guide. The material that this
section discusses has been covered already in the
Cephadm Guide in the Cephadm Operations chapter.
There's no reason to repeat this information twice,
unless adding technical debt to the documentation
is our goal (which of course it is not, and the
opposite of adding technical debt to the documentation
has been the aim that has guided my work these past
six months).
Adam King [Mon, 19 Jul 2021 16:07:39 +0000 (12:07 -0400)]
mgr/cephadm: stop removal of daemons from offline hosts
This check was only looking for the status of the
host and not looking at the offline_hosts set so
it wasn't actually stopping daemons from being removed
from offline hosts
cmd = ['ceph-volume', '--fsid', fsid] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
s = get_ceph_conf(fsid=fsid)
f = cephadm_fs.create_file('ceph.conf', contents=s)
cmd = ['ceph-volume', '--fsid', fsid, '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
cd.command_ceph_volume(ctx)
assert ctx.fsid == fsid
cmd = ['ceph-volume', '--fsid', '10000000-0000-0000-0000-0000deadbeef', '--config', f.path] + cv_cmd
with with_cephadm_ctx(cmd) as ctx:
err = 'fsid does not match ceph.conf'
with pytest.raises(cd.Error, match=err):
cd.command_ceph_volume(ctx)
> assert ctx.fsid == None
E AssertionError: assert '10000000-0000-0000-0000-0000deadbeef' == None
E + where '10000000-0000-0000-0000-0000deadbeef' = <cephadm.CephadmContext object at 0x7f1c7121c1c0>.fsid
```
The clean_cgroup method assumes that the ctx.fsid is set while this is
true for the bootstrap command, it isn't set for adopt or deploy commands
(and maybe others).
This ends up to the adopt command to fails:
Traceback (most recent call last):
File "/sbin/cephadm", line 8301, in <module>
main()
File "/sbin/cephadm", line 8289, in main
r = ctx.func(ctx)
File "/sbin/cephadm", line 1764, in _default_image
return func(ctx)
File "/sbin/cephadm", line 5091, in command_adopt
command_adopt_ceph(ctx, daemon_type, daemon_id, fsid)
File "/sbin/cephadm", line 5299, in command_adopt_ceph
osd_fsid=osd_fsid)
File "/sbin/cephadm", line 2884, in deploy_daemon_units
clean_cgroup(ctx, unit_name)
File "/sbin/cephadm", line 2724, in clean_cgroup
if not ctx.fsid:
File "/sbin/cephadm", line 155, in __getattr__
return super().__getattribute__(name)
AttributeError: 'CephadmContext' object has no attribute 'fsid'
Since we already have the fsid value in deploy_daemon_units (which calls
clean_cgroup) then we can pass the fsid value directly.
mgr/cephadm: ingress: fix typo in spec.virtual_interface_networks reference
When using virtual_inteface_networks to identify the interface to have the
virtual ip on, it referenced spec.networks instead of
spec.virtual_interface_networks.
We don't need to run an extra command (mgr module ls) to obtain the mgr
modules list since we already have this information in the mgr_map.
This workflow is already done for the monitoring stack or for configuring
the iscsi integration within the dashboard (during creation) via the
config_dashboard method.
The mgr_map is mocked in the tests with the dashboard module enabled so we
don't need _mon_command_mock_mgr_module_ls anymore.
Sebastian Wagner [Tue, 20 Jul 2021 14:09:57 +0000 (16:09 +0200)]
cephadm: haproxy 2.4 defaults to a different container user.
Another alternative would be to investigage a different setup
leverageing `--sysctl net.ipv4.ip_unprivileged_port_start=0`,
but that would be a larger PR.
Fixes: https://tracker.ceph.com/issues/51355 Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
(cherry picked from commit 250064bdcbe778b3cc245df843d14dd19cbb8772)
Michael Fritch [Fri, 9 Jul 2021 01:35:42 +0000 (19:35 -0600)]
cephadm: use CephadmContext rather than MagicMock
MagicMock hides attribute errors:
```
self = <cephadm.CephadmContext object at 0x7f1121e62370>, name = 'config_json'
def __getattr__(self, name: str) -> Any:
if '_conf' in self.__dict__ and hasattr(self._conf, name):
return getattr(self._conf, name)
elif '_args' in self.__dict__ and hasattr(self._args, name):
return getattr(self._args, name)
else:
> return super().__getattribute__(name)
E AttributeError: 'CephadmContext' object has no attribute 'config_json'
```