Patrick Donnelly [Wed, 16 Sep 2020 19:28:55 +0000 (12:28 -0700)]
mon: allow overriding the initial mon_host
This overrides what the CephContext believes to be the current quorum of
monitors (retrieved from other instances of the MonClient), introduced
by [1]. Tests need to be able to target a specific monitor for
exercising forwarding and other things.
mon: store mon updates in ceph context for future MonMap instantiation
MonMap builds initial mon list using provided sources, like
mon-host or monmap.
For future instantiations of MonClient, if mon addresses are
updated, stale information from the provided sources are used.
This commit retains mon updates that are processed by the
MonClient in CephContext, for use in MonMap instantiations
and hence uses updated information as required.
This is helpful in cases where librados or libcephfs
instantiate MonClient in the ceph-mgr deamon as required.
RTD does not support installing system packages, the only ways to install
dependencies are setuptools and pip. while ditaa is a tool written in
Java. so we need to find a native python tool allowing us to render ditaa
images. plantweb is able to the web service for rendering the ditaa
diagram. so let's use it as a fallback if "ditaa" is not around.
also start a new line after the directive, otherwise planweb server will
return 500 at seeing the diagram.
doc/conf.py: exclude pybindings docs from build for RTD
because it'd difficult to prepare (dummy) librados,libcephfs and librbd for
their python bindings in the building environment offered by Read the Docs.
Ilya Dryomov [Sat, 29 Aug 2020 10:02:30 +0000 (12:02 +0200)]
msg/async/ProtocolV2: allow rxbuf/txbuf get bigger in testing
We have a kernel client test case that constructs huge auth tickets
to exercise the three related code paths in the kernel. One of the
tickets is bigger than 1000000 bytes, as required for triggering the
third code path.
We haven't bumped into this assert earlier because the kernel client
is still on msgr v1. However, "rbd map" and "rbd unmap" commands
started connecting to the cluster in commit 96f05a7956b3 ("rbd: delay
determination of default pool name") and that happens via msgr v2.
mgr/cephadm: Add comments to secondary contaieners
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com> Co-authored-by: Michael Fritch <mfritch@suse.com>
(cherry picked from commit 98c0119833b9fc9648d667f002d84b1ccb75f334)
Matthew Oliver [Mon, 17 Aug 2020 01:08:56 +0000 (11:08 +1000)]
cephadm: auto wrap and unwrap ipv6 addresses
This patch attempts to simplify IPv6 support in cephadm by automatically
wrapping and unwrapping IPv6 addresses when required.
There are some asumptions though, if you are supplyings an IPv6 addrv
then it needs to be wrapped. But because you are specifiying, you should
know what your doing.
But in general, it means in bootstrap you should be able to supply ipv6
addresses wrapped or not so long as there isn't a post appended.
Fixes: https://tracker.ceph.com/issues/46922 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit 09eac4bef0f04f5db7118f94dd9679f3295bddf8)
Adam King [Thu, 27 Aug 2020 16:22:49 +0000 (12:22 -0400)]
mgr/cephadm: Verify non-empty list in get_active_daemon functions
The get_active_daemon functions for monitoring stack daemons
were just returning the first or last daemon in the given list
without checking the list actually contained any daemons
Matthew Oliver [Thu, 27 Aug 2020 02:44:40 +0000 (12:44 +1000)]
cephadm: Give better access to the /dev in the iscsi container
In testing it seems the main iscsi container's /dev related volume mount to just
/dev/log is too narrow. And in certain circumstances it needs to access
to see /dev/rbd* devices. Like if using krdb.
This patch volume mounts /dev rather then /dev/log in the main
container, but since this aligns with what we need in the tcmu-runner
container it actaully ends up simplifying the code as well.
cephadm: deploying of monitoring images partially broken
Deployment of monitoring images has been broken in the context of
ceph-salt. Due to the removal of the registries in
/etc/containers/registries.conf, all images need to be provided
qualified.
Fixes: https://tracker.ceph.com/issues/46726 Signed-off-by: Patrick Seidensal <pseidensal@suse.com>
(cherry picked from commit 658230048bff5907d6dad8e216a540014a0898f1)
Paul Cuzner [Tue, 11 Aug 2020 21:22:27 +0000 (09:22 +1200)]
cephadm: remove py2 from tox tests
cephadm is dependent on py3, and the presence of py2 in
tox is just going to throw more errors over time as py3
idioms are adopted. py2 is EOL, and older hosts should
have py3 available as an installation option anyway.
Paul Cuzner [Wed, 19 Aug 2020 23:12:09 +0000 (11:12 +1200)]
cephadm: make devices lists more granular in gather-facts
The gather-facts commands returns a list of hdd and flash
devices. This patch changes the list content from being
simple strings representing the disks to a dict, making it
easier to extend in the future.
Paul Cuzner [Mon, 10 Aug 2020 06:03:41 +0000 (18:03 +1200)]
cephadm: fixes to address PR requirements
Minor changes to address issues raised in the PR;
- formatting of ipv6 addresses
- missing docstrings
- NIC mtu and speed now int instead of string
- added NIC driver name
- removed discrete JSON method making gather-facts JSON only
- added upper/lower device lists to show NIC relationships
- added hostname to the JSON!
- added selinux/apparmor status
- added timestamp (epoch) for the gather-facts run
- added system uptime (secs)
Paul Cuzner [Fri, 7 Aug 2020 04:28:30 +0000 (16:28 +1200)]
cephadm: add gather_facts command
The gather_facts command is intended to provide host
level metadata to the caller, which could then be used
in a number of places: orchestrator inventory, displayed
in the UI etc
Shraddha Agrawal [Wed, 19 Aug 2020 10:54:18 +0000 (16:24 +0530)]
qa/tasks/cephadm.py: add ceph logs directory in job's info.yaml
This commit adds the file path of ceph log directories to the job's
info.yaml log file. The motivation behind this is, in case of job
timeout, the logs would still be tranferred to teuthology host
before nuking test machines using these ceph log directory paths in
job's info.yaml log file.
Andrew Schoen [Fri, 4 Sep 2020 14:44:49 +0000 (09:44 -0500)]
ceph-volume: simple scan should ignore tmpfs
When simple scan is ran against a ceph-volume
OSD, util.encryption.legacy_encrypted returns
tmpfs. We want to avoid creating a Device
object with tmpfs and ignore the OSD as it's
not a ceph-disk created OSD.
mon/OSDMonitor: only take in osd into consideration when trimming osdmaps
we should not take down osd into consideration when trimming osdmap. in e62269c892, we decrease the upper bound of range of osdmaps to be trimmed
if the given osd is out. but we should have to decrease it only if the
osd in question is still *in*.
so, in this change, the min_lec is decreased only if the osd in question
is *in*.
Kept to keep upgrades from older point releases working.
This module can be removed as soon as we no longer
support upgrades from old octopus point releases.
Sébastien Han [Tue, 18 Aug 2020 13:41:31 +0000 (15:41 +0200)]
ceph-volume: retry when acquiring lock fails
When preaparing the osd device with --mkfs, the ceph-osd binary tries to
acquire an exclusive lock on the device (soon to become an OSD).
Unfortunately, when running in containers, we have seen cases where
there is a race between ceph-osd and systemd-udevd to acquire a lock on
the device. Sometimes systemd-udevd gets the lock and releases it soon
so that the ceph-osd gets sometimes the lock is still held and because
ceph-osd uses LOCK_NB the command fails.
This commit retries if the lock cannot be acquired, up to 5 times for 5
seconds, this should be more than enough to acquire the lock and
proceed with the OSD mkfs.
Unfortunately, this is so transient that we cannot lock earlier from c-v,
this won't do anything.
Satoru Takeuchi [Fri, 22 May 2020 01:45:32 +0000 (01:45 +0000)]
ceph-volume: show correct rejected reason in inventory if device type is not acceptable
If device type is not acceptable in `c-v inventory`, its rejected reason
becomes "Insufficient space (<5GB)" by mistake. It's because sys_api is
empty due to skipping devices that are neither `disk` nor `device`. We
should report the target device is not acceptable in this case.