Sebastian Wagner [Thu, 16 Jul 2020 14:21:46 +0000 (16:21 +0200)]
Merge pull request #36109 from sebastian-philipp/octopus-backport-35890-35913-35908-35927-35813-35717-35990-35915-35747-36013
octopus: cephadm batch backport July (2)
Reviewed-by: Adam King <adking@redhat.com> Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com> Reviewed-by: Kiefer Chang <kiefer.chang@suse.com>
Stephan Müller [Wed, 1 Jul 2020 14:27:50 +0000 (16:27 +0200)]
cephadm: Make Vagrantfile more flexible
Now you can use a JSON or pass multiple variable to vagrant in order to
configure the outcome of VMs you get. Similar to vstart.sh you can use
OSDS, MGRS and MONS as arguments to pass. As OSDS behave a bit different
in this scenario you can also specify the amount of extra disks and OSD
VM has.
Fixes: https://tracker.ceph.com/issues/46376 Signed-off-by: Stephan Müller <smueller@suse.com>
(cherry picked from commit c767a0c0e8ffed4448e4d2cacef72674e7ada883)
This fix the error that can be seen in:
https://jenkins.rook.io/blue/rest/organizations/jenkins/pipelines/rook/pipelines/rook/branches/master/runs/2046/nodes/63/steps/121/log/?start=0
Patrick Donnelly [Sat, 27 Jun 2020 17:49:08 +0000 (10:49 -0700)]
vstart.sh: use output of hostname for cephadm
Otherwise I get this error on a dev machine:
/home/pdonnell/ceph/build/bin/ceph -c /home/pdonnell/ceph/build/ceph.conf -k /home/pdonnell/ceph/build/keyring orch host add senta03
Error ENOENT: New host senta03 (senta03) failed check: ['INFO:cephadm:podman|docker (/bin/podman) is present', 'INFO:cephadm:systemctl is present', 'INFO:cephadm:lvcreate is present', 'INFO:cephadm:Unit chronyd.service is enabled and running', 'INFO:cephadm:Hostname "senta03" matches what is expected.', 'ERROR: hostname "senta03.front.sepia.ceph.com" does not match expected hostname "senta03"']
If `hostname` is configured to give the fqdn, we get the above error
from cephadm.
Check container_image_name only if ceph cluster image is not pre-defined in config.
We shouldn't care about container_image_name if there cephadm or ceph already have image defined.
Matthew Oliver [Thu, 2 Jul 2020 08:21:53 +0000 (18:21 +1000)]
cephadm: Make list_networks ipv6 enabled
Currently the list_network command and methods in cephadm only run and
parse ipv4 output from `ip route`.
This patch extends the list_network command and internal methods to be
ipv6 enabled. It now also checks `ip -6 route` and `ip -6 addr` to
gather gather all networks from both protocol families.
Matthew Oliver [Fri, 26 Jun 2020 00:15:12 +0000 (00:15 +0000)]
cephadm: ceph-iscsi remove pool from cap
When we create a ceph-iscsi daemon/continer in cephadm we create a user
and set some caps. Turns out we were a little too restrictive.
We were locking down to only access the pool that was given in the spec,
which happens to be the pool the iscsi config is stored. But in reality
we need to be able to attach any rbd images which could exist in other
pools.
So this patch removes the `pool=` from the osd cap, so from:
osd = allow rwx pool={spec.pool}
To:
osd = allow rwx
Fixes: https://tracker.ceph.com/issues/46138 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit 8cf51251a3299bf5a65ea338f9fb06c4f3052ad1)
Matthew Oliver [Thu, 18 Jun 2020 01:39:39 +0000 (01:39 +0000)]
cephadm: Set ms bind ipv6 when mon-ip is ipv6
If you use cephadm bootstrap with an ipv6 mon ip then currently you'll
get into a address family split-brain state, where the mon's messenger
connects and binds to ipv6 but the mgr's binds to ipv4 (usually
0.0.0.0). In this state the bootstrap process hangs as it attempts to
talk and get the mgr state.
A work around is to have `ms bind ipv6 = true` in a ceph conf
you can then pass to bootstrap which gets pulled in and set in mon's
config store.
This patch sets `ms bind ipv6 = true` to the global section in the
mon config store when the mon-ip argument is an ipv6 address.
Fixes: https://tracker.ceph.com/issues/45016 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit 08ba08f7bb5b577ad3c3895e2c7f9f4d4555f185)
Matthew Oliver [Wed, 20 May 2020 00:22:45 +0000 (10:22 +1000)]
cephadm: Give iscsci a RO /lib/modules bind mounted
The ceph iscsi container needs to be able to insert the iscsi_target_mod
but it doesn't exist in the container. for security reasons bind
mounting /lib/modules seems a little dangerous unless we can mount it
RO.
Unfortuntly the docker volume mount (-v) doesn't allow you mount
readonly, adding a `--read-only` actaully does the opposite, makes the
root on the container RO and expects you to write to the mounted volumes
(-v).
However, we get more grainular control over bind mount options if we use
`--mount`[0]. Here we can still bind mound the volume into the container,
but can also add additional options, like bind mounting RO.
This patch adds at addiontal `bind_mounts` option to the CephContainer
along side `volume_mounts`. The `bind_mounts` take a List[List[str]]:
And this is plumbed through into cephadm. Bind_mounts only needs to be
used if you need a little more control over the mounting, otherwise the
volume_mounts are easier to use.
Fixes: https://tracker.ceph.com/issues/45252 Signed-off-by: Matthew Oliver <moliver@suse.com>
(cherry picked from commit d9b5371478b744920cf14e1b34b7d63226c71050)
Kiefer Chang [Thu, 18 Jun 2020 07:42:50 +0000 (15:42 +0800)]
stop.sh: do not block script when there is no running cluster
A query for current fsid is called inside `do_killcephadm`. This blocks
the script when there is no running cluster. The fix avoids entering the
function if cephadm command fails or returns no daemons.
The change also hides the following output for non-cephadm environments:
```
Unable to locate any of ['podman', 'docker']
```
Michael Fritch [Mon, 15 Jun 2020 21:22:08 +0000 (15:22 -0600)]
cephadm: sort the list of inferred fsids
$ cephadm shell
ERROR: Cannot infer an fsid, one must be specified: ['1d5df33f-eb94-4a4f-b192-1d5e770ed0e7', 'unknown']
$ cephadm shell
ERROR: Cannot infer an fsid, one must be specified: ['unknown', '1d5df33f-eb94-4a4f-b192-1d5e770ed0e7']
Instead of printing out a traceback if adding the host fails
during bootstrapping process, should now print error message
telling user host failed to be added
Sebastian Wagner [Tue, 12 May 2020 12:07:32 +0000 (14:07 +0200)]
cephadm: Manually remove containers
This fixes:
```
Error: error creating container storage: the container name "ceph-<fsid>-mon.b" is already in use by "<container-id>". You have to remove that container to be able to reuse that name.: that name is already in use
```
Incorrect conflict-solving during backporting of https://github.com/ceph/ceph/pull/34606
to Octopus (https://github.com/ceph/ceph/pull/35926) led to deletion of
some chuncks in OSD list page.
Venky Shankar [Wed, 18 Mar 2020 07:25:47 +0000 (03:25 -0400)]
mds: do not defer incoming mgrmap when mds is laggy
When the mds is laggy, the incoming mgrmap is queued to be processed
at a later stage. But, the mds does not handle mgrmap message directly.
So, later when the mds is not laggy anymore, the mgrmap message is not
handled and is dropped. But, when the mgrmap message was queued up, the
mds acknowledges that it has handled the message. This causes the mgr
client instance to never process the mgrmap and never connecting to the
manager (the receipt of mgrmap drives the connection to the manager).
The fix is to not acknowledge messages that the mds cannot handle. In
normal cases, the mds does not ack the message but when it's laggy, it
just blindly queues up the message -- so, check if the message can be
handled (later) even when the mds is laggy.
Also, a minor change in a function name -- handle_deferrable_message()
is kind of a misnomer since the function is called to process messages
that are not deferred. That's changed to handle_message() now.
xie xingguo [Sat, 13 Jun 2020 07:28:31 +0000 (15:28 +0800)]
osd/PeeringState: fix history.same_interval_since of merge target again
The symptom looks much like we see in
https://tracker.ceph.com/issues/37654.
The root cause is that both merge source and target could be
fabricated PGs (aka placeholders), hence merge target's
same_interval_since could remain 0 after merge.
Fix by adjusting history.same_interval_since to last_epoch_clean
reported by these PGs were found to be ready for merge.
This peer is going to be ignored/purged by primary anyway later
when peering is done.
Jianpeng Ma [Tue, 21 Apr 2020 00:44:53 +0000 (08:44 +0800)]
osd/OSD: wakeup all threads of shard.
In our test(4NVME), we found for 4K randread(8/16 mean shard, 2/1 mean
thread_per_shard):
QD 8_2(IOPS(k)) 16_1(IOPS(K)) 8_2(apply patch)
32 191 263 263.5