Zac Dover [Mon, 3 Oct 2022 12:51:35 +0000 (22:51 +1000)]
doc/glossary.rst: remove duplicates
This commit removes similar but distinct entries for the following:
* CephFS
* Ceph Client
Removal of a glossary term that is referred to in the body of the
documentation suite requires the alteration of the text string
that refers to the glossary term. Alterations of this kind have
been made to doc/architecture.rst and doc/rados/api/index.rst.
This PR rewrites the front matter in the "Erasure Code"
section of the RADOS documentation. Previously, the information
in this section was syntactically confused. I have also fleshed
out the distinction between erasure coding and replication.
Laura Flores [Wed, 28 Sep 2022 17:43:40 +0000 (17:43 +0000)]
mgr/telemetry: handle daemons with complex ids
Treating daemons as `<daemon_type>.x` caused a crash
in the Telemetry module since the current method does not cover a case
where a daemon id is more complex, i.e. `<daemon_type>.x.y`.
When we parse the daemon type and daemon id, we should
split it into a maximum of two pieces rather than splitting
it by every `.` character. Specifying `1` in the Python
.split() function will limit the split to a maximum of two items.
Fixes: https://tracker.ceph.com/issues/57700 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 97833a6a81fed7f868e1d544816cfbdf254fdb43)
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Fixes issue when pid file config comes empty from config dump which prevents to add metrics. Also get process metrics only if
pid_path isn't empty.
Nizamudeen A [Fri, 16 Sep 2022 07:20:26 +0000 (12:50 +0530)]
mgr/dashboard: use service call instead of form component
For creating the silence from the notification sidebar, instead of using
the silence form which will require initializing the whole component on
the landing page, we can just call the prometheus service and pass on
the required data to the service call. This will fix showing the
`Prometheus not configured` error everytime we visit the landing page when
the prometheus is not configured
bea9f4b643c introduced a regression that makes the activate process
take a very long time to complete.
`_get_bluestore_info()` which calls `ceph-bluestore-tool` binary via
subprocess is called in an exponential way while this is not needed.
mgr/dashboard: Add details to the modal which displays the `safe-to-destroy` result
- Add warnings type information in the case of the OSDs are not safe to destroy
- Add info type information in the case of the OSDs are safe to destroy
Fixes: https://tracker.ceph.com/issues/37327 Signed-off-by: Francesco Torchia <francesco.torchia@suse.com>
(cherry picked from commit 0d6100bbf99ffa8da0e099343ede050f1cca509c)
Adam King [Mon, 22 Aug 2022 15:14:12 +0000 (11:14 -0400)]
mgr/cephadm: allow setting prometheus retention time
When we deploy Prometheus server, we don't provide any
ability to define the tsdb retention time - so it defaults to 15d.
This change adds a field that can be passed in a prometheus service
spec that will be passed as an arg to the --storage.tsdb.retention.time
parameter for the prometheus daemon.
/dev/vdc1 can't be zapped if it still holds an lv mapper.
let's use --destroy in the lvm zap command in order to remove
the held lv mapper before zapping the partition and recreate the partition after.
Nizamudeen A [Fri, 12 Aug 2022 15:34:23 +0000 (21:04 +0530)]
mgr/dashboard: osd form preselect db/wal device filters
If the hostname is selected for the primary devices, then we can
preselect the hostname filter for the db/wal devices because osds will
be deployed only on the hostname of the primary device. If preselected
it'll be clear that only this devices will be used to deploy.
Addition to this, usually ssd devices are used for db/wal devices. So I
am preselecting these too in the filters.
This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
John Mulligan [Mon, 29 Aug 2022 14:03:01 +0000 (10:03 -0400)]
qa/tasks/kubeadm: set up tigera resources via kubectl create
Fixes: https://tracker.ceph.com/issues/57268
The tigera operator for the calico CNI has some pretty large resource
definitions. The length of the definitions can cause the "client side
apply", the default mode for `kubectl apply ....`, to fail due to the
length of the needed annotation that would result:
```
2022-08-22T20:24:55.636 INFO:teuthology.orchestra.run.smithi087.stdout:clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
2022-08-22T20:24:55.670 INFO:teuthology.orchestra.run.smithi087.stdout:deployment.apps/tigera-operator created
2022-08-22T20:24:55.671 INFO:teuthology.orchestra.run.smithi087.stderr:The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
2022-08-22T20:24:55.674 DEBUG:teuthology.orchestra.run:got remote process result: 1
```
There are two simple options for avoiding this error. One is to use
`kubectl create`. The create command will not make this lengthy
annotation. It will fail if any of the resources already exist. The
other option is to use server-side apply, via the `kubectl apply
--server-side ...` command. It is new in k8s 1.18. It will not create
the annotation either.
The block of code setting up the CNI already uses `kubectl create` to
create the custom resources that configure the tigera operator.
Therefore it should be safe to assume the block of code in question
doesn't need to be idempotent and we can also use `kubectl create`
elsewhere in the same block.
Adam King [Thu, 1 Sep 2022 12:37:39 +0000 (08:37 -0400)]
mgr/cephadm: don't use "sudo" in commands if user is root
We had a patch earlier to make us not use sudo unless the
user is not root for our other commands, but this specific
one that just runs "true" with a timeout to check if the host
is online was missed.
Adam King [Thu, 1 Sep 2022 17:08:13 +0000 (13:08 -0400)]
mgr/cephadm: fix tuned profiles getting removed if name has dashes
Previously, due to the way it was just splitting the file name on
dashes to try and get the profile name, any profile name with dashes
was not getting properly matched and would therefore get marked
stray and removed. This new strategy instead tries to match the actual
expected file name.
Adam King [Tue, 16 Aug 2022 14:43:39 +0000 (10:43 -0400)]
mgr/cephadm: make setting --cgroups=split configurable
Previously, we were just always setting this as long
as users were using podman with a high enough version,
but it seems a user has run into an issue where their
daemons are failing to deploy with
Error: could not find cgroup mount in "/proc/self/cgroup"
Zac Dover [Fri, 12 Aug 2022 21:53:21 +0000 (07:53 +1000)]
doc/rados: add prompts to pools.rst
This commit adds ".. prompt:: bash $"-style prompts to pools.rst.
This brings this file up to the standard established in 2020 when
Kefu added support for the ".. prompt::" directive.
This commit is a part of an initiative to modernize the presentation
of all BASH commands in the RADOS documentation.
The progress of this project can be tracked here:
https://tracker.ceph.com/issues/57108
When generating tags the order of endpoints wasn't taken into account.
Two endpoints with the same url prefix, for example `/api/cluster/` and
`/api/cluster/user`, have different docs and the tags is generated from
a doc of one of these two, and since the order of these endpoints might
vary it is imperative to sort them to have a deterministic output.
Nizamudeen A [Wed, 24 Aug 2022 07:47:50 +0000 (13:17 +0530)]
mgr/dashboard: fix unable to create ingress unmanaged
the following snipped is the error from backend
```
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 698, in _from_json_impl
_cls.validate()
File "/lib/python3.6/site-packages/ceph/deployment/service_spec.py", line 1058, in validate
'Cannot add ingress: No frontend_port specified')
ceph.deployment.hostspec.SpecValidationError: Cannot add ingress: No frontend_port specified
```
It looks like even if we set unmanaged flag, we need to input the
backend_service, frontend_port, monitor_port and virtual_ip, because there is a
validation going for that in the backend.
test/{librbd, rgw}: increase delay between and number of bind attempts
Commit aa7885f7cc41 ("test/{librbd, rgw}: retry when bind fail with
port 0") reduced the frequency of sporadic unit test failures caused
by EADDRINUSE a lot, but not entirely.
Currently, it yields a cumulative sleep of ~9 seconds. Let's increase
that to 1 minute.
Lucian Petrut [Fri, 26 Aug 2022 12:54:10 +0000 (12:54 +0000)]
include: fix IS_ERR on Windows
The "long" type uses 32b on x64 Windows platforms, which means
it's not large enough to store a pointer. intptr_t or uintptr_t
should be used instead.
This change fixes include/err.h, using the right types. There was
a previous patch on this topic but unfortunately it didn't address
all the type casts.
This issue was brought up by the unittest_crush test, which recently
started to fail as the CrushWrapper methods use IS_ERR.