Zac Dover [Sun, 9 Oct 2022 07:09:30 +0000 (17:09 +1000)]
doc/various: update link to CRUSH pdf
This commit updates link to the research paper that announces and
explains the CRUSH algorithm. This link was broken in the migration from
the old Ceph website to ceph.io.
This commit makes several refinements to the English in
rados/operations/crush-map-edits.rst, which refinements were suggested
by Cole Mitchell and Anthony D'Atri in the discussion of PR#48085.
This PR updates the prompts in crush-map-edits.rst
to make them unselectable.
There remains no good known way to render parts of
a file if the first line of that file begins with a
hash (#). Sphinx italicizes such a first line, which
is not what I want. Two examples of this are present
in the file crush-maps-rst under the section called
"CRUSH Map Bucket Types". I set this down here for
my own records, in case it is helpful in finding a
way to make these sections render as I would prefer.
Zac Dover [Mon, 3 Oct 2022 12:51:35 +0000 (22:51 +1000)]
doc/glossary.rst: remove duplicates
This commit removes similar but distinct entries for the following:
* CephFS
* Ceph Client
Removal of a glossary term that is referred to in the body of the
documentation suite requires the alteration of the text string
that refers to the glossary term. Alterations of this kind have
been made to doc/architecture.rst and doc/rados/api/index.rst.
This PR rewrites the front matter in the "Erasure Code"
section of the RADOS documentation. Previously, the information
in this section was syntactically confused. I have also fleshed
out the distinction between erasure coding and replication.
Laura Flores [Wed, 28 Sep 2022 17:43:40 +0000 (17:43 +0000)]
mgr/telemetry: handle daemons with complex ids
Treating daemons as `<daemon_type>.x` caused a crash
in the Telemetry module since the current method does not cover a case
where a daemon id is more complex, i.e. `<daemon_type>.x.y`.
When we parse the daemon type and daemon id, we should
split it into a maximum of two pieces rather than splitting
it by every `.` character. Specifying `1` in the Python
.split() function will limit the split to a maximum of two items.
Fixes: https://tracker.ceph.com/issues/57700 Signed-off-by: Laura Flores <lflores@redhat.com>
(cherry picked from commit 97833a6a81fed7f868e1d544816cfbdf254fdb43)
Signed-off-by: Avan Thakkar <athakkar@redhat.com>
Fixes issue when pid file config comes empty from config dump which prevents to add metrics. Also get process metrics only if
pid_path isn't empty.
Nizamudeen A [Fri, 16 Sep 2022 07:20:26 +0000 (12:50 +0530)]
mgr/dashboard: use service call instead of form component
For creating the silence from the notification sidebar, instead of using
the silence form which will require initializing the whole component on
the landing page, we can just call the prometheus service and pass on
the required data to the service call. This will fix showing the
`Prometheus not configured` error everytime we visit the landing page when
the prometheus is not configured
bea9f4b643c introduced a regression that makes the activate process
take a very long time to complete.
`_get_bluestore_info()` which calls `ceph-bluestore-tool` binary via
subprocess is called in an exponential way while this is not needed.
mgr/dashboard: Add details to the modal which displays the `safe-to-destroy` result
- Add warnings type information in the case of the OSDs are not safe to destroy
- Add info type information in the case of the OSDs are safe to destroy
Fixes: https://tracker.ceph.com/issues/37327 Signed-off-by: Francesco Torchia <francesco.torchia@suse.com>
(cherry picked from commit 0d6100bbf99ffa8da0e099343ede050f1cca509c)
Adam King [Mon, 22 Aug 2022 15:14:12 +0000 (11:14 -0400)]
mgr/cephadm: allow setting prometheus retention time
When we deploy Prometheus server, we don't provide any
ability to define the tsdb retention time - so it defaults to 15d.
This change adds a field that can be passed in a prometheus service
spec that will be passed as an arg to the --storage.tsdb.retention.time
parameter for the prometheus daemon.
/dev/vdc1 can't be zapped if it still holds an lv mapper.
let's use --destroy in the lvm zap command in order to remove
the held lv mapper before zapping the partition and recreate the partition after.
Nizamudeen A [Fri, 12 Aug 2022 15:34:23 +0000 (21:04 +0530)]
mgr/dashboard: osd form preselect db/wal device filters
If the hostname is selected for the primary devices, then we can
preselect the hostname filter for the db/wal devices because osds will
be deployed only on the hostname of the primary device. If preselected
it'll be clear that only this devices will be used to deploy.
Addition to this, usually ssd devices are used for db/wal devices. So I
am preselecting these too in the filters.
NitzanMordhai [Thu, 18 Aug 2022 11:33:15 +0000 (11:33 +0000)]
pybind/rados: notify callback reconnect
when testing with socket injection, reconnect won't call error callback
for each reconnect, the callback will be called.
changing the callback count of notify by the data and increass only
when the data is changed, if the data is the same, we probably reconnecting
due to socket injection.
This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
This commit caused a regression in the rados suite, as evidenced by:
- with the commit:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_15:11:39-rados-quincy-release-distro-default-smithi/
- with the commit reverted:
http://pulpito.front.sepia.ceph.com/lflores-2022-09-14_17:02:02-rados-wip-lflores-testing-quincy-release-distro-default-smithi/
Fixes: https://tracker.ceph.com/issues/57546 Signed-off-by: Laura Flores <lflores@redhat.com>
John Mulligan [Mon, 29 Aug 2022 14:03:01 +0000 (10:03 -0400)]
qa/tasks/kubeadm: set up tigera resources via kubectl create
Fixes: https://tracker.ceph.com/issues/57268
The tigera operator for the calico CNI has some pretty large resource
definitions. The length of the definitions can cause the "client side
apply", the default mode for `kubectl apply ....`, to fail due to the
length of the needed annotation that would result:
```
2022-08-22T20:24:55.636 INFO:teuthology.orchestra.run.smithi087.stdout:clusterrolebinding.rbac.authorization.k8s.io/tigera-operator created
2022-08-22T20:24:55.670 INFO:teuthology.orchestra.run.smithi087.stdout:deployment.apps/tigera-operator created
2022-08-22T20:24:55.671 INFO:teuthology.orchestra.run.smithi087.stderr:The CustomResourceDefinition "installations.operator.tigera.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes
2022-08-22T20:24:55.674 DEBUG:teuthology.orchestra.run:got remote process result: 1
```
There are two simple options for avoiding this error. One is to use
`kubectl create`. The create command will not make this lengthy
annotation. It will fail if any of the resources already exist. The
other option is to use server-side apply, via the `kubectl apply
--server-side ...` command. It is new in k8s 1.18. It will not create
the annotation either.
The block of code setting up the CNI already uses `kubectl create` to
create the custom resources that configure the tigera operator.
Therefore it should be safe to assume the block of code in question
doesn't need to be idempotent and we can also use `kubectl create`
elsewhere in the same block.
Adam King [Thu, 1 Sep 2022 12:37:39 +0000 (08:37 -0400)]
mgr/cephadm: don't use "sudo" in commands if user is root
We had a patch earlier to make us not use sudo unless the
user is not root for our other commands, but this specific
one that just runs "true" with a timeout to check if the host
is online was missed.
Adam King [Thu, 1 Sep 2022 17:08:13 +0000 (13:08 -0400)]
mgr/cephadm: fix tuned profiles getting removed if name has dashes
Previously, due to the way it was just splitting the file name on
dashes to try and get the profile name, any profile name with dashes
was not getting properly matched and would therefore get marked
stray and removed. This new strategy instead tries to match the actual
expected file name.
Adam King [Tue, 16 Aug 2022 14:43:39 +0000 (10:43 -0400)]
mgr/cephadm: make setting --cgroups=split configurable
Previously, we were just always setting this as long
as users were using podman with a high enough version,
but it seems a user has run into an issue where their
daemons are failing to deploy with
Error: could not find cgroup mount in "/proc/self/cgroup"
Zac Dover [Fri, 12 Aug 2022 21:53:21 +0000 (07:53 +1000)]
doc/rados: add prompts to pools.rst
This commit adds ".. prompt:: bash $"-style prompts to pools.rst.
This brings this file up to the standard established in 2020 when
Kefu added support for the ".. prompt::" directive.
This commit is a part of an initiative to modernize the presentation
of all BASH commands in the RADOS documentation.
The progress of this project can be tracked here:
https://tracker.ceph.com/issues/57108