Zac Dover [Tue, 14 Nov 2023 13:40:42 +0000 (23:40 +1000)]
doc/glossary: add "Quorum" to glossary
Add the term "Quorum" to the glossary and link to the part of
architecture.rst concerning Monitors. The sticky header at the top of
the docs.ceph.com website gets in the way of the location linked to in
this commit, but fatigue and disgust prevent me from spending time today
trial-and-erroring my way through the hostile and ill-documented
wilderness of scroll-margin so that the link goes where it should.
Co-authored-by: Anthony D'Atri <anthony.datri@gmail.com> Signed-off-by: Zac Dover <zac.dover@proton.me>
Ramana Raja [Thu, 2 Nov 2023 21:44:10 +0000 (17:44 -0400)]
pybind/mgr: remove __del__() of mgr modules
It's strongly recommended for objects that have references to
external resources (e.g., files) to explicitly release them.
Python doesn't guarantee garbage collection of objects and hence
doesn't guarantee freeing of external resources that occur on
garbage collection.
The __del__() methods in the python mgr modules may not even be
called since garbage collection of objects is not guaranteed in python.
And some of the __del__() methods try to cleanup that seem redundant.
- In volumes/module.py, vc.shutdown() is called in Module.shutdown().
No need to call it again in Module.__del__()
- In telegraf/basesocket.py, BaseSocker.close() is called in
BaseSocket.__exit__(). No need to call it again in
BaseSocket.__del__().
- In mgr_module.py, MgrModuleLoggingMixin._unconfigure_logging() is
called in MgrModule.__init__() and MgrStandbyModule.__init__(). No
need to call it in MgrModule.__del__() and
MgrStandbyModule.__del__().|
- In dashboard/services/cephfs.py, the libcephfs mount is not
shutdown explicitly by the mgr module. However, the cython libcephfs
bindings has a LibCephFS.__dealloc__() finalizer method that calls
LibCephFS.shutdown(). This should unmount and cleanup the ceph mount
handle.
Remove the __del__() of the python mgr modules.
Fixes: https://tracker.ceph.com/issues/63421 Signed-off-by: Ramana Raja <rraja@redhat.com>
Zac Dover [Mon, 13 Nov 2023 10:57:07 +0000 (20:57 +1000)]
doc/rados: format "initial troubleshooting"
Format the steps in the "Initial Troubleshooting" section of
doc/rados/troubleshooting/troubleshooting-mon.rst. A near-future PR (not
this one) will add context to this section and explain that the steps
described here are the first steps that you should undertake when you
determine that you have an unresponsive or down Monitor. This PR is
merely for formatting.
The operation's id and future returned when starting SnapTrimObjSubEvent
is emplaced into subop_blocker.
Later on, we await the completion of all the started operations futures.
Before this patch, we only stored the op id in the subop_blocker vector
which allowed `op` to go out of scope and lose all its references
(and get deleted) before exiting.
Storing the operation as a reference instead of the id
will maintain the SnapTrimObjSubEvent operation lifetime.
Zac Dover [Sun, 12 Nov 2023 10:52:09 +0000 (20:52 +1000)]
doc/rados: parallelize t-mon headings
Give parallel structure to the questions in the Q&A section of the "The
Cluster Has Quorum But At Least One Monitor Is Down" subsection of the
"Most Common Monitor Issues" section of
doc/rados/troubleshooting/troubleshooting-mon.rst.
Zac Dover [Sun, 12 Nov 2023 10:21:41 +0000 (20:21 +1000)]
doc/config: edit "ceph-conf.rst"
Edit the first section of doc/rados/configuration/ceph-conf.rst.
Initially I just wanted to change "series" to "set", but once I got my
hands dirty I ended up simplifying some sentences.
John Mulligan [Sat, 4 Nov 2023 23:41:25 +0000 (19:41 -0400)]
cephadm: change ceph & exporter to customize_container_mounts method
Unlike the other types Ceph and CephExporter share the underlying
method. There was no other use of get_container_mounts on the class
so it could be converted to be customize_container_mounts.
Because there's an extra arg that passes from get_container_mounts
top-level function to Ceph.get_ceph_mounts, that function was not
changed.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 4 Nov 2023 22:39:07 +0000 (18:39 -0400)]
cephadm: move volume mounts assignment to a variable
Move the call to get_container_mounts out of the function call block.
This will aid with the next refactoring steps, so that the uses
of get_container_mounts can be brought into the get_container call
directly.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
As part of the Zipper project generic back-end code is being teased
apart from rados-specific back-end code. This is a work in progress,
so currently generic code and other subclasses of StoreDriver (and
related high-level classes) depend on the rados-specific declarations.
Some of these dependencies are not always obvious since
src/rgw/driver/rados was put on the include path. That is now removed,
so any includes needing files from that subclass have to give a more
fully specified path.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
John Mulligan [Fri, 20 Oct 2023 18:22:40 +0000 (14:22 -0400)]
cephadm: convert get_container_binds to use class based approach
Since all types affected by get_container_binds now have the common
customize_container_binds, use a generic class-based approach by
creating an instance of ContainerDaemonForm and calling the method.
All other classes have a customize_container_binds that is a no-op.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 20 Oct 2023 17:56:31 +0000 (13:56 -0400)]
cephadm: only call get_container_binds on types that have binds
This is a step towards not calling get_container_binds in get_container.
A future commit will replace uses of get_container_binds with direct
uses of common class methods.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Fri, 20 Oct 2023 17:49:50 +0000 (13:49 -0400)]
cephadm: move bind mounts assignment to a variable
Move the call to get_container_binds out of the function call.
This will aid with the next refactoring steps, so that the uses
of get_container_binds can be brought into the get_container call
directly.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
John Mulligan [Sat, 4 Nov 2023 20:59:33 +0000 (16:59 -0400)]
cephadm: always pass ctx to customize_container_{binds,mounts}
These functions often derive the binds and/or mounts from the context
variable. Thus we should have the base class method accept the context.
Not all subclassess will use it but it will be there for those that do.
Also, fix the type for customize_container_mounts - it should be a dict
not a list.
Signed-off-by: John Mulligan <jmulligan@redhat.com>
Adam King [Mon, 6 Nov 2023 16:19:09 +0000 (11:19 -0500)]
qa/cephadm: adjust host drain test to handle explicit placement warning
Since we're adding a warning if any host is listed explicitly
in the placement of any service when removing the host,
we need to adjust the host drain test that removes a host
without the --force flag to not have the explicit hostname
in the placement for the mon service.
and then you run `ceph orch host drain host3`, cephadm will remove
the daemon from that host and the placement would now match nothing.
This is definitely an issue that should be able to be bypassed as
it generally isn't serious, but it would be good to let users
know they have the host listed explicitly in placements like this
when they want to drain it.
Fixes: https://tracker.ceph.com/issues/63220 Signed-off-by: Adam King <adking@redhat.com>
Adam King [Tue, 7 Nov 2023 20:49:57 +0000 (15:49 -0500)]
mgr/cephadm: fix reweighting of OSD when OSD removal is stopped
Previously, when you ran "ceph orch osd rm stop <osd-id>"
cephadm would pass in a new OSD object to the removal
queue that would not have any of the fields set previously
for the OSD. This was mostly fine when removing it from
the queue as those fields were no longer needed, but an
exception was the initial weight, which you need if
you want to set the weight back when you stop removal.
This patch changes it so it will now remove the actual
OSD object the removal queue stores so that we will
get to use the previously set original weight. It also
changes when we grab the original weight to make it
happen earlier and adds it to the to_json so it survives
any potential mgr failovers.
Fixes: https://tracker.ceph.com/issues/63481 Signed-off-by: Adam King <adking@redhat.com>
John Mulligan [Tue, 7 Nov 2023 17:32:45 +0000 (12:32 -0500)]
cephadm: work around pip failure on some envs
Work around an encoding/locale issue when the dashboard tests are run
(ubuntu 20.04).
The build.py changes brought in a9d1c62ca86 were validated for package
builds, teuthology, and other CI jobs but a different error was masking
this failure in the dashboard ci job.
Signed-off-by: John Mulligan <jmulligan@redhat.com>