mgr/cephadm: Adding support to store ceph conf per cluster fsid Fixes: https://tracker.ceph.com/issues/55185 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 2ea76173a163a93bbfbf69d0faa732d46eaf05ba)
mgr/cephadm: do not add _admin label when no-minimize-config is provided Fixes: https://tracker.ceph.com/issues/52727 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 01c8999d0354a71a7ef8526aab9b39e30d67c1bb)
Redouane Kachach [Wed, 23 Mar 2022 17:24:01 +0000 (18:24 +0100)]
mgr/cephadm: Adding image tag and date to cephadm startup messages Fixes: https://tracker.ceph.com/issues/55008 Fixes: https://tracker.ceph.com/issues/54373 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 92ecb58d46b6f75265a664f3165f4b3a0dd4993a)
Adam King [Wed, 6 Apr 2022 14:32:22 +0000 (10:32 -0400)]
mgr/cephadm: allow setting insecure_skip_verify for alertmanager
Add a "secure" parameter to alertmanager spec that will cause it
to deploy alertmanagers with insecure_skip_verify as true or false
depending on the value given for "secure".
NOTE: alertmanager must still be reconfigured after applying a yaml
with this option changed.
Fixes: https://tracker.ceph.com/issues/55272 Fixes: https://tracker.ceph.com/issues/55333 Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit e583d4ef1ac23a7473d50d253e0edf70580542ae)
Moritz Röhrich [Mon, 21 Mar 2022 16:32:25 +0000 (17:32 +0100)]
cephadm: avoid crashing on expected non-zero exit
- Avoid crashing when a call out to an external program expectedly does
not return exit status zero.
There are programs that communicate other information than error/no
error through exit status. E.g. `systemctl status` will return different
exit codes depending on the actual status of the units in question.
In cases where this is expected crashing with a RuntimeError exception
is inappropriate and should be avoided.
Fixes: https://tracker.ceph.com/issues/55117 Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
(cherry picked from commit a02be6f22fa18094cd8758700ab74581b6ce1701)
Melissa Li [Wed, 23 Mar 2022 15:38:37 +0000 (11:38 -0400)]
cephadm: show error message if private registry credentials not provided
Raise UnauthorizedRegistryError in `_pull_image` if user tries to pull from a private registry without authentication, handle error in `command_bootstrap`, `commond_adopt`, `command_pull`
Fixes: https://tracker.ceph.com/issues/55015 Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 4de0803ba893abf341ab634d1382208370de7c98)
mgr/cephadm: support non-root ssh-user w permissions
Restructured code, so that in case of non-root, the resulting file will
be created with permissions set to the ssh-user. This allows the
subsequent scp to be able to write the file. The remaining code kept the
same, so that file permissions are restored to the expected ones, but
just runs after the scp.
Fixes: https://tracker.ceph.com/issues/54620 Signed-off-by: Christoph Glaubitz <c.glaubitz@syseleven.de>
(cherry picked from commit 452e52a7e39409e3409d59940133333416b830bc)
ceph-volume/tests: reject loop devices in lvm.conf
The current task doesn't works (typo?).
Otherwise api/lvm.py can't work properly, functions such as
`get_single_lv()` and many other don't return the expected results.
Indeed, lvm is confused because of the nvme_loop setup.
This adds the support of complex OSD creation with command
`orch daemon add osd`.
Any argument supported by `DriveGroupSpec()` can be passed on the command line.
Cephadm shouldn't try to deploy a disk reported as unavailable by ceph-volume.
The idea here is to check the rejection reason so we can still use DB devices
in case of OSD replacement.
Redouane Kachach [Fri, 11 Mar 2022 11:41:18 +0000 (12:41 +0100)]
mgr/cephadm: Show warning when user provides --fsid option Fixes: https://tracker.ceph.com/issues/50804 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 8780aa04651fa2cddeec1d9d2dfcf4e08412d4ce)
mgr/cephadm: checking service name before removal Fixes: https://tracker.ceph.com/issues/54503 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit b26c114c8456941d6cccf7d4355445f21cb373a7)
Joseph Sawaya [Fri, 11 Mar 2022 20:45:16 +0000 (15:45 -0500)]
doc: Add note to osds_per_device description about dual-actuator devices
This commit adds information about using dual-actuator devices with the
osds_per_device drive group option, letting users know they can create
an OSD for each actuator by setting this value to 2 in the drive group
they're using to apply OSDs to the device.
Redouane Kachach [Mon, 14 Feb 2022 14:17:31 +0000 (15:17 +0100)]
mgr/cephadm: Show an error when invalid format Fixes: https://tracker.ceph.com/issues/54198 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit d2fa22fd77d4a20f5db0d9315e5efebb016de481)
mgr/cephadm: Adding AGE field to device ls cmd Fixes: https://tracker.ceph.com/issues/53540 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c5b3e86f9b8ae0ca3ae41798dfa18e9ffe9fcb7)
Redouane Kachach [Tue, 25 Jan 2022 17:27:56 +0000 (18:27 +0100)]
mgr/cephadm: Adding logic to cleanup several dirs after an rm-cluster Fixes: https://tracker.ceph.com/issues/53010
https://tracker.ceph.com/issues/53815 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 0df6c04d8f99692a542b49c22bddbf12510801a5)
cmake/modules: use exact version of python3 when finding cython
* CMakeLists.txt:
always pass "EXACT" to find_package(Python3).
because per cmake document, "EXACT" only takes effect when
<Package>_FIND_VERSION_COUNT is greater than 1, where <Package>
is "Python3". see also cmake/modules/FindPython/Support.cmake
* cmake/modules/AddCephTest.cmake:
drop redundant find_package(Python3) calls. since Python3 is
a mandatory requirement for building Ceph, we only need a
single call of find_package(Python3..) in the top of the source
tree. the only possible case to repeat it is to ensure that we
have the correct version of Python3 used in following CMake
script. but there is no need to repeat it if we just want to
ensure that we have a python3 interpretor in place.
* cmake/modules/Distutils.cmake:
always pass "EXACT" to find_package(Python3).
we should always pass EXACT to find_package() when finding python3,
this is a follow-up of e2babdfae8c99f39f99a7c8a8f966299b2e62b19
This was attempted in commit 69a7ed4eab36 ("run-make-check: enable
WITH_RBD_RWL when WITH_PMEM is true") but never completed. We soon
bumped the requirement on libpmem, so WITH_SYSTEM_PMDK=ON wouldn't
have worked anyway.
Enable the RWL mode conditionally based on WITH_RBD_RWL variable.
Enable the SSD mode unconditionally as it has no special dependencies
and can be built on any architecture.
test/encoding/check-generated.sh: show diff if binary reencode check fails
Take bf0b161115aa ("test/encoding/check-generated.sh: show diff if cmp
fails") a bit further. Suggesting "cmp $tmp1 $tmp2" isn't very helpful
since cmp would report just the mismatch offset.
librbd/cache/pwl: WriteLogCacheEntry constructor must initialize flags
Initializing the individual bit field members leaves the remaining two
bits uninitialized and that garbage state gets persisted.
In general, using bit fields in a structure where the layout actually
matters is not desirable. Even with a few single bits, such as here,
their order, strictly speaking, is not guaranteed:
An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains, a bit-field
that immediately follows another bit-field in a structure shall be
packed into adjacent bits of the same unit. If insufficient space
remains, whether a bit-field that does not fit is put into the next
unit or overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order to
low-order or low-order to high-order) is implementation-defined.
The alignment of the addressable storage unit is unspecified.
cmake/modules: always use the python3 specified in command line
if another python3 with higher version is found by
find_package(Python3), the cmake's install script would just
install the python modules/extensions into that python3's
dist-package directory, and the packaging script would fail
to find these artifacts when trying to package them.
so we need to ensure that the install directories for python
modeules/extensions are always "versioned" with WITH_PYTHON3
cmake option.
Jos Collin [Thu, 24 Mar 2022 04:57:58 +0000 (10:27 +0530)]
qa: make test_perf_stats_stale_metrics check only the clients created for the tests
Uses the client's global id to get the metrics, instead of using the index.
This ensures that test_perf_stats_stale_metrics checks only the clients mounted for
the tests.
Fixes: https://tracker.ceph.com/issues/54971 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit 1621308214cd18750c8be803fc014bdf73e2218a)
Jos Collin [Tue, 29 Jun 2021 09:59:01 +0000 (15:29 +0530)]
mgr, pybind/mgr, mgr/stats: be resilient to offline rank0 MDS
Reregister the user queries during the rank0 MDS failover event
by calling listener.handle_query_updated(). This enables
`ceph fs perf stats` to receive the updated metrics after the
failover.
Fixes: https://tracker.ceph.com/issues/50033 Signed-off-by: Jos Collin <jcollin@redhat.com>
(cherry picked from commit c2470f271cce4d512f2cf00552c9b753e4c69f71)
1. Created subvolume
2. Written some I/O on the subvolume
3. Create snapshot of the subvolume
4. Create clone of the snapshot
5. Delete snapshot from back end (don't use subvolume interface) before
clone completes
6. Delete clone with force
7. Delete subvolume
8. Delete fs and associated pools
9. Created new fs
10 Created new subvolume,
11. Written some I/O on the subvolume
12. Create snapshot of the subvolume
13. Create clone of the snapshot <---------------THIS OPERATION HANGS -----------------
Root Cause:
Since the snapshot is deleted from the back end, the clone fails. But it
also fails to remove the clone index at '/volumes/_index/clone'. The
cloner thread goes to infinite loop of starting the clone and failing.
This involves taking 'self.async_job.lock()' and reads the clone index
to get the job and registers the above job.
While the 'cloner thread' is in above loop, the fs is destroyed. The
cloner threads which lives till the mgr/volumes is enabled in mgr, takes
the 'self.async_job.lock()' and hangs while reading the clone index.
Any further clone operations which also requires above lock hangs.
Fix:
Remove the clone index even though snapshot is not present.
cmake: resurrect mutex debugging in all Debug builds
Commit 403f1ec2888a ("cmake: make "WITH_CEPH_DEBUG_MUTEX" depend on
CMAKE_BUILD_TYPE") made WITH_CEPH_DEBUG_MUTEX depend on build type
being set to Debug, in CMakeLists.txt. However, if CMAKE_BUILD_TYPE
isn't specified by the user, we may still set it to Debug later, in
src/CMakeLists.txt, and in that case WITH_CEPH_DEBUG_MUTEX doesn't
get enabled. The result is that
$ do_cmake.sh -DCMAKE_BUILD_TYPE=Debug ...
debug builds have mutex debugging enabled, while
$ do_cmake.sh ...
builds, which are supposed to be the same, don't. Jenkins builders
don't pass -DCMAKE_BUILD_TYPE=Debug so that commit effectively turned
off all ceph_mutex_is_locked* asserts in "make check".
test/rbd_mirror: grab timer lock before calling add_event_after()
add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").