Adam King [Fri, 1 Apr 2022 12:20:28 +0000 (08:20 -0400)]
mgr/cephadm: make UpgradeState from_json a bit safer
This way, for downgrades to whatever versions
this lands in onward, having added new parameters to
UpgradeState shouldn't break anything. Can't do much
about downgrades to older versions from this one
but this should help in the future.
Adam King [Mon, 28 Mar 2022 16:10:15 +0000 (12:10 -0400)]
mgr/cephadm: split _do_upgrade into sub functions
This function was around 500 lines and difficult to work
with. Splitting it into sub functions should hopefully make
it a bit easier to understand and make changes to.
Zac Dover [Wed, 18 May 2022 10:36:53 +0000 (20:36 +1000)]
doc/start: s/3/three/ in intro.rst
I'm changing "3" to "three" for two reasons:
1. It's correct.
2. This allows me to test backports into Octopus, Pacific, and Quincy.
I am particularly interested to see what happens when I attempt
the backport into Octopus, because backports into Octopus have
failed. This will provide me with another unit of data.
Adam King [Thu, 18 Nov 2021 20:22:39 +0000 (15:22 -0500)]
mgr/cephadm: re-use old ip when re-adding hosts if necessary
When a host is re-added without an explicit ip we can default to the old
ip we had stored for the host rather than either keeping the loopback
address or throwing an exception. We only want to actually error when
the only options left are error or use a resolved loopback address
Redouane Kachach [Tue, 17 May 2022 15:26:39 +0000 (17:26 +0200)]
mgr/cephadm: stripping out / from the end of the url Fixes: https://tracker.ceph.com/issues/55638 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 17032f6be22e9efc3e199d7e35091025bfaae965)
mgr/cephadm: do not add _admin label when no-minimize-config is provided Fixes: https://tracker.ceph.com/issues/52727 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 01c8999d0354a71a7ef8526aab9b39e30d67c1bb)
Moritz Röhrich [Mon, 21 Mar 2022 16:32:25 +0000 (17:32 +0100)]
cephadm: avoid crashing on expected non-zero exit
- Avoid crashing when a call out to an external program expectedly does
not return exit status zero.
There are programs that communicate other information than error/no
error through exit status. E.g. `systemctl status` will return different
exit codes depending on the actual status of the units in question.
In cases where this is expected crashing with a RuntimeError exception
is inappropriate and should be avoided.
Fixes: https://tracker.ceph.com/issues/55117 Signed-off-by: Moritz Röhrich <moritz.rohrich@suse.com>
(cherry picked from commit a02be6f22fa18094cd8758700ab74581b6ce1701)
dparmar18 [Fri, 25 Mar 2022 08:18:54 +0000 (13:48 +0530)]
doc/cephfs/add-remove-mds: added cephadm note, refined "Adding an MDS"
Description: 1) Add a note about using cephadm for setting up the
cluster and mds(s), also mention the use of ceph
orchestrator if one needs to setup mds(s) manually.
2) Changed the term `data point` to `directory` in
point 1 under "Adding an MDS" section for better
clarity.
Cory Snyder [Tue, 17 May 2022 09:24:53 +0000 (05:24 -0400)]
mgr/ActivePyModules.cc: fix cases where GIL is held while attempting to lock mutex
The mgr process can deadlock if the GIL is held while attempting to lock a mutex.
Relevant regressions were introduced in commit a356bac. This fixes those regressions
and also cleans up some unnecessary yielding of the GIL.
Adam Kupczyk [Fri, 29 Apr 2022 21:32:43 +0000 (23:32 +0200)]
kv/RocksDBStore: Remove feature to make WholeSpaceIterator based on bounded iterator
Iterator-bounding feature is introduced to make RocksDB iterators limited, so they
would less likely traverse over tombstones.
This is used when listing keys in fixed range, for example OMAPS for specific object.
It is problematic when extending this logic to WholeSpaceIterator,
since prefix must be taken into account.
Fixes: https://tracker.ceph.com/issues/55444 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Adds a precondition to RocksDBStore::get_cf_handle(string, IteratorBounds)
to avoid duplicating logic of the only caller (RocksDBStore::get_iterator).
Assertions will fail if preconditions are not met.
bluestore: add config option to allow rocksdb iterator bounds to be disabled
Add osd_rocksdb_iterator_bounds_enabled config option to allow rocksdb iterator bounds to be disabled.
Also includes minor refactoring to shorten code associated with IteratorBounds initialization in bluestore.
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit ca3ccd9)
Conflicts:
src/common/options/osd.yaml.in
Cherry-pick notes:
- Conflicts due to option definition in common/options.cc in Pacific vs. common/options/osd.yaml.in in later releases
bluestore: set upper and lower bounds on rocksdb omap iterators
Limits RocksDB omap Seek operations to the relevant key range of the object's omap.
This prevents RocksDB from unnecessarily iterating over delete range tombstones in
irrelevant omap CF shards. Avoids extreme performance degradation commonly caused
by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl
over ShardMergeIteratorImpl when we can determine that all keys within specified
IteratorBounds must be in a single CF.
Nizamudeen A [Mon, 7 Feb 2022 10:53:29 +0000 (16:23 +0530)]
cephadm: change shared_folder directory for prometheus and grafana
After https://github.com/ceph/ceph/pull/44059 the monitoring/prometheus
and monitoring/grafana/dashboards directories are changed to
monitoring/ceph-mixins. That broke the shared_folders in the cephadm
bootstrap script.
Changed all the instances of monitoring/prometheus and
monitoring/grafana/dashboards to monitoring/ceph-mixins
Also, renaming all the instances of prometheus_alerts.yaml to
prometheus_alerts.yml.
Build jsonnet and jb in the testso that we can build ceph without
internet access and still be able to run the test needed for monitoring
using jsonnet tools.
spec: debian: monitoring: build jsonnet from source to use 0.18.0
As this new version is recently released it's still not in every distro
we use. We now build jsonnet from source so that we can use this new
version of jsonnet. This commit could be reverted later on when the new
version would be available everywhere.
mgr/dashboard: monitoring: refactor into ceph-mixin
Mixin is a way to bundle dashboards, prometheus rules and alerts into
jsonnet package. Shifting to mixin will allow easier integration with
monitoring automation that some users may use.
This commit moves `/monitoring/grafana/dashboards` and
`/monitoring/prometheus` to `/monitoring/ceph-mixin`. Prometheus alerts
was also converted to Jsonnet using an automated way (from yaml to json
to jsonnet). This commit minimises any change made to the generated files
and should not change neithers the dashboards nor the Prometheus alerts.
In the future some configuration will also be added to jsonnet to add
more functionalities to the dashboards or alerts (i.e.: multi cluster).
Fixes: https://tracker.ceph.com/issues/53374 Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
(cherry picked from commit 98236e3a1d2855c95d86640645c2984efa83791f)
Adds a precondition to RocksDBStore::get_cf_handle(string, IteratorBounds)
to avoid duplicating logic of the only caller (RocksDBStore::get_iterator).
Assertions will fail if preconditions are not met.
cmake/modules: use exact version of python3 when finding cython
* CMakeLists.txt:
always pass "EXACT" to find_package(Python3).
because per cmake document, "EXACT" only takes effect when
<Package>_FIND_VERSION_COUNT is greater than 1, where <Package>
is "Python3". see also cmake/modules/FindPython/Support.cmake
* cmake/modules/AddCephTest.cmake:
drop redundant find_package(Python3) calls. since Python3 is
a mandatory requirement for building Ceph, we only need a
single call of find_package(Python3..) in the top of the source
tree. the only possible case to repeat it is to ensure that we
have the correct version of Python3 used in following CMake
script. but there is no need to repeat it if we just want to
ensure that we have a python3 interpretor in place.
* cmake/modules/Distutils.cmake:
always pass "EXACT" to find_package(Python3).
we should always pass EXACT to find_package() when finding python3,
this is a follow-up of e2babdfae8c99f39f99a7c8a8f966299b2e62b19
bluestore: add config option to allow rocksdb iterator bounds to be disabled
Add osd_rocksdb_iterator_bounds_enabled config option to allow rocksdb iterator bounds to be disabled.
Also includes minor refactoring to shorten code associated with IteratorBounds initialization in bluestore.
Signed-off-by: Cory Snyder <csnyder@iland.com>
(cherry picked from commit ca3ccd9)
Conflicts:
src/common/options/osd.yaml.in
Cherry-pick notes:
- Conflicts due to option definition in common/options.cc in Pacific vs. common/options/osd.yaml.in in later releases
cmake/modules: always use the python3 specified in command line
if another python3 with higher version is found by
find_package(Python3), the cmake's install script would just
install the python modules/extensions into that python3's
dist-package directory, and the packaging script would fail
to find these artifacts when trying to package them.
so we need to ensure that the install directories for python
modeules/extensions are always "versioned" with WITH_PYTHON3
cmake option.
bluestore: set upper and lower bounds on rocksdb omap iterators
Limits RocksDB omap Seek operations to the relevant key range of the object's omap.
This prevents RocksDB from unnecessarily iterating over delete range tombstones in
irrelevant omap CF shards. Avoids extreme performance degradation commonly caused
by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl
over ShardMergeIteratorImpl when we can determine that all keys within specified
IteratorBounds must be in a single CF.
Volker Theile [Wed, 30 Mar 2022 11:38:33 +0000 (13:38 +0200)]
mgr/dashboard: Imrove error message of '/api/grafana/validation' API endpoint
In case the validation of the Grafana URL fails, e.g. because of an invalid SSL certificate, a useless and not helping default error message is displayed in the UI.
This PR will re-raise the exception as a DashboardException which includes the detailed description of what happened. This will help to identify SSL cert issues much easier for example.
qa: adjust for old snapshot counts during comparison
This is pacific only commit since in master, the snap-schedule module
uses vfs-ceph backed libcephsqlite which seems to preserve the
snapshots stats (created_count, etc..) on ceph-mgr restarts. Pacific
uses in-memory db (serialized to a RADOS object) which seems to
reset these stats when ceph-mgr is restarted.
Also, remove `db_count' assert check as it doesn't make sense.
Adam King [Wed, 6 Apr 2022 14:32:22 +0000 (10:32 -0400)]
mgr/cephadm: allow setting insecure_skip_verify for alertmanager
Add a "secure" parameter to alertmanager spec that will cause it
to deploy alertmanagers with insecure_skip_verify as true or false
depending on the value given for "secure".
NOTE: alertmanager must still be reconfigured after applying a yaml
with this option changed.
Fixes: https://tracker.ceph.com/issues/55272 Fixes: https://tracker.ceph.com/issues/55333 Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit e583d4ef1ac23a7473d50d253e0edf70580542ae)
windgmbh [Fri, 12 Nov 2021 15:51:03 +0000 (16:51 +0100)]
Apply sysctl.d migration from /usr/lib to /etc
A fix regarding the SYSCTL_DIR location (#53130) requires to migrate
sysctl.d/*.conf files from /usr/lib to /etc. Signed-off-by: Lukas Mayer <lmayer@wind.gmbh>
(cherry picked from commit a167a27f30536958e0f2c513d351642e81ba06d5)
windgmbh [Wed, 3 Nov 2021 17:16:53 +0000 (18:16 +0100)]
Fix sysctl.d location FHS compliance
This fixes #53130
Containers should not write to '/usr/lib'.
That location could be read-only or overwritten. Signed-off-by: Lukas Mayer <lmayer@wind.gmbh>
(cherry picked from commit 77afa812ea8b7e1e802246e4aa3a31e7b644a502)
Melissa Li [Wed, 23 Mar 2022 15:38:37 +0000 (11:38 -0400)]
cephadm: show error message if private registry credentials not provided
Raise UnauthorizedRegistryError in `_pull_image` if user tries to pull from a private registry without authentication, handle error in `command_bootstrap`, `commond_adopt`, `command_pull`
Fixes: https://tracker.ceph.com/issues/55015 Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 4de0803ba893abf341ab634d1382208370de7c98)
Adam King [Thu, 24 Mar 2022 13:59:10 +0000 (09:59 -0400)]
cephadm: pass "--security-opt label=disable" to node-exporter container
in order to support setting '--path.procfs=/host/proc','--path.sysfs=/host/sys',
'--path.rootfs=/rootfs' for node-exporter we need to disable selinux separation
between the node-exporter container and the host to avoid selinux denials