bluestore: set upper and lower bounds on rocksdb omap iterators
Limits RocksDB omap Seek operations to the relevant key range of the object's omap.
This prevents RocksDB from unnecessarily iterating over delete range tombstones in
irrelevant omap CF shards. Avoids extreme performance degradation commonly caused
by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl
over ShardMergeIteratorImpl when we can determine that all keys within specified
IteratorBounds must be in a single CF.
qa: adjust for old snapshot counts during comparison
This is pacific only commit since in master, the snap-schedule module
uses vfs-ceph backed libcephsqlite which seems to preserve the
snapshots stats (created_count, etc..) on ceph-mgr restarts. Pacific
uses in-memory db (serialized to a RADOS object) which seems to
reset these stats when ceph-mgr is restarted.
Also, remove `db_count' assert check as it doesn't make sense.
Adam King [Wed, 6 Apr 2022 14:32:22 +0000 (10:32 -0400)]
mgr/cephadm: allow setting insecure_skip_verify for alertmanager
Add a "secure" parameter to alertmanager spec that will cause it
to deploy alertmanagers with insecure_skip_verify as true or false
depending on the value given for "secure".
NOTE: alertmanager must still be reconfigured after applying a yaml
with this option changed.
Fixes: https://tracker.ceph.com/issues/55272 Fixes: https://tracker.ceph.com/issues/55333 Signed-off-by: Adam King <adking@redhat.com>
(cherry picked from commit e583d4ef1ac23a7473d50d253e0edf70580542ae)
windgmbh [Fri, 12 Nov 2021 15:51:03 +0000 (16:51 +0100)]
Apply sysctl.d migration from /usr/lib to /etc
A fix regarding the SYSCTL_DIR location (#53130) requires to migrate
sysctl.d/*.conf files from /usr/lib to /etc. Signed-off-by: Lukas Mayer <lmayer@wind.gmbh>
(cherry picked from commit a167a27f30536958e0f2c513d351642e81ba06d5)
windgmbh [Wed, 3 Nov 2021 17:16:53 +0000 (18:16 +0100)]
Fix sysctl.d location FHS compliance
This fixes #53130
Containers should not write to '/usr/lib'.
That location could be read-only or overwritten. Signed-off-by: Lukas Mayer <lmayer@wind.gmbh>
(cherry picked from commit 77afa812ea8b7e1e802246e4aa3a31e7b644a502)
Melissa Li [Wed, 23 Mar 2022 15:38:37 +0000 (11:38 -0400)]
cephadm: show error message if private registry credentials not provided
Raise UnauthorizedRegistryError in `_pull_image` if user tries to pull from a private registry without authentication, handle error in `command_bootstrap`, `commond_adopt`, `command_pull`
Fixes: https://tracker.ceph.com/issues/55015 Signed-off-by: Melissa Li <melissali@redhat.com>
(cherry picked from commit 4de0803ba893abf341ab634d1382208370de7c98)
Adam King [Thu, 24 Mar 2022 13:59:10 +0000 (09:59 -0400)]
cephadm: pass "--security-opt label=disable" to node-exporter container
in order to support setting '--path.procfs=/host/proc','--path.sysfs=/host/sys',
'--path.rootfs=/rootfs' for node-exporter we need to disable selinux separation
between the node-exporter container and the host to avoid selinux denials
mgr/cephadm: Adding AGE field to device ls cmd Fixes: https://tracker.ceph.com/issues/53540 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit 1c5b3e86f9b8ae0ca3ae41798dfa18e9ffe9fcb7)
Milind Changire [Wed, 24 Nov 2021 08:06:30 +0000 (13:36 +0530)]
qa: add test for concurrent snap creates
Test if the number of snaps on the file-system and the stats on created
snaps in the DB match.
NOTE:
Since it is difficult to get the snapshot created on the exact second,
the timestamp comparison has been limited up to the last 'minute' as the
comparison granularity.
Conflicts:
src/pybind/mgr/snap_schedule/fs/schedule.py
src/pybind/mgr/snap_schedule/fs/schedule_client.py
- changes related to DBConnectionManager to serialize
db interactions
test/rbd_mirror: grab timer lock before calling add_event_after()
add_event_after() expects an externally provided mutex to be held
for the call. This was missed in commit 8965a0f2a6f7 ("rbd-mirror:
synchronize with in-flight stop in ImageReplayer::stop()").
librbd/cache/pwl: remove RBD_FEATURE_DIRTY_CACHE check in DiscardRequest
"m_image_ctx.features &&RBD_FEATURE_DIRTY_CACHE" is obviously wrong
because it would pretty much always be true. However, even if bitwise
AND was used, this check would still be dead because DiscardRequest is
only invoked if RBD_FEATURE_DIRTY_CACHE is enabled:
int invalidate_cache(ImageCtx *ictx) {
{
...
// Delete writeback cache if it is not initialized
if ((!ictx->exclusive_lock ||
!ictx->exclusive_lock->is_lock_owner()) &&
ictx->test_features(RBD_FEATURE_DIRTY_CACHE)) {
C_SaferCond ctx3;
ictx->plugin_registry->discard(&ctx3);
r = ctx3.wait();
}
librbd/cache/pwl: don't crash if cache file removal fails
The non-ec overload will throw fs::filesystem_error on any error
(e.g. EPERM due to unprivileged "rbd persistent-cache invalidate"
being brought up against a privileged workload).
Yin Congmin [Wed, 22 Dec 2021 07:07:11 +0000 (15:07 +0800)]
librbd/cache/pwl: rename persistent cache key
librbd "internal" metadata keys was change to ".rbd" prefix. Change
peristent cache to ".rbd" too.
And the name of persistent cache key is IMAGE_CACHE_STATE. Since
this key is planned to be used outside the pwl directory, it seems
more appropriate to change it to a clear name as PERSISTENT_CACHE_STATE.
librbd/cache/pwl: avoid inconsistencies in ImageCacheState
When empty and/or clean bools are updated in I/O handling code paths,
ImageCacheState becomes inconistent for a short while: e.g. with clean
transitioned to true, dirty_bytes counter could still be positive
because the counters are updated only in periodic_stats(). Move to
updating the counters in update_image_cache_state(Context*) to avoid
this.
update_image_cache_state(Context*) now requires m_lock -- most call
sites already hold it anyway. The only problematic call site was
AbstractWriteLog::shut_down() callback chain: perf_stop() needed to
be moved to the very end since perf counters must be alive now for
update_image_cache_state() to work.
Don't override expect_op_work_queue() in unit tests: completing
context in the same thread now results in a deadlock on m_lock in
all test cases that call AbstractWriteLog::init().
get_json_format() and create_image_cache_state() attempt to get
particular keys which could result in an unhandled std::runtime_error
exception. Conversely, ImageCacheState constructor just swallows that
exception which could leave the newly constructed object incorrectly
initialized. Avoid doing parsing in the constructor and introduce
init_from_config() and init_from_metadata() methods instead.
While at it, move everything out from under "persistent_cache" key.
Also fix init_state_json_write test case which stopped working now
that types are enforced by json_spirit.
Yin Congmin [Tue, 29 Mar 2022 08:59:05 +0000 (16:59 +0800)]
librbd/cache/pwl: add basic metrics to ImageCacheState
Add basic metrics to ImageCacheState and persist them, including
allocated_bytes, cached_bytes, dirty_bytes, free_bytes and hit/miss
info.
Leverage periodic_stats() timer to call update_image_cache_state.
In order to avoid outputting too much debug information, the original
statistics output log level is changed to 5.
Switch to json_spirit for encoding because encode_json encodes bool as
"true"/"false" string.
Remove rbd_persistent_cache_log_periodic_stats option because we need
to always update cache state.
[ idryomov: add cached_bytes and hits_partial; report misses and
miss_bytes instead of respective totals; naming ]
Ernesto Puerta [Fri, 25 Mar 2022 15:26:48 +0000 (16:26 +0100)]
mgr/dashboard: fix api test issue with pip
Fix
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
apache-libcloud 3.5.0 requires requests>=2.26.0, but you have requests 2.25.1 which is incompatible.
Successfully installed CherryPy-13.1.0 PyJWT-2.0.1 Routes-2.4.1 bcrypt-3.1.4 ceph-1.0.0 chardet-4.0.0 cheroot-8.6.0 idna-2.10 jaraco.functools-3.5.0 more-itertools-4.1.0 natsort-8.1.0 portend-3.1.0 pyopenssl-22.0.0 pytz-2022.1 repoze.lru-0.7 requests-2.25.1 tempora-5.0.1
```
Kefu Chai [Sat, 5 Mar 2022 17:44:30 +0000 (01:44 +0800)]
admin/doc-requirements: bump sphinx to 4.4.0
bump sphinx to latest stable. to address following build failure
ERROR: sphinx-autodoc-typehints 1.17.0 has requirement Sphinx>=4, but you'll have sphinx 3.5.4 which is incompatible.
ERROR: sphinx-substitution-extensions 2022.2.16 has requirement sphinx>=4.0.0, but you'll have sphinx 3.5.4 which is incompatible.
also bump bump sphinx-rtd-theme, otherwise we'd have following
build failure:
ERROR: sphinx-rtd-theme 0.5.2 has requirement docutils<0.17, but you'll have docutils 0.17.1 which is incompatible.
Kefu Chai [Sun, 6 Mar 2022 06:23:42 +0000 (14:23 +0800)]
mgr/cephadm: add empty line after param list in docstring
this helps to silence the warning from sphinx, like
src/pybind/mgr/orchestrator/_interface.py:docstring of orchestrator._interface.Orchestrator.remove_osds:9: WARNING: Field list ends without a blank line; unexpected unindent.
Kefu Chai [Sun, 6 Mar 2022 06:27:50 +0000 (14:27 +0800)]
doc/conf.py: silence warnings from breathe
breathe calls doxygen for extracting/generating docs from code.
while doxygen complains at seeing undocumented fields/func. these
warnings could fail the sphinx-build command, if it takes warnings
as errors.
Nizamudeen A [Wed, 6 Apr 2022 07:39:26 +0000 (13:09 +0530)]
build: install-deps failing in docker build
install-deps.sh was failing in our docker build due to the recent change in
the script. Failure can be seen here: https://github.com/rhcs-dashboard/ceph-dev/runs/5844502455?check_suite_focus=true#step:3:2586
This seems to fix the issue.
Signed-off-by: Nizamudeen A <nia@redhat.com>
(cherry picked from commit 72841fdcbe5445b5f5ada5d244d497f0b3f04e4f) Signed-off-by: Ernesto Puerta <epuertat@redhat.com>
Zac Dover [Tue, 8 Jun 2021 15:57:13 +0000 (01:57 +1000)]
doc/dev: s/reposotory/repository/ (really)
This corrects the heinous misspelling described in the
substitution expression in the title. This misspelling is
all the more egregious because it appears in a title, and
therefore would be used to create links if it had not been
caught.
Adam King [Fri, 4 Mar 2022 02:47:47 +0000 (21:47 -0500)]
mgr/cephadm: offline host watcher
To be able to detect if certain offline hosts go
offline quicker. Could be useful for the NFS
HA feature as this requires moving nfs daemons from
offline hosts within 90 seconds.
Adam King [Tue, 22 Mar 2022 22:57:21 +0000 (18:57 -0400)]
mgr/cephadm: Reschedule nfs daemons from offline hosts
In order to improve nfs availability, if there are other
hosts we can place an nfs daemon on or if there is a host
with a lower rank nfs daemon when a higher rank one is on
an offline host, we should reschedule the nfs daemons
mgr/cephadm: checking service name before removal Fixes: https://tracker.ceph.com/issues/54503 Signed-off-by: Redouane Kachach <rkachach@redhat.com>
(cherry picked from commit b26c114c8456941d6cccf7d4355445f21cb373a7)
Adam King [Thu, 10 Feb 2022 01:42:42 +0000 (20:42 -0500)]
qa/tasks/cephadm_cases: increase timeouts in test_cli.py
These seem to be failing sometimes but in my testing
sometimes these events are happening a few seconds after
we hit the timeout. Trying to see if this makes the tests
more consistent. No need to mark the test as failed
if we report something up in 34 seconds vs 25 especially
when cephadm works on a cyclic daemon refresh.
Ronen Friedman [Fri, 25 Mar 2022 10:45:47 +0000 (10:45 +0000)]
pacific: osd/scrub: restart snap trimming only after scrubbing is done
Snap trimming that was postponed as the target PG was scrubbing
must be restarted at scrub completion.
PR #38111 moved trimming restart to just before the scrub fully
terminated. The current PR fixes that.
Trimming is also restarted in those cases where scrub was
queued but aborted immediately.
Yaarit Hatuka [Tue, 9 Nov 2021 18:31:11 +0000 (18:31 +0000)]
mgr/telemetry: fix waiting for mgr to warm up
1. The implementation of config_notify() in telemetry module sets the
flag for event, which is supposed to wake up the 'serve' thread whenever
a config option is changed. The problem is that we call config_notify()
at the beginning of serve(), before we enter its 'run' loop. This call
sets the event which cancels the 10 seconds wait for the mgr to warm up.
To fix this, we extract the logic of updating the config options to a
separate function (config_update_module_option()), and call it on
__init__, instead of calling config_notify() in serve().
2. We should always wait for the mgr to warm up here (10 seconds). In
case of a sporadic event (e.g. a config option change via CLI) the event
will be set, and wait will return immediately. We enforce this wait by
using time.sleep(10) instead of event.wait(10).
Conflict resolution:
- manually removing some scrub scheduling changes from
PR #40984
- pg_scrubber.h: removing some irrelevant lines that were dragged
in.
- PG.h: restoring lines removed by the merge.