Dan van der Ster [Thu, 12 Nov 2020 16:14:37 +0000 (17:14 +0100)]
common/options: bluefs_buffered_io=true by default
Enable bluefs_buffered_io again because it makes a huge user-visible
improvement in metadata intensive scenarios, such as but not limited to
PG deletion.
In our environment, deleting PGs from 4 hybrid OSDs (sharing one SATA SSD block.db) saturates
the block.db at 350MB/s reads and causes slow reqs and flapping on the OSDs.
Those OSDs have 3GB osd_target_memory.
Enabling bluefs_buffered_io drops the SSD IO down to <1MBps and the OSDs
are performant again. (The underlying PG deletion inefficiency is being
solved separately, but the page cache is so much more effective than
the bluestore cache in this scenario).
Lastly, remove the comment about swap. We should separately advise
operators to disable swap on OSD machines, as it is much better in
our experience to OOM and restart than to chug along swapping.
Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch> Related-to: https://tracker.ceph.com/issues/45765 Related-to: https://tracker.ceph.com/issues/47044
(cherry picked from commit 5ec8e8e63d409860c35e24a192090ac2b70af8f6)
Conflicts:
cmake/modules/CephChecks.cmake
src/test/fio/CMakeLists.txt: check gettid() in /CMakeLists.txt
instead, as nautilus does not have cmake/modules/CephChecks.cmake by
then.
Kefu Chai [Sat, 20 Mar 2021 05:00:01 +0000 (13:00 +0800)]
install-deps.sh: remove existing ceph-libboost of different version
we install different versions of precompiled ceph-libboost packages
for different branches when building and testing them on ubuntu test
nodes. for instance,
- nautilus, octopus: v1.72
- pacific: v1.73
they share the same set of test nodes. and these ceph-libboost packages
conflict with each other, because they install files to the same places.
in order to avoid the confliction, we should uninstall existing packages
before installing a different version of ceph-libboost packages.
ceph-libboost${version}-dev is a package providing the shared headers of
boost library, so, in this change we check if it is installed before
returning or removing the existing packages.
Ilya Dryomov [Wed, 17 Mar 2021 10:00:33 +0000 (11:00 +0100)]
qa: krbd_blkroset.t: update for separate hw and user read-only flags
Since kernel 5.12, hardware read-only state and user read-only
policy (BLKROGET/SET ioctls) are tracked separately in the block
layer. As the purpose of our ->set_read_only() method was exactly
that, it was removed.
As a side effect, BLKROSET no longer returns EROFS on an attempt
to make a read-only mapping read-write with "blockdev --setrw".
The policy gets updated, but the device remains read-only as before
because the hardware (== mapping) state is controlled by the driver.
Kefu Chai [Sat, 6 Mar 2021 16:32:42 +0000 (00:32 +0800)]
.github: correct the regex in mileston workflow
also use pull_request_target event so the action is run in the
context of the base of the pull request. this helps us to overcome
the "Resource not accessible by integration" issue where the action
is run in the context of the pull request.
BuildBoost.cmake (used when we're building the submodule) doesn't
provide parity with FindBoost.cmake (used with system Boost).
Specifically, it doesn't set the _FOUND variables for the various
components, making it hard to depend on finding those features.
Set Boost_<component>_FOUND for all the components we're building in
BuildBoost.cmake to make using these variables possible.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
(cherry picked from commit 0f4cb207bb4a9905619894286edd41a89379a747) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
(cherry picked from commit 4ca4201b7fe3e0ca172548204b4b888a0908d162) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/cls/CMakeLists.txt
src/test/rgw/CMakeLists.txt
- Add spawn headers to includes to fix the two build errors below.
No linking is needed since the files don't use 'spawn::' at all.
In file included from /git/ceph/src/rgw/rgw_common.h:31:0,
from /git/ceph/src/cls/otp/cls_otp_client.cc:25:
/git/ceph/src/common/async/yield_context.h:31:10: fatal error: spawn/spawn.hpp: No such file or directory
#include <spawn/spawn.hpp>
^~~~~~~~~~~~~~~~~
compilation terminated.
src/cls/CMakeFiles/cls_otp_client.dir/build.make:62: recipe for target 'src/cls/CMakeFiles/cls_otp_client.dir/otp/cls_otp_client.cc.o' failed
In file included from /git/ceph/src/rgw/rgw_dmclock_scheduler.h:21:0,
from /git/ceph/src/rgw/rgw_dmclock_sync_scheduler.h:18,
from /git/ceph/src/test/rgw/test_rgw_dmclock_scheduler.cc:17:
/git/ceph/src/common/async/yield_context.h:31:10: fatal error: spawn/spawn.hpp: No such file or directory
#include <spawn/spawn.hpp>
^~~~~~~~~~~~~~~~~
compilation terminated.
src/test/rgw/CMakeFiles/unittest_rgw_dmclock_scheduler.dir/build.make:62: recipe for target 'src/test/rgw/CMakeFiles/unittest_rgw_dmclock_scheduler.dir/test_rgw_dmclock_scheduler.cc.o' failed
Casey Bodley [Wed, 6 Nov 2019 20:57:01 +0000 (15:57 -0500)]
rgw: use new spawn() implementation
Signed-off-by: Casey Bodley <cbodley@redhat.com>
(cherry picked from commit 769841a08c3e79985d9634f06c9ff4d62647dcda) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
Conflicts:
src/rgw/CMakeLists.txt
- Remove changes for 'rgw_schedulers' cmake target, not in Nautilus.
- Link 'radosgw_a' against 'spawn'; transitivity from 'rgw_schedulers'
(which is public) is lost, and 'rgw_a'/'rgw_libs' (which is private
to 'radosgw_a') isn't enough to build 'rgw_main.cc' ( error below.)
src/rgw/rgw_aio.cc
- This file doesn't exist in Nautilus; similar changes are done in
other files.
src/rgw/rgw_aio_throttle.h
- No changes required; the base for the changes (e.g., class, variables)
are not in Nautilus.
src/rgw/rgw_asio_frontend.cc
- Less changes required, similarly; commit dd4350b not in Nautilus.
Build error:
In file included from /git/ceph/src/rgw/rgw_common.h:31:0,
from /git/ceph/src/rgw/rgw_main.cc:15:
/git/ceph/src/common/async/yield_context.h:31:10: fatal error: spawn/spawn.hpp: No such file or directory
#include <spawn/spawn.hpp>
^~~~~~~~~~~~~~~~~
compilation terminated.
src/rgw/CMakeFiles/radosgw.dir/build.make:62: recipe for target 'src/rgw/CMakeFiles/radosgw.dir/rgw_main.cc.o' failed
Igor Fedotov [Fri, 26 Feb 2021 14:16:11 +0000 (17:16 +0300)]
os/bluestore: go beyond pinned onodes while trimming the cache.
One might face lack of cache trimming when there is a bunch of pinned entries on the top of Onode's cache LRU list. If these pinned entries stay in the state for a long time cache might start using too much memory causing OSD to go out of osd-memory-target limit. Pinned state tend to happen to osdmap onodes.
The proposed patch preserves last trim position in the LRU list (if it pointed to a pinned entry) and proceeds trimming from that position if it wasn't invalidated. LRU nature of the list enables to do that safely since no new entries appear above the previously present entry while it's not touched.
Fixes: https://tracker.ceph.com/issues/48729 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Adam Kupczyk [Sat, 30 Jan 2021 11:57:05 +0000 (12:57 +0100)]
os/bluestore: Add option to check BlueFS reads
Add option "bluefs_check_for_zeros" to check if there are any zero-filled page.
If so, reread data. It is known that sometimes BlueStore gets such pages.
See "bluestore_retry_disk_reads".
- docstring added to describe the link to mgr/prometheus conflicted with the
const fmt definition for the message. resolved by adding doc under the const
definition.
Paul Cuzner [Thu, 8 Oct 2020 03:30:56 +0000 (16:30 +1300)]
mgr/prometheus: Add healthcheck metric for SLOW_OPS
SLOW_OPS is triggered by op tracker, and generates a health
alert but healthchecks do not create metrics for prometheus to
use as alert triggers. This change adds SLOW_OPS metric, and
provides a simple means to extend to other relevant health
checks in the future
If the extract of the value from the health check message fails
we log an error and remove the metric from the metric set. In
addition the metric description has changed to better reflect
the scenarios where SLOW_OPS can be triggered.
Nathan Cutler [Thu, 25 Feb 2021 20:50:20 +0000 (21:50 +0100)]
common/mempool: include standard thread library
Attempt to address FTBFS:
/home/jenkins-build/build/workspace/ceph-pull-requests/src/test/test_mempool.cc:399:11: error: request for member 'clear' in 'workers', which is of non-class type 'int'
399 | workers.clear();
| ^~~~~
Igor Fedotov [Fri, 5 Feb 2021 11:03:48 +0000 (14:03 +0300)]
os/bluestore: fix huge(>4GB) writes from RocksDB to BlueFS.
Fixes: https://tracker.ceph.com/issues/49168 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 5f94883ec8d64c02b2bb499caad8eaf91dd715f7)
Conflicts:
(lack of bufferlist refactor from https://github.com/ceph/ceph/pull/36754)
(lack of single allocator support from https://github.com/ceph/ceph/pull/30838)
src/os/bluestore/BlueFS.h
src/test/objectstore/test_bluefs.cc
Jianpeng Ma [Mon, 10 Aug 2020 07:56:13 +0000 (15:56 +0800)]
os/bluestore/BlueRocksEnv: Avoid flushing too much data at once.
Although, in _flush func we already check length. If length of dirty
is less then bluefs_min_flush_size, we will skip this flush.
But in fact, we found rocksdb can call many times Append() and then
call Flush(). This make flush_data is much larger than
bluefs_min_flush_size.
From my test, w/o this patch, it can reduce 99.99% latency(from
145.753ms to 20.474ms) for 4k randwrite with bluefs_buffered_io=true.
Because Bluefs::flush acquire lock. So we add new api try_flush
to avoid lock contention.
Kotresh HR [Fri, 19 Feb 2021 11:27:23 +0000 (16:57 +0530)]
mgr/volumes: Bump up AuthMetadataManager's version
With ceph_volume_client and mgr-volumes co-existing
for sometime, the version of both needs to be same.
The ceph_volume_client version <=5 can't decode
'subvolumes' key in auth-metadata file. Hence to
handle version in-compatibility, the version of
ceph_volume_client is bumped up to 6 and the same
needs to be done in mgr-volume's AuthMetadataManager