Matt Benjamin [Thu, 17 Feb 2022 15:55:14 +0000 (10:55 -0500)]
rgwlc: remove bucket_lc_prepare, add backoff
Remove now-unused RGWLC::bucket_lc_prepare. Wrap serializer calls
in RGWLC::process(int index...) with simple backoff, limited to 5
retries.
In RGWLC::process(int index...), also open-coded the behavior of
RGWLC::bucket_lc_prepare(...), as the lock sharing between these
methods is error prone. For now, that method exists, so that it can
be called from the single-bucket process.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Mon, 14 Feb 2022 21:39:27 +0000 (16:39 -0500)]
rgwlc: remove explicit lc shard resets at start-of-run
This is an alternative solution to the (newly exposed) lifecycle
shard starvation problem reported by Jeegen Chen.
There was always an starvation condition implied by the
reset of lc shard head at the start of processing. The introduction
of "stale sessions" in parallel lifecycle changes made it more
visible, in particular when rgw_lc_debug_interval was set to a small
value and many buckets had lifecycle policy.
My hypothesis in this change is that lifecycle processing for each
lc shard should /always/ continue through the full set of eligible
buckets for the shard, regardless of how many processing cycles might
be required to do so. In general, restarting at the first eligible
bucket on each reschedule invites starvation when processing "gets
behind", so just avoid it.
Fixes: https://tracker.ceph.com/issues/49446 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 6e2ae13adced6b3dbb2fe16b547a30e9d68dfa06)
rgwlc: add a wraparound to continued shard processing
If the full set of buckets for a given lc shard couldn't be
processed in the prior cycle, processing will start with a
non-empty marker. Note the initial marker position, then
when the end of shard is reached, allow processing to wrap
around to the logical beginning of the shard and proceeding
through the initial marker.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Please enter the commit message for your changes. Lines starting
(cherry picked from commit 0b8f683d3cf444cc68fd30c3f179b9aa0ea08e7c)
don't report clearing incorrectly
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Mon, 14 Feb 2022 23:26:22 +0000 (18:26 -0500)]
rgwlc: permit disabling of (default) auto-clearing of stale sessions
Provide an option to disable automatic clearing of stale sessions--
which, unless disabled, happens after 2 lifecycle scheduling cycles.
The default behavior is most likely not desired when a debugging or
testing lifecycle processing with rgw_lc_debug_interval is set, and
therefore re-entering a running session after 2 scheduling cycles is
fairly likely.
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Ramana Raja [Tue, 25 Jan 2022 01:06:11 +0000 (20:06 -0500)]
mgr/nfs: allow dynamic update of cephfs nfs export
mgr/nfs module's apply_export() method is used to update an existing
CephFS NFS export. The method always restarted the ganesha service (
ganesha server cluster) after updating the export object and notifying
the ganesha servers to reload their exports. The restart temporarily
affected the clients connections of all the exports served by the
ganesha servers.
It is not always necessary to restart the ganesha servers. Only
updating the export ID, path, or FSAL block of a CephFS NFS export
requires a restart. So modify apply_export() to only restart the
ganesha servers for such export updates.
The mgr/nfs module creates a FSAL ceph user with read-only or
read-write path restricted MDS caps for each export. To change the
access type of the CephFS NFS export, the MDS caps of the export's FSAL
ceph user must also be changed. Ganesha can dynamically enforce an
export's access type changes, but Ceph server daemons can't dynamically
enforce changes in caps of the Ceph clients. To allow dynamic updates
of CephFS NFS exports, always create a FSAL Ceph user with read-write
path restricted MDS caps per export. Rely on the ganesha servers to
enforce the export access type changes for the NFS clients.
Fixes: https://tracker.ceph.com/issues/54025 Signed-off-by: Ramana Raja <rraja@redhat.com>
Xuehan Xu [Fri, 28 Jan 2022 05:04:03 +0000 (13:04 +0800)]
crimson/os/seastore: extract fixed kv btree implementation out of lba manager
Basically, this pr moves the current LBABtree and lba_range_pin out of lba manager,
and rename LBABtree to FixedKVBtree. This is the preparation for implementing backrefs
Joseph Sawaya [Fri, 11 Mar 2022 20:45:16 +0000 (15:45 -0500)]
doc: Add note to osds_per_device description about dual-actuator devices
This commit adds information about using dual-actuator devices with the
osds_per_device drive group option, letting users know they can create
an OSD for each actuator by setting this value to 2 in the drive group
they're using to apply OSDs to the device.
Kefu Chai [Wed, 2 Mar 2022 16:41:19 +0000 (00:41 +0800)]
librbd: s/boost::variant/std::variant/
boost::variant explicitly prevent it from being compared other types
by marking the return type of, for instance operator==(const T&) as
"void", if T is not identical to the variant type. so address this,
let's use std::variant<> instead.
more standard compliant and simpler this way.
because of https://cplusplus.github.io/LWG/issue3052, libstdc++
only specializes for std::variant<> for std::visit(), while libc++
allows the derived types of std::variant<> to be "visited", hence
we add an adaptor for SnapshotNamespace.
Addition of a SCRUB_DURATION field that shows how long the scrub/deep-scrub of a pg took.
This field will be displayed in the output of the "ceph pg dump --format=json" and "ceph pg ls-by-pool --format=json" commands.
Bucket head OPs should have quota in the output. However, we were only
fetching quota on OPs that also had an object. The object itself is not
necessary for quota (although a bucket is). Change it so that we get
quota on bucket OPs as well.
Fixes https://tracker.ceph.com/issues/54488
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Neha Ojha [Wed, 9 Mar 2022 23:19:35 +0000 (15:19 -0800)]
Merge pull request #45305 from Thingee/update-foundation-mem-202203
docs: Updating Foundation member list for 202203
Reviewed-by: Dan van der Ster <daniel.vanderster@cern.ch>
Reviewed-by:Anthony D'Atri <anthony.datri@gmail.com> Reviewed-by: Neha Ojha <nojha@redhat.com>
Kefu Chai [Mon, 7 Mar 2022 16:00:28 +0000 (00:00 +0800)]
cls/rbd: define SnapshotNamespace's ctor using its parent
simpler this way. also this prevent the compiler from trying to
convert a random value into SnapshotNamespace just because it
has a templated constructor when it tries to lookup a candidate
of operator<<(ostream&, Random value).
crimson/osd: fix no ENOENT when removing already removed object
This patch deals with the following problem:
```
[rzarzynski@o06 build]$ RBD_FEATURES="21" ./bin/ceph_test_cls_rbd --gtest_filter=TestClsRbd.create
Running main() from gmock_main.cc
Note: Google Test filter = TestClsRbd.create
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestClsRbd
[ RUN ] TestClsRbd.create
../src/test/cls_rbd/test_cls_rbd.cc:467: Failure
Expected equality of these values:
-2
ioctx.remove(oid)
Which is: 0
[ FAILED ] TestClsRbd.create (10 ms)
[----------] 1 test from TestClsRbd (10 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2805 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TestClsRbd.create
Ilya Dryomov [Tue, 8 Mar 2022 12:56:15 +0000 (13:56 +0100)]
test/librbd/test_notify.py: effect post object map rebuild assert
Instead of just optionally skipping update_features test, commit 9c0b239d70cd ("qa/upgrade: conditionally disable update_features
tests") moved it after rebuild_object_map test. This isn't right
because update_features test invalidates the object map as a side
effect and rebuild_object_map test is what makes it valid again:
crimson, cls: fix the inability to print logs from plugins.
`cls_log()` of the interface between an OSD and a plugin
(a Ceph Class) is implemented on top of the `dout` macros
and shared between crimson and the classical OSD.
Unfortunately, when a plugin is hosted inside crimson,
this causes the inability to send to any in-plugin generated
message to the `seastar::logger` for the `objclass` subsystem.
This patch differtiates the implementation of `cls_log()`,
and thus allow the `seastar::logger`-based implementation
of `dout` to be used.
quiesce all activities and destage allocations to disk before killing the OSD
1) keep the old (unsafe) fast-shutdown when we are not using NCB (non null-manager())
2) skip service.prepare_to_stop() which can take as much as 10 seconds
3) skip debug options in fast-shutdown
4) set_state(STATE_STOPPING) which will stop accepting new tasks to this OSD
5) clear op_shardedwq queues, this is safe since we didn't started processing them
6) stop timer
7) drain osd_op_tp (no new items will be added)
8) now we can safely call umount which will close_db/bluefs and will destage allocation to disk
9) skip _shutdown_cache() when we are in the middle of a fast-shutdown
10) increase debug level on fast-shutdown
11) add option for bluestore_qfsck_on_mount to force scan on mount for all tests
12) disable fsck-on-umount when running fast-shutdown
13) add an option to increase debug level at fast-shutdown umount()
14) set a time limit to fast-shutdown
15) Bug-Fix BlueStore::pool_statfs don't access db after it was removed
16) Fix error message for qfsck (error was caused by PR https://github.com/ceph/ceph/pull/44563)
17) make shutdown-timeout configurable
Fixes: https://tracker.ceph.com/issues/53266 Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
crimson/osd: fix buffer overflow due to missed debug's param
The problem is:
```
DEBUG 2022-03-07 13:50:40,027 [shard 0] osd - calling method rbd.create, num_read=0, num_write=0
DEBUG 2022-03-07 13:50:40,027 [shard 0] objclass - <cls> ../src/cls/rbd/cls_rbd.cc:787: create object_prefix=parent_id size=2097152 order=0 features=1
DEBUG 2022-03-07 13:50:40,027 [shard 0] osd - handling op omap-get-vals-by-keys on object 1:144d5af5:::parent_id:head
=================================================================
==2109764==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7f6de5176e70 at pc 0x7f6dfd2a7157 bp 0x7f6de5176e30 sp 0x7f6de51765d8
WRITE of size 24 at 0x7f6de5176e70 thread T0
#0 0x7f6dfd2a7156 in __interceptor_sigaltstack.part.0 (/lib64/libasan.so.6+0x54156)
#1 0x7f6dfd30d5b3 in __asan::PlatformUnpoisonStacks() (/lib64/libasan.so.6+0xba5b3)
#2 0x7f6dfd31314c in __asan_handle_no_return (/lib64/libasan.so.6+0xc014c)
Reactor stalled for 275 ms on shard 0. Backtrace: 0x45d9d 0xda72bd3 0xd801f73 0xd81f6f9 0xd81fb9c 0xd81fe2c 0xd8200f7 0x12b2f 0x7f6dfd3383c1 0x7f6dfd339b18 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd339bd4 0x7f6dfd33b089 0x7f6dfd33bb36 0x7f6dfd32e0b5 0x7f6dfd32ff3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0x54178 0xba5b3 0xc014c 0x1881f22 0x188344a 0xe8b439d 0xe8b58f2 0x2521d5a 0x2a2ee12 0x2c76349 0x2e04ce9 0x3c70c55 0x3cb8aa8 0x7f6de558de39
#3 0x1881f22 in fmt::v6::internal::arg_map<fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >::~arg_map() /usr/include/fmt/core.h:1170
#4 0x1881f22 in fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char>::~basic_format_context() /usr/include/fmt/core.h:1265
#5 0x1881f22 in fmt::v6::format_handler<fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >, char, fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >::~format_handler() /usr/include/fmt/format.h:3143
#6 0x1881f22 in fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char>::iterator fmt::v6::vformat_to<fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >, char, fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >(fmt::v6::arg_formatter<fmt::v6::internal::output_range<seastar::internal::log_buf::inserter_iterator, char> >::range, fmt::v6::basic_string_view<char>, fmt::v6::basic_format_args<fmt::v6::basic_format_context<seastar::internal::log_buf::inserter_iterator, char> >, fmt::v6::internal::locale_ref) /usr/include/fmt/format.h:3206
#7 0x188344a in seastar::internal::log_buf::inserter_iterator fmt::v6::vformat_to<fmt::v6::basic_string_view<char>, seastar::internal::log_buf::inserter_iterator, , 0>(seastar::internal::log_buf::inserter_iterator, fmt::v6::basic_string_view<char> const&, fmt::v6::basic_format_args<fmt::v6::basic_format_context<fmt::v6::type_identity<seastar::internal::log_buf::inserter_iterator>::type, fmt::v6::internal::char_t_impl<fmt::v6::basic_string_view<char>, void>::type> >) /usr/include/fmt/format.h:3395
#8 0x188344a in seastar::internal::log_buf::inserter_iterator fmt::v6::format_to<seastar::internal::log_buf::inserter_iterator, std::basic_string_view<char, std::char_traits<char> >, hobject_t const&, 0>(seastar::internal::log_buf::inserter_iterator, std::basic_string_view<char, std::char_traits<char> > const&, hobject_t const&) /usr/include/fmt/format.h:3418
#9 0x188344a in seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&)::{lambda(seastar::internal::log_buf::inserter_iterator)#1}::operator()(seastar::internal::log_buf::inserter_iterator) const ../src/seastar/include/seastar/util/log.hh:227
#10 0x188344a in seastar::logger::lambda_log_writer<seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&)::{lambda(seastar::internal::log_buf::inserter_iterator)#1}>::operator()(seastar::internal::log_buf::inserter_iterator) ../src/seastar/include/seastar/util/log.hh:106
#11 0xe8b439d in operator() ../src/seastar/src/util/log.cc:268
#12 0xe8b58f2 in seastar::logger::do_log(seastar::log_level, seastar::logger::log_writer&) ../src/seastar/src/util/log.cc:280
#13 0x2521d5a in void seastar::logger::log<hobject_t const&>(seastar::log_level, seastar::logger::format_info, hobject_t const&) ../src/seastar/include/seastar/util/log.hh:230
#14 0x2a2ee12 in void seastar::logger::debug<hobject_t const&>(seastar::logger::format_info, hobject_t const&) ../src/seastar/include/seastar/util/log.hh:373
#15 0x2a2ee12 in PGBackend::omap_get_vals_by_keys(ObjectState const&, OSDOp&, object_stat_sum_t&) const ../src/crimson/osd/pg_backend.cc:1220
#16 0x2c76349 in operator()<PGBackend, ObjectState> ../src/crimson/osd/ops_executer.cc:577
#17 0x2c76349 in do_const_op<crimson::osd::OpsExecuter::execute_op(OSDOp&)::<lambda(auto:167&, const auto:168&)> > ../src/crimson/osd/ops_executer.cc:449
#18 0x2e04ce9 in do_read_op<crimson::osd::OpsExecuter::execute_op(OSDOp&)::<lambda(auto:167&, const auto:168&)> > ../src/crimson/osd/ops_executer.h:216
#19 0x2e04ce9 in crimson::osd::OpsExecuter::execute_op(OSDOp&) ../src/crimson/osd/ops_executer.cc:576
Reactor stalled for 762 ms on shard 0. Backtrace: 0x45d9d 0xda72bd3 0xd801f73 0xd81f6f9 0xd81fb9c 0xd81fe2c 0xd8200f7 0x12b2f 0x7f6dfd33ae85 0x7f6dfd33bb36 0x7f6dfd32e0b5 0x7f6dfd32ff3a 0xd61d0 0x32412 0xbd8a7 0xbd134 0x54178 0xba5b3 0xc014c 0x1881f22 0x188344a 0xe8b439d 0xe8b58f2 0x2521d5a 0x2a2ee12 0x2c76349 0x2e04ce9 0x3c70c55 0x3cb8aa8 0x7f6de558de39
#20 0x3c70c55 in execute_osd_op ../src/crimson/osd/objclass.cc:35
#21 0x3cb8aa8 in cls_cxx_map_get_val(void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list*) ../src/crimson/osd/objclass.cc:372
#22 0x7f6de558de39 (/home/rzarzynski/ceph1/build/lib/libcls_rbd.so.1.0.0+0x28e39)
0x7f6de5176e70 is located 249456 bytes inside of 262144-byte region [0x7f6de513a000,0x7f6de517a000)
allocated by thread T0 here:
#0 0x7f6dfd3084a7 in aligned_alloc (/lib64/libasan.so.6+0xb54a7)
#1 0xdd414fc in seastar::thread_context::make_stack(unsigned long) ../src/seastar/src/core/thread.cc:196
#2 0x7fff3214bc4f ([stack]+0xa5c4f)
```