Yingxin Cheng [Tue, 8 Jun 2021 01:55:15 +0000 (09:55 +0800)]
crimson/onode-staged-tree: use extent_len_t and node_offset_t correctly
extent_len_t represents a value that may include the node size, but
node_offset_t cannot and may overflow. Also add validations when
try to cast a larger type to node_offset_t.
Kefu Chai [Mon, 7 Jun 2021 08:31:11 +0000 (16:31 +0800)]
*: always include <filesystem>
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use <experimental/filesystem> as an alternative of
<filesystem> anymore.
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use boost::filesystem as an alternative of
std::filesystem anymore.
Kefu Chai [Mon, 7 Jun 2021 08:17:09 +0000 (16:17 +0800)]
cmake: require GCC-8.1 and up
for better C++17 support, for instance for a better std::filesystem
support.
the reason why 8.1 is required is that ubuntu focal provides GCC-8.1,
and RHEL/CentOS8 provides GCC-8.4.1. so we only test the build on
GCC-8.1 and up so far.
crimson/monc: don't serve auth requests without active mon connection.
It's yet another racing issue which happens when auth request
handling is performed during the `active_con` reset sequence.
It caused the following `nullptr` dereference at Sepia:
```
DEBUG 2021-06-09 10:27:24,059 [shard 0] ms - [osd.6(client) v2:172.21.15.170:6809/33397 >> client.? -@39840] GOT AuthRequestFrame: method=2, p
referred_modes={2, 1}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:595:26: runtime error: member call on null pointer of type 'struct Connection'
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:178:11: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x0000563F9C00395F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F4A064D0B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_keys() in ceph-osd
5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
7# 0x0000563F9D007B39 in ceph-osd
8# 0x0000563F9D008C45 in ceph-osd
9# 0x0000563F95FF8D70 in ceph-osd
10# 0x0000563FA1A560BF in ceph-osd
11# 0x0000563FA1A5B600 in ceph-osd
12# 0x0000563FA1C0D66B in ceph-osd
13# 0x0000563FA176B0EA in ceph-osd
14# 0x0000563FA177520E in ceph-osd
15# main in ceph-osd
16# __libc_start_main in /lib64/libc.so.6
17# _start in ceph-osd
Fault at location: 0xb0
```
qa/standalone: Use osd op queue = wpq in activate_osd()
This change is a follow-up to commit b6e9c0903d5ad9a699b675f9fa7739e9cce9a5f3 that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.
The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.
yanqiang-ux [Mon, 7 Jun 2021 07:54:44 +0000 (15:54 +0800)]
osd: set r only if succeed in FillInVerifyExtent
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.
Merge pull request #41065 from pponnuvel/tracker_50554
rgw: Improve error message on email id reuse
Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Reviewed-by: Shilpa Jagannath <smanjara@redhat.com> Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Moving the attrs into s->bucket_attrs before setting them results in
setting empty attrs into the bucket. This means that reading them back
later gets empty attrs, which can cause a segfault.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Zac Dover [Tue, 8 Jun 2021 15:57:13 +0000 (01:57 +1000)]
doc/dev: s/reposotory/repository/ (really)
This corrects the heinous misspelling described in the
substitution expression in the title. This misspelling is
all the more egregious because it appears in a title, and
therefore would be used to create links if it had not been
caught.
crimson/monc: fix races between on_session_opened() and the reset sequence.
The `active_con` can get invalidated every single time when there is
a preemption point. This includes even the middle connection open
sequence as it's spread across multiple continuations!
Unfortunately, we don't check for `active_con` in the lambdas inside
the `on_session_opened()` method. That was the reason of the following
crash at Sepia [1]:
```
INFO 2021-06-08 09:36:23,992 [shard 0] monc - do_auth_single: connection closed
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4967-g96cdf983/rpm/el8/BUILD/ceph-17.0.0-4967-g96cdf983/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x000055C3C1CA860F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FAAE1713B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_conn() in ceph-osd
5# 0x000055C3C2532DA8 in ceph-osd
6# 0x000055C3C2535CB5 in ceph-osd
7# 0x000055C3BBC9FC70 in ceph-osd
8# 0x000055C3C76FAE5F in ceph-osd
9# 0x000055C3C77003A0 in ceph-osd
10# 0x000055C3C78B240B in ceph-osd
11# 0x000055C3C740FE8A in ceph-osd
12# 0x000055C3C7419FAE in ceph-osd
13# main in ceph-osd
14# __libc_start_main in /lib64/libc.so.6
15# _start in ceph-osd
Fault at location: 0x98
```
Aaryan Porwal [Wed, 26 May 2021 08:58:15 +0000 (14:28 +0530)]
mgr/dashboard: fix for right sidebar nav icon not clickable
fixed the responsive sidebar not opening on click event, and close sidebar on clicking tasks and notification list item because it'll be over shadowed by the sidebar Signed-off-by: Aaryan Porwal <aaryanporwal2233@gmail.com>
dengchl01 [Tue, 8 Jun 2021 07:42:27 +0000 (15:42 +0800)]
mgr/mgr_module: add docstring of MgrModule.get()-mgr_map,modified_config_options,service_map,mds_metadata,pg_status,osd_pool_stats,mgr_ips,have_local_config_map
Kefu Chai [Tue, 8 Jun 2021 04:44:21 +0000 (12:44 +0800)]
rgw/rgw_lua*: return unknown field error using luaL_error()
it's found on aarch64, the exception is not caught even if we do
catch exactly the same type of thrown exception, and the uncaught
exception ends up with a std::terminate() call. it could be the ABI
mismatch in aarch64, so the C++ runtime failed to find the catch block.
in this change, luaL_error() is used to populate the error to the
caller instead to workaround this issue.
Kefu Chai [Tue, 8 Jun 2021 04:30:37 +0000 (12:30 +0800)]
rgw/rgw_lua_utils: return error using luaL_error()
it's found on aarch64, the exception is not caught even if we do
catch exactly the same type of thrown exception, and the uncaught
exception ends up with a std::terminate() call. it could be the ABI
mismatch in aarch64, so the C++ runtime failed to find the catch block.
in this change, luaL_error() is used to populate the error to the
caller instead to workaround this issue.