This is mostly for testing: a lot of tests assume that there are no
existing pools. These tests relied on a config to turn off creating the
"device_health_metrics" pool which generally exists for any new Ceph
cluster. It would be better to make these tests tolerant of the new .mgr
pool but clearly there's a lot of these. So just convert the config to
make it work.
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
This creates a new '.mgr' pool for storing a default sqlite3 database
for each mgr module. Each module's database is stored in:
file:///.mgr:<mgr module name>/main.db?vfs=ceph
The "main.db" is the only one used presently but perhaps a module may
want extra databases for some reason. The module name is used for the
RADOS namespace.
Databases are versioned in a common table called MgrModuleKV using the
"__version" key. A mechanism is in place (SCHEMA_VERSIONED) to allow
modules to upgrade their databases over time in a consistent way.
Fixes: https://tracker.ceph.com/issues/50278 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Yingxin Cheng [Tue, 8 Jun 2021 01:55:15 +0000 (09:55 +0800)]
crimson/onode-staged-tree: use extent_len_t and node_offset_t correctly
extent_len_t represents a value that may include the node size, but
node_offset_t cannot and may overflow. Also add validations when
try to cast a larger type to node_offset_t.
Kefu Chai [Mon, 7 Jun 2021 08:31:11 +0000 (16:31 +0800)]
*: always include <filesystem>
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use <experimental/filesystem> as an alternative of
<filesystem> anymore.
since there is no need to be compatible with GCC older than GCC-8, so
there is no need to use boost::filesystem as an alternative of
std::filesystem anymore.
Kefu Chai [Mon, 7 Jun 2021 08:17:09 +0000 (16:17 +0800)]
cmake: require GCC-8.1 and up
for better C++17 support, for instance for a better std::filesystem
support.
the reason why 8.1 is required is that ubuntu focal provides GCC-8.1,
and RHEL/CentOS8 provides GCC-8.4.1. so we only test the build on
GCC-8.1 and up so far.
Kamoltat [Thu, 13 May 2021 17:38:10 +0000 (17:38 +0000)]
pybind/mg/progress: Disregard unreported pgs
The global recovery event progress calculations only
takes into account pgs with `reported_epoch < start_epoch_of_event`
but sometimes the pgs doesn't get move before or after the creation
of the global recovery event, therefore this might result in a bug
where the global event gets stuck forever unless there is another
event that specifically makes the pgs that get stuck moves and updates
its `reported_epoch`.
Therefore, we decided to disregard pgs that are in active+clean state
but has `reported_epoch < start_epoch_of_event`.
crimson/monc: don't serve auth requests without active mon connection.
It's yet another racing issue which happens when auth request
handling is performed during the `active_con` reset sequence.
It caused the following `nullptr` dereference at Sepia:
```
DEBUG 2021-06-09 10:27:24,059 [shard 0] ms - [osd.6(client) v2:172.21.15.170:6809/33397 >> client.? -@39840] GOT AuthRequestFrame: method=2, p
referred_modes={2, 1}, payload_len=174
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:595:26: runtime error: member call on null pointer of type 'struct Connection'
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4977-g65cb255e/rpm/el8/BUILD/ceph-17.0.0-4977-g65cb255e/src/crimson/mon/MonClient.cc:178:11: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x0000563F9C00395F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007F4A064D0B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_keys() in ceph-osd
5# crimson::mon::Client::handle_auth_request(seastar::shared_ptr<crimson::net::Connection>, seastar::lw_shared_ptr<AuthConnectionMeta>, bool, unsigned int, ceph::buffer::v15_2_0::list const&, ceph::buffer::v15_2_0::list*) in ceph-osd
6# crimson::net::ProtocolV2::_handle_auth_request(ceph::buffer::v15_2_0::list&, bool) in ceph-osd
7# 0x0000563F9D007B39 in ceph-osd
8# 0x0000563F9D008C45 in ceph-osd
9# 0x0000563F95FF8D70 in ceph-osd
10# 0x0000563FA1A560BF in ceph-osd
11# 0x0000563FA1A5B600 in ceph-osd
12# 0x0000563FA1C0D66B in ceph-osd
13# 0x0000563FA176B0EA in ceph-osd
14# 0x0000563FA177520E in ceph-osd
15# main in ceph-osd
16# __libc_start_main in /lib64/libc.so.6
17# _start in ceph-osd
Fault at location: 0xb0
```
qa/standalone: Use osd op queue = wpq in activate_osd()
This change is a follow-up to commit b6e9c0903d5ad9a699b675f9fa7739e9cce9a5f3 that set the scheduler to wpq in
run_osd() and run_osd_filestore(). In addition, activate_osd() too has to
set the scheduler type to 'wpq' in order to be consistent and avoid test
failures.
The above is a temporary measure until all the standalone tests are
modified to run well with the mclock_scheduler.
yanqiang-ux [Mon, 7 Jun 2021 07:54:44 +0000 (15:54 +0800)]
osd: set r only if succeed in FillInVerifyExtent
When read failed, ret can be taken as data len in FillInVerifyExtent, which should be avoided.
It may cause errors in crc repair or retry read because of the data len. In my case, we use FillInVerifyExtent for EC read,
when meet -EIO,we will try crc repair, which need read data from other shard accrding to data len.
And I meet assert in ECBackend.cc (loc: line 2288 ceph_assert(range.first != range.second) ), But it seems master branch not support EC crc repair.
In shot, when reuse the readop may cause unpredictable error.
Merge pull request #41065 from pponnuvel/tracker_50554
rgw: Improve error message on email id reuse
Reviewed-by: Daniel Gryniewicz <dang@redhat.com> Reviewed-by: Shilpa Jagannath <smanjara@redhat.com> Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Moving the attrs into s->bucket_attrs before setting them results in
setting empty attrs into the bucket. This means that reading them back
later gets empty attrs, which can cause a segfault.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Zac Dover [Tue, 8 Jun 2021 15:57:13 +0000 (01:57 +1000)]
doc/dev: s/reposotory/repository/ (really)
This corrects the heinous misspelling described in the
substitution expression in the title. This misspelling is
all the more egregious because it appears in a title, and
therefore would be used to create links if it had not been
caught.
crimson/monc: fix races between on_session_opened() and the reset sequence.
The `active_con` can get invalidated every single time when there is
a preemption point. This includes even the middle connection open
sequence as it's spread across multiple continuations!
Unfortunately, we don't check for `active_con` in the lambdas inside
the `on_session_opened()` method. That was the reason of the following
crash at Sepia [1]:
```
INFO 2021-06-08 09:36:23,992 [shard 0] monc - do_auth_single: connection closed
/home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/17.0.0-4967-g96cdf983/rpm/el8/BUILD/ceph-17.0.0-4967-g96cdf983/src/crimson/mon/MonClient.cc:399:10: runtime error: member access within null pointer of type 'struct Connection'
Segmentation fault on shard 0.
Backtrace:
0# 0x000055C3C1CA860F in ceph-osd
1# FatalSignal::signaled(int, siginfo_t const*) in ceph-osd
2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in ceph-osd
3# 0x00007FAAE1713B20 in /lib64/libpthread.so.0
4# crimson::mon::Connection::get_conn() in ceph-osd
5# 0x000055C3C2532DA8 in ceph-osd
6# 0x000055C3C2535CB5 in ceph-osd
7# 0x000055C3BBC9FC70 in ceph-osd
8# 0x000055C3C76FAE5F in ceph-osd
9# 0x000055C3C77003A0 in ceph-osd
10# 0x000055C3C78B240B in ceph-osd
11# 0x000055C3C740FE8A in ceph-osd
12# 0x000055C3C7419FAE in ceph-osd
13# main in ceph-osd
14# __libc_start_main in /lib64/libc.so.6
15# _start in ceph-osd
Fault at location: 0x98
```