]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Mon, 29 Nov 2021 19:18:33 +0000 (14:18 -0500)]
Merge PR #44018 into master
* refs/pull/44018/head:
mon: fix quorum_age() regression
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Mon, 29 Nov 2021 18:56:43 +0000 (13:56 -0500)]
Merge PR #44030 into master
* refs/pull/44030/head:
mgr/cephadm: add some debug output for serve loop
ceph-volume: adjust arguments for 'ceph-volume raw activate'
ceph-volume: add raw support for db/wal for list and activate
Reviewed-by: Sébastien Han <seb@redhat.com>
Sage Weil [Mon, 29 Nov 2021 18:56:28 +0000 (13:56 -0500)]
Merge PR #44107 into master
* refs/pull/44107/head:
qa/tasks/cephadm_cases/test_cli: fix test_daemon_restart
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Wed, 24 Nov 2021 14:17:03 +0000 (09:17 -0500)]
mgr/cephadm: add some debug output for serve loop
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Sat, 20 Nov 2021 15:19:36 +0000 (10:19 -0500)]
ceph-volume: adjust arguments for 'ceph-volume raw activate'
Take a list of devices, so that we can selectively activate a raw osd
with db/wal.
Remove the argument type kludge introduced in
2c228a9a409176c0f1679f176443fd3ead219c7a
since it is no longer needed.
Note that we're making this change because (1) it allows db/wal and (2)
because there are no known users of 'raw activate'. The only known user
is via 'ceph-volume activate' and we've fixed that caller in this commit.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 19 Nov 2021 20:15:18 +0000 (15:15 -0500)]
ceph-volume: add raw support for db/wal for list and activate
Currently 'prepare' doesn't support db/wal, but we want it in list and
activate because 'ceph-volume activate ...' tries raw before lvm.
Note that I'm not sure we really want to accept --block.db and --block.wal
here at all.
Fixes: 3d7ceec684b0ac5b83fae4c397b134236fac485e
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 25 Nov 2021 14:10:28 +0000 (08:10 -0600)]
qa/tasks/cephadm_cases/test_cli: fix test_daemon_restart
We cannot schedule a daemon start if there is another daemon action
with a higher priority (including stop) scheduled. However,
that state isn't cleared until *after* the osd goes down, the
systemctl command returns, and mgr/cephadm gets around to updating
the inventory scheduled_daemon_action state.
Semi-fix: (1) wait for the orch status to change, and then (2)
wait a few more seconds after that.
Signed-off-by: Sage Weil <sage@newdream.net>
Sebastian Wagner [Mon, 29 Nov 2021 16:03:22 +0000 (17:03 +0100)]
Merge pull request #44100 from adk3798/infer-config-fix
cephadm: only infer conf from mon if fsid matches
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Mon, 29 Nov 2021 14:35:32 +0000 (15:35 +0100)]
Merge pull request #44101 from adk3798/agent-down-multiplier
mgr/cephadm: agent: allow agent down multiplier to be configured
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Mon, 29 Nov 2021 13:08:22 +0000 (14:08 +0100)]
Merge pull request #42378 from sebastian-philipp/no-grafana-admin
mgr/cephadm: Add GrafanaSpec.initial_admin_password
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Juan Miguel Olmo Martínez <jolmomar@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
Reviewed-by: Patrick Seidensal <pseidensal@suse.com>
Sebastian Wagner [Mon, 29 Nov 2021 08:50:28 +0000 (09:50 +0100)]
Merge pull request #44011 from adk3798/repr-device
python-common: add string representation for Device and DeviceSelection classes
Reviewed-by: Michael Fritch <mfritch@suse.com>
Mykola Golub [Mon, 29 Nov 2021 07:36:08 +0000 (09:36 +0200)]
Merge pull request #44114 from orozery/librbd-memory-leaks
librbd: fix various memory leaks
Reviewed-by: Mykola Golub <mgolub@suse.com>
Samuel Just [Mon, 29 Nov 2021 04:21:39 +0000 (20:21 -0800)]
Merge pull request #43530 from myoungwon/wip-seastore-nvme-device
seastore: add nvme commands to nvme device class
Reviewed-by: Samuel Just <sjust@redhat.com>
Samuel Just [Mon, 29 Nov 2021 01:59:54 +0000 (17:59 -0800)]
Merge pull request #44068 from rzarzynski/wip-crimson-weakref-in-sharedlru
crimson/common: don't assume pointer-from-SharedLRU can't outlive it.
Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Samuel Just <sjust@redhat.com>
Samuel Just [Mon, 29 Nov 2021 00:36:47 +0000 (16:36 -0800)]
Merge pull request #44110 from rzarzynski/wip-crimson-alienstore-syncumountread
crimson/os: fix a shutdown-related race condition in AlienStore.
Reviewed-by: Samuel Just <sjust@redhat.com>
Samuel Just [Mon, 29 Nov 2021 00:10:46 +0000 (16:10 -0800)]
Merge pull request #43481 from myoungwon/wip-dedup-tool-repair
tool: add repair command to ceph-dedup-tool
Reviewed-by: Samuel Just <sjust@redhat.com>
Or Ozeri [Thu, 25 Nov 2021 18:17:26 +0000 (20:17 +0200)]
librbd/crypto: remove unused member from ShutDownCryptoRequest
m_crypto is not used - remove it.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 17:53:52 +0000 (19:53 +0200)]
test/librbd: fix memory leak in TestMockShutDownCryptoRequest
fix memory leak in TestMockShutDownCryptoRequest.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 17:52:48 +0000 (19:52 +0200)]
test/librbd: fix memory leak in TestMockCryptoLoadRequest
fix memory leak in TestMockCryptoLoadRequest.CryptoAlreadyLoaded
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 17:51:45 +0000 (19:51 +0200)]
test/librbd: fix memory leak in TestMockCryptoCryptoObjectDispatch
fix memory leak in TestMockCryptoCryptoObjectDispatch.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:49:33 +0000 (15:49 +0200)]
librbd/crypto: fix memory leak in openssl/DataCryptor
Re-initializing the same datacryptor, causes a memory leak of the old encryption key.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:47:54 +0000 (15:47 +0200)]
librbd/crypto: fix memory leak in ShutDownCryptoRequest
If crypto object dispatch does not exist, a context pointer is leaked.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:45:00 +0000 (15:45 +0200)]
test/librbd: fix memory leak in TestMockParentCacheObjectDispatch
fix memory leak in TestMockParentCacheObjectDispatch.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:44:09 +0000 (15:44 +0200)]
test/librbd: fix memory leak in TestMockCryptoLuksFormatRequest
fix memory leak in TestMockCryptoLuksFormatRequest.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:42:22 +0000 (15:42 +0200)]
test/librbd: fix memory leak in TestMockCryptoLuksLoadRequest
fix memory leak in TestMockCryptoLuksLoadRequest.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:11:36 +0000 (15:11 +0200)]
test/librbd: fix bad TearDown in TestCryptoOpensslDataCryptor
Fix the TearDown function in TestCryptoOpensslDataCryptor
to call the right class parent function.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:10:14 +0000 (15:10 +0200)]
test/librbd: fix memory leak in TestCryptoOpensslDataCryptor
One of the tests leaks an encryption context.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:08:47 +0000 (15:08 +0200)]
librbd/crypto: fix memory leak in when DataCryptor fails
If DataCryptor fails, either in init_context or update_context,
the encryption context is not returned, which causes a memory leak.
This commit fixes this issue.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Or Ozeri [Thu, 25 Nov 2021 13:05:55 +0000 (15:05 +0200)]
test/librbd: fix memory leak in TestMockCryptoBlockCrypto
fix memory leak in TestMockCryptoBlockCrypto.
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Sage Weil [Fri, 26 Nov 2021 20:15:51 +0000 (15:15 -0500)]
Merge PR #43997 into master
* refs/pull/43997/head:
mgr/cephadm: make logging about agent less verbose
Reviewed-by: Adam King <adking@redhat.com>
Sage Weil [Fri, 26 Nov 2021 20:15:42 +0000 (15:15 -0500)]
Merge PR #44079 into master
* refs/pull/44079/head:
mgr/cephadm: skip osd_stats check if osd removal queue is empty
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Fri, 26 Nov 2021 20:15:27 +0000 (15:15 -0500)]
Merge PR #44075 into master
* refs/pull/44075/head:
mgr/cephadm: drop osdspec_affinity tracking
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Fri, 26 Nov 2021 20:15:12 +0000 (15:15 -0500)]
Merge PR #44073 into master
* refs/pull/44073/head:
pybind/mgr/mgr_module: cache mgr_ip
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Fri, 26 Nov 2021 15:38:58 +0000 (10:38 -0500)]
Merge PR #43936 into master
* refs/pull/43936/head:
qa/tasks/cephadm: pull image to all hosts in parallel
qa/tasks/cephadm: add hosts via mon remote
qa/tasks/cephadm: use shortname for remote directory
qa/tasks/cephadm: deploy no more than 5 mons in roleless mode
qa/tasks/radosbench: default clients to all clients (not client.0)
qa/tasks/ceph_manager: parallelize flush_pg_stats()
qa/suites/big: remove thrasher
qa/suites/big: update for cephadm
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sage Weil [Fri, 26 Nov 2021 15:37:27 +0000 (10:37 -0500)]
Merge PR #44080 into master
* refs/pull/44080/head:
mgr/cephadm: record when finished with scheduled daemon action
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 26 Nov 2021 10:15:51 +0000 (11:15 +0100)]
mgr/cephadm: grafana.ini: Set `cookie_secure = true`
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 16 Jul 2021 14:20:32 +0000 (16:20 +0200)]
mgr/cephadm: Add GrafanaSpec.initial_admin_password
By default, we're not creating any admin accout for Grafana now,
but we're adding an option to set the grafana password manually using:
```yaml
service_type: grafana
spec:
initial_admin_password: mypassword
```
Users can then easily log into Grafana with the given password.
Fixes: https://tracker.ceph.com/issues/48291
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 23 Jul 2021 01:20:43 +0000 (03:20 +0200)]
python-common: Reparent AlertManagerSpec to MonitoringSpec
And remove duplicated members
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 23 Jul 2021 01:15:53 +0000 (03:15 +0200)]
python-common: Move AlertManagerSpec below MonitoringSpec
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 23 Jul 2021 07:05:59 +0000 (09:05 +0200)]
python-common: test_yaml(): add a few tests
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 23 Jul 2021 00:59:59 +0000 (02:59 +0200)]
python-common: prettify `yaml.dump(MonitoringSpec())`
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Fri, 23 Jul 2021 00:36:27 +0000 (02:36 +0200)]
pyhton-common: move some tests from cephadm/test_spec.py
Cause they don't have any dependencies to cephadm
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Thu, 25 Nov 2021 16:54:26 +0000 (17:54 +0100)]
Merge pull request #44106 from sebastian-philipp/mgr-tox-37
mgr/tox.ini: Add python 3.7 environment
Reviewed-by: Adam King <adking@redhat.com>
Sebastian Wagner [Thu, 25 Nov 2021 16:27:26 +0000 (17:27 +0100)]
Merge pull request #43943 from sebastian-philipp/osd-memeory-hyperconverged
doc/cephadm: OSD memory autotuning for hyperconverged
Reviewed-by: Adam King <adking@redhat.com>
Radoslaw Zarzynski [Wed, 24 Nov 2021 15:41:22 +0000 (15:41 +0000)]
crimson/os: fix a shutdown-related race condition in AlienStore.
This is supposed to tackle crashes like the following one:
```
INFO 2021-11-17 16:33:12,048 [shard 0] alienstore - stat
...
DEBUG 2021-11-17 16:33:12,789 [shard 0] ms - [osd.2(hb_front) v2:0.0.0.0:6813/34383 >> osd.0 v2:127.0.0.1:6809/34293@56992] closed!
DEBUG 2021-11-17 16:33:12,791 [shard 0] ms - [osd.2(hb_front) v2:0.0.0.0:6813/34383@53359 >> osd.7 v2:0.0.0.0:6815/34448] closed!
INFO 2021-11-17 16:33:12,795 [shard 0] alienstore - umount
INFO 2021-11-17 16:33:12,804 [shard 0] osd - osd.2: committed_osd_maps(23, 62)
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/
17.0.0-8896-gf35358f1 /rpm/el8/BUILD/
ceph-17.0.0-8896-gf35358f1 /src/rocksdb/db/db_impl/db_impl.cc:1615: rocksdb::Status rocksdb::DBImpl::GetImpl(const rocksdb::ReadOptions&, const rocksdb::Slice&, rocksdb::DBImpl::GetImplOptions&): Assertion `get_impl_options.column_family' failed.
Aborting.
Backtrace:
INFO 2021-11-17 16:33:13,542 [shard 0] ms - [osd.2(cluster) v2:172.21.15.17:6804/34383 >> osd.3 v2:172.21.15.17:6806/34387@50001] execute_ready(): fault at READY with nothing to send, going to STANDBY -- std::system_error (error crimson::net:4, read eof)
DEBUG 2021-11-17 16:33:13,542 [shard 0] ms - [osd.2(cluster) v2:172.21.15.17:6804/34383 >> osd.3 v2:172.21.15.17:6806/34387@50001] TRIGGER STANDBY, was READY
0# gsignal in /lib64/libc.so.6
1# abort in /lib64/libc.so.6
2# 0x00007F12FA13FC89 in /lib64/libc.so.6
3# 0x00007F12FA14DA76 in /lib64/libc.so.6
4# rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) in ceph-osd
5# rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*) in ceph-osd
6# rocksdb::DBImpl::Get(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*) in ceph-osd
7# RocksDBStore::get(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned long, ceph::buffer::v15_2_0::list*) in ceph-osd
8# BlueStore::Collection::get_onode(ghobject_t const&, bool, bool) in ceph-osd
9# BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int) in ceph-osd
10# 0x00005584E516577F in ceph-osd
11# crimson::os::ThreadPool::loop(std::chrono::duration<long, std::ratio<1l, 1000l> >, unsigned long) in ceph-osd
12# 0x00005584E54E71E9 in ceph-osd
13# 0x00007F12FB861BA3 in /lib64/libstdc++.so.6
14# 0x00007F12FBB3C14A in /lib64/libpthread.so.0
15# clone in /lib64/libc.so.6
Content of /proc/self/maps:
7fff7000 -
8fff7000 rw-p
00000000 00:00 0
```
The problem happened in RocksDB:
```cpp
Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
GetImplOptions& get_impl_options) {
assert(get_impl_options.value != nullptr ||
get_impl_options.merge_operands != nullptr);
assert(get_impl_options.column_family);
// ...
```
```cpp
tatus DBImpl::Get(const ReadOptions& read_options,
ColumnFamilyHandle* column_family, const Slice& key,
PinnableSlice* value, std::string* timestamp) {
GetImplOptions get_impl_options;
get_impl_options.column_family = column_family;
get_impl_options.value = value;
get_impl_options.timestamp = timestamp;
Status s = GetImpl(read_options, key, get_impl_options);
return s;
}
```
```cpp
int RocksDBStore::get(
const string& prefix,
const char *key,
size_t keylen,
bufferlist *out)
{
ceph_assert(out && (out->length() == 0));
utime_t start = ceph_clock_now();
int r = 0;
rocksdb::PinnableSlice value;
rocksdb::Status s;
auto cf = get_cf_handle(prefix, key, keylen);
if (cf) {
s = db->Get(rocksdb::ReadOptions(),
cf,
rocksdb::Slice(key, keylen),
&value);
} else {
string k;
combine_strings(prefix, key, keylen, &k);
s = db->Get(rocksdb::ReadOptions(),
default_cf,
rocksdb::Slice(k),
&value);
}
// ...
```
It may be explained by a race condition between `AlienStore::stat()`
and `AlienStore::umount()`. Umounting a BlueStore means nullifying
`default_cf`:
```cpp
void RocksDBStore::close()
{
// ...
default_cf = nullptr;
delete db;
db = nullptr;
}
```
```
INFO 2021-11-17 16:33:12,048 [shard 0] alienstore - stat
...
INFO 2021-11-17 16:33:12,795 [shard 0] alienstore - umount
INFO 2021-11-17 16:33:12,804 [shard 0] osd - osd.2: committed_osd_maps(23, 62)
```
Although `AlienStore` synchronizes `umount()` and `do_transaction()`
with a `seastar::gate`, it lacks similar mechanism for read-like operations.
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Sage Weil [Mon, 15 Nov 2021 18:00:52 +0000 (12:00 -0600)]
qa/tasks/cephadm: pull image to all hosts in parallel
This doesn't affect bootstrap, but it does mean we avoid any delay
the first time we cephadm.shell on some non-boostrap host.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Mon, 15 Nov 2021 17:55:52 +0000 (11:55 -0600)]
qa/tasks/cephadm: add hosts via mon remote
If we use a new remote for each shell command, we end up waiting
for the image to pull on every host in sequence.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Fri, 12 Nov 2021 20:52:46 +0000 (14:52 -0600)]
qa/tasks/cephadm: use shortname for remote directory
This aligns with what the ceph and syslog tasks do.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 10 Nov 2021 20:48:13 +0000 (14:48 -0600)]
qa/tasks/cephadm: deploy no more than 5 mons in roleless mode
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 10 Nov 2021 17:27:53 +0000 (11:27 -0600)]
qa/tasks/radosbench: default clients to all clients (not client.0)
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 10 Nov 2021 17:23:51 +0000 (11:23 -0600)]
qa/tasks/ceph_manager: parallelize flush_pg_stats()
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Wed, 10 Nov 2021 16:35:39 +0000 (10:35 -0600)]
qa/suites/big: remove thrasher
This doesn't work with roleless (yet)
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Mon, 8 Nov 2021 15:29:41 +0000 (09:29 -0600)]
qa/suites/big: update for cephadm
Signed-off-by: Sage Weil <sage@newdream.net>
Sebastian Wagner [Thu, 25 Nov 2021 12:29:01 +0000 (13:29 +0100)]
mgr/cephadm/tests: remove `_deploy_cephadm_binary`
(not needed)
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Thu, 25 Nov 2021 12:22:06 +0000 (13:22 +0100)]
mgr/tox.ini: Add python 3.7 environment
Plus fixes.
Signed-off-by: Sebastian Wagner <sewagner@redhat.com>
Adam King [Wed, 24 Nov 2021 23:52:10 +0000 (18:52 -0500)]
mgr/cephadm: agent: allow agent down multiplier to be configured
Signed-off-by: Adam King <adking@redhat.com>
Adam King [Wed, 24 Nov 2021 22:23:01 +0000 (17:23 -0500)]
cephadm: only infer conf from mon if fsid matches
fixes: https://tracker.ceph.com/issues/53394
Signed-off-by: Adam King <adking@redhat.com>
Neha Ojha [Wed, 24 Nov 2021 17:18:09 +0000 (09:18 -0800)]
Merge pull request #43774 from aclamk/fix-bluefs-truncate
Fix data corruption in bluefs truncate()
Reviewed-by: Igor Fedotov <igor.fedotov@croit.io>
Neha Ojha [Wed, 24 Nov 2021 17:17:11 +0000 (09:17 -0800)]
Merge pull request #43875 from liewegas/ceph-cli-better-help
ceph: make -h/--help show match when some args are supplied
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Tue, 23 Nov 2021 17:32:40 +0000 (12:32 -0500)]
pybind/mgr/mgr_module: cache mgr_ip
This does not change for the lifetime of an active mgr module. No need to
keep calling back into Mgr to re-fetch it.
Signed-off-by: Sage Weil <sage@newdream.net>
Sebastian Wagner [Wed, 24 Nov 2021 15:42:40 +0000 (16:42 +0100)]
Merge pull request #44092 from sebastian-philipp/cephadm-docs-deployment-scenarios
doc/cephadm: Cephadm docs deployment scenarios
Reviewed-by: Adam King <adking@redhat.com>
Melissa [Tue, 26 Oct 2021 06:46:37 +0000 (02:46 -0400)]
doc/cephadm: deployment scenarios single host and isolated environment
This PR adds a deployment scenarios section to the cephadm docs to document the single-host-defaults flag, and explain how to deploy in an isolated environment.
Signed-off-by: Melissa Li <melissali@redhat.com>
Melissa [Tue, 26 Oct 2021 06:46:37 +0000 (02:46 -0400)]
doc/cephadm: isolated environment and other deployment scenarios
This PR adds a section to the cephadm docs to describe how to install cephadm in different deployment scenarios (set cluster on single host, and deployment in an isolated environment or private network).
Signed-off-by: Melissa Li <melissali@redhat.com>
Ernesto Puerta [Wed, 24 Nov 2021 11:47:11 +0000 (12:47 +0100)]
Merge pull request #43905 from rhcs-dashboard/fix-53242-master
mgr/dashboard: dashboard does not show degraded objects if they are less than 0.5% under "Dashboard->Capacity->Objects block
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Alfonso Martínez [Wed, 24 Nov 2021 10:30:22 +0000 (11:30 +0100)]
Merge pull request #44023 from rhcs-dashboard/kcli-expanded-monitoring
mgr/dashboard: cephadm e2e start script: --expanded: deploy monitoring stack
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Mykola Golub [Wed, 24 Nov 2021 09:58:36 +0000 (11:58 +0200)]
Merge pull request #44064 from MrFreezeex/fix-statusupdater-utest
rbd-mirror: make RemoveImmediateUpdate test synchronous
Reviewed-by: Mykola Golub <mgolub@suse.com>
Alfonso Martínez [Wed, 24 Nov 2021 07:34:10 +0000 (08:34 +0100)]
Merge pull request #44045 from rhcs-dashboard/upgrade-cypress
mgr/dashboard: upgrade Cypress to the latest stable version
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Liu-Chunmei [Wed, 24 Nov 2021 00:46:16 +0000 (16:46 -0800)]
Merge pull request #44019 from liu-chunmei/crimson-background-recovery
crimson/osd: add delay for background_recovery
reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Xie Xingguo [Wed, 24 Nov 2021 00:41:22 +0000 (08:41 +0800)]
Merge pull request #43989 from cfsnyder/wip-53308
osd/OSDMap.cc: clean up pg_temp for nonexistent pgs
Reviewed-by: xie xingguo <xie.xingguo@zte.com.cn>
Sage Weil [Wed, 24 Nov 2021 00:40:35 +0000 (19:40 -0500)]
Merge PR #44036 into master
* refs/pull/44036/head:
.github/pull_request_template: drop teuthology reference
.github/pull_request_template: add cleanup option
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Sage Weil [Wed, 24 Nov 2021 00:32:26 +0000 (19:32 -0500)]
mgr/cephadm: record when finished with scheduled daemon action
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 23 Nov 2021 23:59:28 +0000 (18:59 -0500)]
mgr/cephadm: skip osd_stats check if osd removal queue is empty
Signed-off-by: Sage Weil <sage@newdream.net>
Neha Ojha [Tue, 23 Nov 2021 20:07:41 +0000 (12:07 -0800)]
Merge pull request #44061 from Matan-B/wip-matanb-doc-teuthology
doc/dev: adding Teuthology suggested resources
Reviewed-by: Neha Ojha <nojha@redhat.com>
Ernesto Puerta [Tue, 23 Nov 2021 19:03:22 +0000 (20:03 +0100)]
Merge pull request #43996 from rhcs-dashboard/predefined-labels
mgr/dashboard: Predefine labels in create host form
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Sage Weil [Tue, 23 Nov 2021 18:38:50 +0000 (13:38 -0500)]
mgr/cephadm: drop osdspec_affinity tracking
We identify which drivespec legacy OSDs belong(ed) to by metadata they
report to the mgr. Modern cephadm does this instead by looking at the
'service' property in the unit.meta file. Having cephadm query the osd
metadata is expensive for large clusters, so let's avoid this and rely
entirely on unit.meta.
Worst case, some upgraded clusters will show OSDs as service 'osd' instead
of service 'osd.whatever' for whatever drivespec created them.
Signed-off-by: Sage Weil <sage@newdream.net>
Radoslaw Zarzynski [Tue, 23 Nov 2021 11:38:31 +0000 (11:38 +0000)]
crimson/common: don't assume pointer-from-SharedLRU can't outlive it.
Initially, we were assuming that no pointer obtained from SharedLRU
can outlive the lru itself. However, since going with the interruption
concept for handling shutdowns, this is no longer valid.
The patch is supposed to deal with crashes like the following one:
```
ceph-osd: /home/jenkins-build/build/workspace/ceph-dev-new-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/
17.0.0-8898-ge57ad63c /rpm/el8/BUILD/ceph-17.0.
0-8898-ge57ad63c /src/crimson/common/shared_lru.h:46: SharedLRU<K, V>::~SharedLRU() [with K = unsigned int; V = OSDMap]: Assertion `weak_refs.empty()' failed.
Aborting on shard 0.
Backtrace:
Reactor stalled for 1162 ms on shard 0. Backtrace: 0xb14ab 0x46e57428 0x46bc450d 0x46be03bd 0x46be0782 0x46be0946 0x46be0bf6 0x12b1f 0xc8e3b 0x3fdd77e2 0x3fddccdb 0x3fdde1ee 0x3fdde8b3 0x3fdd3f2b 0x3fdd4442 0x3f
dd4c3a 0x12b1f 0x3737e 0x21db4 0x21c88 0x2fa75 0x3a5ae1b9 0x3a38c5e2 0x3a0c823d 0x3a1771f1 0x3a1796f5 0x46ff92c9 0x46ff9525 0x46ff9e93 0x46ff8eae 0x46ff8bd9 0x3a160e67 0x39f50c83 0x39f51cd0 0x46b96271 0x46bde51a
0x46d6891b 0x46d6a8f0 0x4681a7d2 0x4681f03b 0x39fd50f2 0x23492 0x39b7a7dd
0# gsignal in /lib64/libc.so.6
1# abort in /lib64/libc.so.6
2# 0x00007F9535E04C89 in /lib64/libc.so.6
3# 0x00007F9535E12A76 in /lib64/libc.so.6
4# crimson::osd::OSD::~OSD() in ceph-osd
5# seastar::shared_ptr_count_for<crimson::osd::OSD>::~shared_ptr_count_for() in ceph-osd
6# seastar::shared_ptr<crimson::osd::OSD>::~shared_ptr() in ceph-osd
7# seastar::futurize<std::result_of<seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) co
nst::{lambda()#1} ()>::type>::type seastar::smp::submit_to<seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::opera
tor()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}::operator()(seastar::future<void>) const::{la
mbda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) in ceph-osd
8# std::_Function_handler<seastar::future<void> (unsigned int), seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}>
::_M_invoke(std::_Any_data const&, unsigned int&&) in ceph-osd
9# 0x0000562DA18162CA in ceph-osd
10# 0x0000562DA1816526 in ceph-osd
11# 0x0000562DA1816E94 in ceph-osd
12# 0x0000562DA1815EAF in ceph-osd
13# 0x0000562DA1815BDA in ceph-osd
14# seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)>::direct_vtable_for<seastar::future<void>::then_wrapped_maybe_erase<true, seastar::future<void>, seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}>(seastar::sharded<crimson::osd::OSD>::stop()::{lambda(seastar::future<void>)#2}&&)::{lambda(seastar::future<void>&&)#1}>::call(seastar::noncopyable_function<seastar::future<void> (seastar::future<void>&&)> const*, seastar::future<void>&&) in ceph-osd
15# 0x0000562D9476DC84 in ceph-osd
16# 0x0000562D9476ECD1 in ceph-osd
17# 0x0000562DA13B3272 in ceph-osd
18# 0x0000562DA13FB51B in ceph-osd
19# 0x0000562DA158591C in ceph-osd
20# 0x0000562DA15878F1 in ceph-osd
21# 0x0000562DA10377D3 in ceph-osd
22# 0x0000562DA103C03C in ceph-osd
23# main in ceph-osd
24# __libc_start_main in /lib64/libc.so.6
25# _start in ceph-osd
```
Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
Arthur Outhenin-Chalandre [Tue, 23 Nov 2021 14:25:46 +0000 (15:25 +0100)]
rbd-mirror: make RemoveImmediateUpdate test synchronous
Try fixing sporadic failure linked in the tracker in
TestMockMirrorStatusUpdater.RemoveImmediateUpdate by making it
synchronous.
Fixes: https://tracker.ceph.com/issues/53375
Signed-off-by: Arthur Outhenin-Chalandre <arthur.outhenin-chalandre@cern.ch>
Alfonso Martínez [Tue, 23 Nov 2021 14:17:54 +0000 (15:17 +0100)]
mgr/dashboard: upgrade Cypress to the latest stable version
- Remove unneeded dependency that was causing UI performance issues: zone.js
- Ignore 'ResizeObserver loop limit exceeded' error.
- run-frontend-e2e-tests.sh refactoring: create rgw dashboard user through
'ceph dashboard set-rgw-credentials' and use it on rgw buckets' tests.
Fixes: https://tracker.ceph.com/issues/53357
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Ronen Friedman [Tue, 23 Nov 2021 13:19:33 +0000 (15:19 +0200)]
Merge pull request #43244 from ronen-fr/wip-rf-scrub-command
osd: make 'pg deep-scrub' command initiate a scrub
Reviewed-by: Neha Ojha <nojha@redhat.com>
Venky Shankar [Tue, 23 Nov 2021 12:47:22 +0000 (18:17 +0530)]
Merge pull request #43722 from lxbsz/caps_doc
doc: update the capabilities doc for cephfs
Reviewed-by: Venky Shankar <vshankar@redhat.com>
Matan Breizman [Tue, 23 Nov 2021 11:06:57 +0000 (11:06 +0000)]
doc/dev: adding Teuthology suggested resources
Signed-off-by: Matan Breizman <mbreizma@redhat.com>
Guillaume Abrioux [Tue, 23 Nov 2021 05:12:22 +0000 (06:12 +0100)]
Merge pull request #43982 from guits/refactor_cv_human_readable_func
ceph-volume: human_readable_size() refactor
Kefu Chai [Tue, 23 Nov 2021 02:59:54 +0000 (10:59 +0800)]
Merge pull request #44007 from tchaikov/wip-cmake-python3.10
cmake: check for python(\d)\.(\d+) when building boost
Reviewed-by: Casey Bodley <cbodley@redhat.com>
Ernesto Puerta [Mon, 22 Nov 2021 19:39:50 +0000 (20:39 +0100)]
Merge pull request #43992 from rhcs-dashboard/flaky-inventory-test-fix
mgr/dashboard: fix flaky inventory e2e test
Reviewed-by: Waad Alkhoury <walkhour@redhat.com>
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
Ernesto Puerta [Mon, 22 Nov 2021 18:43:11 +0000 (19:43 +0100)]
Merge pull request #43958 from rhcs-dashboard/daemon-event-padding
mgr/dashboard: Daemon Events listing using bootstrap class
Reviewed-by: Aashish Sharma <aasharma@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Ernesto Puerta [Mon, 22 Nov 2021 18:41:04 +0000 (19:41 +0100)]
Merge pull request #43866 from rhcs-dashboard/add-hint-provisioned-images
mgr/dashboard: provisioned values is misleading in RBD image table
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Casey Bodley [Mon, 22 Nov 2021 15:59:39 +0000 (10:59 -0500)]
Merge pull request #43843 from cbodley/wip-test-cls-rgw-stats
test/cls/rgw: add index transaction simulator to model bucket stats
Reviewed-by: J. Eric Ivancich <ivancich@redhat.com>
Sebastian Wagner [Mon, 22 Nov 2021 11:00:20 +0000 (12:00 +0100)]
Merge pull request #43888 from mgfritch/cephadm-expect-hostname
cephadm: fixup expect-hostname message
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Mon, 22 Nov 2021 10:59:19 +0000 (11:59 +0100)]
Merge pull request #43873 from guits/add_shared_folder_shell_cmd
cephadm: add --shared_ceph_folder to shell cmd
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Sebastian Wagner [Mon, 22 Nov 2021 10:27:13 +0000 (11:27 +0100)]
Merge pull request #43876 from sebastian-philipp/all-osd-at-once
mgr/cephadm: create osds at all hosts at once
Reviewed-by: Adam King <adking@redhat.com>
Reviewed-by: Michael Fritch <mfritch@suse.com>
Aashish Sharma [Fri, 12 Nov 2021 10:05:38 +0000 (15:35 +0530)]
mgr/dashboard: dashboard does not show degraded objects if they are less than 0.5% under "Dashboard->Capacity->Objects block
This PR is intended to fix this issue
Fixes: https://tracker.ceph.com/issues/53242
Signed-off-by: Aashish Sharma <aasharma@redhat.com>
Ernesto Puerta [Mon, 22 Nov 2021 08:14:07 +0000 (09:14 +0100)]
Merge pull request #44033 from ljflores/wip-update-email-id
mailmap: add Laura Flores
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: ljflores <NOT@FOUND>
Reviewed-by: neha-ojha <NOT@FOUND>
Samuel Just [Mon, 22 Nov 2021 06:17:04 +0000 (22:17 -0800)]
Merge pull request #43795 from myoungwon/wip-paddr-split
seastore: generalize paddr_t
Reviewed-by: Samuel Just <sjust@redhat.com>
Nizamudeen A [Thu, 18 Nov 2021 11:09:30 +0000 (16:39 +0530)]
mgr/dashboard: Predfine labels in create host form
Also retains the previously created labels by user in the form
Fixes: https://tracker.ceph.com/issues/53315
Signed-off-by: Nizamudeen A <nia@redhat.com>
Nizamudeen A [Thu, 18 Nov 2021 07:13:39 +0000 (12:43 +0530)]
mgr/dashboard: fix flaky inventory e2e test
When `inventory.getTableCount('total').should('be.eq', totalDiskCount);`
this line is executed the table was not loaded properly and hence the
getTableCount returns 0 on the first try but on second try it passes
since the table is loaded. But in orch e2es the retries are set to 0. I
am not sure if it makes sense to set it to 1. Anyway I am adapting the
test a bit to expect the count to be equal to totalDiskCount so that the
test will wait a bit.
Fixes: https://tracker.ceph.com/issues/53353
Signed-off-by: Nizamudeen A <nia@redhat.com>
Deepika Upadhyay [Mon, 22 Nov 2021 01:22:31 +0000 (06:52 +0530)]
Merge pull request #43524 from Rethan/feat-expiration-time
rbd: when trash mv, show expiration time if it's not now
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Reviewed-by: Sunny Kumar <sunkumar@redhat.com>
Deepika Upadhyay [Mon, 22 Nov 2021 01:21:51 +0000 (06:51 +0530)]
Merge pull request #43852 from hualongfeng/show_feature
tools/rbd: make rbd info display dirty-cache feature
Reviewed-by: Deepika Upadhyay <dupadhya@redhat.com>
Deepika Upadhyay [Sun, 21 Nov 2021 17:03:41 +0000 (22:33 +0530)]
Merge pull request #43907 from cybozu/rbd-correct-encoding-of-snap-protection-record-in-exporting
rbd: correct encoding of snap protection record in exporting image
Reviewed-by: Mykola Golub <mykola.golub@clyso.com>
Sage Weil [Sat, 20 Nov 2021 14:55:50 +0000 (08:55 -0600)]
.github/pull_request_template: drop teuthology reference
It is not clear what role this has relative to the needs-qa label.
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Sat, 20 Nov 2021 14:55:20 +0000 (08:55 -0600)]
.github/pull_request_template: add cleanup option
Signed-off-by: Sage Weil <sage@newdream.net>