]> git.apps.os.sepia.ceph.com Git - ceph-ci.git/log
ceph-ci.git
4 years agosrc/ceph-crash.in: remove unused frame in handler()
Sébastien Han [Mon, 28 Jun 2021 16:36:08 +0000 (18:36 +0200)]
src/ceph-crash.in: remove unused frame in handler()

frame was unused so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
4 years agosrc/ceph-crash.in: remove unused variable
Sébastien Han [Mon, 28 Jun 2021 16:35:14 +0000 (18:35 +0200)]
src/ceph-crash.in: remove unused variable

stdout was never used so let's remove it.

Signed-off-by: Sébastien Han <seb@redhat.com>
4 years agoMerge pull request #42050 from rzarzynski/wip-crimson-alienstore-fix-attrs-conv
Kefu Chai [Mon, 28 Jun 2021 12:33:38 +0000 (20:33 +0800)]
Merge pull request #42050 from rzarzynski/wip-crimson-alienstore-fix-attrs-conv

crimson/os: fix memory corruption in AlienStore::get_attrs().

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/os: fix memory corruption in AlienStore::get_attrs().
Radoslaw Zarzynski [Sun, 27 Jun 2021 21:50:37 +0000 (21:50 +0000)]
crimson/os: fix memory corruption in AlienStore::get_attrs().

`FuturizedStore` and `ObjectStore` use different memory layout for
conveying object attributes: map of `bufferlists` and map of `bptrs`
respectively. Unfortunately, `AlienStore` was trying to solve this
mismatch with just a `reinterpret_cast`.

Very likely this problem was the root cause behind the observed
crashes in `PGBackend::load_matadata` like the following one:

```
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: DEBUG 2021-06-15 09:24:19,199 [shard 0] osd - peering_event(id=412, detail=PeeringEvent(from=7 pgid=5.14 sent=49 requested=49 evt=epoch_sent: 49 epoch_requested: 49 MInfoRec from 7 info: 5.14( v 45'2 (0'0,45'2] local-lis/les=48/49 n=0 ec=44/44 lis/c=48/44 les/c/f=49/45/0 sis=48) pg_lease_ack(ruub 19.176788330s))): complete
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: Segmentation fault on shard 0.
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: Backtrace:
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  0# 0x000055C99757FFBF in /usr/bin/ceph-osd
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  1# FatalSignal::signaled(int, siginfo_t const*) in /usr/bin/ceph-osd
2021-06-15T09:25:07.511 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  2# FatalSignal::install_oneshot_signal_handler<11>()::{lambda(int, siginfo_t*, void*)#1}::_FUN(int, siginfo_t*, void*) in /usr/bin/ceph-osd
2021-06-15T09:25:07.512 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  3# 0x00007F34BB632B20 in /lib64/libpthread.so.0
2021-06-15T09:25:07.512 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  4# 0x000055C99263D4D2 in /usr/bin/ceph-osd
2021-06-15T09:25:07.512 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  5# 0x000055C992740E47 in /usr/bin/ceph-osd
2021-06-15T09:25:07.512 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  6# seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > >, seastar::noncopyable_function<crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)84> > >::_future<crimson::errorated_future_marker<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > > > (seastar::future<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >&&)>, seastar::future<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >::then_wrapped_nrvo<crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)84> > >::_future<crimson::errorated_future_marker<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > > >, seastar::noncopyable_function<crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)84> > >::_future<crimson::errorated_future_marker<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > > > (seastar::future<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >&&)> >(seastar::noncopyable_function<crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)84> > >::_future<crimson::errorated_future_marker<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > > > (seastar::future<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > >&&, seastar::noncopyable_function<crimson::errorator<crimson::unthrowable_wrapper<std::error_code const&, crimson::ec<(std::errc)84> > >::_future<crimson::errorated_future_marker<std::unique_ptr<PGBackend::loaded_object_md_t, std::default_delete<PGBackend::loaded_object_md_t> > > > (seastar::future<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >&&)>&, seastar::future_state<std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >&&)#1}, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list, std::less<void>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ceph::buffer::v15_2_0::list> > > >::run_and_dispose() in /usr/bin/ceph-osd
2021-06-15T09:25:07.512 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  7# 0x000055C99CFD195F in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  8# 0x000055C99CFD6EA0 in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]:  9# 0x000055C99D188F0B in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: 10# 0x000055C99CCE698A in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: 11# 0x000055C99CCF0AAE in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: 12# main in /usr/bin/ceph-osd
2021-06-15T09:25:07.513 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: 13# __libc_start_main in /lib64/libc.so.6
2021-06-15T09:25:07.514 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: 14# _start in /usr/bin/ceph-osd
2021-06-15T09:25:07.514 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:19 smithi100 conmon[54917]: Fault at location: 0x31dfff8000
2021-06-15T09:25:07.514 INFO:journalctl@ceph.osd.3.smithi100.stdout:Jun 15 09:24:20 smithi100 podman[55356]: 2021-06-15 09:24:20.230341885 +0000 UTC m=+0.072958807 container died a3ea2a1d0a176286b93b8f5b94458982b9038e70d09128fb55f53b92976f0c42 (image=quay.ceph.io/ceph-ci/ceph@sha256:13ae953e3f83ee011d784d6eb9126fdc692f5bb688fe7d918be61ca7a7282b3c, name=ceph-43579b90-cdba-11eb-8c13-001a4aab830c-osd.3)
```

The fix deals with the issue by wrapping the `bptrs` in `bufferlists`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoMerge pull request #41989 from zdover23/wip-doc-cephadm-serve-man-deploy-of-daemons...
Sebastian Wagner [Mon, 28 Jun 2021 09:46:34 +0000 (11:46 +0200)]
Merge pull request #41989 from zdover23/wip-doc-cephadm-serve-man-deploy-of-daemons-2021-06-24

doc/cephadm: enrich "deployment of daemons"

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
4 years agoMerge pull request #41998 from kevinzs2048/arm64-rwl-cache-optional
Kefu Chai [Sun, 27 Jun 2021 14:31:23 +0000 (22:31 +0800)]
Merge pull request #41998 from kevinzs2048/arm64-rwl-cache-optional

ceph.spec.in, debian/rules: enable rbd-rwl-cache by default only on x86_64

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #42021 from tchaikov/wip-rpm-memory-constraint
Kefu Chai [Sun, 27 Jun 2021 11:20:31 +0000 (19:20 +0800)]
Merge pull request #42021 from tchaikov/wip-rpm-memory-constraint

ceph.spec.in: increase memory per core to 3000MB on SUSE distros

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Nathan Cutler <ncutler@suse.com>
4 years agoMerge PR #41574 into master
Sage Weil [Sat, 26 Jun 2021 14:41:27 +0000 (10:41 -0400)]
Merge PR #41574 into master

* refs/pull/41574/head:
qa/tasks/vstart_runner: add LocalCluster.run
qa/tasks/cephfs/test_nfs: fiddle with sudo
mgr/nfs/export: some cleanup, minor refactoring
mgr/nfs/cluster: remove unused @cluster_setter
nfs/mgr: fix help message case
doc/cephfs/fs-nfs-export: add note about export update behavior
mgr/nfs: move user create/delete into helper
mgr/nfs: refactor _delete_user helper
mgr/nfs: refactor create_export_from_dict() helper
mgr/nfs: keep 'nfs export get' around for backward-compat
mgr/nfs: rename method
qa/tasks/cephfs/test_nfs: test new export via apply
doc/cephfs/fs-nfs-export: be consistent with cluster_id and _ vs -
mgr/nfs: addr -> client_addr for 'nfs export create ...'
mgr/nfs: fix tests
mgr/nfs: 'nfs export get' -> 'nfs export info'
mgr/nfs: binding -> pseudo_path
mgr/nfs: more revisions based on review
mgr/nfs: adjust NFSExceptoin errno arg
doc/cephfs: update 'nfs export {get,apply}' docs
mgr/nfs: merge FSExport back into ExportMgr
doc/radosgw/nfs: document mgr/nfs way to add/remove rgw exports
mgr/nfs: merge 'nfs export {update,import}' -> 'nfs export apply'
mgr/nfs: test export creation and list
mgr/nfs: test export_update (+ fixes)
mgr/nfs: test Export.validate(); several fixes
mgr/nfs: test that export <-> block+dict conversions go both ways
mgr/nfs: clean up test a bit
mgr/nfs/export: fix export validation
mgr/nfs/export: fix tests
mgr/nfs: handle option addr/client block in create_export()
mgr/nfs: allow multiple addrs for new exports
mgr/nfs: fix/finish rgw export
mgr/nfs/module: clusterid -> cluster_id
mgr/nfs/export: fix export_update_1 to type check
mgr/nfs/cluster: fix type error
mgr/nfs/export: wrap long lines
mgr/nfs: ExportMgr._delete_export only works for cephfs for now
mgr/nfs: Remove pool_ns from NFSCluster
mgr/nfs: Remove ExportMgr.rados_namespace
mgr/nfs: flake8
mgr/nfs: Add type checking
mgr/nfs: Add __eq__ method to Export
mgr/nfs: Add some compatibility to mgr/dashboard
mgr/nfs: Fix whitespace handling
mgr/nfs: Copy unit tests from mgr/dashboard
mgr/nfs: partially implement rgw export support
mgr/nfs: abstract FSAL; add RGWFSAL
mgr/nfs: refactor to merge 'update' and 'import' code
mgr/nfs: add 'nfs export import' command
mgr/nfs: refactor 'nfs export update' and export validation
mgr/nfs: fix _fetch_export to distinguish between clusters
mgr/nfs: move export ganesha conf translation into caller
mgr/nfs: name nfs cephfs client key 'nfs.{cluster_id}.{export_id}'
mgr/nfs: add --addr to 'nfs export create'
mgr/nfs: add --squash to 'nfs export create'
mgr/nfs/export_utils: include false but non-None items in config
vstart.sh: enable nfs module
mgr/cephadm: nfs: drop attr_expiration_time from top-level config
mgr/cephadm: remove Dir_Chunk = 0

Reviewed-by: Michael Fritch <mfritch@suse.com>
4 years agoMerge pull request #41937 from liewegas/mgr-crash
Kefu Chai [Sat, 26 Jun 2021 14:18:14 +0000 (22:18 +0800)]
Merge pull request #41937 from liewegas/mgr-crash

mgr: generate crash dumps for Python exceptions in mgr modules

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41946 from liewegas/fix-51294
Kefu Chai [Sat, 26 Jun 2021 14:17:30 +0000 (22:17 +0800)]
Merge pull request #41946 from liewegas/fix-51294

mgr/devicehealth: fix _get_device_metrics ValueError

Reviewed-by: Patrick Donnelly <pdonnell@redhat.com>
4 years agoqa/tasks/vstart_runner: add LocalCluster.run
Sage Weil [Fri, 25 Jun 2021 23:16:19 +0000 (19:16 -0400)]
qa/tasks/vstart_runner: add LocalCluster.run

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoqa/tasks/cephfs/test_nfs: fiddle with sudo
Sage Weil [Fri, 25 Jun 2021 19:08:03 +0000 (15:08 -0400)]
qa/tasks/cephfs/test_nfs: fiddle with sudo

- no sudo for 'ceph' commands
- explicit sudo for _sys_cmd (things like 'rados' don't need sudo!)

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/nfs/export: some cleanup, minor refactoring
Sage Weil [Wed, 23 Jun 2021 16:42:17 +0000 (12:42 -0400)]
mgr/nfs/export: some cleanup, minor refactoring

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/nfs/cluster: remove unused @cluster_setter
Sage Weil [Thu, 24 Jun 2021 20:05:14 +0000 (16:05 -0400)]
mgr/nfs/cluster: remove unused @cluster_setter

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #41977 from rzarzynski/wip-crimson-common-print-more-on-crash
Kefu Chai [Sat, 26 Jun 2021 01:08:34 +0000 (09:08 +0800)]
Merge pull request #41977 from rzarzynski/wip-crimson-common-print-more-on-crash

crimson/common: dump more on faults

Reviewed-by: Samuel Just <sjust@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agonfs/mgr: fix help message case
Sage Weil [Thu, 24 Jun 2021 16:41:18 +0000 (12:41 -0400)]
nfs/mgr: fix help message case

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agodoc/cephfs/fs-nfs-export: add note about export update behavior
Sage Weil [Wed, 23 Jun 2021 16:46:07 +0000 (12:46 -0400)]
doc/cephfs/fs-nfs-export: add note about export update behavior

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/nfs: move user create/delete into helper
Sage Weil [Tue, 22 Jun 2021 16:25:44 +0000 (12:25 -0400)]
mgr/nfs: move user create/delete into helper

- Do user create or delete via a helper
- Defer until after we have validated the Export (on create or update)
- Support updates to user_id, which is needed to keep the naming consistent
and to also support changing the bucket, since the user_id is derived
from that.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #41838 from p-se/grafana-clean-up
Ernesto Puerta [Fri, 25 Jun 2021 18:45:28 +0000 (20:45 +0200)]
Merge pull request #41838 from p-se/grafana-clean-up

monitoring: Clean up Grafana dashboards

Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: jan--f <NOT@FOUND>
Reviewed-by: p-se <NOT@FOUND>
Reviewed-by: Paul Cuzner <pcuzner@redhat.com>
4 years agoqa/suites/rados/mgr: whitelist module crash during selftest
Sage Weil [Fri, 25 Jun 2021 17:48:45 +0000 (13:48 -0400)]
qa/suites/rados/mgr: whitelist module crash during selftest

One of the selftests triggers an exception from serve().

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #41721 from aaryanporwal/telemetry-ident-fix
Ernesto Puerta [Fri, 25 Jun 2021 16:48:34 +0000 (18:48 +0200)]
Merge pull request #41721 from aaryanporwal/telemetry-ident-fix

mgr/dashboard: telemetry activate: show ident fields when checked

Reviewed-by: aaryanporwal <NOT@FOUND>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Avan Thakkar <athakkar@redhat.com>
Reviewed-by: Nizamudeen A <nia@redhat.com>
Reviewed-by: Pere Diaz Bou <pdiazbou@redhat.com>
4 years agoMerge pull request #41991 from dang/wip-dang-bucket-delete
Daniel Gryniewicz [Fri, 25 Jun 2021 16:00:37 +0000 (12:00 -0400)]
Merge pull request #41991 from dang/wip-dang-bucket-delete

RGW - Bucket Remove Op: Pass in user

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoMerge pull request #41993 from ronen-fr/wip-ronenf-50346
Neha Ojha [Fri, 25 Jun 2021 15:48:45 +0000 (08:48 -0700)]
Merge pull request #41993 from ronen-fr/wip-ronenf-50346

osd/scrub: replace a ceph_assert() with a test

Reviewed-by: Neha Ojha <nojha@redhat.com>
4 years agoMerge pull request #42024 from rzarzynski/wip-crimson-load_obc_nocpy
Kefu Chai [Fri, 25 Jun 2021 13:02:47 +0000 (21:02 +0800)]
Merge pull request #42024 from rzarzynski/wip-crimson-load_obc_nocpy

crimson/osd: don't extra copy hobject in PG::load_head_obc().

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: don't extra copy hobject in PG::load_head_obc().
Radoslaw Zarzynski [Wed, 23 Jun 2021 09:25:41 +0000 (09:25 +0000)]
crimson/osd: don't extra copy hobject in PG::load_head_obc().

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoceph.spec.in: increase memory per core to 3000MB on SUSE distros
Kefu Chai [Fri, 25 Jun 2021 05:29:23 +0000 (13:29 +0800)]
ceph.spec.in: increase memory per core to 3000MB on SUSE distros

in the KVM instance offered by OBS, we have

[  346s] + cat /proc/meminfo
[  347s] MemTotal:       10167736 kB
[  347s] MemFree:         4983964 kB
[  347s] MemAvailable:    9826800 kB
[  347s] Buffers:           85856 kB
[  347s] Cached:          4615192 kB
[  347s] SwapCached:            0 kB
...
[  347s] SwapTotal:       2097148 kB

and its number of hardware threads is

[  346s] ++ /usr/bin/getconf _NPROCESSORS_ONLN
[  346s] + _threads=8

so ($MemTotal+$SwapTotal)/1024/2600 = 4.6, which is less
than the # of threads, so "4" was used for the number of jobs.

but per our recent observation in
38be14bc0fa32be6877dea08ebd35495d39e464f, some compiling jobs could
take up to 3GB. in the OOM failure in OBS, we had

[24915s] [24848.843594] Out of memory: Killed process 16894 (cc1plus) total-vm:4293756kB, anon-rss:2970012kB, file-rss:0kB, shmem-rss:0kB, UID:399 pgtables:8324kB oom_score_adj:0

where 4GiB memory was allocated, in which 3GiB was mapped into
memory. this matches with our findings.

in this change, the memory per core is bumped up to 3000MB
in hope to address the OOB. the downside of this change is
that it would take even longer to finish the build if the
building host is limited in memory.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #41615 from tchaikov/wip-avl-alloc-ff
Kefu Chai [Fri, 25 Jun 2021 09:01:11 +0000 (17:01 +0800)]
Merge pull request #41615 from tchaikov/wip-avl-alloc-ff

os/bluestore/AvlAllocator: introduce bluestore_avl_alloc_ff_max_* options

Reviewed-by: Igor Fedotov <ifedotov@suse,com>
Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
4 years agoMerge pull request #38939 from ronen-fr/wip-ronenf-scrub-blocked
Kefu Chai [Fri, 25 Jun 2021 06:57:31 +0000 (14:57 +0800)]
Merge pull request #38939 from ronen-fr/wip-ronenf-scrub-blocked

osd: issue a warning if the scrubber blocks for too long on an object

Reviewed-by: David Zafman <dzafman@redhat.com>
4 years agoMerge pull request #40850 from varshar16/wip-vstart-support-cephadm-rgw
Kefu Chai [Fri, 25 Jun 2021 06:51:25 +0000 (14:51 +0800)]
Merge pull request #40850 from varshar16/wip-vstart-support-cephadm-rgw

src/vstart: deploy rgw service with cephadm and create rgw user with system flag

Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge pull request #42020 from athanatos/sjust/wip-cache-assert
Samuel Just [Fri, 25 Jun 2021 06:08:02 +0000 (23:08 -0700)]
Merge pull request #42020 from athanatos/sjust/wip-cache-assert

crimson/os/seastore: transaction conflict handling improvements

Reviewed-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agoMerge pull request #42003 from cyx1231st/wip-seastore-fix-onode-tree
Kefu Chai [Fri, 25 Jun 2021 04:52:23 +0000 (12:52 +0800)]
Merge pull request #42003 from cyx1231st/wip-seastore-fix-onode-tree

crimson/onode-staged-tree: fix ref-counter assert failures

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/common: dump entire siginfo on segmentation fault.
Radoslaw Zarzynski [Tue, 22 Jun 2021 14:24:22 +0000 (14:24 +0000)]
crimson/common: dump entire siginfo on segmentation fault.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson/common: FatalSignal::signaled() takes siginfo by a reference.
Radoslaw Zarzynski [Tue, 22 Jun 2021 14:23:02 +0000 (14:23 +0000)]
crimson/common: FatalSignal::signaled() takes siginfo by a reference.

There is no point in having the distincted `nullptr` value.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agocrimson/common: dump /proc/self/maps on crash.
Radoslaw Zarzynski [Tue, 22 Jun 2021 14:15:40 +0000 (14:15 +0000)]
crimson/common: dump /proc/self/maps on crash.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoceph.spec.in, debian/rules: Set rbd-rwl-cache optional on arm64 and ppc64le
Kevin Zhao [Thu, 24 Jun 2021 00:00:03 +0000 (08:00 +0800)]
ceph.spec.in, debian/rules: Set rbd-rwl-cache optional on arm64 and ppc64le

set rwl cache option on arm64 and ppc64le as PMDK is not well supported.
Currently, only 64-bit Linux* and Windows* on x86 are supported PMDK

Reference:
1. Experimental support on Arm64, but lacking of librpmem:
See: https://github.com/pmem/pmdk#experimental-support-for-64-bit-arm
2. No RPM for PMDK on Arm64:
See: https://bugzilla.redhat.com/show_bug.cgi?id=1340635
3. > Does PMDK support ARM64*?
   > Currently only 64-bit Linux* and Windows* on x86 are supported.
See: https://software.intel.com/content/www/us/en/develop/articles/persistent-memory-faq.html
4. Make check fail on Arm64
See: https://github.com/pmem/pmdk/issues/5255

Fixes: https://tracker.ceph.com/issues/51339
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
4 years agoMerge pull request #41889 from ChenFanTony/mkfs_wait_complete
Kefu Chai [Fri, 25 Jun 2021 02:59:55 +0000 (10:59 +0800)]
Merge pull request #41889 from ChenFanTony/mkfs_wait_complete

osd/OSD: mkfs need wait for transcation completely finish

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/onode-staged-tree: reset root node after lookup
Yingxin Cheng [Thu, 24 Jun 2021 07:50:18 +0000 (15:50 +0800)]
crimson/onode-staged-tree: reset root node after lookup

Otherwise there could be unexpected references that will break the
asserts when remove nodes during insert/delete.

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agocrimson/onode-staged-tree: add missing mutable keyword
Yingxin Cheng [Thu, 24 Jun 2021 07:49:23 +0000 (15:49 +0800)]
crimson/onode-staged-tree: add missing mutable keyword

Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
4 years agoMerge pull request #42004 from tchaikov/wip-crimson-osd-fsm
Kefu Chai [Fri, 25 Jun 2021 00:27:41 +0000 (08:27 +0800)]
Merge pull request #42004 from tchaikov/wip-crimson-osd-fsm

crimson/osd: shutdown if osdmap forces us to do so

Reviewed-by: Chunmei Liu <chunmei.liu@intel.com>
Reviewed-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
4 years agoseastore/.../staged_fltree/node: check for conflict in Node::load
Samuel Just [Thu, 24 Jun 2021 23:25:54 +0000 (16:25 -0700)]
seastore/.../staged_fltree/node: check for conflict in Node::load

This will be unnecessary once converted to interruptible_future.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/lba_manager/btree/lba_btree_node_impl: add debugging
Samuel Just [Thu, 24 Jun 2021 23:22:43 +0000 (16:22 -0700)]
crimson/os/seastore/lba_manager/btree/lba_btree_node_impl: add debugging

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agoseastore/.../node_extent_manager/seastore: detect transaction conflicts in read_extent
Samuel Just [Thu, 24 Jun 2021 23:28:10 +0000 (16:28 -0700)]
seastore/.../node_extent_manager/seastore: detect transaction conflicts in read_extent

This won't be necessary once converted to interruptible_future.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/cache: mark conflict in get_extent
Samuel Just [Thu, 24 Jun 2021 22:24:09 +0000 (15:24 -0700)]
crimson/os/seastore/cache: mark conflict in get_extent

After wait_io, the extent may have been mutated again, so it may be
invalid.  Check in the caller and mark the transaction conflicted as
needed.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/transasction: expose is_conflicted
Samuel Just [Thu, 24 Jun 2021 23:27:34 +0000 (16:27 -0700)]
crimson/os/seastore/transasction: expose is_conflicted

Useful for components not yet converted to use interruptible_future.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agoMerge pull request #41963 from athanatos/sjust/wip-interruptible-tm
Samuel Just [Thu, 24 Jun 2021 20:19:47 +0000 (13:19 -0700)]
Merge pull request #41963 from athanatos/sjust/wip-interruptible-tm

crimson/os/seastore: refactor transaction_manager and below to use interruptible_future

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agomgr/devicehealth: fix _get_device_metrics ValueError
Sage Weil [Sun, 20 Jun 2021 22:49:27 +0000 (17:49 -0500)]
mgr/devicehealth: fix _get_device_metrics ValueError

This appears to have broken with abd35d47696c208990355395d48c1c1e261de95c

The SQL OR doesn't work because in the case that sample is passed,
_t2epoch(min_sample) is 0 and the 0 <= time portion of the expression
is always true.

Fixes: https://tracker.ceph.com/issues/51294
Signed-off-by: Sage Weil <sage@newdream.net>
4 years agotest/crimson/test_interruptible_future: disable handle_error
Samuel Just [Thu, 24 Jun 2021 17:08:34 +0000 (17:08 +0000)]
test/crimson/test_interruptible_future: disable handle_error

Seems to cause a linker hang with gcc-9 in bionic.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/transaction_manager: pass t by ref to submit_transaction
Samuel Just [Sat, 19 Jun 2021 07:43:27 +0000 (00:43 -0700)]
crimson/os/seastore/transaction_manager: pass t by ref to submit_transaction

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agoMerge pull request #39934 from Jeegn-Chen/wip-tracker-49128
Casey Bodley [Thu, 24 Jun 2021 16:17:53 +0000 (12:17 -0400)]
Merge pull request #39934 from Jeegn-Chen/wip-tracker-49128

rgw: write meta of a MP part to a correct pool

Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoMerge pull request #41739 from liewegas/rgw-realm-metadata
Casey Bodley [Thu, 24 Jun 2021 16:16:19 +0000 (12:16 -0400)]
Merge pull request #41739 from liewegas/rgw-realm-metadata

radosgw: include realm_{id,name} in service map

Reviewed-by: Ernesto Puerta <epuertat@redhat.com>
Reviewed-by: Alfonso Martínez <almartin@redhat.com>
Reviewed-by: Casey Bodley <cbodley@redhat.com>
4 years agoRGW - Bucket Remove Op: Pass in user
Daniel Gryniewicz [Wed, 23 Jun 2021 15:31:22 +0000 (11:31 -0400)]
RGW - Bucket Remove Op: Pass in user

When a bucket remove op is called on the non-master zone, the op is
forwarded to the master zone, but this needs a user, so pass the user
in.

Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
4 years agoMerge pull request #41994 from anthonyeleven/anthonyeleven/adjust-rados-operations...
zdover23 [Thu, 24 Jun 2021 13:51:30 +0000 (23:51 +1000)]
Merge pull request #41994 from anthonyeleven/anthonyeleven/adjust-rados-operations-pools

doc/rados/operations: Update pools.rst

Reviewed-by: Zac Dover <zac.dover@gmail.com>
4 years agoMerge pull request #42005 from trociny/wip-51342
Ilya Dryomov [Thu, 24 Jun 2021 12:48:13 +0000 (14:48 +0200)]
Merge pull request #42005 from trociny/wip-51342

test/librbd: use really invalid domain

Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
4 years agoceph.spec.in: enable --with-rbd_ssd_cache by default
Kefu Chai [Thu, 24 Jun 2021 11:52:51 +0000 (19:52 +0800)]
ceph.spec.in: enable --with-rbd_ssd_cache by default

unlike rbd_rwl_cache, rbd_ssd_cache does not depend on pmdk (libpmem),
so let's enable it on all supported architecture and rpm based distros.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agotest/librbd: use really invalid domain
Mykola Golub [Thu, 24 Jun 2021 10:23:21 +0000 (11:23 +0100)]
test/librbd: use really invalid domain

in TestMockMigrationHttpClient.OpenResolveFail

Fixes: https://tracker.ceph.com/issues/51342
Signed-off-by: Mykola Golub <mgolub@suse.com>
4 years agoMerge pull request #41828 from tchaikov/wip-btree-alloc
Kefu Chai [Thu, 24 Jun 2021 11:10:22 +0000 (19:10 +0800)]
Merge pull request #41828 from tchaikov/wip-btree-alloc

os/bluestore: add BtreeAllocator

Reviewed-by: Adam Kupczyk <akupczyk@redhat.com>
4 years agocrimson/osd: document fsm of crimson osd
Kefu Chai [Thu, 24 Jun 2021 07:57:53 +0000 (15:57 +0800)]
crimson/osd: document fsm of crimson osd

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: mark more OSD methods private
Kefu Chai [Thu, 24 Jun 2021 06:34:41 +0000 (14:34 +0800)]
crimson/osd: mark more OSD methods private

they are internal helpers, not part of the public interface of the OSD
class.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: shutdown if osdmap forces us to do so
Kefu Chai [Thu, 24 Jun 2021 06:26:07 +0000 (14:26 +0800)]
crimson/osd: shutdown if osdmap forces us to do so

mirror the change introduced by 5dbae13ce0f5b0104ab43e0ccfe94f832d0e1268
in classic osd.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agocrimson/osd: use discard_result() in stop()
Kefu Chai [Thu, 24 Jun 2021 06:16:07 +0000 (14:16 +0800)]
crimson/osd: use discard_result() in stop()

we don't care about the result of shutdown() of messengers, when
shutting down the daemon actually, and we don't handle the failures.

Signed-off-by: Kefu Chai <kchai@redhat.com>
4 years agoMerge PR #41935 into master
Patrick Donnelly [Wed, 23 Jun 2021 20:24:58 +0000 (13:24 -0700)]
Merge PR #41935 into master

* refs/pull/41935/head:
mds: avoid journaling overhead for ceph.dir.subvolume for no-op case

Reviewed-by: Venky Shankar <vshankar@redhat.com>
4 years agocrimson/os/seastore: convert transaction_manager internally to use interruptible_future
Samuel Just [Sat, 19 Jun 2021 07:40:34 +0000 (00:40 -0700)]
crimson/os/seastore: convert transaction_manager internally to use interruptible_future

Consumers of TransactionManager use wrapper classes InterruptedTransactionManager
and InterruptedTMRef for now until we convert them.

Also converts users of InterruptedCache etc and removes.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agotest/crimson/seastore/test_seastore_cache: use cache directly
Samuel Just [Mon, 14 Jun 2021 23:25:14 +0000 (23:25 +0000)]
test/crimson/seastore/test_seastore_cache: use cache directly

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/lba_manager/btree: convert to use interruptible_future
Samuel Just [Fri, 11 Jun 2021 00:21:26 +0000 (17:21 -0700)]
crimson/os/seastore/lba_manager/btree: convert to use interruptible_future

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/cache: convert to use interruptible future
Samuel Just [Tue, 22 Jun 2021 00:10:29 +0000 (17:10 -0700)]
crimson/os/seastore/cache: convert to use interruptible future

Introduces InterruptedCache wrapper for now for components not yet
converted.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/transaction: introduce TransactionConflictCondition interruptor
Samuel Just [Thu, 3 Jun 2021 21:51:03 +0000 (14:51 -0700)]
crimson/os/seastore/transaction: introduce TransactionConflictCondition interruptor

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/cache.h: remove unused get_extents
Samuel Just [Thu, 3 Jun 2021 21:43:37 +0000 (14:43 -0700)]
crimson/os/seastore/cache.h: remove unused get_extents

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore: invalidate transaction referencing invalid extents
Samuel Just [Wed, 12 May 2021 09:04:16 +0000 (09:04 +0000)]
crimson/os/seastore: invalidate transaction referencing invalid extents

Modify read_set to retain a reverse mapping from extents back to
transactions and use it to update Transaction::conflicted upon
invalidation.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agotest/crimson/test_interruptible_future: add tests for errorated behavior
Samuel Just [Mon, 21 Jun 2021 23:57:48 +0000 (16:57 -0700)]
test/crimson/test_interruptible_future: add tests for errorated behavior

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add interruptor::base_ertr
Samuel Just [Tue, 15 Jun 2021 00:24:41 +0000 (17:24 -0700)]
crimson/common/interruptible_future: add interruptor::base_ertr

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add safe_then_interruptible for multiple error...
Samuel Just [Fri, 11 Jun 2021 00:03:37 +0000 (17:03 -0700)]
crimson/common/interruptible_future: add safe_then_interruptible for multiple error handlers

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: refactor handle_interruption
Samuel Just [Fri, 18 Jun 2021 06:19:16 +0000 (23:19 -0700)]
crimson/common/interruptible_future: refactor handle_interruption

handle_interruption can't really be validly used outside of
with_interruption_cond.  Make private, and adjust is_interruption
to not require an instance.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: remove enable/disable_interruption
Samuel Just [Thu, 17 Jun 2021 21:06:41 +0000 (14:06 -0700)]
crimson/common/interruptible_future: remove enable/disable_interruption

with_interruption_cond needs to check the condition on the way in.
call_with_interruption_impl already has the required machinery, so
let's just use it and dispense with the other helpers.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: remove unnecessary make_ready_future template
Samuel Just [Wed, 9 Jun 2021 23:09:09 +0000 (16:09 -0700)]
crimson/common/interruptible_future: remove unnecessary make_ready_future template

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add ready|exception_future_marker constructors
Samuel Just [Mon, 7 Jun 2021 20:20:46 +0000 (13:20 -0700)]
crimson/common/interruptible_future: add ready|exception_future_marker constructors

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: introduce future<> helper to interruptor
Samuel Just [Thu, 3 Jun 2021 21:50:22 +0000 (14:50 -0700)]
crimson/common/interruptible_future: introduce future<> helper to interruptor

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: introduce si_then as shorthand for safe_then_int...
Samuel Just [Thu, 3 Jun 2021 21:49:56 +0000 (14:49 -0700)]
crimson/common/interruptible_future: introduce si_then as shorthand for safe_then_interruptible

safe_then_interruptible is too long for common use within seastore.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: introduce with_interruption_to_error
Samuel Just [Wed, 2 Jun 2021 03:06:44 +0000 (20:06 -0700)]
crimson/common/interruptible_future: introduce with_interruption_to_error

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add handle_interruption
Samuel Just [Thu, 10 Jun 2021 00:34:21 +0000 (17:34 -0700)]
crimson/common/interruptible_future: add handle_interruption

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add futurize::invoke
Samuel Just [Thu, 10 Jun 2021 00:32:44 +0000 (17:32 -0700)]
crimson/common/interruptible_future: add futurize::invoke

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/interruptible_future: add common errorator forwards
Samuel Just [Wed, 2 Jun 2021 03:05:12 +0000 (20:05 -0700)]
crimson/common/interruptible_future: add common errorator forwards

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/errorator: add futurize::apply
Samuel Just [Wed, 2 Jun 2021 02:52:27 +0000 (19:52 -0700)]
crimson/common/errorator: add futurize::apply

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocommon/interruptible_future: use errorated future as core_type, fix constructor
Samuel Just [Thu, 10 Jun 2021 00:07:33 +0000 (17:07 -0700)]
common/interruptible_future: use errorated future as core_type, fix constructor

No reason really to remember the underlying seastar::future type, we should
only be interacting with the errorated future wrapped type.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agotest/crimson/test_interruptible_future: using namespace crimson
Samuel Just [Wed, 2 Jun 2021 03:03:35 +0000 (20:03 -0700)]
test/crimson/test_interruptible_future: using namespace crimson

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/common/fixed_kv_node_layout: add reference type for do_for_each implementations
Samuel Just [Mon, 7 Jun 2021 20:14:45 +0000 (13:14 -0700)]
crimson/common/fixed_kv_node_layout: add reference type for do_for_each implementations

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/cache: fix typo in comment
Samuel Just [Tue, 22 Jun 2021 00:10:23 +0000 (17:10 -0700)]
crimson/os/seastore/cache: fix typo in comment

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/cache: rename retire_extent_addr for addr overload
Samuel Just [Thu, 3 Jun 2021 21:51:43 +0000 (14:51 -0700)]
crimson/os/seastore/cache: rename retire_extent_addr for addr overload

Makes InterruptibleCache bit in the later patch simpler, and is somewhat
clearer.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agocrimson/os/seastore/lba_manager: make complete_transaction void
Samuel Just [Sat, 19 Jun 2021 09:05:44 +0000 (02:05 -0700)]
crimson/os/seastore/lba_manager: make complete_transaction void

This really can't result in mutations (the transaction already committed!)
and presently doesn't require any IO at all.  Just make it void.

Signed-off-by: Samuel Just <sjust@redhat.com>
4 years agodoc/rados/operations: Updates pools.rst
Anthony D'Atri [Wed, 23 Jun 2021 17:25:13 +0000 (10:25 -0700)]
doc/rados/operations: Updates pools.rst

Add clarity, change example PG counts to a power of two.

Signed-off-by: Anthony D'Atri <anthony.datri@gmail.com>
4 years agoosd/scrub: replace a ceph_assert() with a test
Ronen Friedman [Wed, 23 Jun 2021 17:02:28 +0000 (20:02 +0300)]
osd/scrub: replace a ceph_assert() with a test

We are using two distinct conditions to decide whether a candidate PG is already being scrubbed. The OSD checks pgs_scrub_active(), while the PG asserts on the value of PG_STATE_FLAG.
There is a time window when PG_STATE_FLAG is set but is_scrub_active() wasn't yet set. is_reserving() covers most of that period, but the ceph_assert is just before the is_reserving check.

fixes: https://tracker.ceph.com/issues/50346
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
4 years agomgr/telemetry: redact python crash dump in telemetry
Sage Weil [Sat, 19 Jun 2021 16:56:18 +0000 (12:56 -0400)]
mgr/telemetry: redact python crash dump in telemetry

Include the exception value in teh crash dump, but redact it in telemetry.
That way the operator can see it (it's useful info!) but we don't risk
sharing identifying data via telemetry.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/telemetry: fix telemetry crash integration
Sage Weil [Sat, 19 Jun 2021 16:54:48 +0000 (12:54 -0400)]
mgr/telemetry: fix telemetry crash integration

Broken by 8c009e278aec83ea6e12f28bf0c3351204d2efa5

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr/crash: separate RECENT_MGR_MODULE_CRASH error for mgr module crashes
Sage Weil [Sat, 19 Jun 2021 16:21:47 +0000 (12:21 -0400)]
mgr/crash: separate RECENT_MGR_MODULE_CRASH error for mgr module crashes

Generate a different warning for crashes in mgr module python code, as
they do not mean that the entire mgr daemon crashed.  Document.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agomgr: generate crash dump for python exceptions
Sage Weil [Fri, 18 Jun 2021 21:02:40 +0000 (17:02 -0400)]
mgr: generate crash dump for python exceptions

Extend handle_pyerror() to generate a crash dump.  Pass some additional
context through from the callers (including the ability to not generate
a crash dump in the CLI handler case).

Extra crash dump fields look like so:

    "backtrace": [
        "  File \"/home/sage/src/ceph/src/pybind/mgr/balancer/module.py\", line 652, in serve\n    self.ifail()",
        "  File \"/home/sage/src/ceph/src/pybind/mgr/balancer/module.py\", line 648, in ifail\n    raise RuntimeError('test')",
    ],
    "mgr_module": "balancer",
    "mgr_module_caller": "PyModuleRunner::serve",
    "mgr_python_exception": "RuntimeError",

Notably, the backtrace deliberately excludes the 'value' of the exception,
as that may leak identifying information about the system.  Instead, we
only include the exception *type* and the portion of the traceback that
identifies the call path (where in the code we crashed).

Also note: a side-effect of this change is that module exceptions will
trigger cluster health warnings about daemon crashes.

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agocommon/BackTrace: refactor into Clib and Py implementations
Sage Weil [Sat, 19 Jun 2021 16:00:54 +0000 (12:00 -0400)]
common/BackTrace: refactor into Clib and Py implementations

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agocommon/BackTrace: accept list of strings to ctor
Sage Weil [Fri, 18 Jun 2021 20:58:32 +0000 (16:58 -0400)]
common/BackTrace: accept list of strings to ctor

This may seem a bit backwards: we take nice C++ list<string> and do the
C dance.  It's a bit defensive: this class is used in the segv handler
(in the backtrace() and backtrace_symbol() path), so we want to minimize
the work we do on the heap in that case.  (For the list<string> path,
we can do whatever we like.)

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoglobal/signal_handler: expose function to generate crash dump
Sage Weil [Fri, 18 Jun 2021 20:58:58 +0000 (16:58 -0400)]
global/signal_handler: expose function to generate crash dump

Signed-off-by: Sage Weil <sage@newdream.net>
4 years agoMerge pull request #41990 from liewegas/fix-51292
Kefu Chai [Wed, 23 Jun 2021 16:58:52 +0000 (00:58 +0800)]
Merge pull request #41990 from liewegas/fix-51292

qa/suites/rados/dashboard: fix e2e test

Reviewed-by: Kefu Chai <kchai@redhat.com>
4 years agoqa/suites/rados/dashboard: fix e2e test
Sage Weil [Wed, 23 Jun 2021 14:49:28 +0000 (09:49 -0500)]
qa/suites/rados/dashboard: fix e2e test

Move roles into task yaml.  Rename e2e.

Fixes: https://tracker.ceph.com/issues/51292
Signed-off-by: Sage Weil <sage@newdream.net>
4 years agodoc/cephadm: enrich "deployment of daemons"
Zac Dover [Wed, 23 Jun 2021 14:32:06 +0000 (00:32 +1000)]
doc/cephadm: enrich "deployment of daemons"

service-management.rst contained a section called
"Deployment of Daemons" that needed a rewrite by
a native English speaker.

Signed-off-by: Zac Dover <zac.dover@gmail.com>