Alfonso Martínez [Wed, 19 Feb 2020 07:39:55 +0000 (08:39 +0100)]
mgr/dashboard: coverage venv python version same as mgr
As https://github.com/ceph/ceph/pull/31525 is merged,
coverage dep. in run-backend-api-tests.sh has to be installed in venv
with the same python version as ceph-mgr.
Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Sebastian Wagner [Wed, 12 Feb 2020 15:21:05 +0000 (16:21 +0100)]
mgr/orchestrator: Use CLICommand, except it's global variable
`CLICommand.COMMANDS` is a global varialbe that prevents
anyone from importing other modules, as the `COMMANS` are then
merged together. Let's use a meta class instead of a global variable.
Signed-off-by: Sebastian Wagner <sebastian.wagner@suse.com>
Yingxin Cheng [Mon, 10 Feb 2020 09:00:31 +0000 (17:00 +0800)]
crimson/net: remove duplicated error codes and conditions
The duplicated error codes and conditions were originally introduced to
match connection errors with both system category (thrown by seastar)
and generic category (thrown by standard library). Since error_code
with system category can be matched by error_condition with generic
category (see std::errc and
system_error_category::default_error_condition(int)), our duplicated
counterparts are not needed actually.
Matthew Oliver [Fri, 10 Jan 2020 03:17:11 +0000 (03:17 +0000)]
rgw: make radosgw-admin user create and modify distinct
Currently if you run 'radosgw-admin user create ..' when the user
already exists and you happen to specify, at least, '--uid' and
'--display-name' that match the existing user, radowgw-admin will
actaully go modify the existing user.
This behaviour is a little confusing, hence the bug this patch is
fixing. This patch instead simplifies the tool to make
'create' create and 'modify' modify.
Meaning when you go 'create' a user that already exists, you'll get an
error, as expected. If you want to modify a user, you actually have to
use 'modify'.
For exapmle, now:
$ radosgw-admin user create --uid="test-user" --display-name="test user"
could not create user: unable to create user, user: test-user exists
Signed-off-by: Matthew Oliver <moliver@suse.com> Fixes: https://tracker.ceph.com/issues/38619
Kefu Chai [Sun, 16 Feb 2020 11:05:09 +0000 (19:05 +0800)]
crimson/admin: no need to check for '\n'
as we don't need to mimic the behavior of classic OSD, what we need to
to fulfill the needs of ceph cli. see `admin_socket()` in
`src/pybind/ceph_daemon.py`, which sends a `\0` to indicate the end of a
command.
Kefu Chai [Sun, 16 Feb 2020 10:03:24 +0000 (18:03 +0800)]
crimson/asok: disconnect client when shutdown
track the established connection as well, please note, the current asok
implementation only allows a single connection at the same time, even
though unix domain socket allows multiple concurrent clients. so there
is no need to track multiple clients at this moment.
Kefu Chai [Sun, 16 Feb 2020 08:40:04 +0000 (16:40 +0800)]
crimson/asok: do not assume the order of param eval
* do not assume the order of parameter evaluation, before this change,
we have `do_with(cn.input(), cn.output(), std::move(cn) ...)`, see
https://en.cppreference.com/w/cpp/language/eval_order,
> side effects of the initialization of every parameter are
> indeterminately sequenced with respect to value computations and side
> effects of any other parameter.
we cannot move `cn` out and then call its member functions. so
introduce a struct for capturing its input and output.
* move `do_until_gate()` into `start()`, no need to check if
gate is stopped in `safe_action`, as `sestar::do_until()` will do
this for us.
Kefu Chai [Sun, 16 Feb 2020 02:03:36 +0000 (10:03 +0800)]
crimson: refactor asok command
* do not define another iterator type, use `map::const_iterator`
directly
* do not register hooks/commands with server block, register them
one by one, much simpler this way.
* encapsulate the hook metadata in `AdminSocketHook`, so each
`AdminSocketHook` instance is self-contained in the sense that
we don't need to use an extra type for keeping track of them.
Sage Weil [Sat, 15 Feb 2020 17:40:08 +0000 (11:40 -0600)]
cephadm: separate out require files in config-json
- Put files in a subsection of the config-json.
- Also, consolidate the sanity checks into one place (command_deploy)
instead of duplicating them in create_daemon_dirs.
Paul Cuzner [Wed, 29 Jan 2020 03:10:37 +0000 (16:10 +1300)]
cephadm: add alertmanager deployment feature
Deploy now accepts a daemon_type of alertmanager. Since alertmanager
is a cluster aware service, the monitoring metadata has been updated to
allow a daemon to use multiple ports. In addition, when config_json is
received, any 'key' prefixed by '_' is skipped when creating files in the
daemons etc directory. Keys that use the '_' prefix hold config data that
can be used elsewhere. In the case of the alertmanager a _peers parameter
is required which is used to add --cluster.peer=<ip>:<port> to the
container command to form the alertmanager cluster
Samuel Just [Fri, 17 Jan 2020 21:04:30 +0000 (13:04 -0800)]
crimson/common/errorator: restrict all_same_way to valid types
As with pass_further/discard_all, we don't want the returned handler
to work on types outside of the errorator at all. Otherwise, the
handler will transparently apply to any error.
Samuel Just [Fri, 17 Jan 2020 20:33:42 +0000 (12:33 -0800)]
crimson/common/errorator: fix errorator::pass_further and discard_all
Previously, both of these were invocable on all errors, but would
static_assert on invalid ones. What we actually want is for them
to only be invocable on valid errors. That way, we can do, for
instance:
}).handle_error(
roll_journal_segment_ertr::pass_further{},
SegmentManager::open_ertr::all_same_way([this](auto &&e) {
logger().error(
"error {} in close segment {}",
e,
current_journal_segment_id);
ceph_assert(0 == "error in close");
return;
})
to explicitely propogate any errors in roll_journal_segment_ertr
while asserting on anything else.
"If multiple threads of execution access the same std::shared_ptr
object without synchronization and any of those accesses uses
a non-const member function of shared_ptr then a data race will
occur (...)"
One of the coredumps showed the `shared_ptr`-typed `OSD::osdmap`
with healthy looking content but damaged control block:
```
[Current thread is 1 (Thread 0x7f7dcaf73700 (LWP 205295))]
(gdb) bt
#0 0x0000559cb81c3ea0 in ?? ()
#1 0x0000559c97675b27 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
#3 0x0000559c975ef8aa in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#4 std::__shared_ptr<OSDMap const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1167
#5 std::shared_ptr<OSDMap const>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/8/bits/shared_ptr.h:103
#6 OSD::create_context (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9053
#7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
#8 0x0000559c97886db6 in ceph::osd::scheduler::PGPeeringItem::run (this=<optimized out>, osd=<optimized out>, sdata=<optimized out>, pg=..., handle=...) at /usr/include/c++/8/ext/atomicity.h:96
#9 0x0000559c9764862f in ceph::osd::scheduler::OpSchedulerItem::run (handle=..., pg=..., sdata=<optimized out>, osd=<optimized out>, this=0x7f7dcaf703f0) at /usr/include/c++/8/bits/unique_ptr.h:342
#10 OSD::ShardedOpWQ::_process (this=<optimized out>, thread_index=<optimized out>, hb=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:10677
#11 0x0000559c97c76094 in ShardedThreadPool::shardedthreadpool_worker (this=0x559ca22aca28, thread_index=14) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.cc:311
#12 0x0000559c97c78cf4 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/common/WorkQueue.h:706
#13 0x00007f7df17852de in start_thread () from /lib64/libpthread.so.0
#14 0x00007f7df052f133 in __libc_ifunc_impl_list () from /lib64/libc.so.6
#15 0x0000000000000000 in ?? ()
(gdb) frame 7
#7 0x0000559c97655571 in OSD::dequeue_peering_evt (this=0x559ca22ac000, sdata=0x559ca2ef2900, pg=0x559cb4aa3400, evt=std::shared_ptr<PGPeeringEvent> (use count 2, weak count 0) = {...}, handle=...)
at /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc:9665
9665 in /usr/src/debug/ceph-15.0.0-10071.g5b5a3a3.el8.x86_64/src/osd/OSD.cc
(gdb) print osdmap
$24 = std::shared_ptr<const OSDMap> (expired, weak count 0) = {get() = 0x559cba028000}
(gdb) print *osdmap
# pretty sane OSDMap
(gdb) print sizeof(osdmap)
$26 = 16
(gdb) x/2a &osdmap
0x559ca22acef0: 0x559cba028000 0x559cba0ec900
(gdb) frame 2
#2 std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x559cba0ec900) at /usr/include/c++/8/bits/shared_ptr_base.h:148
148 /usr/include/c++/8/bits/shared_ptr_base.h: No such file or directory.
(gdb) disassemble
Dump of assembler code for function std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release():
...
0x0000559c97675b1e <+62>: mov (%rdi),%rax
0x0000559c97675b21 <+65>: mov %rdi,%rbx
0x0000559c97675b24 <+68>: callq *0x10(%rax)
=> 0x0000559c97675b27 <+71>: test %rbp,%rbp
...
End of assembler dump.
(gdb) info registers rdi rbx rax
rdi 0x559cba0ec900 94131624790272
rbx 0x559cba0ec900 94131624790272
rax 0x559cba0ec8a0 94131624790176
(gdb) x/a 0x559cba0ec8a0 + 0x10
0x559cba0ec8b0: 0x559cb81c3ea0
(gdb) bt
#0 0x0000559cb81c3ea0 in ?? ()
...
(gdb) p $_siginfo._sifields._sigfault.si_addr
$27 = (void *) 0x559cb81c3ea0
```
Helgrind seems to agree:
```
==00:00:02:54.519 510301== Possible data race during write of size 8 at 0xF123930 by thread #90
==00:00:02:54.519 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
==00:00:02:54.519 510301== at 0x7218DD: operator= (shared_ptr_base.h:1078)
==00:00:02:54.519 510301== by 0x7218DD: operator= (shared_ptr.h:103)
==00:00:02:54.519 510301== by 0x7218DD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
==00:00:02:54.519 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
==00:00:02:54.519 510301== by 0x72A06C: Context::complete(int) (Context.h:77)
==00:00:02:54.519 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
==00:00:02:54.519 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
==00:00:02:54.519 510301==
==00:00:02:54.519 510301== This conflicts with a previous read of size 8 by thread #117
==00:00:02:54.519 510301== Locks held: 1, at address 0x2123E9A0
==00:00:02:54.519 510301== at 0x6B5842: __shared_ptr (shared_ptr_base.h:1165)
==00:00:02:54.519 510301== by 0x6B5842: shared_ptr (shared_ptr.h:129)
==00:00:02:54.519 510301== by 0x6B5842: get_osdmap (OSD.h:1700)
==00:00:02:54.519 510301== by 0x6B5842: OSD::create_context() (OSD.cc:9053)
==00:00:02:54.519 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
==00:00:02:54.519 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
==00:00:02:54.519 510301== by 0x70E62E: run (OpSchedulerItem.h:148)
==00:00:02:54.519 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
==00:00:02:54.519 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
==00:00:02:54.519 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
==00:00:02:54.519 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
==00:00:02:54.519 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
==00:00:02:54.519 510301== Address 0xf123930 is 3,824 bytes inside a block of size 10,296 alloc'd
==00:00:02:54.519 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
==00:00:02:54.519 510301== by 0x66F766: main (ceph_osd.cc:688)
==00:00:02:54.519 510301== Block was alloc'd by thread #1
```
Actually there is plenty of similar issues reported like:
```
==00:00:05:04.903 510301== Possible data race during read of size 8 at 0x1E3E0588 by thread #119
==00:00:05:04.903 510301== Locks held: 1, at address 0x1EAD41D0
==00:00:05:04.903 510301== at 0x753165: clear (hashtable.h:2051)
==00:00:05:04.903 510301== by 0x753165: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t>
>, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__deta
il::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102)
==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100)
==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr
_base.h:471)
==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155)
==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_count (shared_ptr_base.h:728)
==00:00:05:04.903 510301== by 0x6B58A9: ~__shared_ptr (shared_ptr_base.h:1167)
==00:00:05:04.903 510301== by 0x6B58A9: ~shared_ptr (shared_ptr.h:103)
==00:00:05:04.903 510301== by 0x6B58A9: OSD::create_context() (OSD.cc:9053)
==00:00:05:04.903 510301== by 0x71B570: OSD::dequeue_peering_evt(OSDShard*, PG*, std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&) (OSD.cc:9665)
==00:00:05:04.903 510301== by 0x71B997: OSD::dequeue_delete(OSDShard*, PG*, unsigned int, ThreadPool::TPHandle&) (OSD.cc:9701)
==00:00:05:04.903 510301== by 0x70E62E: run (OpSchedulerItem.h:148)
==00:00:05:04.903 510301== by 0x70E62E: OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*) (OSD.cc:10677)
==00:00:05:04.903 510301== by 0xD3C093: ShardedThreadPool::shardedthreadpool_worker(unsigned int) (WorkQueue.cc:311)
==00:00:05:04.903 510301== by 0xD3ECF3: ShardedThreadPool::WorkThreadSharded::entry() (WorkQueue.h:706)
==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
==00:00:05:04.903 510301==
==00:00:05:04.903 510301== This conflicts with a previous write of size 8 by thread #90
==00:00:05:04.903 510301== Locks held: 2, at addresses 0xF122A58 0xF1239A8
==00:00:05:04.903 510301== at 0x7531E1: clear (hashtable.h:2054)
==00:00:05:04.903 510301== by 0x7531E1: std::_Hashtable<entity_addr_t, std::pair<entity_addr_t const, utime_t>, mempool::pool_allocator<(mempool::pool_index_t)15, std::pair<entity_addr_t const, utime_t> >, std::__detail::_Select1st, std::equal_to<entity_addr_t>, std::hash<entity_addr_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::~_Hashtable() (hashtable.h:1369)
==00:00:05:04.903 510301== by 0x75331C: ~unordered_map (unordered_map.h:102)
==00:00:05:04.903 510301== by 0x75331C: OSDMap::~OSDMap() (OSDMap.h:350)
==00:00:05:04.903 510301== by 0x753606: operator() (shared_cache.hpp:100)
==00:00:05:04.903 510301== by 0x753606: std::_Sp_counted_deleter<OSDMap const*, SharedLRU<unsigned int, OSDMap const>::Cleanup, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (shared_ptr_base.h:471)
==00:00:05:04.903 510301== by 0x73BB26: _M_release (shared_ptr_base.h:155)
==00:00:05:04.903 510301== by 0x73BB26: std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (shared_ptr_base.h:148)
==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:747)
==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr_base.h:1078)
==00:00:05:04.903 510301== by 0x72191E: operator= (shared_ptr.h:103)
==00:00:05:04.903 510301== by 0x72191E: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8116)
==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77)
==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
==00:00:05:04.903 510301== Address 0x1e3e0588 is 872 bytes inside a block of size 1,208 alloc'd
==00:00:05:04.903 510301== at 0xA7DC0C3: operator new[](unsigned long) (vg_replace_malloc.c:433)
==00:00:05:04.903 510301== by 0x6C7C0C: OSDService::try_get_map(unsigned int) (OSD.cc:1606)
==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:699)
==00:00:05:04.903 510301== by 0x7213BD: get_map (OSD.h:1732)
==00:00:05:04.903 510301== by 0x7213BD: OSD::_committed_osd_maps(unsigned int, unsigned int, MOSDMap*) (OSD.cc:8076)
==00:00:05:04.903 510301== by 0x7752CA: C_OnMapCommit::finish(int) (OSD.cc:7678)
==00:00:05:04.903 510301== by 0x72A06C: Context::complete(int) (Context.h:77)
==00:00:05:04.903 510301== by 0xD07F14: Finisher::finisher_thread_entry() (Finisher.cc:66)
==00:00:05:04.903 510301== by 0xA7E1203: mythread_wrapper (hg_intercepts.c:389)
==00:00:05:04.903 510301== by 0xC6182DD: start_thread (in /usr/lib64/libpthread-2.28.so)
==00:00:05:04.903 510301== by 0xD8B34B2: clone (in /usr/lib64/libc-2.28.so)
```