Sage Weil [Mon, 2 Mar 2020 16:30:03 +0000 (10:30 -0600)]
Merge PR #33523 into master
* refs/pull/33523/head:
mgr/orch: ServiceSpec: drop 'count'
mgr/rook: use spec.placement.count (instead of spec.count)
mgr/cephadm: make HostAssignment make sense
mgr/orch: PlacementSpec: do not combine all_hosts with anything else
mgr/orch: use PlacementSpec.from_strings() for all CLI commands
Kefu Chai [Sat, 29 Feb 2020 06:51:28 +0000 (14:51 +0800)]
include/cpp-btree: use the same type when allocate/deallocate
btree_set<> by default uses `std::allocator<Key>`, and btree_map by
default uses `std::allocator<std::pair<Key, Value>>`.
before this change, btree uses the allocator directly for allocating n
elements where element is `Key` or `std::pair<Key, Value>` respectively,
while "n" is actually supposed to be the number of bytes used by each
node which is being allocated.
but, what we need to allocate is actually a "node_type" for holding
multiple slots, and each slot holds an element. in addition to the
slots, a node also keeps track of metadata for btree itself. in short,
what we allocate now is (in bytes):
alignof(sizeof(node_type)) * sizeof(element)
but what we should allocate is (in bytes):
alignof(sizeof(node_type))
in this change:
* always rebind the allocator to the correct aligned type with given
alignment
* extract the allocator related helpers into a template class
Sage Weil [Tue, 25 Feb 2020 00:29:12 +0000 (18:29 -0600)]
mgr/cephadm: make HostAssignment make sense
- if hosts are passed, use those.
- if all_hosts=true, use all hosts.
Otherwise, build a set of hosts based on the labels--either explicit or
implied. If there's no label, use all hosts.
If there is a count, use a subset of candidate hosts. If there was no
label and there is no count, fail.
If count and hosts are both provided, then we either (1) use the hosts
as the candidate list and place among them (if len(hosts) >= count), or
(2) ensure that any result includes *at least* the provided hosts.
Sage Weil [Sun, 1 Mar 2020 20:18:39 +0000 (14:18 -0600)]
Merge PR #33553 into master
* refs/pull/33553/head:
mgr/cephadm: orch ls: include specs with no daemons
mgr/cephadm: orch ls: show spec size
mgr/orch: remove unused fields in RGWSpec
mgr/orch: fix ServiceSpec fields
mgr/cephadm: simplify spec apply
pybind/mgr/mgr_module: revert PersistentStoreDict seperator
mgr/cephadm: apply services after refreshing inventory
mgr/cephadm: catch and log exceptions from apply
mgr/orch: no extra whitespace in stored json specs
mgr/cephadm: drop daemon_type arg to _apply_service
mgr/cephadm: use _apply() helper for all apply_ methods
mgr/cephadm: replace PersistentStoreDict with SpecStore
mgr/cephadm: do not remove service spec when removing a daemon
mgr/cephadm: rename completion variables&cleanup
mgr/cephadm: leverage service specs
Sage Weil [Sun, 1 Mar 2020 13:48:17 +0000 (07:48 -0600)]
mgr/orch: fix ServiceSpec fields
- service_type is required. Make it the first position arg to the ctor.
- service_id is the id *only* and optional.
- service_name() is the full service name (no change)
The old 'name' was previously used as the id only, so it was poorly named,
and optional, but in this series was changed to include the type, breaking
naming for a bunch of things (e.g., daemons called mds.mds.fsname.xyz).
Sage Weil [Sun, 1 Mar 2020 03:09:57 +0000 (21:09 -0600)]
mgr/cephadm: simplify spec apply
- Teach _apply_service how to pick the create (and config) functions, so
that we don't need any weird wrappers in the callers.
- Replace trigger_deploy() and _apply_services() with a simpler
_apply_all_services()
- Drop all of the per-type _apply_foo() methods.
Joshua Schmid [Wed, 26 Feb 2020 13:26:42 +0000 (14:26 +0100)]
mgr/cephadm: leverage service specs
Fixes: https://tracker.ceph.com/issues/44205
This does a couple of things:
* Change the way apply_$service() works:
Instead of triggering the deployment mechanism it will rather
transform the already passed ServiceSpec into a json representation
and save it in a persistent mon_store section.
These locations will be periodically checked in the serve() thread.
This works since all the apply_$service_type functions are idempotent.
* Allow to save a config-like specification in the mon_store.
`ceph orch apply -i <service_spec_file.yaml>`
will read the specified services and save them in the mon store
section like mentioned above. The same serve() mechanism like above
also applies to deployment.
Sage Weil [Fri, 28 Feb 2020 21:11:37 +0000 (15:11 -0600)]
msg: add get_{pid,random}_nonce() helpers
In cases where we normally use a pid for a nonce, fall back to a random
value when the pid == 1 (i.e., we're in a container). For the cases where
we use a random value, use the helper.
Sage Weil [Fri, 28 Feb 2020 20:52:02 +0000 (14:52 -0600)]
msg/Policy: make stateless_server default to anon (again)
Midway through the octopus cycle, we made stateless server more stateless
in the sense that it would not register incoming client connections. And,
in so doing, it would not enforce that client connections came from
unique addresses, by closing an existing connection from the same addr
when a new connection was accepted.
This turned out to cause out of order OSD ops because the OSD needed that
behavior. See https://tracker.ceph.com/issues/42328. We fixed that by
reverting to the old behavior for all but monitor connections, where we
needed it, in 507d213cc453ed86ab38619590f710f33245c652.
This, in turn, breaks most OSD <-> OSD communication (and probably lots
of other things) with cephadm, because we make entity_addr_t unique with
a nonce that is populated by getpid()... and the containerized daemons
all have pid 1. When we finally merged the follow-on fixes for the change
above cephadm OSDs can't ping each other.
In my view, the 'anon' connection handling is a good idea in the general
case. So, let's adjust our fix for #42328 so that it is only the OSD
client-side interface that registers client connections and makes them
unique.
Fixes: https://tracker.ceph.com/issues/44358 Signed-off-by: Sage Weil <sage@redhat.com>
liupengs [Sun, 17 Nov 2019 15:03:07 +0000 (23:03 +0800)]
msg/async/rdma: fix bug event center is blocked by rdma construct connection for transport ib sync msg
We construct a tcp connection to transport ib sync msg, if the
remote node is shutdown (shutdown by accident), the net.connect will be blocked until timeout
is reached, which cause the event center be blocked.
This bug may cause mon probe timeout and osd not reply, and so on.
Casey Bodley [Fri, 28 Feb 2020 17:49:58 +0000 (12:49 -0500)]
rgw: bucket_list_ordered loops until it gets a unique candidate
when we detect a duplicate common prefix, we need to loop until we get
the next unique candidate. we must add a new candidate for each shard,
or we won't visit it again and would miss later entries
Casey Bodley [Fri, 28 Feb 2020 16:06:32 +0000 (11:06 -0500)]
rgw: bucket_list_ordered advances past duplicate common prefixes
we may see the same common prefix from more than one shard. when we
detect a duplicate, we need to advance past it. otherwise, we may make
the wrong decision about is_truncated because the shards with
duplicates won't be at_end()