Sage Weil [Sat, 30 Mar 2019 13:35:23 +0000 (08:35 -0500)]
Merge PR #27139 into nautilus
* refs/pull/27139/head:
os/bluestore: unconditionally cap chunks returned by allocator to 2^31
os/bluestore: start using 64-bit intervals for bitmap allocator
os/bluestore: make bluestore interval base template.
tests/fastbmap_alloc: UT to reproduce 4G allocation bug
os/bluestore: os/bluestore: implement dump for bitmap allocator
os/bluestore be more tolerant to lack of space for bluefs.
Sage Weil [Sun, 24 Mar 2019 15:28:42 +0000 (10:28 -0500)]
Merge PR #27119 into nautilus
* refs/pull/27119/head:
crush/CrushWrapper: make update_choose_args less chatty
qa/standalone/crush/crush-choose-args: add weight-set tests
qa/standalone/crush/crush-choose-args: fix test
crush/CrushWrapper: move_item: do not clobber weight-set weights
crush/CrushWrapper: create_or_move: make weight-set update optional
mon/OSDMonitor: apply osd_crush_update_weight_set for reweight, create-or-move
crush/CrushWrapper: insert_item: make weight-set update optional (for leaves only)
crush/CrushWrapper: use adjust_item_weight_in_bucket for subtree reweight
crush/CrushWrapper: fix detach_bucket, remove_item[_under] vs weight-sets
crush/CrushWrapper: add update_weight_sets arg to adjust_item_weight_*
crush/CrushWrapper: refactor adjust_weight_* into per-bucket helper
crush/CrushWrapper: pass cct down into more places
Igor Fedotov [Mon, 11 Mar 2019 16:13:19 +0000 (19:13 +0300)]
os/bluestore be more tolerant to lack of space for bluefs.
'gift' space is just advisory for allocation, part of it actually requested
from BlueFS is mandatory only. Hence do not fail when unable to allocate
the whole space.
Fixes: https://tracker.ceph.com/issues/38760 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit dbc1a78787baacd7bbc98ff8bbb72e609def2ad6)
Verify we have the expected behavior for creates and moves that
maintain bucket summation, both with and without the
osd_crush_update_weight_set option enabled.
Sage Weil [Thu, 14 Mar 2019 16:29:10 +0000 (11:29 -0500)]
mon/OSDMonitor: apply osd_crush_update_weight_set for reweight, create-or-move
Since CrushWrapper no longer applies this setting at a low level,
where it can't tell what the real intention is, we instead apply
it at the top command level where we do.
Specifically, we use it to control whether the weight-set weights
are set for the commands
Note that this (indirectly) affects the way weight-set weights
are initialized for newly created OSDs, since those are added to
the crush map via the 'osd crush create-or-move' command.
Sage Weil [Thu, 14 Mar 2019 17:40:23 +0000 (12:40 -0500)]
crush/CrushWrapper: insert_item: make weight-set update optional (for leaves only)
If it is a bucket, we should sum the weight-set values to weight
the bucket in the subtrees. It only makes sense to reset the
weight-set weights for leaf items.
Sage Weil [Thu, 14 Mar 2019 16:29:10 +0000 (11:29 -0500)]
crush/CrushWrapper: add update_weight_sets arg to adjust_item_weight_*
- Make it optional whether the weight-set weights are adjusted to
match the weight.
- Fix the adjustment of the parent bucket(s) so that the
summations in weight-sets are correctly maintained. Prior to
this change, if I adjust any weight, all parent buckets'
weight-set weights are reset to the bucket's primary weight.
Sebastian Wagner [Wed, 13 Feb 2019 14:01:25 +0000 (15:01 +0100)]
mgr/orchestrator: Add error handling to interface
Also:
* Small test_orchestrator refactorization
* Improved Docstring in MgrModule.remote
* Added `raise_if_exception` that raises Exceptions
* Added `OrchestratorError` and `OrchestratorValidationError`
* `_orchestrator_wait` no longer raises anything
* `volumes` model also calls `raise_if_exception`
Jason Dillaman [Wed, 20 Mar 2019 18:40:50 +0000 (14:40 -0400)]
librbd: ignore -EOPNOTSUPP errors when retrieving image group membership
The Luminous release did not support adding images to a group (it only
included the bare-minimum support for creating groups). Commit f76df32666b
incorrectly dropped support for ignoring this possible failure. This
prevents Nautilus-release clients from opening images contained within
a Luminous-release cluster.
Fixes: http://tracker.ceph.com/issues/38834 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Sat, 16 Mar 2019 20:06:00 +0000 (15:06 -0500)]
mon/OSDMonitor: allow 'osd pool set pgp_num_actual'
Normally we let the mgr control pgp_num_actual for us in a nice, safe, controlled
way. However, it is very conservative, and only makes changes if all PGs are healthy.
There are situations where the user wants to be move aggressive than this.
For example, if you have a pool with many PGs (say, 4096) and set pg_num_target to a
small number like 4, the mgr will adjust pgp_num way down. This can lead to an OSD
hitting max_pgs_per_osd. That prevents the PGs from being active+clean, however,
which prevents the mgr adjusting pgp_num back up even if the user sets the target to
a larger value.
This patch lets the user directly adjust pgp_num_actual. Note that we still do
not expose access to pg_num_actual, since there are much stricter conditions that
must be true in order to safely make downward adjustments.
The stress-split thrasher already had this off, but the ec variant did
not. We don't support ceph-objectstore-tool exports/imports between major
versions.
Fixes: http://tracker.ceph.com/issues/38294 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 15 Mar 2019 17:24:52 +0000 (12:24 -0500)]
osd/PG: fix pg merge check for rc clusters
If a cluster had a pg merge pending before last_pg_merge_meta was
introduced then the source_pgid will be pg_t(). If that's the case,
skip these new checks.
Likewise, if we decode a legacy pg_pool_t, put the old merge les/lec
values into the correct location.
Sage Weil [Fri, 15 Mar 2019 17:08:34 +0000 (12:08 -0500)]
Merge PR #26965 into nautilus
* refs/pull/26965/head:
ms/async/ProtocolV2: add ms_die_on_bug and assert rxbuf/txbuf don't get big
msg/async/ProtocolV2: do not reenable pre_auth buffering on from reset_recv_state
Sage Weil [Fri, 15 Mar 2019 03:50:29 +0000 (22:50 -0500)]
msg/async/ProtocolV2: do not reenable pre_auth buffering on from reset_recv_state
This is specifically bad because we call reset_recv_state from
reuse_connection, which turns buffering back on on an already-authenticated
session.
Instead, reenable it only when we set the state to START_CONNECT. (On
the accepting side, it is a fresh connection, so it starts out true.)
Also, we want to *disable* it on the connection we are reusing, which
might be in a pre-auth state, while we are in a post-auth state.
Fixes: http://tracker.ceph.com/issues/38746 Signed-off-by: Sage Weil <sage@redhat.com>
Lenz Grimmer [Fri, 15 Mar 2019 09:38:00 +0000 (10:38 +0100)]
Merge pull request #26738 from votdev/fix_docs
mgr/dashboard: Fix issues in controllers/docs
Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Patrick Nawracay <pnawracay@suse.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com> Reviewed-by: Tina Kallio <tina.kallio@gmail.com>
Matt Benjamin [Tue, 12 Mar 2019 12:58:53 +0000 (08:58 -0400)]
rgw: ldap: fix LDAPAuthEngine::init() when uri !empty()
Fixes: https://tracker.ceph.com/issues/38699 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 6ef98c6e0fcf4f9b6e431b3409975e0966c5c21a)
Sage Weil [Fri, 15 Mar 2019 03:37:18 +0000 (22:37 -0500)]
Merge PR #26898 into nautilus
* refs/pull/26898/head:
osd/PG: invalidate PG if merging with unexpected version
osd,mon: include more pg merge metadata in pg_pool_t
qa/standalone/osd/pg-split-merge.sh: reproduce pg merge problem with empty pgs
osd: add osd_debug_no_{acting_change,purge_strays}
Sage Weil [Thu, 14 Mar 2019 15:04:14 +0000 (10:04 -0500)]
Merge PR #26875 into nautilus
* refs/pull/26875/head:
common: implement HMACs on top of OpenSSL.
msg/async, v2: switch the pre-auth mechanism to HMAC-SHA256.
include/types: beef sha_digest_t up with encode and compare.
auth: add hmac_sha256() to CryptoKey.
msg/async, v2: introduce pre_auth exchanges with CRC32.
msg/async, v2: introduce pre_auth buffers.
msg/async, v2: rectify the encapsulation of rx_segments_{desc,data}.
msg/async, v2: rework decoding of MessageFrame.
msg/async, v2: limit the num_segments to non-empty segments.
msg/async, v2: drop the bl onwire space optimization in ControlFrames.
msg/async, v2: clean up ret handling in ProtocolV2::write().
msg/async, v2: drop next_payload_len as we don't need anymore.
msg/async, v2: drop temp_buffer and limitations driven by it.
msg/async, v2: switch to rx_buffer_t entirely.
msg/async, v2: rx continuations use buffer::ptr_node.
msg/async, v2: use bptr continuation for segment reading.
msg/async: introduce bptr-carrying continuations.
msg/async: replace CONTINUATION_PARAM() with specialized types.
msg/async, v2: ::_banner_exchange() takes CtRef instead of CtPtr.
msg/async: avoid extra pointers in continuation definitions.
msg/async, v2: dissect setting stream handlers into ::finish_auth().
msg/async, v2: drop ceph_msg_header2 handling from ControlFrames.
msg/async, v2: drop the SignedEncryptedFrame entirely.
msg/async, v2: reintroduce segment aligment. It's compile-time now.
msg/async, v2: generalize Frame about number of segments.
msg/async, v2: rework and generalize Frame encryption.
msg/async, v2: rework the class hierarchy - introduce MessageFrame.
msg/async, v2: rework the class hierarchy - introduce ControlFrame.
msg/async/ProtocolV2: remove obsolete AuthFlags