Sage Weil [Sat, 30 Mar 2019 13:35:23 +0000 (08:35 -0500)]
Merge PR #27139 into nautilus
* refs/pull/27139/head:
os/bluestore: unconditionally cap chunks returned by allocator to 2^31
os/bluestore: start using 64-bit intervals for bitmap allocator
os/bluestore: make bluestore interval base template.
tests/fastbmap_alloc: UT to reproduce 4G allocation bug
os/bluestore: os/bluestore: implement dump for bitmap allocator
os/bluestore be more tolerant to lack of space for bluefs.
xie xingguo [Sat, 23 Mar 2019 01:50:27 +0000 (09:50 +0800)]
osd/OSDMap: calc_pg_upmaps - restrict optimization to origin pools only
The current implementation will try to cancel any pg_upmaps that
would otherwise re-map a PG out from an underfull osd, which is wrong,
e.g., because it could reliably fire the following assert:
huangjun [Wed, 20 Mar 2019 08:44:02 +0000 (16:44 +0800)]
crush: add root_bucket to identify underfull buckets
All underfull buckets under root_buckets will be taken as target
For the crule rule:
step take datacenter0
step chooseleaf firstn 2 type host
step emit
step take datacenter1
step chooseleaf firstn 2 type host
step emit
If one host contains overfull osd but no underfull osd,
it will use other underfull buckets as target, which
maybe not in the same datacenter, that will
broke the rule.
Sage Weil [Mon, 25 Mar 2019 18:40:19 +0000 (13:40 -0500)]
common/config: parse --default-$option as a default value
Sometimes it is useful to specify an alternative default value for an
option via the command line such that it has a lower priority than the
mon config database, config file, the rest of the command line, or the
environment.
Sage Weil [Sun, 24 Mar 2019 15:28:42 +0000 (10:28 -0500)]
Merge PR #27119 into nautilus
* refs/pull/27119/head:
crush/CrushWrapper: make update_choose_args less chatty
qa/standalone/crush/crush-choose-args: add weight-set tests
qa/standalone/crush/crush-choose-args: fix test
crush/CrushWrapper: move_item: do not clobber weight-set weights
crush/CrushWrapper: create_or_move: make weight-set update optional
mon/OSDMonitor: apply osd_crush_update_weight_set for reweight, create-or-move
crush/CrushWrapper: insert_item: make weight-set update optional (for leaves only)
crush/CrushWrapper: use adjust_item_weight_in_bucket for subtree reweight
crush/CrushWrapper: fix detach_bucket, remove_item[_under] vs weight-sets
crush/CrushWrapper: add update_weight_sets arg to adjust_item_weight_*
crush/CrushWrapper: refactor adjust_weight_* into per-bucket helper
crush/CrushWrapper: pass cct down into more places
Igor Fedotov [Mon, 11 Mar 2019 16:13:19 +0000 (19:13 +0300)]
os/bluestore be more tolerant to lack of space for bluefs.
'gift' space is just advisory for allocation, part of it actually requested
from BlueFS is mandatory only. Hence do not fail when unable to allocate
the whole space.
Fixes: https://tracker.ceph.com/issues/38760 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit dbc1a78787baacd7bbc98ff8bbb72e609def2ad6)
Verify we have the expected behavior for creates and moves that
maintain bucket summation, both with and without the
osd_crush_update_weight_set option enabled.
Sage Weil [Thu, 14 Mar 2019 16:29:10 +0000 (11:29 -0500)]
mon/OSDMonitor: apply osd_crush_update_weight_set for reweight, create-or-move
Since CrushWrapper no longer applies this setting at a low level,
where it can't tell what the real intention is, we instead apply
it at the top command level where we do.
Specifically, we use it to control whether the weight-set weights
are set for the commands
Note that this (indirectly) affects the way weight-set weights
are initialized for newly created OSDs, since those are added to
the crush map via the 'osd crush create-or-move' command.
Sage Weil [Thu, 14 Mar 2019 17:40:23 +0000 (12:40 -0500)]
crush/CrushWrapper: insert_item: make weight-set update optional (for leaves only)
If it is a bucket, we should sum the weight-set values to weight
the bucket in the subtrees. It only makes sense to reset the
weight-set weights for leaf items.
Sage Weil [Thu, 14 Mar 2019 16:29:10 +0000 (11:29 -0500)]
crush/CrushWrapper: add update_weight_sets arg to adjust_item_weight_*
- Make it optional whether the weight-set weights are adjusted to
match the weight.
- Fix the adjustment of the parent bucket(s) so that the
summations in weight-sets are correctly maintained. Prior to
this change, if I adjust any weight, all parent buckets'
weight-set weights are reset to the bucket's primary weight.
Sebastian Wagner [Wed, 13 Feb 2019 14:01:25 +0000 (15:01 +0100)]
mgr/orchestrator: Add error handling to interface
Also:
* Small test_orchestrator refactorization
* Improved Docstring in MgrModule.remote
* Added `raise_if_exception` that raises Exceptions
* Added `OrchestratorError` and `OrchestratorValidationError`
* `_orchestrator_wait` no longer raises anything
* `volumes` model also calls `raise_if_exception`
Jason Dillaman [Wed, 20 Mar 2019 18:40:50 +0000 (14:40 -0400)]
librbd: ignore -EOPNOTSUPP errors when retrieving image group membership
The Luminous release did not support adding images to a group (it only
included the bare-minimum support for creating groups). Commit f76df32666b
incorrectly dropped support for ignoring this possible failure. This
prevents Nautilus-release clients from opening images contained within
a Luminous-release cluster.
Fixes: http://tracker.ceph.com/issues/38834 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Sage Weil [Sat, 16 Mar 2019 20:06:00 +0000 (15:06 -0500)]
mon/OSDMonitor: allow 'osd pool set pgp_num_actual'
Normally we let the mgr control pgp_num_actual for us in a nice, safe, controlled
way. However, it is very conservative, and only makes changes if all PGs are healthy.
There are situations where the user wants to be move aggressive than this.
For example, if you have a pool with many PGs (say, 4096) and set pg_num_target to a
small number like 4, the mgr will adjust pgp_num way down. This can lead to an OSD
hitting max_pgs_per_osd. That prevents the PGs from being active+clean, however,
which prevents the mgr adjusting pgp_num back up even if the user sets the target to
a larger value.
This patch lets the user directly adjust pgp_num_actual. Note that we still do
not expose access to pg_num_actual, since there are much stricter conditions that
must be true in order to safely make downward adjustments.
The stress-split thrasher already had this off, but the ec variant did
not. We don't support ceph-objectstore-tool exports/imports between major
versions.
Fixes: http://tracker.ceph.com/issues/38294 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Fri, 15 Mar 2019 17:24:52 +0000 (12:24 -0500)]
osd/PG: fix pg merge check for rc clusters
If a cluster had a pg merge pending before last_pg_merge_meta was
introduced then the source_pgid will be pg_t(). If that's the case,
skip these new checks.
Likewise, if we decode a legacy pg_pool_t, put the old merge les/lec
values into the correct location.
Sage Weil [Fri, 15 Mar 2019 17:08:34 +0000 (12:08 -0500)]
Merge PR #26965 into nautilus
* refs/pull/26965/head:
ms/async/ProtocolV2: add ms_die_on_bug and assert rxbuf/txbuf don't get big
msg/async/ProtocolV2: do not reenable pre_auth buffering on from reset_recv_state
Sage Weil [Fri, 15 Mar 2019 03:50:29 +0000 (22:50 -0500)]
msg/async/ProtocolV2: do not reenable pre_auth buffering on from reset_recv_state
This is specifically bad because we call reset_recv_state from
reuse_connection, which turns buffering back on on an already-authenticated
session.
Instead, reenable it only when we set the state to START_CONNECT. (On
the accepting side, it is a fresh connection, so it starts out true.)
Also, we want to *disable* it on the connection we are reusing, which
might be in a pre-auth state, while we are in a post-auth state.
Fixes: http://tracker.ceph.com/issues/38746 Signed-off-by: Sage Weil <sage@redhat.com>
Lenz Grimmer [Fri, 15 Mar 2019 09:38:00 +0000 (10:38 +0100)]
Merge pull request #26738 from votdev/fix_docs
mgr/dashboard: Fix issues in controllers/docs
Reviewed-by: Ernesto Puerta <epuertat@redhat.com> Reviewed-by: Patrick Nawracay <pnawracay@suse.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com> Reviewed-by: Tina Kallio <tina.kallio@gmail.com>