Sage Weil [Wed, 11 Jul 2018 12:10:28 +0000 (07:10 -0500)]
qa/standalone/osd/repro_long_log.sh: fix test
The log trimming case wasn't quite right. Before HEAD^ we were
rolling forward too aggressively and miscalculating the can_rollforward_to,
which affected the trim_to calculation.
Sage Weil [Wed, 11 Jul 2018 01:22:49 +0000 (20:22 -0500)]
osd/PG: do not blindly roll forward to log.head
If we are told we can roll forward by the primary, we should only roll
forward as far as the primary says we can.
This probably came out of the similar case in append_log(), but notably
that roll_forward() only happens if !transaction_applied (i.e., backfill
target), and that condition is not checked here.
Matt Benjamin [Wed, 16 May 2018 17:04:55 +0000 (13:04 -0400)]
rgw: require --yes-i-really-mean-it to run radosgw-admin orphans find
Incorrect use of orphans find can lead to data loss. Warn users to be
extra cautious.
Fixes: http://tracker.ceph.com/issues/24146 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 3ff47c7f3eb5964464c8cd49144546ce532ed7f7)
Mykola Golub [Tue, 19 Jun 2018 12:05:27 +0000 (15:05 +0300)]
librbd: deep_copy: update end_size only if zero interval caused truncate
The problem shown up when stripping was used, and assembling a
nonexistent destination object from source objects, and one of these
objects existed and snap diff returned a zero interval. The non-zero
end_size in that case triggered invalid object map update.
Sage Weil [Wed, 4 Jul 2018 19:19:04 +0000 (14:19 -0500)]
osd/PrimaryLogPG: rebuild attrs from clients
Ensure that buffers coming in via client ops are rebuilt before populating
the PGTransaction. This ensures that we don't pin the raw buffers for
the entire incoming message in memory.
In the past we've addressed this issue at the ObjectStore layer, but we
did not consider the attr_cache in ObjectContext. Rebuilding the buffers
at this point will sanitize any incoming attribute before it reaches
either PGBackend or ObjectContext object_cache or the ObjectStore
implementation.
Igor Fedotov [Thu, 5 Jul 2018 11:27:12 +0000 (14:27 +0300)]
os/bluestore: fix incomplete faulty range marking when doing compression
GC.
Under some scenarios GC might process an extent range where some inner extents are left untouched by GC (as there is no need for that). Hence GC doesn't invaliate these inner extents with fault_range call. If untouched extents are mapped to unloaded shards it results in subsequent assertion on o->extent_map.dirty_range() call.
The solution is to invalidate the whole extent range when doing GC.
Fixes: https://tracker.ceph.com/issues/23540 Fixes: https://tracker.ceph.com/issues/24799 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0f04d4484c8663767bdb60f743a8835897013b5a)
John Spray [Fri, 29 Jun 2018 10:36:39 +0000 (11:36 +0100)]
mon: exception for dashboard in config-key warning
This warning went in with the expectation that nobody
would be using config-key commands for modules any more,
but the dashboard does use these in order to get the
"-i" functionality on the CLI for loading certs/keys.
In Nautilus they can switch to using "-i" on real module
commands, but for Mimic let's silence the warning for
the dashboard module.
Fixes: https://tracker.ceph.com/issues/24689 Signed-off-by: John Spray <john.spray@redhat.com>
Venky Shankar [Mon, 4 Jun 2018 09:34:58 +0000 (05:34 -0400)]
rbd-mirror: fix state transition table for disassociation
The final state transition when disassociating (removing) images
does not purge the image state map for a given image. This can
also result in uneven balance of images across instances as the
policy implementation relies on this structure to figure out
total number of images tracked.
Venky Shankar [Tue, 29 May 2018 05:45:40 +0000 (01:45 -0400)]
rbd-mirror: schedule rebalancer to level-load instances
Policy implementation takes care of evenly balancing images
across rbd mirror instances. This is done when images are
added to the map and/or instances are added or removed with
the exception of image removal -- removing images does not
reshuffle other (mapped) images which can result in some of
the instances under loaded (in worst case, if one removes
images which all map to a particular instance, that instance
would remain idle until more images are added or a shuffle is
triggered).
We could possibly trigger map shuffle when images are removed,
but that would change the interface between Policy and ImageMap
class (in the form of changes to Policy::remove_images()). Also,
policy (and its implementations) would have to do more work when
the above class method is invoked.
Therefore, an interval based rebalancer is added to ImageMap for
periodic rebalancing of images only if the following conditions
are met:
- policy has been idle for a configured time duration
- no scheduled or in-transit operations
Neha Ojha [Wed, 20 Jun 2018 17:20:58 +0000 (13:20 -0400)]
osd/PG: restrict async_recovery_targets to up osds
When an osd that is part of the acting set and not the up set, gets chosen
as an async_recovery_target, it gets removed from the acting set. Since this
osd is no longer in the up or acting set, it is classified as a stray in
the next peering cycle. This results in choose_acting() looping between two
proposed acting sets.
To avoid this, we will only choose up osds as async_recovery_targets.
Neha Ojha [Wed, 30 May 2018 18:33:41 +0000 (11:33 -0700)]
PG: do not choose stray osds as async_recovery_targets
Without this change, we might accept stray osds as async_recovery_targets,
and need to ensure that they get a chance to become part of the acting set
after recovery is over.
However, when choose_acting() is called in the Recovered state, we set
restrict_to_up_acting=true, which does not allow them to get back to the
acting set.
Therefore, similar to backfill, do not allow stray osds to become
async_recovery_targets.
Stephan Müller [Mon, 14 May 2018 13:11:27 +0000 (15:11 +0200)]
mgr/dashboard: Format small numbers correctly
The issue was triggered by numbers that a lower than 1.
Doing a logarithm with a number lower than 1 leads to
negative value that is not handled anywhere in The formatter service as
a result the final value will be quirky.
The negative number will also be used as index in the units array, where
it will return "undefined".
Sage Weil [Tue, 26 Jun 2018 02:18:01 +0000 (21:18 -0500)]
osd/PG: do not send notify to empty peer
This is mostly paranoia to avoid doing something clearly silly if the
add_source_info() implementation incorrectly decided we found something
new (as it did in http://tracker.ceph.com/issues/24588).
Sage Weil [Tue, 26 Jun 2018 02:08:48 +0000 (21:08 -0500)]
osd/PG: do not assume delete event means found_missing
This condition was introduced in 3a9d056d843bcafd26d78950b84e2844f8a3a9a1
as part of the missing deletes series, without a clear motivation. The
best guess is that it was either compensating for some other unfound bug
or simply being a bit overaggressive.
The problem is that it triggers a notify being sent to the sender, or
restarts recovery, both of which are overreactions.
Jianpeng Ma [Tue, 3 Jul 2018 07:11:07 +0000 (15:11 +0800)]
os/bluestore: set correctly shard for existed Collection.
For existed Collection, the constructor of Collection will be called in _open_collections.
But m_finisher_num can't setup when enable bluestore_shard_finishers.
So move m_finisher_num setup before _open_collections && _kv_start.
Fixes: http://tracker.ceph.com/issues/24761 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 42cd25b794d2a2c04e96a24abea7f773bb7a3c2e)