Sage Weil [Mon, 9 Jul 2018 18:26:39 +0000 (13:26 -0500)]
global/global_init: fix stdout/stderr/stdin closing for daemonization
The global_init_postfork/prefork helpers close stdout/stdin/stderr on
fork and reopen /dev/null in their place. This ensures that if later
code writes to those descriptors (e.g., a stray cout or cerr usage) the
output/input will go nowhere instead of interfering with some other open
fd.
However, with the use of preforker, there are other threads running when
these helpers are run, which means we can race with, say, filestore
opening an object file and end up sending log output there.
Fix by atomically replacing the fds with the dup2(2) syscall, which
will implicitly close and reopen the target fd in an atomic fashion. This
behavior is present on both Linux and FreeBSD.
Matt Benjamin [Wed, 16 May 2018 17:04:55 +0000 (13:04 -0400)]
rgw: require --yes-i-really-mean-it to run radosgw-admin orphans find
Incorrect use of orphans find can lead to data loss. Warn users to be
extra cautious.
Fixes: http://tracker.ceph.com/issues/24146 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 3ff47c7f3eb5964464c8cd49144546ce532ed7f7)
Sage Weil [Wed, 4 Jul 2018 19:19:04 +0000 (14:19 -0500)]
osd/PrimaryLogPG: rebuild attrs from clients
Ensure that buffers coming in via client ops are rebuilt before populating
the PGTransaction. This ensures that we don't pin the raw buffers for
the entire incoming message in memory.
In the past we've addressed this issue at the ObjectStore layer, but we
did not consider the attr_cache in ObjectContext. Rebuilding the buffers
at this point will sanitize any incoming attribute before it reaches
either PGBackend or ObjectContext object_cache or the ObjectStore
implementation.
Igor Fedotov [Thu, 5 Jul 2018 11:27:12 +0000 (14:27 +0300)]
os/bluestore: fix incomplete faulty range marking when doing compression
GC.
Under some scenarios GC might process an extent range where some inner extents are left untouched by GC (as there is no need for that). Hence GC doesn't invaliate these inner extents with fault_range call. If untouched extents are mapped to unloaded shards it results in subsequent assertion on o->extent_map.dirty_range() call.
The solution is to invalidate the whole extent range when doing GC.
Fixes: https://tracker.ceph.com/issues/23540 Fixes: https://tracker.ceph.com/issues/24799 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
(cherry picked from commit 0f04d4484c8663767bdb60f743a8835897013b5a)
John Spray [Fri, 29 Jun 2018 10:36:39 +0000 (11:36 +0100)]
mon: exception for dashboard in config-key warning
This warning went in with the expectation that nobody
would be using config-key commands for modules any more,
but the dashboard does use these in order to get the
"-i" functionality on the CLI for loading certs/keys.
In Nautilus they can switch to using "-i" on real module
commands, but for Mimic let's silence the warning for
the dashboard module.
Fixes: https://tracker.ceph.com/issues/24689 Signed-off-by: John Spray <john.spray@redhat.com>
Neha Ojha [Wed, 20 Jun 2018 17:20:58 +0000 (13:20 -0400)]
osd/PG: restrict async_recovery_targets to up osds
When an osd that is part of the acting set and not the up set, gets chosen
as an async_recovery_target, it gets removed from the acting set. Since this
osd is no longer in the up or acting set, it is classified as a stray in
the next peering cycle. This results in choose_acting() looping between two
proposed acting sets.
To avoid this, we will only choose up osds as async_recovery_targets.
Neha Ojha [Wed, 30 May 2018 18:33:41 +0000 (11:33 -0700)]
PG: do not choose stray osds as async_recovery_targets
Without this change, we might accept stray osds as async_recovery_targets,
and need to ensure that they get a chance to become part of the acting set
after recovery is over.
However, when choose_acting() is called in the Recovered state, we set
restrict_to_up_acting=true, which does not allow them to get back to the
acting set.
Therefore, similar to backfill, do not allow stray osds to become
async_recovery_targets.
Stephan Müller [Mon, 14 May 2018 13:11:27 +0000 (15:11 +0200)]
mgr/dashboard: Format small numbers correctly
The issue was triggered by numbers that a lower than 1.
Doing a logarithm with a number lower than 1 leads to
negative value that is not handled anywhere in The formatter service as
a result the final value will be quirky.
The negative number will also be used as index in the units array, where
it will return "undefined".
Sage Weil [Tue, 26 Jun 2018 02:18:01 +0000 (21:18 -0500)]
osd/PG: do not send notify to empty peer
This is mostly paranoia to avoid doing something clearly silly if the
add_source_info() implementation incorrectly decided we found something
new (as it did in http://tracker.ceph.com/issues/24588).
Sage Weil [Tue, 26 Jun 2018 02:08:48 +0000 (21:08 -0500)]
osd/PG: do not assume delete event means found_missing
This condition was introduced in 3a9d056d843bcafd26d78950b84e2844f8a3a9a1
as part of the missing deletes series, without a clear motivation. The
best guess is that it was either compensating for some other unfound bug
or simply being a bit overaggressive.
The problem is that it triggers a notify being sent to the sender, or
restarts recovery, both of which are overreactions.
Jianpeng Ma [Tue, 3 Jul 2018 07:11:07 +0000 (15:11 +0800)]
os/bluestore: set correctly shard for existed Collection.
For existed Collection, the constructor of Collection will be called in _open_collections.
But m_finisher_num can't setup when enable bluestore_shard_finishers.
So move m_finisher_num setup before _open_collections && _kv_start.
Fixes: http://tracker.ceph.com/issues/24761 Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
(cherry picked from commit 42cd25b794d2a2c04e96a24abea7f773bb7a3c2e)
Conflicts:
src/pybind/mgr/dashboard/frontend/src/app/ceph/rgw/rgw-user-form/rgw-user-form.component.ts
(retain mimic "Create an observable to add the S3 key when the form is submitted" logic
but add whitespace so it blends in with the rest of the refactor)
This implements a configurable authentication order, currently used only for s3
authentication and only supporting external & local authentication, though there
is potential for more finegrained control by allowing for a map of various
engines and the control strategy (required vs sufficient vs fallback)
The current implementation just focuses on setting control fallback if the
engine is the last in the order (and hence the stack) and just sets sufficient to
every other element, so that errors from the last sufficient engine is returned.
The configuration option is rgw_s3_auth_order which takes a comma/space seperated
list of authentication engines where currently we support the keywords `external`
and `local`.
Andrew Schoen [Tue, 3 Jul 2018 11:45:24 +0000 (06:45 -0500)]
ceph-volume: always ignore a missing ceph conf in main.py
Now that we have a nice error message when a ceph.conf is missing
and we try to use values from it, maintaining a list of commands that
don't need ceph.conf isn't as helpful. We had actually missed 'simple
trigger' when we first implemented this causing all our luminous tests
for simple to fail when we backported.
Sage Weil [Mon, 4 Jun 2018 17:51:11 +0000 (12:51 -0500)]
osd/PrimaryLogPG: fix on_local_recover crash on stray clone
If there is a stray clone (one that does not appear in the SnapSet) and
we do any sort of recovery on it the OSD will crash. Log an error instead
but continue.
This addresses a problem where a cluster has both (1) an unexpected clone
and (2) the clone is not present on all replicas. Doing repair on that
PG will both not fix the unexpected clone and also cause the remaining
OSDs to crash trying to recover it.