This was only needed for luminous and mimic. We are keeping these commits in master (and reverting them) only so that the cherry-pick -x references work.
Sage Weil [Thu, 12 Jul 2018 03:01:42 +0000 (22:01 -0500)]
Merge PR #22974 into master
* refs/pull/22974/head:
qa/standalone/osd/ec-error-rollforward: reproduce bug 24597
qa/suites/rados/thrash-erasure-code: add many deletes workload
qa/standalone/osd/repro_long_log.sh: fix test
osd/PG: do not blindly roll forward to log.head
Sage Weil [Wed, 11 Jul 2018 13:18:34 +0000 (08:18 -0500)]
qa/suites/rados/thrash-erasure-code: add many deletes workload
Having lots of deletes will mean deletes on objects that don't exist,
which will in turn mean error log entries and more coverage of the
append_log_entries_update_missing code. Hopefully this will trigger
http://tracker.ceph.com/issues/24597
Sage Weil [Wed, 11 Jul 2018 12:10:28 +0000 (07:10 -0500)]
qa/standalone/osd/repro_long_log.sh: fix test
The log trimming case wasn't quite right. Before HEAD^ we were
rolling forward too aggressively and miscalculating the can_rollforward_to,
which affected the trim_to calculation.
Sage Weil [Wed, 11 Jul 2018 01:22:49 +0000 (20:22 -0500)]
osd/PG: do not blindly roll forward to log.head
If we are told we can roll forward by the primary, we should only roll
forward as far as the primary says we can.
This probably came out of the similar case in append_log(), but notably
that roll_forward() only happens if !transaction_applied (i.e., backfill
target), and that condition is not checked here.
Fixes: http://tracker.ceph.com/issues/24597 Signed-off-by: Sage Weil <sage@redhat.com>
Merge pull request #22593 from wido/mgr-dashboard-ssl
mgr/dashboard: Add option to disable SSL
Reviewed-by: John Spray <john.spray@redhat.com> Reviewed-by: Ricardo Dias <rdias@suse.com> Reviewed-by: Sebastian Wagner <swagner@suse.com> Reviewed-by: Tatjana Dehler <tdehler@suse.com> Reviewed-by: Volker Theile <vtheile@suse.com>
Although is preferred and should be enabled by default users might
want to disable SSL as the dashboard might be running behind a proxy
which terminates the SSL.
Fixes: https://tracker.ceph.com/issues/24674 Signed-off-by: Wido den Hollander <wido@42on.com>
Before the commit following race can happen:
```
A : OpTracker::visit_ops_in_flight(..., callable leaking TrackedOpRef outside)
A : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)
A with lock : (nref > 0) == true
B : TrackedOp::put(), nref := 0 // updating the counter is done without the lock
B : OpTracker::unregister_inflight_op()
B : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)
A with lock : visit() -> TrackedOp::get(), nref := 1
A with lock : Mutex::Locker::~Locker()
B with lock : boost::intrusive::list::iterator_to(op)
B with lock : boost::intrusive::list::erase(iter)
B with lock : Mutex::Locker::~Locker()
A : TrackedOp::put(), nref := 0
A : OpTracker::unregister_inflight_op()
A : Mutex::Locker::Locker(sdata->ops_in_flight_lock_sharded)
A with lock : boost::intrusive::list::iterator_to(op) // oops as op doesn't belong to the list anymore
```
common,rbd,rgw,osd: extract config values into ConfigValues
this change introduce three classes: ConfigValues, ConfigProxy and
ConfigReader. in seastar port of OSD, each CPU shard will hold its own
reference of configuration, and upon changes of settings, each
shard will be updated with the new setting in async. so this forces us
to be able to keep two set of configuration at the same time. so we
need to extract the changeable part of md_config_t out. so we can
replace the old one with new one on demand, and let different shards
share the same unchanged part, amon the other things, the Options map
and the lookup tables. that's why we need ConfigValues. we will add
a policy template for this class, so we can specialize for Seastar
implementation to allow different ConfigProxy instances to point
md_config_impl<> to different ConfigValues.
because the observer interface is still using md_config_t, to minimise
the impact of this change, handle_conf_change() and
handle_subsys_change() are not changed. but as it accepts a `const
md_config_t`, which cannot be used to create/reference the ConfigProxy
holding it, we need to introduce ConfigReader for reading the updated
setting from md_config_t in a simpler way, without exposing the
internal "values" member variable.
John Spray [Tue, 10 Jul 2018 09:41:52 +0000 (10:41 +0100)]
doc/cephfs: make scary DR bits less prominent
I'm sure people will still find them, but let's at least
force people to click through one more time to get to the
commands that can damage your cluster.
Also, the ".. danger" directive at the top of the page
wasn't actually getting special formatting, so I changed
it to a ".. warning" which is red.
Sage Weil [Sun, 8 Jul 2018 16:00:12 +0000 (11:00 -0500)]
mon/OSDMonitor: add 'osd repeer <pgid>' command
Selecting force peering on a single PG. In reality this probably induces
*2* interval changes.
Note that in the case of a single OSD cluster we can't actually force a
repeer on a single PG because the pg_temp code is pretty robust about
filtering out redundant or meaningless changes, so we can't pg_temp our
way into a new interval if there are no other OSDs to switch to and the
code also prevents an empty pg_temp.
Sage Weil [Mon, 9 Jul 2018 18:26:39 +0000 (13:26 -0500)]
global/global_init: fix stdout/stderr/stdin closing for daemonization
The global_init_postfork/prefork helpers close stdout/stdin/stderr on
fork and reopen /dev/null in their place. This ensures that if later
code writes to those descriptors (e.g., a stray cout or cerr usage) the
output/input will go nowhere instead of interfering with some other open
fd.
However, with the use of preforker, there are other threads running when
these helpers are run, which means we can race with, say, filestore
opening an object file and end up sending log output there.
Fix by atomically replacing the fds with the dup2(2) syscall, which
will implicitly close and reopen the target fd in an atomic fashion. This
behavior is present on both Linux and FreeBSD.
Fixes: http://tracker.ceph.com/issues/23492 Signed-off-by: Sage Weil <sage@redhat.com>
cmake: should link against libatomic if libcxx/libstdc++ does not offer atomic ops
for instance, GCC-8 on riscv64 does not offer atomic ops like
__atomic_fetch_or_1, so we need to link against libatomic to get access
to these symbols.
Since run-make-check.sh already ensures that ccache is installed,
it makes sense to let everyone benefit from the ccache
tweaks introduced by 4cb5a590537a9caaf61db42ce8ea123d2ab961f3
Note 1: The previous solution using "date" would cause build tools to reset
their timestamps after 24 hours, on subsequent runs of run-make-check.sh.
In order to maximize ccache effectiveness, this commit sets SOURCE_DATE_EPOCH
to a fixed value: the number of seconds elapsed since the Unix epoch as at
January 1, 2000 (chosen to commemorate Y2K armageddon).
Note 2: this commit introduces "set -e". This was actually in effect
before, via "source install-deps.sh". Better to make it explicit.
9b80b14783ef895390b4153320078661627f373d extended that check to also
include RECOVERY_DELETES. The Mimic release notes do not mention these
flags as prerequisites for an upgrade beyond Luminous. That creates an
obvious issue for users who skipped Luminous in production, and now
want to upgrade from Jewel to Mimic in, say, a weekend.
Update the release notes to include those flags as prerequisites for a
Luminous to Mimic upgrade, explain how users can make sure that they
are set, and also give users a one-liner to fix up their PGs in a
pinch, if they need to.