Owen Synge [Wed, 11 Feb 2015 17:18:05 +0000 (18:18 +0100)]
Support ceph cluster names with systemd
Systemd unit files could only ever support one cluster name due
to not supporting dynamically gathered variables. For this reason
we support multiple systemd units for different cluster names.
Owen Synge [Mon, 26 Jan 2015 15:20:20 +0000 (16:20 +0100)]
New rich init system detection.
Uses both a database and detecting management commands to find init system.
Logs error is one of these two systems fails.
Raises error if both systems disgree.
Testing notes:
- works on SLE12
- works on openSUSE 13.1
- works on Scientific 6.4
- works on debian 7.7 (wheezy)
- works on debian 8 (jessie)
Owen Synge [Fri, 23 Jan 2015 11:04:37 +0000 (12:04 +0100)]
Changed prestart script path
Ceph was teh only application I could find in suse which used the path
/usr/libexec so I changed this to the /usr/lib/ patch for
/usr/lib/ceph/ceph-osd-prestart.sh
Owen Synge [Mon, 12 Jan 2015 13:35:41 +0000 (14:35 +0100)]
Fixes to rcceph script
- only start OSDs if mon daemons are also present
- adds support for mask and unmask
- removes support for cluster with non default cluster name,
as this was very limited and inconsistent
Owen Synge [Wed, 7 Jan 2015 10:36:24 +0000 (11:36 +0100)]
radosgw systemd support
Added a radosgw systemd support and associated prestart script.
- With improved checking over first revison.
- ceph-radosgw-prestart.sh now installed in /usr/lib/ceph-radosgw
Owen Synge [Wed, 3 Dec 2014 11:32:34 +0000 (12:32 +0100)]
Fix overflowing journel partitions.
This fixes bnc#896406. When useing ceph-disk to create a journel
parititon in the next available partition and thier is not enough
space ceph-disk did not provide a clear error message.
ceph-disk detects OS and Version and from this decides
to use sysV systemd or upstart. This code needs a bigger
rewrite so for now just explicitly tell ceph-disk the
init system.
Owen Synge [Thu, 7 Aug 2014 09:23:09 +0000 (11:23 +0200)]
Fix "disk zap" sgdisk invocation
If the metadata on the disk is truly invalid, sgdisk would fail to zero
it in one go, because --mbrtogpt apparently tried to operate on the
metadata it read before executing --zap-all.
Splitting this up into two separate invocations to first zap everything
and then clear it properly fixes this issue.
Based on patch by Lars Marowsky-Bree <lmb@suse.com> in ceph-deploy.
Created by Vincent Untz <vuntz@suse.com>
Sage Weil [Fri, 10 Apr 2015 15:43:45 +0000 (08:43 -0700)]
crush: fix has_v4_buckets()
alg, not type!
This bug made us incorrectly think we were using v4 features when user type
5 was being used. That's currently 'rack' with recent crush maps, but
was other types for clusters that were created with older versions. This
is clearly problematic as it will lock out non-hammer clients incorrectly,
breaking deployments on upgrade.
Guang Yang [Thu, 26 Feb 2015 08:13:12 +0000 (08:13 +0000)]
osd: fix negative degraded objects during backfilling
When there is deleting requests during backfilling, the reported number of degraded
objects could be negative, as the primary's num_objects is the latest (locally) but
the number for replicas might not reflect the deletings. A simple fix is to ignore
the negative subtracted value.
This can be done better in a separate script, which puts these in
CEPH_EXTRA_CONFIGURE_ARGS. In particular, this lets us enable
lttng for gitbuilder builds, but not release builds.
Jason Dillaman [Mon, 16 Mar 2015 22:40:49 +0000 (18:40 -0400)]
librbd: snap_remove should ignore -ENOENT errors
If the attempt to deregister the snapshot from the parent
image fails with -ENOENT, ignore the error as it is safe
to assume that the child is not associated with the parent.
Samuel Just [Thu, 26 Mar 2015 17:26:48 +0000 (10:26 -0700)]
ReplicatedPG::cancel_pull: requeue waiters as well
If we are in recovery_wait, we might not recover that object as part of
recover_primary for some time. Worse, if we are waiting on a backfill
which is blocked waiting on a copy_from on the missing object in
question, it can become a dead lock.
Fixes: 11244
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Sage Weil [Fri, 27 Mar 2015 22:35:21 +0000 (15:35 -0700)]
common: send cluster log messages to 'cluster' channel by default
The CLOG_CHANNEL_DEFAULT constant was being abused for two purposes:
- the default channel to log messages to
- the name of the config option key in the key/value pair string that is
used for the default option, e.g. "default=true foo=false bar=false"
Fix this by making the config option key CLOG_CONFIG_DEFAULT_KEY and
replacing throughout, and changing CLOG_CHANNEL_DEFAULT to "cluster" (as
it should be and has been historically).
Fixes: #11177 Signed-off-by: Sage Weil <sage@redhat.com>
Samuel Just [Tue, 24 Mar 2015 17:48:02 +0000 (10:48 -0700)]
PG: set/clear CREATING in Primary state entry/exit
Previously, we did not actually set it when we got a pg creation message from
the mon. It would actually get set on the first start_peering_interval after
that point. If we don't get that far, but do send a stat update to the mon, we
can end up with 11197. Instead, let's just set it and clear it upon entry into
and exit from the Primary state.
Fixes: 11197 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Tue, 24 Mar 2015 22:14:34 +0000 (15:14 -0700)]
ReplicatedPG: trim backfill intervals based on peer's last_backfill_started
Otherwise, we fail to trim the peer's last_backfill_started and get bug 11199.
1) osd 4 backfills up to 31bccdb2/mira01213209-286/head (henceforth: foo)
2) Interval change happens
3) osd 0 now finds itself backfilling to 4 (lb=foo) and osd.5
(lb=b6670ba2/mira01213209-160/snapdir//1, henceforth: bar)
4) recover_backfill causes both 4 and 5 to scan forward, so 4 has an interval
starting at foo, 5 has an interval starting at bar.
5) Once those have come back, recover_backfill attempts to trim off the
last_backfill_started, but 4's interval starts after that, so foo remains in
osd 4's interval (this is the bug)
7) We serve a copyfrom on foo (sent to 4 as well).
8) We eventually get to foo in the backfilling. Normally, they would have the
same version, but of course we don't update osd.4's interval from the log since
it should not have received writes in that interval. Thus, we end up trying to
recover foo on osd.4 anyway.
9) But, an interval change happens between removing foo from osd.4 and
completing the recovery, leaving osd.4 without foo, but with lb >= foo
Fixes: #11199
Backport: firefly Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Fri, 20 Mar 2015 22:28:15 +0000 (15:28 -0700)]
ReplicatedPG::promote_object: check scrubber and block if necessary
Otherwise, we might attempt to promote into an in-progress scrub
interval causing 11156. I would have added a return value to
promote_object(), but could not find an existing user which
cared to distinguish the cases, even with a null op passed.
All existing users are in maybe_handle_cache. The ones which
pass a null op are for promoting the object in parallel
with a proxy -- a case where not actually performing the promote
does not really matter.
Fixes: #11156 Signed-off-by: Samuel Just <sjust@redhat.com>
Currently, this method also returns true if the object is backfilling.
This commit was reverted earlier in the branch in order to make the
other reverts clean. It's actually a nice rename though, so I'm
re-cherry-picking it.
Signed-off-by: Samuel Just <sjust@redhat.com>
Conflicts:
src/osd/ReplicatedPG.cc
Sage Weil [Thu, 19 Mar 2015 23:27:17 +0000 (16:27 -0700)]
osd: only complain about stored vs actual digest if all peers support it
If we have a mixed cluster of hammer and pre-hammer OSDs, we will fall back
to using 0 as the initial crc32c value. However, if the primary has a
stored digest, it currently compares its value to the reported value (w/
the wrong initial value) and complains.
There are two possible fixes:
- avoid storing a digest if all peers don't support it, or
- avoid complaining on scrub if all peers don't support it.
The latter is easier, and this fix also has the benefit of fixing the bug
even for clusters where this has already happened.
Fixes: #11102 Signed-off-by: Sage Weil <sage@redhat.com>