Dan Mick [Tue, 12 Aug 2014 21:09:43 +0000 (14:09 -0700)]
ceph.spec.in: No version on ceph-libs Obsoletes.
If we are installing with the new package structure we don't ever want the
new package to co-exist with the old one; this includes the mistakenly-
released v0.81 on Fedora, which should be removed in favor of this
version.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 8f95daf66b5fdb2a8141988480f984c1249599c5)
Erik Logtenberg [Thu, 31 Jul 2014 22:13:50 +0000 (00:13 +0200)]
ceph.spec.in, init-ceph.in: Don't autostart ceph service on Fedora.
This patch is taken from the current Fedora package and makes the upstream
ceph.spec compliant with Fedora policy. The goal is to be fully compliant
upstream so that we can replace current Fedora package with upstream
package to fix many bugs in Fedora.
Addition from Dan Mick <dan.mick@inktank.com>:
Do this for RHEL and Centos as well, since they surely will benefit
from the same policy. Note: this requires changes to
autobuild-ceph and ceph-build scripts, which currently copy
only the dist tarball to the rpmbuild/SOURCES dir.
Signed-off-by: Erik Logtenberg <erik@logtenberg.eu> Signed-off-by: Dan Mick <dan.mick@inktank.com>:
(cherry picked from commit 461523b06cdf93e32f1d8b354ac3799e73162d33)
Erik Logtenberg [Thu, 31 Jul 2014 21:49:56 +0000 (23:49 +0200)]
ceph.spec.in: add ceph-libs-compat
Added a ceph-libs-compat package in accordance with Fedora packaging
guidelines [1], to handle the recent package split more gracefully.
In Fedora this is necessary because there are already other packages
depending on ceph-libs, that need to be adjusted to depend on the new
split packages instead. In the mean time, ceph-libs-compat prevents
breakage.
Jason Dillaman [Sun, 7 Sep 2014 02:59:40 +0000 (22:59 -0400)]
Enforce cache size on read requests
In-flight cache reads were not previously counted against
new cache read requests, which could result in very large
cache usage. This effect is most noticeable when writing
small chunks to a cloned image since each write requires
a full object read from the parent.
Sage Weil [Thu, 25 Sep 2014 20:16:52 +0000 (13:16 -0700)]
osdc/Objecter: only post_rx_buffer if no op timeout
If we post an rx buffer and there is a timeout, the revocation can happen
while the reader has consumed the buffers but before it has decoded and
constructed the message. In particular, we calculate a crc32c over the
data portion of the message after we've taken the buffers and dropped the
lock.
Instead of fixing this race (for example, by reverifying rx_buffers under
the lock while calculating the crc.. bleh), just skip the rx buffer
optimization entirely when a timeout is present.
Note that this doesn't cover the op_cancel() paths, but none of those users
provide static buffers to read into.
Danny Al-Gaaf [Fri, 19 Sep 2014 10:25:07 +0000 (12:25 +0200)]
rgw_main.cc: add missing virtual destructor for RGWRequest
CID 1160858 (#1 of 1): Non-virtual destructor (VIRTUAL_DTOR)
nonvirtual_dtor: Class RGWLoadGenRequest has a destructor
and a pointer to it is upcast to class RGWRequest which doesn't
have a virtual destructor.
Locker: accept ctime updates from clients without dirty write caps
The ctime changes any time the inode does. That can happen even without
the file itself having changed, so we'd better accept the update whenever
the auth caps have dirtied, without worrying about the file caps!
Fixes: #9514
Backport: firefly
Signed-off-by: Greg Farnum <greg@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: John Spray <john.spray@redhat.com>
(cherry picked from commit 0ea20a668cf859881c49b33d1b6db4e636eda18a)
qa/workunits/cephtool/test.sh: fix thrash (ultimate)
Keep the osd trash test to ensure it is a valid command but make it a
noop by giving it a zero argument (meaning thrash 0 OSD maps).
Remove the loops that were added after the command in an attempt to wait
for the cluster to recover and not pollute the rest of the tests. Actual
testing of osd thrash would require a dedicated cluster because it the
side effects are random and it is unnecessarily difficult to ensure they
are finished.
This might have been the culprit for #9307. Before we were calculating
the hash after the call to processor->handle_data(), however, that
method might have spliced the bufferlist, so we can't be sure that the
pointer that we were holding originally is still invalid. Instead, push
the hash calculation down. Added a new explicit complete_hash() call to
the processor, since when we're at complete() it's too late (we need to
have the hash at that point already).
We need to identify whether an object is just composed of a head, or
also has a tail. Test for pre-firefly objects ("explicit objs") was
broken as it was just looking at the number of explicit objs in the
manifest. However, this is insufficient, as we might have empty head,
and in this case it wouldn't appear, so we need to check whether the
sole object is actually pointing at the head.
Yehuda Sadeh [Fri, 22 Aug 2014 04:53:38 +0000 (21:53 -0700)]
rgw: clear bufferlist if write_data() successful
Fixes: #9201
Backport: firefly
We sometimes need to call RGWPutObjProcessor::handle_data() again,
so that we send the pending data. However, we failed to clear the buffer
that was already sent, thus it was resent. This triggers when using non
default pool alignments.
Then, we begin to flush 15 with a delete with snapc 4:[4] leaving the
backing pool with:
4:[4]:[4(4)]
Then, we finish flushing 15 with snapc 9:[4] with leaving the backing
pool with:
9:[4]:[4(4)]+head
Next, snaps 10 and 15 are removed causing clone 10 to be removed leaving
the cache with:
30:[29,21,20,4]:[22(21),4(4)]+head
We next begin to flush 22 by sending a delete with snapc 4(4) since
prev_snapc is 4 <---------- here is the bug
The backing pool ignores this request since 4 < 9 (ORDERSNAP) leaving it
with:
9:[4]:[4(4)]
Then, we complete flushing 22 with snapc 19:[4] leaving the backing pool
with:
19:[4]:[4(4)]+head
Then, we begin to flush head by deleting with snapc 22:[21,20,4] leaving
the backing pool with:
22[21,20,4]:[22(21,20), 4(4)]
Finally, we flush head leaving the backing pool with:
30:[29,21,20,4]:[22(21*,20*),4(4)]+head
When we go to flush clone 22, all we know is that 22 is dirty, has snaps
[21], and 4 is clean. As part of flushing 22, we need to do two things:
1) Ensure that the current head is cloned as cloneid 4 with snaps [4] by
sending a delete at snapc 4:[4].
2) Flush the data at snap sequence < 21 by sending a copyfrom with snapc
20:[20,4].
Unfortunately, it is possible that 1, 1&2, or 1 and part of the flush
process for some other now non-existent clone have already been
performed. Because of that, between 1) and 2), we need to send
a second delete ensuring that the object does not exist at 20.
We have been setting it to the old head value. This is usually
harmless since the new head will virtually always be ahead of the
old head for claim_log_and_clear_rollback_info, but can cause trouble
in some edge cases.
Samuel Just [Mon, 15 Sep 2014 23:53:21 +0000 (16:53 -0700)]
PG::find_best_info: let history.last_epoch_started provide a lower bound
If we find a info.history.last_epoch_started above any
info.last_epoch_started, we must be missing updates and
min_last_update_acceptable should provisionally be max().
Fixes: #9482
Backport: firefly Signed-off-by: Samuel Just <sam.just@inktank.com>
The crash occurs due to ImageCtx->parent->parent being uninitialized,
since the inital open_parent() -> open_image(parent) ->
ictx_refresh(parent) occurs before ImageCtx->parent->snap_id is set,
so refresh_parent() is not called to open an ImageCtx for the parent
of the parent. This leaves the ImageCtx->parent->parent NULL, but the
rest of ImageCtx->parent updated to point at the correct parent snapshot.
Setting the parent->snap_id earlier has some unintended side effects
currently, so for now just call refresh_parent() during
open_parent(). This is the easily backportable version of the
fix. Further patches can clean up this whole initialization process.
init-radosgw.sysv: Support systemd for starting the gateway
When using RHEL7 the radosgw daemon needs to start under systemd.
Check for systemd running on PID 1. If it is then start
the daemon using: systemd-run -r <cmd>. pidof returns null
as it is executed too quickly, adding one second of sleep and
script reports startup correctly.
Sage Weil [Mon, 8 Sep 2014 20:44:57 +0000 (13:44 -0700)]
osdc/Objecter: revoke rx_buffer on op_cancel
If we cancel a read, revoke the rx buffers to avoid a use-after-free and/or
other undefined badness by using user buffers that may no longer be
present.
Fixes: #9362
Backport: firefly, dumpling Reported-by: Matthias Kiefer <matthias.kiefer@1und1.de> Signed-off-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 2305b2897acba38384358c33ca3bbfcae6f1c74e)
Samuel Just [Wed, 27 Aug 2014 23:21:41 +0000 (16:21 -0700)]
PG::can_discard_op: do discard old subopreplies
Otherwise, a sub_op_reply from a previous interval can stick around
until we either one day go active again and get rid of it or delete the
pg which is holding it on its waiting_for_active list. While it sticks
around futily waiting for the pg to once more go active, it will cause
harmless slow request warnings.
Sage Weil [Wed, 27 Aug 2014 13:19:12 +0000 (06:19 -0700)]
osd/PG: fix crash from second backfill reservation rejection
If we get more than one reservation rejection we should ignore them; when
we got the first we already sent out cancellations. More importantly, we
should not crash.
osd: OSDMap: ordered blacklist on non-classic encode function
Fixes: #9211
Backport: firefly
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com> Reviewed-by: Sage Weil <sage@redhat.com>
(cherry picked from commit 81102044f417bd99ca570d9234b1df5195e9a8c9)
Sage Weil [Tue, 26 Aug 2014 15:16:29 +0000 (08:16 -0700)]
osd/OSDMap: encode blacklist in deterministic order
When we use an unordered_map the encoding order is non-deterministic,
which is problematic for OSDMap. Construct an ordered map<> on encode
and use that. This lets us keep the hash table for lookups in the general
case.
Sage Weil [Wed, 20 Aug 2014 15:59:46 +0000 (08:59 -0700)]
mon: add a cluster fingerprint
Generate it on cluster creations with the initial monmap. Include it in
the report. Provide no way for this uuid to be fed in to the cluster
(intentionally or not) so that it can be assumed to be a truly unique
identifier for the cluster.
Sage Weil [Sat, 16 Aug 2014 19:42:33 +0000 (12:42 -0700)]
os/FileStore: fix mount/remount force_sync race
Consider:
- mount
- sync_entry is doing some work
- umount
- set force_sync = true
- set done = true
- sync_entry exits (due to done)
- ..but does not set force_sync = false
- mount
- journal replay starts
- sync_entry sees force_sync and does a commit while op_seq == 0
...crash...
Loic Dachary [Mon, 25 Aug 2014 15:05:04 +0000 (17:05 +0200)]
common: ROUND_UP_TO accepts any rounding factor
The ROUND_UP_TO function was limited to rounding factors that are powers
of two. This saves a modulo but it is not used where it would make a
difference. The implementation is changed so it is generic.
Haomai Wang [Thu, 20 Mar 2014 06:09:49 +0000 (14:09 +0800)]
Remove exclusive lock on GenericObjectMap
Now most of GenericObjectMap interfaces use header as argument not the union of
coll_t and ghobject_t. So caller should be responsible for maintain the
exclusive header.