prev_snapc will be 0, oi.snaps will be [a], p will end up at end(), get
assigned to dnewest, and we'll dereference. It's only sometime harmful
though because we may still take the right (else) branch...
Fixes: #9294 Signed-off-by: Sage Weil <sage@redhat.com>
Sage Weil [Thu, 28 Aug 2014 17:59:18 +0000 (10:59 -0700)]
test/mon/*: prime mon with initial command before injection
The osdmonitor_prepare_command is very fragile. Send an initial command
to the mon beforehand. This seems to prevent the initial command from
getting combined into an early mon proposal with some other stuff.
Alternatively, we could remove these tests and this mechanism entirely as
it is likely to great in the future when the next set of mon changes are
made, but they have shown themselves to be useful it catching other
regressions, so we'll patch them up for a bit longer.
Loic Dachary [Sat, 23 Aug 2014 09:07:29 +0000 (11:07 +0200)]
erasure-code: assert the PluginRegistry lock is held when it must
Add lock to the preload method and assert that it is held by methods
requiring it. Although preload is called at bootstrap and does not
require the lock, adding it does not hurt and makes the lock policy
clearer to understand.
Loic Dachary [Thu, 21 Aug 2014 16:38:52 +0000 (18:38 +0200)]
erasure-code: add Ceph version check to plugins
Add the __erasure_code_version function to all plugins, to return the
Ceph version against which they have been compiled. When a plugin is
loaded, an error is thrown if the version of the plugin does not match
the version of the daemon loading it.
If the symbol does not exist, which will be true of older plugins, set
the version to "an older version" so it never matches.
Loic Dachary [Thu, 21 Aug 2014 16:31:02 +0000 (18:31 +0200)]
erasure-code: jerasure preloads the plugin variant
The variant selection depending on the available CPU features is
encapsulated in a helper. The helper is used in the factory() method and
in the load() method.
The factory() method may load a variant that is not the default, for
benchmark purposes. Such a variant is not preloaded by the load() method
and upgrading while running may be problematic. However, running with a
non standard variant is used for benchmarking and upgrades in this
context are not a concern.
Loic Dachary [Thu, 21 Aug 2014 16:22:18 +0000 (18:22 +0200)]
erasure-code: add directory to plugin init functions
The prototype of the init functions of erasure coded plugins is changed
from
int __erasure_code_init(char *plugin_name)
to
int __erasure_code_init(char *plugin_name, char *directory)
The jerasure plugin will find optimized variants in this directory and
load them. The load() and preload() functions of
ErasureCodePluginRegistry only use a directory instead of a more generic
parameters map. The parameters map was only used for the directory entry
anyway.
Samuel Just [Wed, 27 Aug 2014 23:21:41 +0000 (16:21 -0700)]
PG::can_discard_op: do discard old subopreplies
Otherwise, a sub_op_reply from a previous interval can stick around
until we either one day go active again and get rid of it or delete the
pg which is holding it on its waiting_for_active list. While it sticks
around futily waiting for the pg to once more go active, it will cause
harmless slow request warnings.
Fixes: #9259
Backport: firefly Signed-off-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Tue, 19 Aug 2014 23:48:34 +0000 (16:48 -0700)]
mon/Paxos: make backend write async
Move into the WRITING state and do the write to leveldb (or whatever the
backend is) asynchronously.
A few tricks here:
- we can't do the is_updating() state check because we will always be in
REFRESH. Instead, make commit_proposal() tolerate the case where it is
called but the top proposal isn't the one we just did (or the list is
empty). This makes the callers simpler.
- do_refresh() may call bootstrap. If we do bootstrap while in REFRESH,
don't do a sync/flush on the backend store because *we* are async
completion thread and we'll deadlock. All other callers need to wait
for this, though!
Sage Weil [Tue, 19 Aug 2014 23:45:46 +0000 (16:45 -0700)]
mon/Paxos[Service]: allow reads during WRITING state
The REFRESH state is not readable; that's when we are re-reading our state
out of leveldb, and we hold the mon_lock during the period. So, strictly
speaking, it doesn't matter whether we include it here since none of these
call sites would be visited while in that state.
Sage Weil [Sun, 17 Aug 2014 05:29:04 +0000 (22:29 -0700)]
mon/Paxos: move post-commit finish work into commit_finish()
The main change here is that we are merging the singleton and clustered
finish code together. This is mostly a code shuffle, except for one
semantic change: we now trigger the commit waiters before finish_round()
in the singleton case, whereas before we did not. I don't think there
was a specific reason why it differed from the clustered case.
Dan Mick [Tue, 12 Aug 2014 23:31:22 +0000 (16:31 -0700)]
ceph.spec.in: tests for rhel or centos need to not include _version
rhel_version and centos_version are apparently the OpenSUSE Build
names; the native macros are just "rhel" and "centos" (and contain
a version number, should it be necessary).
Dan Mick [Tue, 12 Aug 2014 21:09:43 +0000 (14:09 -0700)]
ceph.spec.in: No version on ceph-libs Obsoletes.
If we are installing with the new package structure we don't ever want the
new package to co-exist with the old one; this includes the mistakenly-
released v0.81 on Fedora, which should be removed in favor of this
version.
Signed-off-by: Sandon Van Ness <sandon@inktank.com> Reviewed-by: Dan Mick <dan.mick@inktank.com>
Erik Logtenberg [Thu, 31 Jul 2014 22:13:50 +0000 (00:13 +0200)]
ceph.spec.in, init-ceph.in: Don't autostart ceph service on Fedora.
This patch is taken from the current Fedora package and makes the upstream
ceph.spec compliant with Fedora policy. The goal is to be fully compliant
upstream so that we can replace current Fedora package with upstream
package to fix many bugs in Fedora.
Addition from Dan Mick <dan.mick@inktank.com>:
Do this for RHEL and Centos as well, since they surely will benefit
from the same policy. Note: this requires changes to
autobuild-ceph and ceph-build scripts, which currently copy
only the dist tarball to the rpmbuild/SOURCES dir.
Signed-off-by: Erik Logtenberg <erik@logtenberg.eu> Signed-off-by: Dan Mick <dan.mick@inktank.com>:
Erik Logtenberg [Thu, 31 Jul 2014 21:49:56 +0000 (23:49 +0200)]
ceph.spec.in: add ceph-libs-compat
Added a ceph-libs-compat package in accordance with Fedora packaging
guidelines [1], to handle the recent package split more gracefully.
In Fedora this is necessary because there are already other packages
depending on ceph-libs, that need to be adjusted to depend on the new
split packages instead. In the mean time, ceph-libs-compat prevents
breakage.
Yehuda Sadeh [Fri, 22 Aug 2014 04:53:38 +0000 (21:53 -0700)]
rgw: clear bufferlist if write_data() successful
Fixes: #9201
Backport: firefly
We sometimes need to call RGWPutObjProcessor::handle_data() again,
so that we send the pending data. However, we failed to clear the buffer
that was already sent, thus it was resent. This triggers when using non
default pool alignments.
Sage Weil [Wed, 27 Aug 2014 13:19:12 +0000 (06:19 -0700)]
osd/PG: fix crash from second backfill reservation rejection
If we get more than one reservation rejection we should ignore them; when
we got the first we already sent out cancellations. More importantly, we
should not crash.
Fixes: #8863
Backport: firefly, dumpling Signed-off-by: Sage Weil <sage@redhat.com>
common: config: let us obtain a diff between current and default config
It's mildly annoying when trying to figure out what has been changed on
a running system's config options and having to rely on whatever is set
on ceph.conf and the admin's memory of what has been injected.
With this we can simply ask the daemon for the diff between what would be
its default and what is its current config.
Current form will output extraneous information that was not directly
supplied by the user though, such as 'host' 'fsid' and 'daemonize', as
well as defaults we may rewrite ourselves (leveldb tunables on the monitor
for instance). Nonetheless, it's way better than the alternative and
considering it should be used solely for debug purposes I think we can
get away with it.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Then, we begin to flush 15 with a delete with snapc 4:[4] leaving the
backing pool with:
4:[4]:[4(4)]
Then, we finish flushing 15 with snapc 9:[4] with leaving the backing
pool with:
9:[4]:[4(4)]+head
Next, snaps 10 and 15 are removed causing clone 10 to be removed leaving
the cache with:
30:[29,21,20,4]:[22(21),4(4)]+head
We next begin to flush 22 by sending a delete with snapc 4(4) since
prev_snapc is 4 <---------- here is the bug
The backing pool ignores this request since 4 < 9 (ORDERSNAP) leaving it
with:
9:[4]:[4(4)]
Then, we complete flushing 22 with snapc 19:[4] leaving the backing pool
with:
19:[4]:[4(4)]+head
Then, we begin to flush head by deleting with snapc 22:[21,20,4] leaving
the backing pool with:
22[21,20,4]:[22(21,20), 4(4)]
Finally, we flush head leaving the backing pool with:
30:[29,21,20,4]:[22(21*,20*),4(4)]+head
When we go to flush clone 22, all we know is that 22 is dirty, has snaps
[21], and 4 is clean. As part of flushing 22, we need to do two things:
1) Ensure that the current head is cloned as cloneid 4 with snaps [4] by
sending a delete at snapc 4:[4].
2) Flush the data at snap sequence < 21 by sending a copyfrom with snapc
20:[20,4].
Unfortunately, it is possible that 1, 1&2, or 1 and part of the flush
process for some other now non-existent clone have already been
performed. Because of that, between 1) and 2), we need to send
a second delete ensuring that the object does not exist at 20.
Fixes: #9054
Backport: firefly Signed-off-by: Samuel Just <sam.just@inktank.com>