Samuel Just [Thu, 3 Nov 2016 00:38:13 +0000 (17:38 -0700)]
osd/: use PGBackend::call_write_ordered to submit log entries in commit order
Without this change, we might submit new log entries for marking objects
unfound in a way that causes replicas to process them out-of-order with
pending writes with lower version numbers. That would be bad. Instead,
add an interface to allow an arbitrary callback to be called after any
previously submitted transaction commit, but before any subsequently
submitted operations commit.
Samuel Just [Fri, 21 Oct 2016 21:33:08 +0000 (14:33 -0700)]
osd/: Update PGBackend users to project last_update and submit stat deltas
The RMW pipeline means that we don't start committing an update
immediately, so we can't update the log syncronously with
submit_transaction. Thus, in order to pipeline writes, PG/ReplicatedPG
will need to project last_update and abstain from updating info
directly (updating info.stats was the only offender).
Samuel Just [Tue, 15 Nov 2016 23:47:37 +0000 (15:47 -0800)]
osd/: refactor PGLog a bit and add support for rolling back extents
It was hard to reason about the validity of the IndexedLog internal
pointers and iterators during updates, so this patch cleans that up
a bunch. It also moves responsibility for doing rollbacks into
PGBackend. Finally, it adds support for the new log entry format.
Samuel Just [Sat, 27 Aug 2016 18:33:02 +0000 (11:33 -0700)]
osd/: 's/trim_rollback_to/roll_forward_to/g'
trim_rollback_to was a not terrible name before in that all
it ever did is (possibly) trim the stashed version of the
object. However, now, it's going to encompass, in general,
the roll_forward part of a tpc (which will still be to
delete the stashed object in cases where that is
appropriate).
Samuel Just [Fri, 12 Aug 2016 15:42:12 +0000 (08:42 -0700)]
osd/: switch all users of PGTransaction to use the new structure
This patch removes ReplicatedBackend::PGTransaction and implemenations
and switches over all users. Happily, do_osd_ops loses the mod_desc
cruft and OpContext::pending_attrs. PGTransaction doesn't really
have a natural way to implement append, however. In reality, I think
this is probably an improvement, but it does mean that copy_from's
final transaction is now filled in by a lambda rather than by
appending a transaction fragment.
Samuel Just [Wed, 10 Aug 2016 22:45:32 +0000 (15:45 -0700)]
osd/: introduce PGTransaction
ECBackend is going to need a transaction representation which reduces
the operational representation from the OSDOp to a descriptive one
which makes questions like "what is the largest offest written" and
"does this transaction delete the object?" simple to answer. At the
same time, we're going to eliminate the PGBackend::PGTransaction
interface since I don't think writing directly to an
ObjectStore::Transaction is buying us enough to offset the irritation
of having to update both implemenations.
A happy consequence of this design will be that we can fill in the
pg_log_entry_t::mod_desc member after submission in the backend
rather than inline in do_osd_ops. We can also dispense with having
to maintain OpContext::pending_attrs separately from the ongoing
PGTransaction.
Samuel Just [Thu, 25 Aug 2016 23:42:17 +0000 (16:42 -0700)]
inline_variant: simplify it a lot, enable perfect forwarding
The previous implementation was a bit more baroque than it
needed to be. Also, it made copies of the lambdas in a
few places. Finally, it caused segfaults. Not actually
sure why.
If this sequence is interupted after 8 and replayed from 1, by the time
it gets to 3 the object will only have size 10 and no replay guard
(since 1 was skipped and 2 recreated the object with size 10 resulting
in a short read. This should only happen if the replay guard is
missing, which should only happen if the object gets deleted later
in the sequence.
As of the Mitaka release show_image_direct_url is not needed, but
instead show_multiple_locations should be used.
Adding the necessary guidance for Mitaka release.
In file included from /home/pdonnell/ceph/src/mds/MDSRank.h:18:0,
from /home/pdonnell/ceph/src/mds/MDBalancer.cc:18:
/home/pdonnell/ceph/src/common/TrackedOp.h:153:16: warning: ‘virtual void TrackedOp::_dump(ceph::Formatter*) const’ was hidden [-Woverloaded-virtual]
virtual void _dump(Formatter *f) const {}
^
In file included from /home/pdonnell/ceph/src/mon/mon_types.h:23:0,
from /home/pdonnell/ceph/src/mon/MonMap.h:22,
from /home/pdonnell/ceph/src/mon/MonClient.h:20,
from /home/pdonnell/ceph/src/mds/MDBalancer.cc:19:
/home/pdonnell/ceph/src/mon/MonOpRequest.h:106:8: warning: by ‘void MonOpRequest::_dump(utime_t, ceph::Formatter*) const’ [-Woverloaded-virtual]
void _dump(utime_t now, Formatter *f) const {
^
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Loic Dachary [Tue, 15 Nov 2016 16:16:37 +0000 (17:16 +0100)]
mon,ceph-disk: add lockbox permissions to bootstrap-osd
ceph-disk --dmcrypt needs to put a config-key and authorize
the OSD to get it back. The corresponding permissions are
added to the bootstrap-osd profile in the monitor.
When preparing the OSD lockbox, use the bootstrap-osd profile instead of
implicitly requiring admin permissions to perform the initial config-key
and auth get-or-create operations.
David Zafman [Tue, 30 Aug 2016 19:22:29 +0000 (12:22 -0700)]
rados, osd: Improve attrs output of list-inconsistent-obj
Persist the user_version and shard id of scrubbed obj
Rados command dump inconsistent obj's version and shard-id
so they can be passed to repair command
Rados list-inconsistent-obj output of attrs
Make attrs an array since there are more than one
Use base64 encode for values with non-printable chars
Add indication if base64 encoding used
Add checking for ss_attr_missing and ss_attr_corrupted
Rename attr errors to attr_key_mismatch and attr_value_mismatch
Add missing size_mismatch_oi scrub checking
For erasure coded pools add ec_size_error and ec_hash_error not just read_error
Use oi_attr_missing and oi_attr_corrupted just like list-inconsistent-snap does
Pick an object info based on version and use that to find specific shards in error
Check for object info inconsistency which should be rare
Make all errors based on comparing shards to each other object errors
We don't want give the impression that we've picked the correct one
Signed-off-by: Kefu Chai <kchai@redhat.com> Signed-off-by: David Zafman <dzafman@redhat.com>
xie xingguo [Wed, 16 Nov 2016 06:33:30 +0000 (14:33 +0800)]
os/bluestore: avoid unnecessary call to init_csum()
We have to initiate CSumType from 1, which represents CSUM_NONE,
to be aligned with OSDMnitor's pool_opts_t handling.
So we have to explicitly check against CSUM_NONE to skip init_csum(),
which will set FLAG_CSUM and alloc memory for csum_data and thus
shall be avoided whenever it is possible.
Dan Mick [Wed, 16 Nov 2016 03:42:06 +0000 (19:42 -0800)]
git-archive-all.sh: use an actually unique tmp dir
git archive into $TMPDIR/$(basename "$(pwd)").$FORMAT is not unique;
if two runs are running simultaneously, this will collide. Make
TMPDIR actually unique, and then the cleanup can just remove the whole
directory as well.
John Spray [Wed, 21 Sep 2016 10:45:38 +0000 (11:45 +0100)]
mon: make MDSMonitor tolerant of slow mon elections
Previously MDS daemons would get failed incorrectly
when they appeared to have timed out due to
delays in calling into MDSMonitor that were
actually caused by e.g. slow leveldb writes leading
to slow mon elections.
Fixes: http://tracker.ceph.com/issues/17308 Signed-off-by: John Spray <john.spray@redhat.com>