David Disseldorp [Thu, 17 Nov 2016 16:55:26 +0000 (17:55 +0100)]
doc/cephfs: add note about deletion from OSD restricted pool
As described in http://tracker.ceph.com/issues/17937, a client with
restricted pool access can still delete files unless a corresponding
MDS path restriction is also in place.
Matt Benjamin [Tue, 15 Nov 2016 22:43:16 +0000 (17:43 -0500)]
cmake: produce civetweb.h, again
The recent change to do this logic with file copy (and in src/rgw)
resolved the build problem, but now updates to the civetweb
submodule were not reflected in the build.
Move the copy into a custom target which will always source the
current submodule version at build time.
Avoid using the BYPRODUCTS option, as it is not supported in many
older cmake versions (e.g., Centos 7).
Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Samuel Just [Fri, 21 Oct 2016 21:29:09 +0000 (14:29 -0700)]
osd/: cleanup the snap trimmer and deal with delayed repops
With the PGBackend changes, it's not necessarily the case that
calling simple_opc_submit syncronously updates the SnapMapper.
Thus, we can't rely on being able to just ask the snap mapper
for the next object immediately (we could well loop on the same
one if ECBackend is flushing the pipeline). Instead, update
SnapMapper and the SnapTrimmer to grab N at a time.
Additionally, we need to make sure we don't try this again until
all of the previously submitted repops are flushed (a good idea
anyway). To that end, this patch also refactors the SnapTrimmer
machine to be fully explicit about why it's blocked so we can be
sure that we don't queue an async work item unless we really
want to.
Samuel Just [Wed, 19 Oct 2016 16:56:46 +0000 (09:56 -0700)]
osd/ECBackend: use an explicit backfill field on ECSubWrite
Previously, we used an empty transaction to indicate when we
were sending the op to a backfill peer which needs the logs,
but can't run the transaction. I'd like to be able to send
and empty transaction for the rollforward side effect without
it causing the peer to think it missed a backfill op, so
instead, use an explicit flag. Compatability is handled by
interpretting an old version encoding with an empty transaction
as having the backfill field filled.
Samuel Just [Tue, 18 Oct 2016 21:46:53 +0000 (14:46 -0700)]
ReplicatedPG::OpContext::start_async_reads: tolerate case sync callback call
If the read can be completed immediately, objects_read_async will call
the callback syncronously, which will result in ctx being cleaned up.
Clear pending_async_reads before the call.
Samuel Just [Thu, 3 Nov 2016 00:38:13 +0000 (17:38 -0700)]
osd/: use PGBackend::call_write_ordered to submit log entries in commit order
Without this change, we might submit new log entries for marking objects
unfound in a way that causes replicas to process them out-of-order with
pending writes with lower version numbers. That would be bad. Instead,
add an interface to allow an arbitrary callback to be called after any
previously submitted transaction commit, but before any subsequently
submitted operations commit.
Samuel Just [Fri, 21 Oct 2016 21:33:08 +0000 (14:33 -0700)]
osd/: Update PGBackend users to project last_update and submit stat deltas
The RMW pipeline means that we don't start committing an update
immediately, so we can't update the log syncronously with
submit_transaction. Thus, in order to pipeline writes, PG/ReplicatedPG
will need to project last_update and abstain from updating info
directly (updating info.stats was the only offender).
Samuel Just [Tue, 15 Nov 2016 23:47:37 +0000 (15:47 -0800)]
osd/: refactor PGLog a bit and add support for rolling back extents
It was hard to reason about the validity of the IndexedLog internal
pointers and iterators during updates, so this patch cleans that up
a bunch. It also moves responsibility for doing rollbacks into
PGBackend. Finally, it adds support for the new log entry format.
Samuel Just [Sat, 27 Aug 2016 18:33:02 +0000 (11:33 -0700)]
osd/: 's/trim_rollback_to/roll_forward_to/g'
trim_rollback_to was a not terrible name before in that all
it ever did is (possibly) trim the stashed version of the
object. However, now, it's going to encompass, in general,
the roll_forward part of a tpc (which will still be to
delete the stashed object in cases where that is
appropriate).
Samuel Just [Fri, 12 Aug 2016 15:42:12 +0000 (08:42 -0700)]
osd/: switch all users of PGTransaction to use the new structure
This patch removes ReplicatedBackend::PGTransaction and implemenations
and switches over all users. Happily, do_osd_ops loses the mod_desc
cruft and OpContext::pending_attrs. PGTransaction doesn't really
have a natural way to implement append, however. In reality, I think
this is probably an improvement, but it does mean that copy_from's
final transaction is now filled in by a lambda rather than by
appending a transaction fragment.
Samuel Just [Wed, 10 Aug 2016 22:45:32 +0000 (15:45 -0700)]
osd/: introduce PGTransaction
ECBackend is going to need a transaction representation which reduces
the operational representation from the OSDOp to a descriptive one
which makes questions like "what is the largest offest written" and
"does this transaction delete the object?" simple to answer. At the
same time, we're going to eliminate the PGBackend::PGTransaction
interface since I don't think writing directly to an
ObjectStore::Transaction is buying us enough to offset the irritation
of having to update both implemenations.
A happy consequence of this design will be that we can fill in the
pg_log_entry_t::mod_desc member after submission in the backend
rather than inline in do_osd_ops. We can also dispense with having
to maintain OpContext::pending_attrs separately from the ongoing
PGTransaction.
Samuel Just [Thu, 25 Aug 2016 23:42:17 +0000 (16:42 -0700)]
inline_variant: simplify it a lot, enable perfect forwarding
The previous implementation was a bit more baroque than it
needed to be. Also, it made copies of the lambdas in a
few places. Finally, it caused segfaults. Not actually
sure why.
If this sequence is interupted after 8 and replayed from 1, by the time
it gets to 3 the object will only have size 10 and no replay guard
(since 1 was skipped and 2 recreated the object with size 10 resulting
in a short read. This should only happen if the replay guard is
missing, which should only happen if the object gets deleted later
in the sequence.
As of the Mitaka release show_image_direct_url is not needed, but
instead show_multiple_locations should be used.
Adding the necessary guidance for Mitaka release.
In file included from /home/pdonnell/ceph/src/mds/MDSRank.h:18:0,
from /home/pdonnell/ceph/src/mds/MDBalancer.cc:18:
/home/pdonnell/ceph/src/common/TrackedOp.h:153:16: warning: ‘virtual void TrackedOp::_dump(ceph::Formatter*) const’ was hidden [-Woverloaded-virtual]
virtual void _dump(Formatter *f) const {}
^
In file included from /home/pdonnell/ceph/src/mon/mon_types.h:23:0,
from /home/pdonnell/ceph/src/mon/MonMap.h:22,
from /home/pdonnell/ceph/src/mon/MonClient.h:20,
from /home/pdonnell/ceph/src/mds/MDBalancer.cc:19:
/home/pdonnell/ceph/src/mon/MonOpRequest.h:106:8: warning: by ‘void MonOpRequest::_dump(utime_t, ceph::Formatter*) const’ [-Woverloaded-virtual]
void _dump(utime_t now, Formatter *f) const {
^
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Loic Dachary [Tue, 15 Nov 2016 16:16:37 +0000 (17:16 +0100)]
mon,ceph-disk: add lockbox permissions to bootstrap-osd
ceph-disk --dmcrypt needs to put a config-key and authorize
the OSD to get it back. The corresponding permissions are
added to the bootstrap-osd profile in the monitor.
When preparing the OSD lockbox, use the bootstrap-osd profile instead of
implicitly requiring admin permissions to perform the initial config-key
and auth get-or-create operations.
David Zafman [Tue, 30 Aug 2016 19:22:29 +0000 (12:22 -0700)]
rados, osd: Improve attrs output of list-inconsistent-obj
Persist the user_version and shard id of scrubbed obj
Rados command dump inconsistent obj's version and shard-id
so they can be passed to repair command
Rados list-inconsistent-obj output of attrs
Make attrs an array since there are more than one
Use base64 encode for values with non-printable chars
Add indication if base64 encoding used
Add checking for ss_attr_missing and ss_attr_corrupted
Rename attr errors to attr_key_mismatch and attr_value_mismatch
Add missing size_mismatch_oi scrub checking
For erasure coded pools add ec_size_error and ec_hash_error not just read_error
Use oi_attr_missing and oi_attr_corrupted just like list-inconsistent-snap does
Pick an object info based on version and use that to find specific shards in error
Check for object info inconsistency which should be rare
Make all errors based on comparing shards to each other object errors
We don't want give the impression that we've picked the correct one
Signed-off-by: Kefu Chai <kchai@redhat.com> Signed-off-by: David Zafman <dzafman@redhat.com>