Sage Weil [Mon, 20 Feb 2017 16:49:08 +0000 (11:49 -0500)]
osd: restructure op_shardedwq
This is difficult to break into pieces, so one big fat commit it is.
A few trivial bits
- include epoch in PGQueueable.
- PGQueuable operator<<
- remove op_wq ref from OSDService; use simple set of queue methods instead
The big stuff:
- Fast dispatch now passes messages directly to the queue based on an
spg_t. The exception is MOSDOp's from legacy clients. We add a
waiting_for_map mechanism on the front-side that is similar to but simpler
than the previous one so that we can map those legacy requests to an
accurate spg_t.
- The dequeue path now has a waiting_for_pg mechanism. It also uses a
much simpler set of data structures that should make it much faster than
the previous incarnation.
- Shutdown works a bit differently; we drain the queue instead of trying
to remove work for individual PGs. This lets us remove the dequeue_pg
machinery.
Samuel Just [Thu, 16 Feb 2017 21:22:07 +0000 (13:22 -0800)]
osd,osdc: eliminate FLAG_ONDISK and helpers
The objecter actually always needs to get a response in order to
be able to not continually resend ops (even if the caller didn't
provide a callback). Thus, it makes no sense for an MOSDOp to
ever not have FLAG_ONDISK set. Therefore, we'll just remove the
helper and assume it's always there (it's safe to send a response
the client didn't ask for, the error paths already do that). On
the Objecter side, we'll just unconditionally fill in ONDISK for
the benefit of pre-luminous OSDs.
Fixes: http://tracker.ceph.com/issues/18961 Signed-off-by: Samuel Just <sjust@redhat.com>
Samuel Just [Wed, 15 Feb 2017 00:50:11 +0000 (16:50 -0800)]
PrimaryLogPG::start_flush: don't use ORDERSNAP, eliminate the second delete
I think that whole thing was a misguided attempt to avoid deleting head
if it exists in the base tier (in reality it doesn't matter since head
would have to be logically dirty and anything we actually care about
would be preserved by sending a new enough seq to cause a clone).
Kefu Chai [Wed, 22 Feb 2017 05:16:05 +0000 (13:16 +0800)]
script/sepia_bt.sh: get sha1,release from t.log if it's not in core
* sometimes, the coredump comes from python, so we should get the sha1 and
release in a different and more fragile way.
* also, the distro of Centos7 comes from python is "Centos Linux", so we
should normalize its distro name and distro version.
* add "-v" option to be more chatty.
* normalize the $prog if $prog is */python*
* fix the pkg_path if the distro is centos7
Matt Benjamin [Thu, 23 Feb 2017 21:02:07 +0000 (16:02 -0500)]
rgw_file: ensure valid_s3_object_name for directories, too
The logic in RGWLibFS::mkdir() validated bucket names, but not
object names (though RGWLibFS::create() did so).
The negative side effect of this was not creating illegal objects
(we won't), but in a) failing with -EIO and b) more importantly,
not removing up the proposed object from FHCache, which produced a
boost assert when recycled.
Fixes: http://tracker.ceph.com/issues/19066 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Matt Benjamin [Thu, 23 Feb 2017 15:21:38 +0000 (10:21 -0500)]
rgw_file: return of RGWFileHandle::FLAG_EXACT_MATCH
Allow callers of rgw_lookup() on objects attested in an
rgw_readdir() callback the ability to bypass exact match in
RGWLibFS::stat_leaf() case 2, but restore exact match enforcement
for general lookups.
This preserves required common_prefix namespace behavior, but
prevents clients from eerily permitting things like "cd sara0" via
partial name match on "sara01."
Fixes: http://tracker.ceph.com/issues/19059 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Sage Weil [Mon, 20 Feb 2017 19:26:42 +0000 (14:26 -0500)]
osdc/Objecter: _calc_target on all ops so that we notice splits
We need to make sure we update the mapping and get an accurate actual_pgid
value by recalcuating the mapping on every map change. Otherwise, we may
not notice a split (and subsequent actual_pgid change) and resend the same
op with a stale spg_t. To fix this,
- _calc_target on need_resend
- update target regardless of current con
Sage Weil [Mon, 20 Feb 2017 12:45:32 +0000 (07:45 -0500)]
osd: warn on ops directed to the wrong pg_t
Check whether the request hobj maps to the current pg_t. If we have the
osd_debug_misdirected_ops setting enabled (as teuthology does), assert out
as well so that the error is easy to spot. This catches bugs in the
Objecter (especially the new code that explicitly names the spg_t for the
request).
Kefu Chai [Wed, 22 Feb 2017 18:36:22 +0000 (02:36 +0800)]
ceph-dencoder: s/WITH_LIBAIO/HAVE_LIBAIO/
* s/WITH_LIBAIO/HAVE_LIBAIO/: as HAVE_LIBAIO is used to detect if libaio
is installed and is exposed in the acconfig.h.
* do not test bluestore_blob_t with ceph-dencoder, as it repurposes the
"feature" parameter for struct_v.
Signed-off-by: Willem Jan Withagen <wjw@digiware.nl> Signed-off-by: Kefu Chai <kchai@redhat.com>
add extends before marking unused ranges, otherwise add_unused()
asserts if (offset + len < blob_len)
this method is supposed to be used by ceph-dencoder, but
bluestore_blob_t's codec is quite different. we are not testing its
encoding in ceph-dencoder at this moment.
Matt Benjamin [Wed, 22 Feb 2017 15:24:29 +0000 (10:24 -0500)]
rgw_file: avoid stranding invalid-name bucket handles in fhcache
To avoid a string copy in the common mkdir path, handles for
proposed buckets currently are staged in the handle table, before
being rejected. They need to be destaged, not just marked deleted
(because deleted objects are now assumed not to be linked, as of beaeff059375b44188160dbde8a81dd4f4f8c6eb).
This triggered an unhandled Boost assert when deleting staged
handles, as current safe_link mode requires first removing from
the FHCache.
Fixes: http://tracker.ceph.com/issues/19036 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Loic Dachary [Wed, 22 Feb 2017 00:49:12 +0000 (01:49 +0100)]
ceph-disk: dmcrypt activate must use the same cluster as prepare
When dmcrypt is used, the fsid cannot be retrieved from the data
partition because it is encrypted. Store the fsid in the lockbox to
enable dmcrypt activation using the same logic as regular activation.
The fsid is used to retrive the cluster name that was used during
prepare, reason why activation does not and must not have a --cluster
argument.
Jason Dillaman [Tue, 21 Feb 2017 18:09:39 +0000 (13:09 -0500)]
rbd-mirror: object copy should always reference valid snapshots
If a remote snapshot is deleted while an image sync is in-progress,
associate the read request against the most recent, valid remote
snapshot for a given snapshot object clone.
Signed-off-by: Jason Dillaman <dillaman@redhat.com>