Sage Weil [Thu, 26 Oct 2017 21:23:41 +0000 (16:23 -0500)]
osd/PG: fix recovery op leak due to recovery preemption
This was fixed in master in a different patch, but are not yet ready to
backport the bits there that came before this. For now, fix it
specifically for luminous. We can either sort out the conflicts later
or revert this and backport the master parts conflict-free.
chenliuzhong [Tue, 24 Oct 2017 02:54:33 +0000 (10:54 +0800)]
ceph.spec.in,debian/rules: change aio-max-nr to 1048576
when osd is more than 14 in one host,it report error that aio is not enough.
As the default aio-max-nr is 65536, one OSD needs 4096 aios and other programs may use aios.
This patch change aio-max-nr to 1048576 when install ceph-osd rpm package and debian package
Jason Dillaman [Fri, 20 Oct 2017 02:13:36 +0000 (22:13 -0400)]
common/common_init: disable ms subsystem log gathering for clients
The log gathering causes large performance degradation to clients
with high message throughputs. This is hopefully a short-term
workaround until per-message logging can be replaced with an
efficient data recording system for post-incident analysis
use-cases.
Fixes: http://tracker.ceph.com/issues/21860 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit a3a40413f7908b08c40dec4020034cca4a0c4798)
Sage Weil [Mon, 23 Oct 2017 22:11:59 +0000 (17:11 -0500)]
osd/PrimaryLogPG: clear DEGRADED at recovery completion even if more backfill
We may have log recovery *and* backfill to do, but cease to be degraded
as soon as the log recovery portion is done. If that's the case, clear
the DEGRADED bit so that the PG state is not misleading.
Sage Weil [Fri, 20 Oct 2017 13:51:17 +0000 (08:51 -0500)]
os/bluestore/BlueFS: fix race with log flush during async log compaction
During async log compaction we rely on _flush-and_sync_log to update the
log_writer to jump_to. However, if racing threads are also trying to flush
the log and manage to flush our new log events for us, then our flush will
turn into a no-op, and we won't update jump_to correctly at all. This
results in a corrupted log size a bit later one.
Fix by ensuring that there are no in-progress flushes before we add our
log entries. Also, add asserts to _flush_and_sync_log to make sure we
never bail out early if jump_to is set (which would indicate this or
another similar bug is still present).
Sage Weil [Mon, 23 Oct 2017 03:46:00 +0000 (22:46 -0500)]
osd/PG: on recovery done, requeue for backfill
We were keeping our existing recovery reservation slot (with a high
priority) and going straight to waiting for backfill reservations on
the peers. This is a problem because the reserver thinks we're doing
high priority work when we're actually doing lower-priority backfill.
Fix by closing out our recovery reservation and going to the
WaitLocalBackfillReserved state, where we'll re-request backfill at the
appropriate priority.
Sage Weil [Thu, 19 Oct 2017 21:19:35 +0000 (16:19 -0500)]
buffer: remove list _mempool member
This broke the C++ ABI by changing the list structure size. Also, it's
not necessary as we can infer the mempool by looking at the other list
contents. We don't (currently) have a need to map an empty list to a
particular mempool and have that state stick.
Sage Weil [Wed, 6 Sep 2017 02:46:48 +0000 (22:46 -0400)]
mon/OSDMonitor: improve crush map validation
- move into OSDMap method
- ensure that rules exist for each pool
- ensure pool type matches rule type
- ensure rule mask min/max size cover the pool size
John Spray [Fri, 25 Aug 2017 10:06:21 +0000 (11:06 +0100)]
mon: more forceful renumbering of legacy ruleset IDs
Previously, the rules were only modified in the trivial case,
so we continued to potentially have CRUSH maps with the
legacy ruleset functionality in use.
In order to ultimately remove rulesets entirely, we need
to do this more aggressively, renumbering all the rules
and then updating any pools as needed.
Conflicts:
src/mon/OSDMonitor.cc: the check for multiple rules was removed
in master, but not in luminous. once we renumber the legacy ruleset IDs,
it's not need to check for and to warn the user at seeing the case where
1-to-n mapping from ruleset to rule IDs.
Sage Weil [Sat, 21 Oct 2017 03:32:33 +0000 (22:32 -0500)]
messages/MOSDMap: do compat reencode of crush map, too
If we are reencoding an incremental, and it embeds a crush map, we need
to reencode that in a compatible way too. This is especially true now
because we have the compat crush weight-sets. Otherwise, a client may
learn the crush map through an incremental but not understand choose_args,
and not see/understand the alternate weight set. It will send requests
to the wrong OSDs where they will just get dropped.
Sage Weil [Tue, 19 Sep 2017 22:25:56 +0000 (18:25 -0400)]
osd/OSDMap: ignore xinfo if features == 0
Some old bug (e.g., http://tracker.ceph.com/issues/20751) could
result in an UP+EXISTS osd having features==0. If that happens,
we shouldn't crash the mon, which (reasonably) does
if (osdmap.get_epoch()) {
if (osdmap.get_num_up_osds() > 0) {
assert(osdmap.get_up_osd_features() & CEPH_FEATURE_MON_STATEFUL_SUB);
check_subs();
}
}
Adam C. Emerson [Thu, 28 Sep 2017 17:54:32 +0000 (13:54 -0400)]
rgw: Check bucket Website operations in policy
Add code to check s3:GetBucketWebsite and s3:PutBucketWebsite
operations against bucket policy.
Fixes: http://tracker.ceph.com/issues/21597 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1493896 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit ceed535957ac186e241fcff26b103cf7efa959b1)
Adam C. Emerson [Wed, 27 Sep 2017 19:42:27 +0000 (15:42 -0400)]
rgw: Check bucket CORS operations in policy
Add code to check s3:GetCORS and s3:PutCORS operations against bucket
policy.
Fixes: http://tracker.ceph.com/issues/21578 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1494140 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 27eb13fe568cc802feaf69131a21db076bcb6746)
Adam C. Emerson [Wed, 27 Sep 2017 20:08:56 +0000 (16:08 -0400)]
rgw: Check bucket GetBucketLocation in policy
Add code to check s3:GetBucketLocation against bucket policy.
Fixes: http://tracker.ceph.com/issues/21582 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1493934 Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 79188d679edeb6e2f7ca852fdc4224368412cb72)
rgw: defer constructing keystone engine unless url is configured
currently we create a keystone revocation thread even when keystone url
is empty, lets defer the construction of keystone unless the urls are
configured
Casey Bodley [Thu, 5 Oct 2017 20:39:30 +0000 (16:39 -0400)]
rgw: RGWUser::init no longer overwrites user_id
if an admin op specifies a user_id and does not find a user with that
id, but does find a user based on a later field (email, access key,
etc), RGWUser::user_id will be overwritten with the existing user's id
when this happens on 'radosgw-admin user create', RGWUser::execute_add()
will modify that existing user, instead of trying to create a new user
with the given user_id (and failing due to the conflicting email,
access key, etc)
by preserving the original user_id (when specified), this uid conflict
is detected in RGWUser::check_op() and a "user id mismatch" error is
returned
Matt Benjamin [Tue, 3 Oct 2017 21:48:29 +0000 (17:48 -0400)]
rgw: release cls lock if taken in RGWCompleteMultipart
Follows Casey's proposal to conditionally release the lock in
::complete(), in order to avoid duplicated code in various early
return cases.
Fixes: http://tracker.ceph.com/issues/21596 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 704f793f08a02760d23eb5778b738bb07be0e7cf)
Enming Zhang [Fri, 25 Aug 2017 11:48:53 +0000 (19:48 +0800)]
rgw: encryption fix the issue when not provide encryption mode
Now, in RGW, if someone want to upload an object using server-side
encryption with providing customer key or kms key id, but not
specify the encryption mode in the
"x-amz-server-side-encryption-customer-algorithm" or
"x-amz-server-side-encryption", the object will be uploaded
successfully without encryption.
This is not a correct way to deal with it. It is better to
return error.