Piotr Dałek [Fri, 16 Jun 2017 11:34:19 +0000 (13:34 +0200)]
messages/MOSDPing: optimize encode and decode of dummy payload
The dummy payload doesn't need to be processed, we can just skip over
it when decoding and we can use a single bufferptr instead of entire
bufferlist to encode it.
Piotr Dałek [Fri, 16 Jun 2017 11:10:36 +0000 (13:10 +0200)]
messages/MOSDPing: fix the inflation amount calculation
If user specifies a min_message_size small enough (or zero to disable
it altogether), OSDs will crash and burn while trying to allocate
almost 4GB of payload (both min_message_size and payload.length() are
unsigned, so it'll roll over back to 4GB and MAX(4GB, 0) will use 4GB).
If the size of dummy payload is 0, don't bother constructing bufferptr
and bufferlist, then encoding that.
Piotr Dałek [Fri, 16 Jun 2017 11:05:10 +0000 (13:05 +0200)]
OSD: mark two heartbeat config opts as observed
"osd heartbeat min size" and "osd heartbeat interval" can be changed
at runtime, because their values, when used, are always taken from
global Ceph configuration. Mark them as observed, so the message
the user sees once they're changed doesn't confuse them.
Greg Farnum [Mon, 5 Jun 2017 20:33:14 +0000 (13:33 -0700)]
osd: heartbeat with packets large enough to require working jumbo frames.
We get periodic reports that users somehow misconfigure one of their switches
so that it drops jumbo frames, yet the servers are still passing them along. In
that case, MOSDOp messages generally don't get through because they are much
larger than the 1500-byte non-jumbo limit, but the MOSDPing messages have kept
going (as they are very small and dispatched independently, even when the
server is willing to make jumbo frames). This means peer OSDs won't mark down
the ones behind the broken switch, despite all IO hanging.
Push the MOSDPing message size over the 1500-byte limit so that anybody in
this scenario will see the OSDs stuck behind a bad switch get marked down.
Conflicts:
src/messages/MOSDPing.h
- Changed HEAD_VERSION to 3 and kept COMPAT_VERSION to 1.
- In class MOSDPing removed following line:
if (header.version >= 2)
- To keep ::decode(stamp, p) without condition because HEAD_Version
is already 3 now and this condition is removed in the backport commit.
Nathan Cutler [Sun, 25 Jun 2017 08:32:16 +0000 (10:32 +0200)]
tests: upgrade/hammer-x/v0-94-6-mon-overload: tweak packages list
Include some hammer dependencies that aren't in the jewel default packages
list, and exclude some java packages that may not be in the hammer repo and are
not needed for the upgrade test in any case.
N.B.: This cannot be cherry-picked from master because upgrade/hammer-x was
dropped in master.
Nathan Cutler [Tue, 27 Jun 2017 00:27:22 +0000 (02:27 +0200)]
tests: drop upgrade/hammer-jewel-x
This suite doesn't have any test logic in it. Its existence in the jewel branch
appears to be an oversight.
This cannot be cherry-picked from master because the upgrade/hammer-jewel-x
suite is present (and justified) in master and is not currently being dropped
there.
Nathan Cutler [Sun, 25 Jun 2017 08:27:58 +0000 (10:27 +0200)]
tests: upgrade/hammer-x/stress-split: tweak packages list
Include some hammer dependencies that aren't in the jewel default packages
list, and exclude some java packages that may not be in the hammer repo and are
not needed for the upgrade test in any case.
N.B.: This cannot be cherry-picked from master because upgrade/hammer-x was
dropped in master.
Nathan Cutler [Fri, 23 Jun 2017 06:27:42 +0000 (08:27 +0200)]
tests: move swift.py task to qa/tasks
In preparation for moving this task from ceph/teuthology.git into ceph/ceph.git
The move is necessary because jewel-specific changes are needed, yet teuthology
does not maintain a separate branch for jewel. Also, swift.py is a
Ceph-specific task so it makes more sense to have it in Ceph.
Kefu Chai [Thu, 22 Jun 2017 00:06:43 +0000 (08:06 +0800)]
qa/workunits/rados/test-upgrade-*: whitelist tests the right way
--gtest_filter=POSTIVE_PATTERNS[-NEGATIVE_PATTERNS], so we cannot add
multiple exclusive patterns using -pattern:-pattern, instead, we should
use: -pattern:pattern
Signed-off-by: Kefu Chai <kchai@redhat.com>
Conflicts:
qa/workunits/rados/test-upgrade-v11.0.0.sh: this change is not
cherry-picked from master, because the clone-range op was removed
from master. and only supported in pre-luminous releases.
Kefu Chai [Tue, 20 Jun 2017 11:49:14 +0000 (19:49 +0800)]
qa/workunits/rados/test-upgrade-*: whitelist tests for master
The jewel-x upgrade test now runs this script against a mixed cluster on
a machine with code from master installed. That means we have to
skip any new tests that will fail on a mixed cluster. CloneRange was
removed in 0d7b0b7.
Signed-off-by: Kefu Chai <kchai@redhat.com>
Conflicts:
qa/workunits/rados/test-upgrade-v11.0.0.sh: this change is not
cherry-picked from master, because the clone-range op was removed from
master. and only supported in pre-luminous releases.
After Locker::issue_new_caps() adds new Capability data struct,
do not issue caps immediately. Let CInode::encode_inodestate()
do the job instead. This can avoid various races that early reply
is not allowed, caps that haven't been sent to client gets revoked.
tests: rados: sleep before ceph tell osd.0 flush_pg_stats after restart
Even though we wait for HEALTH_OK after restarting the daemons, they are not
ready to respond to flush_pg_stats.
The reason why the osd is not ready for "tell" command after "ceph health"
shows that the cluster is "HEALTH_OK" is that the monitor fails to be notified
that the osd in question is not up in "heatbeat_interval". Because infernalis
does not have the osd_fast_fail_on_connection_refused support, the monitor
needs longer to detect that an osd is down, and osd_heartbeat_grace is used to
determine if an osd is down.
Yan, Zheng [Fri, 19 May 2017 01:37:15 +0000 (09:37 +0800)]
client: update the 'approaching max_size' code
The old 'approaching max_size' code expects MDS set max_size to
'2 x reported_size'. This is no longer true. The new code reports
file size when half of previous max_size increment has been used.
Yan, Zheng [Wed, 17 May 2017 11:08:37 +0000 (19:08 +0800)]
mds: limit client writable range increment
For very large file, setting the writable range to '2 * file_size'
causes file recovery to run a long time. To recover a 1T file, Filer
needs to probe 2T~1T range.
Journaler::_do_flush() can skip flushing some data when prezered
journal space isn't enough. Before updating Journaler::next_safe_pos,
we need to check if Journaler::_do_flush() has flushed enough data.
Conflicts:
src/osdc/Journaler.cc - 8d4f6b92cba is not being backported to jewel
src/osdc/Journaler.h - Journaler::Journaler initializer list is different in jewel, compared to master
Brad Hubbard [Mon, 22 May 2017 03:21:25 +0000 (13:21 +1000)]
osd: Move scrub sleep timer to osdservice
PR 14886 erroneously creates a scrub sleep timer for every pg resulting
in a proliferation of threads. Move the timer to the osd service so
there can be only one.
Conflicts:
qa/tasks/cephfs/test_data_scan.py: difference in the
self._mount.run_shell() call in NonDefaultLayout::write (which is
being dropped by this commit) - in jewel it has "sudo", and in
master it doesn't
John Spray [Wed, 15 Mar 2017 17:51:44 +0000 (17:51 +0000)]
client: _getattr on quota_root before using in statfs
...so that after someone adjusts the quota settings
on an inode that another client is using as its mount root,
the change is visible immediately on the other client.
Casey Bodley [Fri, 5 May 2017 18:56:40 +0000 (14:56 -0400)]
cls/rgw: list_plain_entries() stops before bi_log entries
list_plain_entries() was using encode_obj_versioned_data_key() to set
its end_key, which gives a prefix of BI_BUCKET_OBJ_INSTANCE_INDEX[=2]
that range between start_key and end_key would not only span the
BI_BUCKET_OBJS_INDEX[=0] prefixes, but BI_BUCKET_LOG_INDEX[=1] prefixes
as well. this can result in list_plain_entries() trying and failing to
decode a rgw_bi_log_entry as a rgw_bucket_dir_entry
Sage Weil [Tue, 30 May 2017 13:58:09 +0000 (09:58 -0400)]
qa/workunits/rados/test-upgrade-*: whitelist tests for master
The jewel-x upgrade test now runs this script against a mixed cluster on
a machine with code from master installed. That means we have to skip
any new tests that will fail on a mixed cluster.
Karol Mroz [Thu, 17 Mar 2016 09:32:14 +0000 (10:32 +0100)]
rgw: rest and http client code to use param vectors
Replaces param/header lists with vectors. In these cases, we're only ever
adding to the back of the list, so a vector should be more efficient.
Also moves param_pair_t/param_vec_t higher up the include chain for
cleaner function signatures.