Kefu Chai [Tue, 11 Jun 2019 15:17:46 +0000 (23:17 +0800)]
qa: install python3-{cephfs,rados} instead of python34-*
we install the latest python-rpm-macros on all builders since
https://github.com/ceph/ceph-build/pull/1283 . now that we started
building python36-* after that change, for testing the python3 packages on
CentOS/RHEL 7, we need to install python36-* instead of python34-*.
and after the change of 8ae1947, python36-* now "Provides" python3-*, we
can just install python3-* for fulfill the requirement for testing
python3 cephfs bindings.
Fixes: http://tracker.ceph.com/issues/39164 Signed-off-by: Kefu Chai <kchai@redhat.com>
Conflicts: this change is not cherry-picked from master, because,
in master, we don't install python3 packages after 7e5c85b604.
(cherry picked from commit 6790821afc749b14b1ddac68a0889059419eebb3)
qa/tasks/ceph_deploy: install python3.6 instead of python3.4 for py3 tests
EPEL7 has switched over to python3.6 as the main python3. and we started
packaging python bindings for python3.6 since
https://github.com/ceph/ceph-build/pull/1283
rpm: add "Provides: python3-*" for python packages
so user can install python3-rados, instead of python36-rados, without
specifying the minor version of python. also, we should not break our
teuthology tests with this naming scheme change. for instance, our
cephfs qa suite installs `python3-cephfs` for testing the `cephfs-shell`
some of our centos7 jenkins builders are failing to build ceph master and
nautilus branches. because EPEL7 recently switched from python3.4 to
python3.6 as the native python3. see
https://lists.fedoraproject.org/archives/list/epel-announce@lists.fedoraproject.org/message/EGUMKAIMPK2UD5VSHXM53BH2MBDGDWMO/
and one of our BuildRequires, cmake3,
was offered by EPEL7. it also followed the python3.6 switch-over to
rebuild against python3.6. as a result, the cmake3-data-3.13.4-2.el7
started to depend on /usr/bin/python3.6, which is in turn offered by
python36 package. after installing python36 as a dependency of the
updated cmake3. but in cmake, we originally checks for the latest
python3 interpreter if WITH_PYTHON3 is enabled, that's why these
builders which happen to install these updated packages started to fail
when detecting the existence of python3.6 related build dependencies.
as a fix, in d1e83082,
python%{python3_pkgversion}-{devel,setuptools,Cython} are listed as
BuildRequires to reflect this change in EPEL7. before d1e83082, we
hardwired them to python34-*.
but as following analysis puts, there are cases where `yum-builddep`
is inconsistent with `rpmbuild`. as `yum-builddep` changes the how
`python3_pkgversion` and `python3_version` macros are expanded:
- none of the packages installed by `yum-builddep` installs the python3
related rpm macros, so the system stays with whatever python3 it was
using. in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are consistent before and
after `yum-builddep`.
- system has python3.4 installed before `yum-builddep`. but
`yum-builddep` installed python3.6 and also the updated
`python-rpm-macros` packages, which points `python3_version` and
`python3_pkgversion` to 3.6 and 36 respectively. in this case,
`rpmbuild` will complain, because when we run `yum-builddep`,
`python3_version` was still "3.4".
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python36, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.6 and 36 respectively.
in this case, `rpmbuild` will complain, because the python36 related
dependencies are missing. what the system has is python34
dependencies.
- system does not have python3 installed before `yum-builddep`. so
it was using python34 for preparing the "BuildRequires". but some
of the packages installed by `yum-builddep` installs python34, and
also the updated `python-rpm-macros` packages, which points
`python3_version` and `python3_pkgversion` to 3.4 and 34 respectively.
in this case, `rpmbuild` won't complain, as the
`python3_pkgversion` and `python_version` are also consistent before and
after `yum-builddep`.
as we cannot tell if the system has python3 or what the python3 version
the system has before `yum-builddep`, so what we can do is to ensure
`rpmbuild` has what it needs to build Ceph. so let's just stick with
python3.6.
to force cmake to use the python3 and python3 modules for building
python3 bindings
on the debian side, it's okay to continue using "-DWITH_PYTHON3=ON", as
- cmake does normalize "ON" to 3
- debian's cmake extension lives on /usr/lib/python3/dist-packages/
not in a specific /usr/lib/python3.x/dist-packages directory
use might have multiple python3 installed, some of them has/have all
dependencies installed and is good enough for building Ceph. we should
not always use the latest python installed in the system and complain that
there is missing dependencies, even if user has installed all the
python3 dependencies for the older python3.
put in other words, if user only installs cython module for python3.4, but
she has both python3.6 and python3.4 in her system. we should not force
her to uninstall python3.6 for installing Ceph.
this change also aligns with MGR_PYTHON_VERSION. i am not applying the
same change to WITH_PYTHON2, because python2 is already stablized. and distros
are not likely to release new python2 releases.
Conflicts:
src/CMakeLists.txt: in luminous, WITH_PYTHON3 was "CHECK" by
default. as it's complicatd to support this behavior. it is changed to
"ON" in this change to be consistent with mimic and up. since we always
specify -DWITH_PYTHON3=ON explicitly when building rpm and deb packages,
this change is not visible to our CI or package builders.
rgw: orphans find: don't process stale bucket instances
As a large bucket might have resharded multiple times, check the cur bucket info
and ensure that no reshard is in progress before we attempt to log bucket index
entries. On a large sized bucket, since a bucket would have undergone reshard
multiple times, this avoids wasteful processing of stale bucket instance entries
Conflicts:
src/rgw/rgw_orphan.cc
sys_objctx dropped as obj_ctx changes are a part of Nautilus. Also similarly
with the includes, only `rgw_bucket.h` is included
rgw: orphan: introduce a detailed mode (off by default)
We currently stat objects that fit in a head as well and also log them, since we
skip head objects anyway in the rados list output this commit avoids logging
these objects if the object size itself is less than the manifest head size.
Additionally we avoid the stat call itself from the list object output when the
object fits within the chunk size. This behaviour can be unset by setting the
detailed mode which can help in older clusters where the head used to have a
different size.
The old behaviour in both the cases can be turned on by setting the detailed
flag which can be passed on from rgw-admin. Avoiding stat calls and not logging
the head objects significantly reduces the IO activity on clusters which have a
huge percentage of objects that fit in a head.
rgw: orphans tool: align with rgw list bucket min readahead
At rgw::rados layer we read upto `min readahead` entries anyway and then pass on
only the requested amount to the caller. Since this translates down to a cls
call requesting a 1000 omap keys by default, it makes sense not to waste the
entries, and process them
Robin H. Johnson [Thu, 23 Aug 2018 17:57:24 +0000 (10:57 -0700)]
rgw: use chunked encoding to get partial results out faster
Some operations can take a long time to have their complete result.
If a RGW operation does not set a content-length header, the RGW
frontends (CivetWeb, Beast) buffer the entire request so that a
Content-Length header can be sent.
If the RGW operation takes long enough, the buffering time may exceed
keepalive values, and because no bytes have been sent in the connection,
the connection will be reset.
If a HTTP response header contains neither Content-Length or chunked
Transfer-Encoding, HTTP keep-alive is not possible.
To fix the issue within these requirements, use chunked
Transfer-Encoding for the following operations:
RGWCopyObj & RGWDeleteMultiObj specifically use send_partial_response
for long-running operations, and are the most impacted by this issue,
esp. for large inputs. RGWCopyObj attempts to send a Progress header
during the copy, but it's not actually passed on to the client until the
end of the copy, because it's buffered by the RGW frontends!
The HTTP/1.1 specification REQUIRES chunked encoding to be supported,
and the specification does NOT require "chunked" to be included in the
"TE" request header.
This patch has one side-effect: this causes many more small IP packets.
When combined with high-latency links this can increase the apparent
deletion time due to round trips and TCP slow start. Future improvements
to the RGW frontends are possible in two seperate but related ways:
- The FE could continue to push more chunks without waiting for the ACK
on the previous chunk, esp. while under the TCP window size.
- The FE could be patched for different buffer flushing behaviors, as
that behavior is presently unclear (packets of 200-500 bytes seen).
Performance results:
- Bucket with 5M objects, index sharded 32 ways.
- Index on SSD 3x replicas, Data on spinning disk, 5:2
- Multi-delete of 1000 keys, with a common prefix.
- Cache of index primed by listing the common prefix immediately before
deletion.
- Timing data captured at the RGW.
- Timing t0 is the TCP ACK sent by the RGW at the end of the response
body.
- Client is ~75ms away from RGW.
BEFORE:
Time to first byte of response header: 11.3 seconds.
Entire operation: 11.5 seconds.
Response packets: 17
AFTER:
Time to first byte of response header: 3.5ms
Entire operation: 16.36 seconds
Response packets: 206
Backport: mimic, luminous
Issue: http://tracker.ceph.com/issues/12713 Signed-off-by: Robin H. Johnson <rjohnson@digitalocean.com>
(cherry picked from commit d22c1f96707ba9ae84578932bd4d741f6c101a54)
mimic: rgw: civetweb: use poll instead of select while waiting on sockets
Non cherry-picked backport of 4d0035830e5783a828a275245b8bc3ae88edc417 as the
commit hashes of the submodules are different in different upstream release
branches of ceph/civetweb, so creating this new commit which directly references
the tip of ceph-mimic branch instead picking from ceph-master, as ceph-master
and ceph-mimic may diverge at a later stage.
The new civetweb 1.10 version in mimic and later is strict on control characters
being url encoded, making url validation more relaxed and passing these through
to rgw where the requisite url validation is done.
Introduces the following additions in rgw:
- allow_unicode_in_urls introduced with a corresponding downstream commit in
civetweb, as the newer version of civetweb validates that urls are url encoded
which swifttests do not follow, so introduce this as a configurable which we set
as true
- mg header struct changes in civetweb update, use auto here
- drop info->uri and use local_uri instead as the former is deprecated
Xinying Song [Wed, 15 Nov 2017 06:10:58 +0000 (14:10 +0800)]
rgw:send x-amz-version-id header when upload files
To be compatible with aws s3, an x-amz-version-id header should be returned.
For atomic upload, RGWPutObj::version_id will stores the version-id either
generated by rgw randomly or read from user. For multipart upload,
RGWCompleteMultipart::version_id will stores the version-id either
generated by rgw randomly or read from user.
Function send_respones() will send 'x-amz-version-id' header when version_id is not empty.
mon/OSDMonitor: further improve prepare_command_pool_set E2BIG error message
d2c0fe9b5319a4404965c40ec92e291802ef30f6 improved this error message,
but it can be improved further by suggesting that the pg_num be increased in
smaller increments.
xie xingguo [Tue, 26 Mar 2019 07:02:02 +0000 (15:02 +0800)]
osd/PG: move down peers out from peer_purged
In purge_strays(), we'll aggressively clear stray_set and
add all related peers into peer_purged.
However, if the corrsponding peer is down and becomes
up again, (unconditionally) adding it to peer_purged
will prevent primary from re-purging it.
(See Active::react(const MNotifyRec& notevt))
On consuming a new osdmap, let's move any down peers out from
peer_purged simutaneously. This way we can lower the risk
of leaving any leftover PGs behind.
xie xingguo [Tue, 26 Mar 2019 12:04:15 +0000 (20:04 +0800)]
osd/PG: introduce all_missing_unfound helper
We use pg_log.missing to track each peer's missing objects separately,
whereas missing_loc records the location of all (probably existing) good copies
for both primary and replicas' missing objects. Hence an item from
pg_log.missing or missing_loc is of different meaning and is not comparable.
During recovery, we can skip recovering primary only if
- primary is good, e.g., has no missing at all
- or all of the primary's missing objects do exist in missing_loc and are
currently unfound
Obviously, the current "all missing objects are unfound" checker is broken.
Fix by introducing an independent all_missing_unfound helper to make the
count of missing objects that are currently unfound correct.
Neha Ojha [Wed, 6 Feb 2019 03:23:21 +0000 (19:23 -0800)]
osd/PGLog: should not rollback further than deleted object version
When a deleted object becomes a divergent entry in the pg log,
we should not be able to rollback to a version of the deleted
object that doesn't exist.
To avoid this, we need to preserve the original crt of the pg log,
before we update it in rewind_from_head() and use that to decide whether
we can rollback or not in _merge_object_divergent_entries().
Zengran Zhang [Wed, 27 Mar 2019 01:39:31 +0000 (09:39 +0800)]
osd: shutdown recovery_request_timer earlier
recovery_request_timer may hold some QueuePeeringEvts which PGRef,
if we dont shutdown it earlier, it potentially cause the PGRef leak
when kicking pg.
Jonas Jelten [Mon, 1 Apr 2019 10:28:09 +0000 (12:28 +0200)]
osd/PG: discover missing objects when an OSD peers and PG is degraded
When a PG is remapped from OSD `a` to OSD `b`, the objects are
backfilled. When OSD `a` is restarted, objects become degraded
as `a` is no longer queried or considered as a backfill source.
As the PG is degraded, `PG::discover_all_missing` is not called
when a candidate OSD peers with the primary: The PG is already
active, thus `PG::activate` (and in turn missing object discovery)
is not called. Discovery is also not initiated from
`PG::RecoveryState::Active::react(const MNotifyRec& notevt)`
as there are no unfound objects.
This patch adds a call to `discover_all_missing` when
when an OSD sends its `MNotifyRec` message and the PG is degraded.
rgw: s3: awsv4 drop special handling for x-amz-credential
While s3 docs mention that every byte must be urlencoded, they are relaxed in
its implementation, when testing this behaviour on aws s3 itself, they seem to
be relaxed in handling aws credentials of the form
Casey Bodley [Fri, 15 Mar 2019 20:47:19 +0000 (16:47 -0400)]
rgw: don't crash on missing /etc/mime.types
lack of mime types is not a fatal error. when a Content-Type header
is not provided in swift's PutObj, it uses this mime type mapping
to guess a content type based on the object's suffix