Kefu Chai [Wed, 7 Aug 2019 09:46:13 +0000 (17:46 +0800)]
admin/build-doc: use python3
to address https://github.com/sphinx-doc/sphinx/issues/3620, we need to
use sphinx with its fix at
https://github.com/sphinx-doc/sphinx/commit/e049f86b2de1cfdf8a74c88dc9593d047c85d5cb
in other words, we need to use sphinx v2.0.0 and up. but sphinx 2.0
requires python >= 3.5, so we have to use python3 for building the
documents.
in this change:
* doc-requirements.txt: install python3 packages on debian derivatives
* build-doc: install python3.6 packages from EPEL7, and use python3
venv for using sphinx2
* doc-requirements.txt: bump up all python packages to latest
stable.
ceph-volume: do not fail when trying to remove crypt mapper
In a containerized context, at some point, need to run `simple scan` on a device
from a separate container (not the existing and running corresponding container
to that device), but this can't work because when it tries to remove the
mapper which is still in use by the corresponding running osd container,
it fails.
This can be a bit more permissive and simply throw a warning.
Jan Fajerski [Thu, 15 Aug 2019 10:20:00 +0000 (12:20 +0200)]
ceph-volume: don't keep device lists as sets
This was introduced by #27754. The explicit device lists were cast to
sets but other parts of the code where not updated accordingly. To avoid
touching all code places, only cast to sets for disjoint test and keep
lists otherwise.
Fixes: https://tracker.ceph.com/issues/41292 Signed-off-by: Jan Fajerski <jfajerski@suse.com>
(cherry picked from commit 0534cf188a671096d5ddb9d48cdae3dccc6c0b18)
This caused problems in environments where stderr was redirected, since
stderr sets the encoding to None. Getting it back again allows
everything to work correctly, and keeps all the current unit tests
passing
Kefu Chai [Thu, 30 May 2019 15:44:37 +0000 (23:44 +0800)]
qa/standalone/ceph-helpers: resurrect all OSD before waiting for health
address the regression introduced by e62cfceb
in e62cfceb, we wanted to test the newly introduced TOO_FEW_OSDS
warning, so we increased the number of OSD to the size of pool, so if
the number of OSD is less than pool size, monitor will send a warning
message.
but we need to bring all OSDs back if we are expecting a healthy
cluster. in this change, all OSDs are resurrect before
`wait_for_health_ok`.
map<hobject_t, snapid_t>::iterator i = objects_blocked_on_degraded_snap.find(
oid.get_head()); // Access
...
}
Fixes: https://tracker.ceph.com/issues/41250 Signed-off-by: Tao Ning <ningtao@sangfor.com.cn>
(cherry picked from commit 86d55c1a0ddb48efc0c1934728d27f22cf49dfa1)
Conflicts: src/osd/PGBackend.h: `PrimaryLogPG` derives from
`PGBackend::Listener` in mimic, and it's `PGBackend::Listener` 's only
derived class. so we need to update `PGBackend::Listener` accordingly.
osd/PrimaryLogPG: update oi.size on write op implicitly truncating object up
See "BlueStore::_do_truncate", bluestore will reset ondisk object size
unconditionally to track truncated size. Hence we must adjust logical
size (and usage) accordingly to match the specified metrics.
install-deps.sh: install `python*-devel` for python*rpm-macros
in 087ea813, we installed '*rpm-macros' for the macros, so we can have
access to the latest python packaging related macros for preparing the
build dependencies.
but we could run into https://bugs.centos.org/view.php?id=16379, if
we already have an old version of python-devel installed. as the newer
version of python-rpm-macros conflicts with it.
it was a chicken-and-egg problem, as we don't know the exact name of
*rpm-macros packages. that's why we chose to install all of them. but
we have to upgrade the existing python-devel package to resolve the
conflict. but the since there is no python3-devel in RHEL7/CentOS7,
what they have is python36-devel. so we have to hardwire the
`%{python3_pkgversion}` to "36" even before we have access to this
macro, and upgrade the python36-devel package beforehand. but this
renders installing the rpm-macro package less useful -- we intend to
use the macro offered by the package to figure out "36".
as a workaround, we pretend that we know the "main" version of python3
in current RHEL/CentOS. and always install python36-devel for
python-rpm-macros. as the former requires the latter.
once all python3*-devel on all builders are upgraded, we will be safe
to install '*rpm-macros' again without installing python36-devel first.
by then, we could revert this change, or continue installing
python36-devel until the distro bumps up the "main" python version to 3.7
Zengran Zhang [Tue, 20 Aug 2019 07:06:09 +0000 (15:06 +0800)]
osd: clear PG_STATE_CLEAN when repair object
there is a race be found, when we repair object on clean state,
we queue a DoRecovery peering event, but before the peering event
dequeue,a snaptrim event on the missing object's snap dequeue,
then we will get pass the context< SnapTrimmer >().can_trim()
and go to get the context of the missing object(snapdir)
we can avoid this by clear clean state when we found missing..
Conflicts:
src/osd/PrimaryLogPG.cc
- assert() instead of ceph_assert(), and Feature PR
https://github.com/ceph/ceph/pull/26942 ("Improvements to auto repair") is
not being backported
Sage Weil [Mon, 19 Aug 2019 21:32:22 +0000 (16:32 -0500)]
osd/PeeringState: do not complain about past_intervals constrained by oldest epoch
The start of the required interval has a floor set by the oldest osdmap
epoch we have. That can lead to an invalid/empty required interval
(because the start is >= the end), but the PG may still have past
intervals. That can be cause by a slow PG deletion.
No need to complain about this harmless condition.
Conflicts:
src/osd/PeeringState.cc
- file does not exist in mimic; made the changes manually in src/osd/PG.cc
- mimic has a different way of getting the oldest osdmap
Kefu Chai [Mon, 19 Aug 2019 07:21:06 +0000 (15:21 +0800)]
cmake,run-make-check.sh,deb,rpm: disable SPDK by default
but we still enable it in `run-make-check.sh`
* cmake: disable SPDK by default
* run-make-check.sh: enable WITH_SPDK so at least we can ensure it
builds
* deb,rpm: add uuid-dev / libuuid-devel as a "make check" dependency
Conflicts:
CMakeLists.txt
ceph.spec.in
debian/control
run-make-check.sh
- disable SPDK for mimic, even in run-make-check.sh, since the feature is not
being used in production
Sage Weil [Tue, 2 Jul 2019 23:04:09 +0000 (18:04 -0500)]
mon/AuthMonitor: clear_secrets() in create_initial()
If we are creating the initial state and initial proposal, start with an
empty keyring. Specifically, we want to clear out any rotating secrets
from a previously failed paxos round so that the subsequent call to
check_rotate() will correctly populate the initial proposal with new
rotating keys. (When we don't do this, the leader OSD will have the
keys from an earlier round in memory but no other mons will.)
Casey Bodley [Thu, 9 May 2019 19:57:36 +0000 (15:57 -0400)]
doc/rgw: document use of 'realm pull' instead of 'period pull'
'radosgw-admin period pull' fetches a period configuration, but does not
update the realm's current_period to use it. the 'realm pull' command
does both, and the difference is especially important in the failover
case
Brad Hubbard [Thu, 22 Nov 2018 00:07:22 +0000 (10:07 +1000)]
install-deps.sh: Remove CR repo
Remove the continuous release repos for CentOS and Virtuozzo 7 as they
should no longer be needed since http://tracker.ceph.com/issues/13997 is
no longer relevant and the newer versions of selinux packages pulled in
by the build system are causing problems for systems without CR repos
enabled.
Dan van der Ster [Tue, 12 Mar 2019 15:42:25 +0000 (16:42 +0100)]
doc: describe metadata_heap cleanup
Fixes: http://tracker.ceph.com/issues/18174 Signed-off-by: Dan van der Ster <daniel.vanderster@cern.ch>
(cherry picked from commit e76604224c0e74bbb3350743910d263c6591fd26)
Correction to this commit required because of separating it from very large pull request.
Needed to include part of 4bc01379bbf946d2f5963dcca6b071914117ce4a which
changed endmap from const OSDMap& to const OSDMapRef&.
Ilya Dryomov [Wed, 20 Feb 2019 21:30:29 +0000 (22:30 +0100)]
osdc/Objecter: invalidate crcs on preallocated rx buffers
Both simple and async messengers use c_str() when copying the data from
the socket into the receive buffer, going behind bufferlist's back. If
the receive buffer is preallocated, we need to invalidate its crc cache
by hand to avoid possible data crc mismatches on the client side.
Conflicts:
src/test/librados/io_cxx.cc
- In master, 3730d10623650ce8569be96b28cbba599a9a0db6 renamed this file from
src/test/librados/io.cc but that commit is not being backported to mimic.
Manually cherry-picked the test mods into src/test/librados/io.cc.