Ali Maredia [Mon, 25 Nov 2019 02:30:03 +0000 (21:30 -0500)]
luminous: update s3-test download code for s3-test tasks
- Ensure the download code for all tasks running
s3-tests is consistent.
- Simplify download code to only use the config
variable 'force-branch' for the branch being
cloned.
- Make ceph-luminous the force-branch for all
suites using s3-tests.
- Add force-branch to suites running s3readwrite
& s3roundtrip tasks
osd/MissingLoc.cc: do not rely on missing_loc_sources only
In 624ade487ea4aeaf988cc1767e0b293f76addd5b, we relied on missing_loc_sources
to check for strays and remove an OSD from missing_loc. However, it is
possible that missing_loc_sources is empty while there are still OSDs
present in missing_loc. Since the aim is to just remove a stray OSD from
missing_loc, we do not need to rely on missing_loc_sources. We still
clean missing_loc_sources if any stray is present in it.
xie xingguo [Sat, 31 Aug 2019 02:17:57 +0000 (10:17 +0800)]
osd/PG: fix _finish_recovery vs repair race
On detecting a corrupted object, primary may automatically
repair that object by leveraging the existing recovery procedure,
which turned out to be racy with a previous unfinished _finish_recovery
callback - the problem would then be that _finish_recovery might
continue to purge some strays that we still want to pull data from.
Fix by re-checking if there are any newly added missing objects when
executing _finish_recovery.
Note that before https://github.com/ceph/ceph/pull/29756 we might
instead have to call needs_recovery to catch the race condition
since we did not evict pg from clean state when triggering an auto-repair..
Conflicts:
src/osd/PG.cc
- adjusted if conditional for luminous
- did not add the comment nor state_clear(PG_STATE_REPAIR);. Those lines were
moved but don't exist in luminous.
Neha Ojha [Sat, 31 Aug 2019 01:15:58 +0000 (18:15 -0700)]
osd/MissingLoc, PeeringState: remove osd from missing loc in purge_strays()
We should always try to keep osds in missing_loc consistent with peer_missing
and peer_info. When we remove an osd from peer_missing and peer_info, we
should also remove it from missing_loc during purging strays.
Conflicts:
src/osd/MissingLoc.cc
src/osd/MissingLoc.h
src/osd/PeeringState.cc
- these files do not exist in luminous; made the changes manually to
src/osd/PG.cc and src/osd/PG.h
- ldout(cct, ...) -> ldout(pg->cct, ...)
We should have done this while cherry-picking from master, but we
didn't. And here we are now. It's simpler to apply this one-off patch
than going back to the cherry-picking maze to adjust this one thing.
Conflicts:
src/pybind/mgr/telemetry/module.py
Due to missing context resulting from missing patches.
PendingReleaseNotes
Dropped to prevent conflicts in the future
Note:
This commit was heavily modified. We wanted to provide the number of
ipv4 and ipv6 monitors in the report, so we rewrote that part so we
can report on it; but we had to drop everything else (msgr1 and
msgr2), as well as 'min_mon_release'. Those do not exist in
luminous. In the end, the commit message itself is misleading, but
we are somehow (*shrug*) opting for leaving the commit as the original.
Additionally, we removed PendingReleaseNotes changes to prevent
conflicts in the future.
Matt Benjamin [Wed, 11 Dec 2019 22:52:57 +0000 (17:52 -0500)]
rgw: lc: continue past get_obj_state() failure
The get_obj_state() failure in particular could indicate a race with
an object being deleted, so likely is non-fatal. By returning, lifecycle
processing for the current bi-shard would not resume until re-scheduled,
likely in 24 hours.
Fixes: https://tracker.ceph.com/issues/43269 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
Conflicts:
src/osd/OSD.cc
src/osd/OSD.h
src/osd/PrimaryLogPG.h
- no OSD::get_osd_delete_sleep() in luminous, no OSD::get_recovery_max_active()
in luminous
- use cct->_conf->get_val instead of cct->_conf.get_val
Sage Weil [Mon, 28 Jan 2019 20:58:26 +0000 (14:58 -0600)]
osd: refuse to start if release > recorded min_osd_release + 2
If we try to start up the objectstore, we may make writeable changes to
(say) rocksdb that are not backwards compatible. This happens, for
example, if you start a mimic osd. Even if the compatset checks fail,
rocksdb may have written something that is not backwards compatible.
Sage Weil [Mon, 28 Jan 2019 21:05:53 +0000 (15:05 -0600)]
osd: record require_osd_release in objectstore meta
Record the require_osd_release value from the OSDMap in the 'meta' portion
of the osd's metadata that can be accessed without actually mounting the
OSD. This will be useful as a safety gate to prevent you from mounting
an osd thet is too new that may make incompatible changes to the store.
Conflicts:
doc/rados/operations/health-checks.rst
We don't have the crash module, hence neither its docs.
src/pybind/mgr/telemetry/module.py
Issues due to context
Conflicts:
src/pybind/mgr/telemetry/module.py
Slight conflicts due to past cherry-picks (or lack thereof)
Using set_config() instead of set_module_option()
Conflicts:
src/pybind/mgr/telemetry/module.py
Due to lack of 'crash' and 'devicehealth' modules, and a bit
on how we keep options (self.config[] vs class attributes)
Conflicts:
src/pybind/mgr/telemetry/module.py
Don't backport code related to the 'crash' module, and adjust
how we read option variables (luminous goes through a config
map, instead of master's that goes through class attributes)
Sage Weil [Mon, 29 Apr 2019 19:32:44 +0000 (14:32 -0500)]
mgr/telemetry: use cluster-provided timestamp unmolested
The cluster stamp is now ISO 8601; just use that.
(The isoformat() puts a : in +hh:mm the timezone offset, which is slightly
different than what Ceph does; just pass Ceph's value through for
consistency.)
Conflicts:
src/pybind/mgr/telemetry/module.py
Past commit in master had introduced field types and a
'minimum' value for the interval. We concluded that the field
types commit does not affect the telemetry module in a
significant way to force us to backport it, and the minimum
value commit is introduced for the benefit of the dashboard
(which, in luminous, does not have control over telemetry)
Conflicts:
src/pybind/mgr/telemetry/module.py
mostly due to store_get/store_set not existing in luminous,
and we relying instead on config_get/config_set.
Conflicts:
src/pybind/mgr/telemetry/module.py
master no longer has 'telemetry selftest' due to some other
major changes that we did not backport, as they would require
too many changes that were not, in an obvious manner, relevant
for us.
Conflicts:
src/pybind/mgr/telemetry/module.py
We don't have some other scaffolding that exists on master,
and we are not cherry-picking it because it changes
significantly the module's code in a way that is not a clear
advantage for the telemetry module (in 'luminous' context)
mgr/telemetry: Add Ceph Telemetry module to send reports back to project
This Manager Module will send statistics and version information from
a Ceph cluster back to telemetry.ceph.com if the user has opted-in on sending
this information.
Additionally a user can tell that the information is allowed to be made
public which then allows other users to see this information.
Nathan Cutler [Tue, 3 Dec 2019 09:34:45 +0000 (10:34 +0100)]
ceph-detect-init: run tox tests on Python 2 only
Luminous is EOL upstream and 12.2.13 will be the last point release, so it's
pretty much a given that luminous will not be migrated to py3.
Luminous runs on OS versions where py3 is not a "first-class citizen": compiling
the mgr with a py3 subinterpreter on such systems would be "interesting"...
David Zafman [Tue, 3 Dec 2019 18:13:46 +0000 (10:13 -0800)]
test balancer: Backport specific fixes
Add "ceph balancer sleep" command to set balancer sleep_interval for testing
Remove unavailable "ceph balancer pool" part of testing
Remove setting of nonexistant osd_pool_default_pg_autoscale_mode
Improve balancer module log message
Fix log message test (no pg merging)
Pool balancing isn't grouped by rule, so results different here
callers of get_python_path were not passing in a $1 parameter, so
ceph_lib was an empty string resulting in an invalid path to the built
cython modules. assume this is called from the `lib` parent directory.
pass path to the manager modules when starting ceph-mgr.
"make check" needs python-tox to run, but until 71ac2163831ffd764bf3da6a3efa76ef02e5e884
("ceph.spec: added dashboard_v2 development and runtime dependencies")
this was not explicitly declared in the spec file. It does not make sense to
backport that commit to luminous, though, because the dashboard itself was never
backported to luminous.
Matt Benjamin [Thu, 5 Sep 2019 15:38:56 +0000 (11:38 -0400)]
rgw: crypt: permit RGW-AUTO/default with SSE-S3 headers
Permit the existing logic for encrypton by a global master key
to take effect when a client has requested AES256 server-side encryption
with S3 managed keys, as well as SSE-KMS.
Fixes: https://tracker.ceph.com/issues/41670 Signed-off-by: Matt Benjamin <mbenjamin@redhat.com>
(cherry picked from commit 80bffd9ae12f6b5846cf8efbffda71e9f921e18f)