Kefu Chai [Sun, 27 Jun 2021 01:32:10 +0000 (09:32 +0800)]
common/options: convert a millisecs opt to a chrono::milliseconds when paring it
Option always parses a new string value and convert it to a value_t
before validating it. and value_t is an alias of boost::variant<...>.
and we use "new_value < min" to tell if the new_value is out of the
bound or not, where both "new_value" and "min" are instances of
value_t. so it is critcal that these two values contain the same type
of value, otherwise boost::variant::operator< would
> Returns:
> If which() == rhs.which() then: content_this < content_rhs, where content_this is the content of *this and content_rhs is the content of rhs. Otherwise: which() < rhs.which().
where which() indicates which type of value is contained in the value_t.
before this change, instead of converting a new value of milliseconds to
std::chrono::milliseconds, we convert it to an uint64_t, whose index in
the value_t is 2, while the milliseconds value's index is 9, so
"new_value < min" evaluates to true even if "new_value" is 100 and "min"
is 30.
after this change, the new value of a milliseconds option is converted
to std::chrono::milliseconds, so it is comparable with its min value and
max value.
a minimal test is added to reproduce this issue.
the change which added the support of millisec to option was 29690a338ba4482d187e6036903e138437ae3bb4 which is not included by any
LTS branches, so no need to backport this fix.
Sage Weil [Sat, 26 Jun 2021 14:41:27 +0000 (10:41 -0400)]
Merge PR #41574 into master
* refs/pull/41574/head:
qa/tasks/vstart_runner: add LocalCluster.run
qa/tasks/cephfs/test_nfs: fiddle with sudo
mgr/nfs/export: some cleanup, minor refactoring
mgr/nfs/cluster: remove unused @cluster_setter
nfs/mgr: fix help message case
doc/cephfs/fs-nfs-export: add note about export update behavior
mgr/nfs: move user create/delete into helper
mgr/nfs: refactor _delete_user helper
mgr/nfs: refactor create_export_from_dict() helper
mgr/nfs: keep 'nfs export get' around for backward-compat
mgr/nfs: rename method
qa/tasks/cephfs/test_nfs: test new export via apply
doc/cephfs/fs-nfs-export: be consistent with cluster_id and _ vs -
mgr/nfs: addr -> client_addr for 'nfs export create ...'
mgr/nfs: fix tests
mgr/nfs: 'nfs export get' -> 'nfs export info'
mgr/nfs: binding -> pseudo_path
mgr/nfs: more revisions based on review
mgr/nfs: adjust NFSExceptoin errno arg
doc/cephfs: update 'nfs export {get,apply}' docs
mgr/nfs: merge FSExport back into ExportMgr
doc/radosgw/nfs: document mgr/nfs way to add/remove rgw exports
mgr/nfs: merge 'nfs export {update,import}' -> 'nfs export apply'
mgr/nfs: test export creation and list
mgr/nfs: test export_update (+ fixes)
mgr/nfs: test Export.validate(); several fixes
mgr/nfs: test that export <-> block+dict conversions go both ways
mgr/nfs: clean up test a bit
mgr/nfs/export: fix export validation
mgr/nfs/export: fix tests
mgr/nfs: handle option addr/client block in create_export()
mgr/nfs: allow multiple addrs for new exports
mgr/nfs: fix/finish rgw export
mgr/nfs/module: clusterid -> cluster_id
mgr/nfs/export: fix export_update_1 to type check
mgr/nfs/cluster: fix type error
mgr/nfs/export: wrap long lines
mgr/nfs: ExportMgr._delete_export only works for cephfs for now
mgr/nfs: Remove pool_ns from NFSCluster
mgr/nfs: Remove ExportMgr.rados_namespace
mgr/nfs: flake8
mgr/nfs: Add type checking
mgr/nfs: Add __eq__ method to Export
mgr/nfs: Add some compatibility to mgr/dashboard
mgr/nfs: Fix whitespace handling
mgr/nfs: Copy unit tests from mgr/dashboard
mgr/nfs: partially implement rgw export support
mgr/nfs: abstract FSAL; add RGWFSAL
mgr/nfs: refactor to merge 'update' and 'import' code
mgr/nfs: add 'nfs export import' command
mgr/nfs: refactor 'nfs export update' and export validation
mgr/nfs: fix _fetch_export to distinguish between clusters
mgr/nfs: move export ganesha conf translation into caller
mgr/nfs: name nfs cephfs client key 'nfs.{cluster_id}.{export_id}'
mgr/nfs: add --addr to 'nfs export create'
mgr/nfs: add --squash to 'nfs export create'
mgr/nfs/export_utils: include false but non-None items in config
vstart.sh: enable nfs module
mgr/cephadm: nfs: drop attr_expiration_time from top-level config
mgr/cephadm: remove Dir_Chunk = 0
Sage Weil [Tue, 22 Jun 2021 16:25:44 +0000 (12:25 -0400)]
mgr/nfs: move user create/delete into helper
- Do user create or delete via a helper
- Defer until after we have validated the Export (on create or update)
- Support updates to user_id, which is needed to keep the naming consistent
and to also support changing the bucket, since the user_id is derived
from that.
The SQL OR doesn't work because in the case that sample is passed,
_t2epoch(min_sample) is 0 and the 0 <= time portion of the expression
is always true.
Fixes: https://tracker.ceph.com/issues/51294 Signed-off-by: Sage Weil <sage@newdream.net>
handle_interruption can't really be validly used outside of
with_interruption_cond. Make private, and adjust is_interruption
to not require an instance.
with_interruption_cond needs to check the condition on the way in.
call_with_interruption_impl already has the required machinery, so
let's just use it and dispense with the other helpers.
Ronen Friedman [Wed, 23 Jun 2021 17:02:28 +0000 (20:02 +0300)]
osd/scrub: replace a ceph_assert() with a test
We are using two distinct conditions to decide whether a candidate PG is already being scrubbed. The OSD checks pgs_scrub_active(), while the PG asserts on the value of PG_STATE_FLAG.
There is a time window when PG_STATE_FLAG is set but is_scrub_active() wasn't yet set. is_reserving() covers most of that period, but the ceph_assert is just before the is_reserving check.
Sage Weil [Sat, 19 Jun 2021 16:56:18 +0000 (12:56 -0400)]
mgr/telemetry: redact python crash dump in telemetry
Include the exception value in teh crash dump, but redact it in telemetry.
That way the operator can see it (it's useful info!) but we don't risk
sharing identifying data via telemetry.
Sage Weil [Fri, 18 Jun 2021 21:02:40 +0000 (17:02 -0400)]
mgr: generate crash dump for python exceptions
Extend handle_pyerror() to generate a crash dump. Pass some additional
context through from the callers (including the ability to not generate
a crash dump in the CLI handler case).
Extra crash dump fields look like so:
"backtrace": [
" File \"/home/sage/src/ceph/src/pybind/mgr/balancer/module.py\", line 652, in serve\n self.ifail()",
" File \"/home/sage/src/ceph/src/pybind/mgr/balancer/module.py\", line 648, in ifail\n raise RuntimeError('test')",
],
"mgr_module": "balancer",
"mgr_module_caller": "PyModuleRunner::serve",
"mgr_python_exception": "RuntimeError",
Notably, the backtrace deliberately excludes the 'value' of the exception,
as that may leak identifying information about the system. Instead, we
only include the exception *type* and the portion of the traceback that
identifies the call path (where in the code we crashed).
Also note: a side-effect of this change is that module exceptions will
trigger cluster health warnings about daemon crashes.
Sage Weil [Fri, 18 Jun 2021 20:58:32 +0000 (16:58 -0400)]
common/BackTrace: accept list of strings to ctor
This may seem a bit backwards: we take nice C++ list<string> and do the
C dance. It's a bit defensive: this class is used in the segv handler
(in the backtrace() and backtrace_symbol() path), so we want to minimize
the work we do on the heap in that case. (For the list<string> path,
we can do whatever we like.)