Yin Congmin [Wed, 30 Jun 2021 08:56:23 +0000 (16:56 +0800)]
common/buffer: fix SIGABRT in rebuild_aligned_size_and_memory
There is such a bl, which needs to satisfy two conditions:
1)all ptrs' length sum except last ptr is aligned with 4K;
2)the length of last ptr is 0.
This bl will cause stack corruption when calling
bufferlist::rebuild_aligned_size_and_memory().
Deal with this special scenario in rebuild_aligned_size_and_memory() to
solve the bug. And added a specialtest-case to reproduce this scenario.
Kefu Chai [Tue, 29 Jun 2021 12:06:49 +0000 (20:06 +0800)]
crimson/os/alienstore: use semaphore to manage tasks in thread pool
* implement std::counting_semaphore in C++17
* use the homebrew counting_semaphore as the synchronization primitive
to access the pending tasks, for better performance than the
implementation using mutex and condition_variable.
In C++ `std::moving` a `const`-qualified value yields a constant
r-value reference (`const T&&`) which won't be matched with a callable
taking non-constant r-value reference (like move constructors) but
can play with one taking a constant l-value reference (like copy
constructors do). This behaviour is surprising especially in lambas
where adding or removing the `mutable` specifier may lead to different
behaviour. The problem isn't obvious and it's easy to wrongly drop
the `mutable` druing a clean-up. Therefore introducing a tool for
developers to fail at compile-time if that happens seems desired.
Zac Dover [Tue, 29 Jun 2021 12:42:29 +0000 (22:42 +1000)]
doc/cephadm: improving "Monitoring the Upgrade"
This PR improves the section "Monitoring the Upgrade"
in the "Upgrading Ceph" chapter of the cephadm documentation.
This PR introduces a couple of section breaks with signposting
information in their titles, and rewrites some sentences in order
to reduce the cognitive load of the reader.
Kefu Chai [Tue, 22 Jun 2021 13:27:32 +0000 (21:27 +0800)]
crimson/os: use lockfree queue for sharded queue
each sharded queue has multiple producers and a single consumer queue,
but the producer side is in seastar world, so ideally, we should not
guard the queue using a POSIX lock.
in this change, the lockfree multiple-writer/multiple-reader queue is
used to replace the std::queue to drop the lock guarding it.
chunmei-liu [Tue, 29 Jun 2021 05:20:55 +0000 (22:20 -0700)]
crimson: fix pgnls exception
has_pg_op is always false, since m->ops is empty at that time.
so pgnls operation will go to process_op and report unknown operations.
move m->finish_decode ahead to fill m->ops.
Zac Dover [Tue, 29 Jun 2021 01:29:33 +0000 (11:29 +1000)]
doc/cephadm: improve "Upgrading Ceph" (main)
This PR makes a couple of minor improvements to the text under the
top-level section "Upgrading Ceph" in the "Upgrading Ceph" chapter of
the cephadm documentation.
This one, mercifully, contains only a couple of changes.
crimson/os: fix memory corruption in AlienStore::get_attrs().
`FuturizedStore` and `ObjectStore` use different memory layout for
conveying object attributes: map of `bufferlists` and map of `bptrs`
respectively. Unfortunately, `AlienStore` was trying to solve this
mismatch with just a `reinterpret_cast`.
Very likely this problem was the root cause behind the observed
crashes in `PGBackend::load_matadata` like the following one:
Kefu Chai [Sun, 27 Jun 2021 01:32:10 +0000 (09:32 +0800)]
common/options: convert a millisecs opt to a chrono::milliseconds when paring it
Option always parses a new string value and convert it to a value_t
before validating it. and value_t is an alias of boost::variant<...>.
and we use "new_value < min" to tell if the new_value is out of the
bound or not, where both "new_value" and "min" are instances of
value_t. so it is critcal that these two values contain the same type
of value, otherwise boost::variant::operator< would
> Returns:
> If which() == rhs.which() then: content_this < content_rhs, where content_this is the content of *this and content_rhs is the content of rhs. Otherwise: which() < rhs.which().
where which() indicates which type of value is contained in the value_t.
before this change, instead of converting a new value of milliseconds to
std::chrono::milliseconds, we convert it to an uint64_t, whose index in
the value_t is 2, while the milliseconds value's index is 9, so
"new_value < min" evaluates to true even if "new_value" is 100 and "min"
is 30.
after this change, the new value of a milliseconds option is converted
to std::chrono::milliseconds, so it is comparable with its min value and
max value.
a minimal test is added to reproduce this issue.
the change which added the support of millisec to option was 29690a338ba4482d187e6036903e138437ae3bb4 which is not included by any
LTS branches, so no need to backport this fix.
Patrick Donnelly [Sat, 26 Jun 2021 18:26:41 +0000 (11:26 -0700)]
libcephsqlite: register atexit handler for cleanup
We need to tear down the ceph vfs when sqlite3 (or other binaries) call
exit(). Otherwise global state gets destructed which can cause library
threads to segfault or raise exceptions.
Also pull vfs struct out of appdata. We need to be able to detect double
calls to the atexit handler which, sadly, can happen.
Fixes: https://tracker.ceph.com/issues/50503 Fixes: https://tracker.ceph.com/issues/51372 Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
Sage Weil [Sat, 26 Jun 2021 14:41:27 +0000 (10:41 -0400)]
Merge PR #41574 into master
* refs/pull/41574/head:
qa/tasks/vstart_runner: add LocalCluster.run
qa/tasks/cephfs/test_nfs: fiddle with sudo
mgr/nfs/export: some cleanup, minor refactoring
mgr/nfs/cluster: remove unused @cluster_setter
nfs/mgr: fix help message case
doc/cephfs/fs-nfs-export: add note about export update behavior
mgr/nfs: move user create/delete into helper
mgr/nfs: refactor _delete_user helper
mgr/nfs: refactor create_export_from_dict() helper
mgr/nfs: keep 'nfs export get' around for backward-compat
mgr/nfs: rename method
qa/tasks/cephfs/test_nfs: test new export via apply
doc/cephfs/fs-nfs-export: be consistent with cluster_id and _ vs -
mgr/nfs: addr -> client_addr for 'nfs export create ...'
mgr/nfs: fix tests
mgr/nfs: 'nfs export get' -> 'nfs export info'
mgr/nfs: binding -> pseudo_path
mgr/nfs: more revisions based on review
mgr/nfs: adjust NFSExceptoin errno arg
doc/cephfs: update 'nfs export {get,apply}' docs
mgr/nfs: merge FSExport back into ExportMgr
doc/radosgw/nfs: document mgr/nfs way to add/remove rgw exports
mgr/nfs: merge 'nfs export {update,import}' -> 'nfs export apply'
mgr/nfs: test export creation and list
mgr/nfs: test export_update (+ fixes)
mgr/nfs: test Export.validate(); several fixes
mgr/nfs: test that export <-> block+dict conversions go both ways
mgr/nfs: clean up test a bit
mgr/nfs/export: fix export validation
mgr/nfs/export: fix tests
mgr/nfs: handle option addr/client block in create_export()
mgr/nfs: allow multiple addrs for new exports
mgr/nfs: fix/finish rgw export
mgr/nfs/module: clusterid -> cluster_id
mgr/nfs/export: fix export_update_1 to type check
mgr/nfs/cluster: fix type error
mgr/nfs/export: wrap long lines
mgr/nfs: ExportMgr._delete_export only works for cephfs for now
mgr/nfs: Remove pool_ns from NFSCluster
mgr/nfs: Remove ExportMgr.rados_namespace
mgr/nfs: flake8
mgr/nfs: Add type checking
mgr/nfs: Add __eq__ method to Export
mgr/nfs: Add some compatibility to mgr/dashboard
mgr/nfs: Fix whitespace handling
mgr/nfs: Copy unit tests from mgr/dashboard
mgr/nfs: partially implement rgw export support
mgr/nfs: abstract FSAL; add RGWFSAL
mgr/nfs: refactor to merge 'update' and 'import' code
mgr/nfs: add 'nfs export import' command
mgr/nfs: refactor 'nfs export update' and export validation
mgr/nfs: fix _fetch_export to distinguish between clusters
mgr/nfs: move export ganesha conf translation into caller
mgr/nfs: name nfs cephfs client key 'nfs.{cluster_id}.{export_id}'
mgr/nfs: add --addr to 'nfs export create'
mgr/nfs: add --squash to 'nfs export create'
mgr/nfs/export_utils: include false but non-None items in config
vstart.sh: enable nfs module
mgr/cephadm: nfs: drop attr_expiration_time from top-level config
mgr/cephadm: remove Dir_Chunk = 0
Sage Weil [Tue, 22 Jun 2021 16:25:44 +0000 (12:25 -0400)]
mgr/nfs: move user create/delete into helper
- Do user create or delete via a helper
- Defer until after we have validated the Export (on create or update)
- Support updates to user_id, which is needed to keep the naming consistent
and to also support changing the bucket, since the user_id is derived
from that.
[24915s] [24848.843594] Out of memory: Killed process 16894 (cc1plus) total-vm:4293756kB, anon-rss:2970012kB, file-rss:0kB, shmem-rss:0kB, UID:399 pgtables:8324kB oom_score_adj:0
where 4GiB memory was allocated, in which 3GiB was mapped into
memory. this matches with our findings.
in this change, the memory per core is bumped up to 3000MB
in hope to address the OOB. the downside of this change is
that it would take even longer to finish the build if the
building host is limited in memory.
Kevin Zhao [Thu, 24 Jun 2021 00:00:03 +0000 (08:00 +0800)]
ceph.spec.in, debian/rules: Set rbd-rwl-cache optional on arm64 and ppc64le
set rwl cache option on arm64 and ppc64le as PMDK is not well supported.
Currently, only 64-bit Linux* and Windows* on x86 are supported PMDK
Reference:
1. Experimental support on Arm64, but lacking of librpmem:
See: https://github.com/pmem/pmdk#experimental-support-for-64-bit-arm
2. No RPM for PMDK on Arm64:
See: https://bugzilla.redhat.com/show_bug.cgi?id=1340635
3. > Does PMDK support ARM64*?
> Currently only 64-bit Linux* and Windows* on x86 are supported.
See: https://software.intel.com/content/www/us/en/develop/articles/persistent-memory-faq.html
4. Make check fail on Arm64
See: https://github.com/pmem/pmdk/issues/5255
Fixes: https://tracker.ceph.com/issues/51339 Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
The SQL OR doesn't work because in the case that sample is passed,
_t2epoch(min_sample) is 0 and the 0 <= time portion of the expression
is always true.
Fixes: https://tracker.ceph.com/issues/51294 Signed-off-by: Sage Weil <sage@newdream.net>