add `event_loop` and `tkey` object to with_cephadm_module, and create MockEventLoopThread in fixtures.py to test async functions of ssh.py.
rewrite test_offline to be compatible with asyncssh
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: use _remote_connection (ssh.py), _execute_command, _check_execute_command in _run_cephadm
remove _get_connection from module.py and _remote_connection in serve.py, replacing with _remote_connection in ssh.py.
also, replace remoto.process.check with _execute_command and _check_execute_command in ssh.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: remove remotes.py, replace old _write_remote_file in serve.py with write_remote_file in ssh.py
remove remotes.py because it is specific to execnet/remoto.
_write_remote_file in ssh.py now fulfills the function of write_file in remotes.py and the old _write_remote_file in serve.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create thread to start event loop for ssh.py, and return results of the async functions with get_result
The EventLoopThread class starts a thread and an event loop which runs forever. Coroutines are scheduled on the event loop by the `get_result` method which uses `run_coroutine_threadsafe` to return a concurrent.futures.Future, and ultimately the result with .result()
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create async function _write_remote_file to write files on remote host
_write_remote_file uses _check_execute_command in ssh.py which calls _execute_command which uses shlex quote. Thus, any commands with an int will need to be transformed into a str because shlex quote does not take int objects
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: execute commands run over ssh via asyncssh
_execute_command will run commands over ssh using the asyncssh `run` method: https://asyncssh.readthedocs.io/en/latest/api.html#asyncssh.SSHClientConnection.run
_check_execute_command will check the output of _execute_command and raise OrchestratorError if command fails on the remote host.
All commands run over ssh are prepended with sudo in `_execute_command` and shell-escaped with shlex quote.
If the cached ssh connection is closed or broken, the connection object will be removed from the cache, added to the `offline_hosts`, and an OrchestratorError will be raised. On the next call, the connection object will attempt to be recreated.
Exceptions involving asyncssh methods should be handled otherwise errors like TypeError: __init__() missing 1 required positional argument: 'reason' could occur due to the asyncssh error interacting with `raise_if_exception`
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create and cache asyncssh connection objects, and handle asyncssh connection errors
Create asyncssh connection object in async `_remote_connection` function and cache in `self.cons`
Create a handler for asyncssh log redirection and output ssh log if a connection error occurs
Disable asyncssh logger from propagating because the asyncssh info messages are verbose
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Sage Weil [Wed, 11 Aug 2021 14:58:39 +0000 (10:58 -0400)]
Merge PR #42682 into master
* refs/pull/42682/head:
cephadm: no need to explicitly enable prometheus module
mgr/cephadm: enable prometheus module before deploying prometheus
mgr/cephadm: drop daemon_id arg to CephadmService.config()
doc/cephadm: no need to manually enable the prometheus module
Reviewed-by: Sebastian Wagner <sewagner@redhat.com>
Currently BlueStore keeps its allocation info inside RocksDB.
BlueStore is committing all allocation information (alloc/release) into RocksDB (column-family B) before the client Write is performed causing a delay in write path and adding significant load to the CPU/Memory/Disk.
Committing all state into RocksDB allows Ceph to survive failures without losing the allocation state.
The new code skips the RocksDB updates on allocation time and instead perform a full desatge of the allocator object with all the OSD allocation state in a single step during umount().
This results with an 25% increase in IOPS and reduced latency in small random-write workloads, but exposes the system to losing allocation info in failure cases where we don't call umount.
We added code to perform a full allocation-map rebuild from information stored inside the ONode which is used in failure cases.
When we perform a graceful shutdown there is no need for recovery and we simply read the allocation-map from a flat file where the allocation-map was stored during umount() (in fact this mode is faster and shaves few seconds from boot time since reading a flat file is faster than iterating over RocksDB)
Open Issues:
There is a bug in the src/stop.sh script killing ceph without invoking umount() which means anyone using it will always invoke the recovery path.
Adam Kupczyk is fixing this issue in a separate PR.
A simple workaround is to add a call to 'killall -15 ceph-osd' before calling src/stop.sh
Fast-Shutdown and Ceph Suicide (done when the system underperforms) stop the system without a proper drain and a call to umount.
This will trigger a full recovery which can be long( 3 minutes in my testing, but your your mileage may vary).
We plan on adding a follow up PR doing the following in Fast-Shutdown and Ceph Suicide:
Block the OSD queues from accepting any new request
Delete all items in queue which we didn't start yet
Drain all in-flight tasks
call umount (and destage the allocation-map)
If drain didn't complete within a predefined time-limit (say 3 minutes) -> kill the OSD Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
create allocator from on-disk onodes and BlueFS inodes
change allocator + add stat counters + report illegal physical-extents
compare allocator after rebuild from ONodes
prevent collection from being open twice
removed FSCK repo check for null-fm
Bug-Fix: don't add BlueFS allocation to shared allocator
add configuration option to commit to No-Column-B
Only invalidate allocation file after opening rocksdb in read-write mode
fix tests not to expect failure in cases unapplicable to null-allocator
accept non-existing allocation file and don't fail the invaladtion as it could happen legally
don't commit to null-fm when db is opened in repair-mode
add a reverse mechanism from null_fm to real_fm (using RocksDB)
Using Ceph encode/decode, adding more info to header/trailer, add crc protection
Code cleanup
some changes requested by Adam (cleanup and style changes)
Signed-off-by: Gabriel Benhanokh <gbenhano@redhat.com>
Sage Weil [Thu, 5 Aug 2021 14:24:13 +0000 (10:24 -0400)]
mgr/cephadm: enable prometheus module before deploying prometheus
The mon will restart the mgr when the module is enabled, so we don't
really have to do anything here. The raise is there just in case the
mgr doesn't immediately get the new mgrmap and respawn, although there is
likely no harm done if we continue to deploy prometheus in the meantime,
even if we're interrupted partway through.
Kefu Chai [Wed, 11 Aug 2021 08:29:10 +0000 (16:29 +0800)]
script/run-make.sh: retry if dpkg was interrupted
there is chance that apt-get is interrupted in the middle when a new PR
cancels the running jenkins job, the next job running apt-get or dpkg
would run into issues like:
E: dpkg was interrupted, you must manually run 'sudo dpkg --configure -a' to correct the problem.
Build step 'Execute shell' marked build as failure
Kefu Chai [Wed, 11 Aug 2021 09:32:20 +0000 (17:32 +0800)]
do_cmake:sh: do not set BOOST_J
do_cmake.sh is called by src/script/run-make.sh in configure() function,
in src/script/run-make.sh, BOOST_J is also set if it is not set. so we
can drop the code setting BOOST_J in do_cmake.sh.
this helps to silence the cmake warning like:
CMake Warning:
Manually-specified variables were not used by the project:
Sage Weil [Tue, 10 Aug 2021 20:47:34 +0000 (16:47 -0400)]
Merge PR #42318 into master
* refs/pull/42318/head:
mgr/rook: update DefaultFetcher device path to look at local and fix bug
mgr/rook: add node and PV name information to Device in DefaultFetcher
mgr/rook: fix typing errors in Fetcher classes
mgr/rook: create and use DefaultFetcher and LSOFetcher classes
mgr/rook: create KubernetesCustomResource class to fetch CRs
mgr/rook: fix device ls error handling
mgr/rook: change storage class module option name and default value
mgr/rook: fix typing errors related to storage_class_name and device ls
mgr/rook: make `device ls` only display pvs in specified storage class
mgr/rook: add StorageV1Api and storage_class_name to RookCluster
mgr/rook: add StorageV1Api to RookOrchestrator
mgr/rook: add mgr/rook/storage_class_name to ceph config
mgr/rook: ceph orch device ls fetch and display info about PVs
mgr/rook: add CustomObjectsApi to RookCluster
Reviewed-by: Juan Miguel Olmo <jolmomar@redhat.com>
Sage Weil [Tue, 10 Aug 2021 20:37:38 +0000 (16:37 -0400)]
Merge PR #42691 into master
* refs/pull/42691/head:
mgr/nfs: add --port to 'nfs cluster create' and port to 'nfs cluster info'
qa/suites/orch/cephadm/smoke-roleless: test taking ganeshas offline
qa/tasks/vip: exec with bash -ex
qa/suites/orch/cephadm: separate test_nfs from test_orch_cli
Sage Weil [Tue, 10 Aug 2021 14:36:37 +0000 (10:36 -0400)]
Merge PR #42680 into master
* refs/pull/42680/head:
src/pybind/mgr/nfs/tests: pass cluster_id to from_export_block()
src/pybind/mgr/nfs: remove `tag` option
src/pybind/mgr/nfs: remove per daemon config test
src/pybind/mgr/nfs: directly use cluster_id and remove daemon related stuff
script: run-cbt.sh tests crimson with CyanStore instead of MemStore.
These tests were always supposed to run against CyanStore. However,
commit e6ed65db8b4e0a2f8026c2e35a12dd292c5f2b8c (PR #42437) changed
the meaning of `--memstore` and introduced `--cyanstore` to be used
instead. This commit makes `run-cbt.sh` aware about the new switch.
This PR updates the text in the RADOS Guide
(the Ceph Storage Cluster Guide) that appears
at the beginning of the "Storage Devices"
chapter. I did the following:
- rewrote some of the sentences so that
they read more like written text than like
spoken language
- added "Ceph Manager" to the list of daemons
that a Ceph cluster comprises
- that's about it.
mgr/dashboard: rgw service creation form: add realm and zone to service spec.
Align rgw service id pattern with cephadm: https://github.com/ceph/ceph/pull/39877
- Update rgw pattern to allow service id for non-multisite config.
- Extract realm and zone from service id (when detected) and add them to the service spec.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
mgr/dashboard: connect-rgw: rename to set-rgw-credentials; refactoring
- Rename the dashboard command to better reflect its behavior.
- Rename '_radosgw_admin' method to 'send_rgwadmin_command' for consistency with
'send_mon_command' and move it to the mgr_module.py .
- Cleanup: remove unneeded rgw settings.
- Better error handling and test coverage.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Alfonso Martínez [Wed, 28 Jul 2021 07:48:18 +0000 (09:48 +0200)]
mgr/dashboard: connect-rgw: adaptation and test coverage
- Align Dashboard with cephadm: configure credentials using the same logic.
- Fix: create a 'dashboard' user per realm (before: only on 1st realm).
- Lint fixes, test coverage, method renaming to better reflect behavior and method visibility.
Fixes: https://tracker.ceph.com/issues/44605 Signed-off-by: Alfonso Martínez <almartin@redhat.com>
Adam Kupczyk [Mon, 9 Aug 2021 13:59:46 +0000 (15:59 +0200)]
os/bluestore: Better handling of deferred write trigger
Now deferred write in _do_alloc_write does not depend on blob size,
but on size of extent allocated on disk.
It is now possible to set bluestore_prefer_deferred_size way larger than
bluestore_max_blob_size and still get desired behavior.
Example: for deferred=256K, blob=64K : when op write is 128K both blobs will be
written as deferred. When op write is 256K then all will go as regular write.
Sage Weil [Mon, 9 Aug 2021 18:15:28 +0000 (14:15 -0400)]
cephadm: fix container name detection
'enter' was broken because we weren't correctly identifying the container
name. Strip the newline from the inspect result so that we can reliably
match against the 'running' state.