Igor Fedotov [Tue, 31 Aug 2021 12:54:23 +0000 (15:54 +0300)]
os/bluestore: fix bluefs migrate command
After migrating DB volume to a slow one RocksDB still
needs to be provided with slow.db path to properly access relevant files under db.slow subfolder.
Without that specification it tries to access them under 'db' one which
results in "not-found" error.
Fixes: https://tracker.ceph.com/issues/40434 Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Kefu Chai [Thu, 19 Aug 2021 09:24:32 +0000 (17:24 +0800)]
common/options: validate see-also
y2c.py is like a compiler which translates .yaml to .cc and .h files,
it does not have access to all .yaml files. to validate the dangling
see-also issue, we need to do this with a "linker".
in this change, validate-options.py is introduced to check if any of
option name included by the see-also property is valid.
haoyixing [Mon, 16 Aug 2021 10:55:24 +0000 (18:55 +0800)]
rbd: avoid overflow of ios and clarify io-size limit for bench
When doing rbd bench, we record done ios to print progress, current it's unsigned.
Suppose we do a bench of io-size 512B and io-total 4T, that means a total number of
8G ios which causes an overflow.
And we don't support io-size greater than 4G, so change help message.
Xuehan Xu [Thu, 19 Aug 2021 05:33:36 +0000 (13:33 +0800)]
crimson/common: keep ref count of crimson::interruptible::interrupt_cond
Currently, interrupt conditionss are transfered between inner and outer continuation
chains via a tls interrupt_cond variable. This simple strategy leads to problem when it
comes to mixing normal future/continuation procedures and seastar::thread. When seastar::async()
is called, the reactor can directly invoke the passed functor and lead to two different
scenarios:
1.if a seastar::get/yield() inside the passed lambda, the interrupt_cond should be erased at the
end of the continuation execution when it is yielded back;
2.otherwise, the interrupt_cond should be not erased.
There can be so many possible sequences of yielding of several different fibers that we can hardly
judge at the end of the continuation execution whether there was a yielding during the current
execution, which means we can't be able to know whether the tls interrupt_cond should be erased.
There could be other scenarios where the current strategy fails. To end this kind of issues
once and for all, we involve the ref counting mechinary.
J. Eric Ivancich [Wed, 28 Jul 2021 18:07:09 +0000 (14:07 -0400)]
rgw-multisite: metadata conflict not computed correctly
The former logic with a conditional based on `++i == 0` would never
execute. So this uses a boolean to differentiate the first from other
iterations and tries to clarify the code through commenting and an
explicit declaration. Additionally a warning is eliminated by
initializing a variable.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
add `event_loop` and `tkey` object to with_cephadm_module, and create MockEventLoopThread in fixtures.py to test async functions of ssh.py.
rewrite test_offline to be compatible with asyncssh
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: use _remote_connection (ssh.py), _execute_command, _check_execute_command in _run_cephadm
remove _get_connection from module.py and _remote_connection in serve.py, replacing with _remote_connection in ssh.py.
also, replace remoto.process.check with _execute_command and _check_execute_command in ssh.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: remove remotes.py, replace old _write_remote_file in serve.py with write_remote_file in ssh.py
remove remotes.py because it is specific to execnet/remoto.
_write_remote_file in ssh.py now fulfills the function of write_file in remotes.py and the old _write_remote_file in serve.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create thread to start event loop for ssh.py, and return results of the async functions with get_result
The EventLoopThread class starts a thread and an event loop which runs forever. Coroutines are scheduled on the event loop by the `get_result` method which uses `run_coroutine_threadsafe` to return a concurrent.futures.Future, and ultimately the result with .result()
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create async function _write_remote_file to write files on remote host
_write_remote_file uses _check_execute_command in ssh.py which calls _execute_command which uses shlex quote. Thus, any commands with an int will need to be transformed into a str because shlex quote does not take int objects
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: execute commands run over ssh via asyncssh
_execute_command will run commands over ssh using the asyncssh `run` method: https://asyncssh.readthedocs.io/en/latest/api.html#asyncssh.SSHClientConnection.run
_check_execute_command will check the output of _execute_command and raise OrchestratorError if command fails on the remote host.
All commands run over ssh are prepended with sudo in `_execute_command` and shell-escaped with shlex quote.
If the cached ssh connection is closed or broken, the connection object will be removed from the cache, added to the `offline_hosts`, and an OrchestratorError will be raised. On the next call, the connection object will attempt to be recreated.
Exceptions involving asyncssh methods should be handled otherwise errors like TypeError: __init__() missing 1 required positional argument: 'reason' could occur due to the asyncssh error interacting with `raise_if_exception`
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create and cache asyncssh connection objects, and handle asyncssh connection errors
Create asyncssh connection object in async `_remote_connection` function and cache in `self.cons`
Create a handler for asyncssh log redirection and output ssh log if a connection error occurs
Disable asyncssh logger from propagating because the asyncssh info messages are verbose
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Kefu Chai [Fri, 20 Aug 2021 14:50:40 +0000 (22:50 +0800)]
cmake: exclude "grafonnet-lib" target from "all"
so we don't build this target when running "make", and hence avoid
accessing the internet in a building envronment where the internest
access is not allowed.
As part of removing RGWObjManifest from the Zipper API, we need to remove
WriteOp. Fortunately, with the multipart upload changes, it's no longer
needed outside the RadosStore.
Signed-off-by: Daniel Gryniewicz <dang@redhat.com>
Deepika Upadhyay [Thu, 19 Aug 2021 09:00:33 +0000 (14:30 +0530)]
run-make-check: fix do_cmake not consuming run-make-check opts
run-make-check.sh uses run-make.sh to `prepare`(install dependencies)
and `configure` cmake options, without quotes these options containing
special characters(hypens mostly) are skipped, hence we see not all
options supplied at cmake configure step.
Resolves(focused to solve issues in jenkins build env):
- missing cmake options:
cmake_opts+=" -DCMAKE_CXX_COMPILER=$cxx_compiler -DCMAKE_C_COMPILER=$c_compiler"
cmake_opts+=" -DCMAKE_CXX_FLAGS_DEBUG=\-Werror"
- Ninja not being used as cmake generator