Kefu Chai [Wed, 25 Aug 2021 14:25:54 +0000 (22:25 +0800)]
crimson/os: use structured binding in loop
also avoid using `map[key] = val` for setting an item in map, as, if
the key does not exist in map, `map[key]` would have to create a value
using its default ctor, and then call the `operator=(bufferlist&&)` to
set it.
crimson/osd: implicitly append '--smp 1' when invoked without it
This commit is basically a hack supposed to fulfil the obligation
of crimson being a drop-in replacement for the classical OSD
we already made by packaging it under `/usr/bin/ceph-osd`.
The discussion whether the interface-exactness should be continued
or not is out of scope of the commit; it's supposed just to handle
the issue unveiled by the Rook integration effort: `crimson-osd`
is unable to `--mkfs` because Seastar, if not restricted by passing
`--smp N`, considers all CPU cores available in the system when
allocating resources. This leads to the following error:
```
ERROR 2021-08-24 14:17:32,105 [shard 5] seastar - Could not setup Async I/O: Resource temporarily unavailable. The most common cause is not enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that number or reducing the amount of logical CPUs available for your application
```
This hack will need to be dropped when integrating multi-reactor
support in crimson.
Kefu Chai [Thu, 19 Aug 2021 09:24:32 +0000 (17:24 +0800)]
common/options: validate see-also
y2c.py is like a compiler which translates .yaml to .cc and .h files,
it does not have access to all .yaml files. to validate the dangling
see-also issue, we need to do this with a "linker".
in this change, validate-options.py is introduced to check if any of
option name included by the see-also property is valid.
behave_test: Implemented basic bahave test scenario's
Fixes: https://tracker.ceph.com/issues/52371
This commit includes the basic implementation of behave test scenario's
(for cephadm, ceph shell and OSD commands) and python implementations for
interacting with kcli and behave test cases. The test scenarios can be executed
using behave command. The files are created under src/test/behave_tests directory.
haoyixing [Mon, 16 Aug 2021 10:55:24 +0000 (18:55 +0800)]
rbd: avoid overflow of ios and clarify io-size limit for bench
When doing rbd bench, we record done ios to print progress, current it's unsigned.
Suppose we do a bench of io-size 512B and io-total 4T, that means a total number of
8G ios which causes an overflow.
And we don't support io-size greater than 4G, so change help message.
Xuehan Xu [Thu, 19 Aug 2021 05:33:36 +0000 (13:33 +0800)]
crimson/common: keep ref count of crimson::interruptible::interrupt_cond
Currently, interrupt conditionss are transfered between inner and outer continuation
chains via a tls interrupt_cond variable. This simple strategy leads to problem when it
comes to mixing normal future/continuation procedures and seastar::thread. When seastar::async()
is called, the reactor can directly invoke the passed functor and lead to two different
scenarios:
1.if a seastar::get/yield() inside the passed lambda, the interrupt_cond should be erased at the
end of the continuation execution when it is yielded back;
2.otherwise, the interrupt_cond should be not erased.
There can be so many possible sequences of yielding of several different fibers that we can hardly
judge at the end of the continuation execution whether there was a yielding during the current
execution, which means we can't be able to know whether the tls interrupt_cond should be erased.
There could be other scenarios where the current strategy fails. To end this kind of issues
once and for all, we involve the ref counting mechinary.
J. Eric Ivancich [Wed, 28 Jul 2021 18:07:09 +0000 (14:07 -0400)]
rgw-multisite: metadata conflict not computed correctly
The former logic with a conditional based on `++i == 0` would never
execute. So this uses a boolean to differentiate the first from other
iterations and tries to clarify the code through commenting and an
explicit declaration. Additionally a warning is eliminated by
initializing a variable.
Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
add `event_loop` and `tkey` object to with_cephadm_module, and create MockEventLoopThread in fixtures.py to test async functions of ssh.py.
rewrite test_offline to be compatible with asyncssh
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: use _remote_connection (ssh.py), _execute_command, _check_execute_command in _run_cephadm
remove _get_connection from module.py and _remote_connection in serve.py, replacing with _remote_connection in ssh.py.
also, replace remoto.process.check with _execute_command and _check_execute_command in ssh.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: remove remotes.py, replace old _write_remote_file in serve.py with write_remote_file in ssh.py
remove remotes.py because it is specific to execnet/remoto.
_write_remote_file in ssh.py now fulfills the function of write_file in remotes.py and the old _write_remote_file in serve.py
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create thread to start event loop for ssh.py, and return results of the async functions with get_result
The EventLoopThread class starts a thread and an event loop which runs forever. Coroutines are scheduled on the event loop by the `get_result` method which uses `run_coroutine_threadsafe` to return a concurrent.futures.Future, and ultimately the result with .result()
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create async function _write_remote_file to write files on remote host
_write_remote_file uses _check_execute_command in ssh.py which calls _execute_command which uses shlex quote. Thus, any commands with an int will need to be transformed into a str because shlex quote does not take int objects
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: execute commands run over ssh via asyncssh
_execute_command will run commands over ssh using the asyncssh `run` method: https://asyncssh.readthedocs.io/en/latest/api.html#asyncssh.SSHClientConnection.run
_check_execute_command will check the output of _execute_command and raise OrchestratorError if command fails on the remote host.
All commands run over ssh are prepended with sudo in `_execute_command` and shell-escaped with shlex quote.
If the cached ssh connection is closed or broken, the connection object will be removed from the cache, added to the `offline_hosts`, and an OrchestratorError will be raised. On the next call, the connection object will attempt to be recreated.
Exceptions involving asyncssh methods should be handled otherwise errors like TypeError: __init__() missing 1 required positional argument: 'reason' could occur due to the asyncssh error interacting with `raise_if_exception`
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
mgr/cephadm: create and cache asyncssh connection objects, and handle asyncssh connection errors
Create asyncssh connection object in async `_remote_connection` function and cache in `self.cons`
Create a handler for asyncssh log redirection and output ssh log if a connection error occurs
Disable asyncssh logger from propagating because the asyncssh info messages are verbose
Fixes: https://tracker.ceph.com/issues/44676 Signed-off-by: Melissa Li <li.melissa.kun@gmail.com>
Kefu Chai [Fri, 20 Aug 2021 14:50:40 +0000 (22:50 +0800)]
cmake: exclude "grafonnet-lib" target from "all"
so we don't build this target when running "make", and hence avoid
accessing the internet in a building envronment where the internest
access is not allowed.