Loic Dachary [Wed, 13 May 2015 08:39:37 +0000 (10:39 +0200)]
tests: tiering health report reworked
Instead of
* setting limit
* populate the cache
* check the health warnings
do the following
* populate the cache
* set limits below the content of the cache
* check the health warnings
The problem with the former approach is that the limits stored by the
OSD internally do not exactly match the one set by the user: they are
converted in ratios and there may be rounding errors.
Also replace the busy loop waiting for pg stats to flush with
Loic Dachary [Wed, 13 May 2015 07:26:46 +0000 (09:26 +0200)]
tests: no agent when testing tiering agent border case
On a machine slow enough, the tiering agent can be activated while
testing border cases where the cache is almost full. Prevent that
by deactivating the tiering agent.
Kefu Chai [Fri, 1 May 2015 11:52:25 +0000 (04:52 -0700)]
common: fix the macros for malformed_input::what()
the thrown exception of malformed_input should carry the function name in
which it was thrown. but what we have now is something like:
"buffer::malformed_input: __PRETTY_FUNCTION__ unknown encoding version >
100"
because __PRETTY_FUNCTION__ is not a macro any more. see
- https://gcc.gnu.org/onlinedocs/gcc-3.1/gcc/Function-Names.html
- https://gcc.gnu.org/onlinedocs/gcc/Function-Names.html
and it is not a string literal, so neither can we can concat it with
the literal err message.
John Spray [Mon, 11 May 2015 11:53:52 +0000 (12:53 +0100)]
tools: fix tabletool reset of nonexistent sessionmap
If the object didn't exist, the omap clear was failing
and preventing the subsueent omap set header from
executing. Set the FAILOK flag on the omap clear
sub-operation.
John Spray [Wed, 29 Apr 2015 19:44:12 +0000 (20:44 +0100)]
tools: fix tabletool reset snap
SnapServer has an encode method defined that
is different to encode_state, whereas in InoTable
the two were synonmous. This code was working
previously for inotable but not for snapserver.
Raju Kurunkad [Tue, 5 May 2015 14:10:38 +0000 (19:40 +0530)]
Update XIO client connection IP and nonce
Obtain the local IP of the client and save the nonce provided when the messenger was created. This is required for RBD lock/unlock
Fix script error in RBD concurrent test
Reset did_bind during messenger shutdown
Jon Bernard [Fri, 8 May 2015 15:54:06 +0000 (11:54 -0400)]
common/admin_socket: close socket descriptor in destructor
Long-running processes that do not reuse a single client connection will
see accumulating file descriptors as a result of not closing the
listening socket. In this case, eventually the system will reach
file-max and subsequent connections will fail.
Fixes: #11535 Signed-off-by: Jon Bernard <jbernard@tuxion.com>
Loic Dachary [Wed, 6 May 2015 18:14:37 +0000 (20:14 +0200)]
tests: ceph-helpers kill_daemons fails when kill fails
Instead of silently leaving the daemons running, it returns failure so
the caller can decide what to do with this situation. The timeout is
also extended to minutes instead of seconds to gracefully handle the
rare situations when a machine is extra slow for some reason.
Loic Dachary [Fri, 8 May 2015 07:19:44 +0000 (09:19 +0200)]
install-deps.sh: exit on error if dependencies cannot be installed
Now that pre-installing pip dependencies is done at the end of the
script, the last command to run is no longer the installation
command. Therefore the status of the script is no longer the status of
the install command and no longer reflect success or failure to install
the dependencies. Add explicit || exit 1 to commands that are to be
treated as fatal errors.
Also set -e so that another error has a better chance to be caught.
Loic Dachary [Fri, 8 May 2015 06:57:24 +0000 (08:57 +0200)]
tests: pip must not log in $HOME/.pip
Because it may not have permission to when running in a container and
scripts run from source are not expected to modify anything outside of
the source tree anyway.
Jason Dillaman [Thu, 30 Apr 2015 19:32:38 +0000 (15:32 -0400)]
librbd: ObjectMap::aio_update can acquire snap_lock out-of-order
Detected during an fsx run where a refresh and CoR were occurring
concurrently. The refresh held the snap_lock and was waiting on
the object_map_lock, while the CoR held object_map_lock and was
waiting for snap_lock.
Fixes: #11577 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Loic Dachary [Thu, 7 May 2015 21:12:33 +0000 (23:12 +0200)]
tests: fail make check if nproc is too low
When running tests in parallel with make -jX, the ulimit -u (number of
processor / thread per user) needs to be at least X * 1024. If not it
will fail in mysterious ways. Since there is no convenient way to figure
out the value of X ( see
http://blog.jgc.org/2015/03/gnu-make-insanity-finding-value-of-j.html
for a non trivial an entertaining solution) add a very conservative
check that assumes the user will run make -jX where X is nproc / 2.
It will be annoying for users who want to run make check, not use -j,
and have a low ulimit -u. But the error suggest a way to override this
with
make CHECK_ULIMIT=false check
This is a minor irritation compared to the puzzling behavior of make
check when ulimit is exceeded.
John Spray [Thu, 7 May 2015 17:42:01 +0000 (18:42 +0100)]
client: fix error handling in check_pool_perm
Previously, on an error such as a pool not existing,
the caller doing the check would error out, but
anyone waiting on waiting_for_pool_perm would
block indefinitely (symptom was that reads on a
file with a bogus layout would block forever).
Fix by triggering the wait list on errors and
clear the CHECKING state so that the other callers
also perform the check and find the error.
Additionally, don't return the RADOS error code
up to filesystem users, because it can be
misleading. For example, nonexistent pool is
ENOENT, but we shouldn't give ENOENT on IO
to a file which does exist, we should give EIO.