mds: Adjust mydir auth when starting MDS that was stopped cleanly
When starting a MDS that was stopped cleanly, we need manually
adjust mydir's auth. This is because MDS log is empty in this case,
mydir's auth can not be adjusted during log replay.
librados: fix use without NULL check in rados_pool_list
CID 716911: Dereference after null check (FORWARD_NULL)
At (5): Passing null pointer "b" to function "strncat(char *, char
const *, size_t)", which dereferences it. (The dereference is assumed
on the basis of the 'nonnull' parameter attribute.)
librados: init everything in default IoCtxImpl ctor
CID 717219: Uninitialized pointer field (UNINIT_CTOR)
At (14): Non-static class member "objecter" is not initialized in this
constructor nor in any functions that it calls.
rbd: make sure we have a device before trying to unmap
CID 717444: Explicit null dereferenced (FORWARD_NULL)
At (48): Passing null pointer "devpath" to function
"do_kernel_rm(char const *)", which dereferences it.
Order is never actually this high currently, but it be via librbd.
CID 716937: Overflowed return value (INTEGER_OVERFLOW)
At (3): Overflowed or truncated value (or a value computed from an
overflowed or truncated value) "offset" used as return value.
CID 717012: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
At (1): Potentially overflowing expression "1 << obj_order" with type
"int" (32 bits, signed) is evaluated using 32-bit arithmetic before
being used in a context which expects an expression of type "uint64_t"
(64 bits, unsigned). To avoid overflow, cast the left operand to
"uint64_t" before performing the left shift.
CID 717011: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
At (1): Potentially overflowing expression "1 << order" with type
"int" (32 bits, signed) is evaluated using 32-bit arithmetic before
being used in a context which expects an expression of type "uint64_t"
(64 bits, unsigned). To avoid overflow, cast the left operand to
"uint64_t" before performing the left shift.
CID 717013: Unintentional integer overflow (OVERFLOW_BEFORE_WIDEN)
At (1): Potentially overflowing expression "1 << order" with type
"int" (32 bits, signed) is evaluated using 32-bit arithmetic before
being used in a context which expects an expression of type "uint64_t"
(64 bits, unsigned). To avoid overflow, cast the left operand to
"uint64_t" before performing the left shift.
CID 717226: Uninitialized scalar field (UNINIT_CTOR)
At (2): Non-static class member "cookie" is not initialized in this
constructor nor in any functions that it calls.
librbd: init m_req in LibrbdWriteback::C_Read ctor
CID 717225: Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "m_req" is not initialized in this
constructor nor in any functions that it calls.
librbd: initialize on-disk header in ImageCtx ctor
CID 717224: Uninitialized scalar field (UNINIT_CTOR)
At (26): Non-static class member field "header.snaps" is not
initialized in this constructor nor in any functions that it calls.
librbd: init everything in default AioRequest constructors
CID 717222: Uninitialized pointer field (UNINIT_CTOR)
At (16): Non-static class member "m_hide_enoent" is not initialized
in this constructor nor in any functions that it calls.
CID 717223: Uninitialized scalar field (UNINIT_CTOR)
At (4): Non-static class member "m_has_parent" is not initialized in
this constructor nor in any functions that it calls.
CID 717220: Uninitialized pointer field (UNINIT_CTOR)
At (4): Non-static class member "aio_type" is not initialized in this
constructor nor in any functions that it calls.
CID 717221: Uninitialized pointer field (UNINIT_CTOR)
At (2): Non-static class member "m_req" is not initialized in this
constructor nor in any functions that it calls.
Sage Weil [Thu, 20 Sep 2012 17:14:24 +0000 (10:14 -0700)]
msg/Accepter: fix race in accepter shutdown
We want to avoid a race like:
- entry() starts, populates pfd with listen_sd, gets past !done check
- stop() does shutdown + close on listen_sd
- someone else opens a new fd
- entry() thread calls poll(2) on wrong sd
- stop() calls join, waits forever for entry thread
rgw: prepare_update_index should not error on system bucket
Should just return true. This way we don't need higher level
functions to be aware of system buckets. Also, don't use
marker.empty() to test for system bucket, use bucket_is_system().
Sam Lang [Thu, 20 Sep 2012 15:54:45 +0000 (08:54 -0700)]
vstart.sh: Alternative fix for vstart.sh -n
The previous fix (0f7c516f3e) breaks osd startup with -k. This one
from dmick just tells the ceph-mon which keyring to use through the
command line rather than moving the keyring path to the [global]
section of the config file.
When handling master request with slaves, the mds could crash
after receiving all slaves' commit acknowledgement, but before
journalling the ECommitted. Current MDS recovery code does not
handle this case correctly, the request will be left in
LogSegment's uncommitted_masters after recovery is finished.
It prevents LogSegment from being trimmed. The fix is find and
clean up request of this kind when recovery enters rejoin stage.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
Sam Lang [Thu, 20 Sep 2012 17:24:35 +0000 (10:24 -0700)]
cfuse: Define CEPH_INO_DOTDOT (3) for top-level parentdir
Defines the macro CEPH_INO_DOTDOT (to 3) and uses it as the top-level
parent directory (..) inode number. The value of 2 is already taken
by the .ceph hidden directory.
Sam Lang [Thu, 20 Sep 2012 00:47:21 +0000 (17:47 -0700)]
cfuse: Add the parent entry (..) for a top-level readdir
In the lowlevel fuse api, the current (.) and parent (..) entries
must be added manually in a readdir call. For the root directory
the parent is not a ceph inode, so we give it a fake inode value
(2) and intercept that inode on a getattr.
Fixes: #1957 Signed-off-by: Sam Lang <sam.lang@inktank.com>
Sam Lang [Wed, 19 Sep 2012 20:22:59 +0000 (13:22 -0700)]
Move keyring option to global section
Using vstart.sh -n uses ceph-authtool to generate the keyring file
in ./keyring. The vstart.sh script then writes out the ceph.conf
with a keyring option in the [client] section, so when the monitors
start, they can't find a keyring file. This commit puts the keyring in
the [global] section.
Sage Weil [Tue, 18 Sep 2012 21:48:14 +0000 (14:48 -0700)]
mon: make MRoute encoding backwards-compatible
If the target as the NULLROUTE feature, use a new encoding that explicitly
indicates whether a message follows. If the feature is absent, use the
old encoding. The mon is responsible for not trying to send a null reply
if the target does not have the feature.
Alex Elder [Wed, 19 Sep 2012 03:51:10 +0000 (22:51 -0500)]
rbd/copy.sh: fix typo
Or maybe it was a spello, or a thinko, or something. In any case
I'm pretty sure Josh intended to call the function he added in
commit 78d6a60ca, and not the non-existent "test_import_args".
Alex Elder [Wed, 19 Sep 2012 03:51:10 +0000 (22:51 -0500)]
rbd/copy.sh: fix typo
Or maybe it was a spello, or a thinko, or something. In any case
I'm pretty sure Josh intended to call the function he added in
commit 78d6a60ca, and not the non-existent "test_import_args".
librbd: use generic cls_lock instead of cls_rbd's locking
Update the librbd locking api to make more sense:
* Add an optional tag to shared locking
* only make shared vs exclusive different functions in the user-visible api
* return a list of structs instead of a set of pairs
* fix incorrect range checking in the C api
* rename locks to lockers to be consistent with the generic locking class
* rename other_locker parameter to client, to match the list_lockers usage
rbd: make --pool/--image args easier to understand for import
There's no need to set the default pool in set_pool_image_name - this
is done later, in a way that doesn't ignore --pool if --dest-pool
is not specified.
This means --pool and --image can be used with import, just like
the rest of the commands. Without this change, --dest and --dest-pool
had to be used, and --pool would be silently ignored for rbd import.
librbd, cls_rbd: close snapshot creation race with old format
If two clients created a snapshot at the same time, the one with the
higher snapshot id might be created first, so the lower snapshot id
would be added to the snapshot context and the snaphot seq would be
set to the lower one.
Instead of allowing this to happen, return -ESTALE if the snapshot id
is lower than the currently stored snapshot sequence number. On the
client side, get a new id and retry if this error is encountered.
Backport: argonaut Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Josh Durgin [Thu, 30 Aug 2012 00:30:17 +0000 (17:30 -0700)]
doc: clarify rbd man page (esp. layering)
* a clone's size can't be overridden
* note which commands require format 2
* clarify details of copy
* add examples for cloning
* add pool to map example for consistency
* fix a couple warnings and re-sync man page with rst
Josh Durgin [Wed, 29 Aug 2012 00:24:47 +0000 (17:24 -0700)]
librbd: prevent racing clone and snap unprotect
If the following sequence of events occured,
a clone could be created of an unprotected snapshot:
1. A: begin clone - check that snap foo is protected
2. B: rbd unprotect snap foo
3. B: check that all pools have no clones of foo
4. B: unprotect snap foo
5. A: finish creating clone of foo, add it as a child
To stop this from happening, check at the beginning and end of
cloning that the parent snapshot is protected. If it is not,
or checking protection status fails (possibly because the parent
snapshot was removed), remove the clone and return an error.
Sage Weil [Tue, 4 Sep 2012 23:55:08 +0000 (16:55 -0700)]
mon: decay laggy calculations over time
Add a configurable halflife for the laggy probability and duration and
apply it at the time those values are used to adjust the heartbeat grace
period. Both are multiplied together, so it doesn't matter which you
think is being decayed (the probability or the interval).
Sage Weil [Tue, 4 Sep 2012 20:39:23 +0000 (13:39 -0700)]
mon: scale heartbeat grace based on laggy probability, interval
If, based on historical behavior, an observed osd failure is likely to be
due to unresponsiveness and not the daemon stopping, scale the heartbeat
grace period accordingly:
This will avoid fruitlessly marking OSDs down and generating additional
map update overhead when the cluster is overloaded and potentially
struggling to keep up with map updates. See #3045.
Sage Weil [Tue, 4 Sep 2012 20:20:32 +0000 (13:20 -0700)]
mon: check failures in tick
Currently we only trigger a failure on receipt of a failure report. Move
the checks into a helper and check during tick() too, so that we will
trigger failures even when the thresholds are not met at failure report
time. This is rarely true now, but will be true once we locally scale the
grace period.