Sage Weil [Thu, 13 Mar 2014 18:22:34 +0000 (11:22 -0700)]
rbd-fuse: fix signed/unsigned warning
rbd_fuse/rbd-fuse.c: In function 'enumerate_images':
rbd_fuse/rbd-fuse.c:113:2: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Samuel Just [Tue, 11 Mar 2014 21:23:10 +0000 (14:23 -0700)]
PG: do not wait for flushed before activation
This should reduce the sting of the previous commit somewhat. We wait
for the activation transactions to clear prior to accepting IO anyway,
so we can go ahead and get that process started without waiting for the
flush.
Samuel Just [Tue, 11 Mar 2014 17:31:55 +0000 (10:31 -0700)]
PG: do not serve requests until replicas have activated
There are two problems:
1) We choose the min last_update amoung peers with the max local-les
value as an upper bound on requests which could have been reported to
the client as committed. We then, for ec pools, roll back to that point
to ensure that we don't inadvertently commit to an update which fewer
than K replicas actually saw. If the primary sets local-les, accepts an
update from a client, and there is a new interval before any of the
replicas have been activated, we will end up being forced to use that
update which no other replica has seen as the new last_update. This
will cause the object to become unfound. We don't have this problem as
long as all active replicas agree on last_update before we accept IO.
2) Even for replicated pools, we would then immediately respond to the
request which created the primary-only update with a commit since it is
in the log and we have no outstanding repops. If we then lose that
primary before any of the replicas in the new interval record the new
log, we will not only lose the object, but also the log entry recording
it, which will result in a lost write.
For these reasons, it seems like we need to wait for the replicas to
activate before we can process new requests essentially because whatever
update we select as last_update is essentially regarded as committed as
soon as we accept IO.
Fixes: #7649 Signed-off-by: Samuel Just <sam.just@inktank.com>
Yehuda Sadeh [Wed, 12 Mar 2014 01:19:44 +0000 (18:19 -0700)]
rgw: don't overwrite bucket entry data when syncing user stats
Fixes: #7687
When syncing user bucket stats we overwritten the entire entry with the
passed in entry. We should only look at the stats portion, and not
overwrite the rest (which contains bucket creation time).
Image names buffer is fixed at 1024. This turns out to be not enough:
there are at least two "rbd-fuse rbd_list: error %d Numerical result
out of range" reports on the ML. Fix it by calling rbd_list() twice to
first get the expected buffer size. Also, get rid of the memory leak
and tweak the error message while at it.
Warren Usui [Fri, 21 Feb 2014 05:11:45 +0000 (21:11 -0800)]
Fix get_status() to find client.rados text inside of ps command results.
Added port (fixed value for right now in teuthology) to hostname. Fixes: 7374 Reviewed-by: Yehuda Sadeh <yehuda@inktank.com> Signed-off-by: Warren Usui <warren.usui@inktank.com>
(cherry picked from commit 8200b8a02511e367370d33cb74c3d45ef85fca31)
Yan, Zheng [Sun, 9 Mar 2014 23:36:14 +0000 (07:36 +0800)]
mds: fix owner check of file lock
flock and posix lock do not use process ID as owner identifier.
The process ID of who holds the lock is just for F_GETLK fcntl(2).
For linux kernel, File lock's owner identifier is the file pointer
through which the lock is requested.
The fix is do not take the 'pid_namespace' into consideration when
checking conflict locks. Also rename the 'pid' fields of struct
ceph_mds_request_args and struct ceph_filelock to 'owner', rename
'pid_namespace' fields to 'pid'.
The kclient counterpart of this patch modifies the flock code to
assign the file pointer to the 'owner' field of lock message. It
also set the most significant bit of the 'owner' field. We can use
that bit to distinguish between old and new clients.
Stephan Renatus [Mon, 10 Mar 2014 14:17:41 +0000 (15:17 +0100)]
rbdmap: bugfix upstart script
It seems like the upstart script is lacking a little behind [the initscript](https://github.com/ceph/ceph/blob/master/src/init-rbdmap#L44-L49); however, this bugfix makes it actually do what it should do.
Before, the bug made the job just ignore all parameters, with the following error in /var/log/upstart/rbdmap.log:
Ilya Dryomov [Mon, 10 Mar 2014 08:36:48 +0000 (10:36 +0200)]
FileStore: support compiling without libxfs
When configured with --without-libxfs, use GenericFileStoreBackend
instead of XfsFileStoreBackend for XFS. At this point this would only
impact the allocation hint op. The default is to compile with
--with-libxfs. (Previously it was unconditionally enabled on linux and
disabled for non-linux arches.)
Samuel Just [Fri, 7 Mar 2014 23:54:23 +0000 (15:54 -0800)]
ReplicatedPG::finish_ctx: clear object_info if !obs.exists
Otherwise, we see a different object_info_t depending on whether the
transaction deleting the object clears before another op recreating it appears.
In particular, we use oi.version to set the prior_version on the log entries in
finish_ctx. If the oi is allowed to stick around the recreation log event will
have a prior version of the deletion event when it should have a prior version
of eversion_t().
Fixes: #7655 Signed-off-by: Samuel Just <sam.just@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
Babu Shanmugam [Sat, 8 Mar 2014 05:17:13 +0000 (05:17 +0000)]
Broke down sysinfo's format into a histogram with a value and count
so that we just see how many of each version/distro/kernel/os/arch/cpu/etc are running
Looking for an entry in olog which matches one of ours might add
extra divergent entries. Instead, do what merge_log does and
walk back through the auth log looking for an entry in olog.
Fixes: 7657 Signed-off-by: Samuel Just <sam.just@inktank.com>
Yan, Zheng [Thu, 6 Mar 2014 23:12:39 +0000 (07:12 +0800)]
client: fix Client::getcwd()
An recent commit made MDS not include dentry trace in LOOKUPPARENT
reply. It broke Client::getcwd. The fix is change getcwd() to use
LOOKUPNAME MDS request
Yan, Zheng [Thu, 6 Mar 2014 07:24:02 +0000 (15:24 +0800)]
mds: introduce LOOKUPNAME MDS request
The new MDS request is used for connecting a given inode to its
parent inode. It allows client to have efficient implementation of
get_rename() NFS export callback.
Sage Weil [Fri, 7 Mar 2014 22:02:26 +0000 (14:02 -0800)]
mon/PGMap: send pg create messages to primary, not acting[0]
For erasure pools, these may not match.
In the case of #7652, this caused pg_create messages to be send
indefinitely. register_pg() added it to the list for acting_primary, and
when we got the (non-creating) pg stat update we removed it from the list
for acting[0].
Fixes: #7652 Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 7 Mar 2014 21:29:03 +0000 (13:29 -0800)]
mon/OSDMonitor: make osdmap feature checks non-racy
The check for OSD features may race with the boot of an OSD that does not
have the necessary features. Check the pending info too, and if there is
a missing feature, return -EAGAIN. In the callers, wait on -EAGAIN.
qa: workunits/mon/rbd_snaps_ops.sh: ENOTSUP on snap rm from copied pool
'rados cppool' copies the contents but that doesn't make the destination
pool an unmanaged snaps pool. Therefore, we must get an ENOTSUP when
we try to remove an unmanaged snap from a not-unmanaged pool.
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
mon: OSDMonitor: don't remove unamanaged snaps from not-unmanaged pools
Although we should allow creating unmanaged snaps on not-unamanaged pools,
as long as those pools don't have any managed snapshots in them, we cannot
allow removal -- because the pool will not have any unmanaged snapshots.
Fixes: 7210 Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Sage Weil [Fri, 7 Mar 2014 00:12:30 +0000 (16:12 -0800)]
osd: fix agent thread shutdown
We had an old invariant that agent_queue would have at least 1 entry in
it to simplify some other code paths, but it turns out that it is simpler
not to do that.
In particular, this was triggering a failed assertion on shutdown when we
assert that the queue is empty.
Dump offending items on shutdown if they are there, tho, to catch any
future bugs.
Fixes: #7637 Signed-off-by: Sage Weil <sage@inktank.com>