Sage Weil [Thu, 20 Oct 2011 16:19:45 +0000 (09:19 -0700)]
perfcounters: use simple names
We don't need to uniquely identify ourselves in the global namespace with
the PerfCounter name.. only in the current process. Collectd will handle
the per-daemon naming part.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Tue, 18 Oct 2011 21:12:06 +0000 (14:12 -0700)]
test_filestore_idempotent: simple tool to generate a worklaod of non-idempotent operations
Generate a workload of operations that are non-idempotent. These are:
transaction {
clone A -> A.($n-1)
write $n to A
}
$n++
loop!
If we apply any transaction to the file system more than once, we will
find that the A.$n object does not contain $n, but instead contains
some larger value.
First run in 'write' mode to generate a workload and fake a crash.
Then run in 'verify' mode to see if the result was bad.
Sage Weil [Tue, 18 Oct 2011 20:38:21 +0000 (13:38 -0700)]
mds: handle xattrs on inode creation
Allow mknod, mkdir, symlink, create to provide xattrs for the new
inode. This will be used by the kclient to set ACLs on new inodes
based on the parent directory.
Sage Weil [Tue, 18 Oct 2011 23:04:22 +0000 (16:04 -0700)]
radosgw-admin: fix conflict with KeyType in libnss
rgw/rgw_admin.cc:459:6: error: using typedef-name 'KeyType' after 'enum'
/usr/include/nss3/keythi.h:69:3: error: 'KeyType' has a previous declaration here
Sage Weil [Mon, 17 Oct 2011 15:50:54 +0000 (08:50 -0700)]
ceph.spec: work around build.opensuse.org
The redhat-rpm-config isn't installed on build.opensuse.org, which means
the processor is set to i386 instead of something less ancient. This
breaks compilation on 32-bit x86.
Sage Weil [Wed, 12 Oct 2011 17:38:23 +0000 (10:38 -0700)]
cephtool: ability to send commands directly to osds
This makes commands beginning with 'tell <target>' magic in that they go
to the given target instead of to the monitor. This is slightly odd, but
I think it gives the most natural interface for the user, with the tool
Doing The Right Thing for you. E.g.,
ceph tell <someone> something (direct to some daemon)
ceph do something (goes to monitor to do X)
Sage Weil [Wed, 12 Oct 2011 00:51:07 +0000 (17:51 -0700)]
msg: add MCommand, MCommandReply message types
These are similar to MMonCommand[Ack], but aren't PaxosServiceMessage
children, don't include the command in the reply (useless), have a more
generic name.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Fri, 14 Oct 2011 23:49:28 +0000 (16:49 -0700)]
filestore: assert on any unexpected error
Right now, the only errors we expect out of the underlying filesystem are
-ENOENT, -ENODATA, or (as a workaround for extN xattr suckage) -ENOSPC
for certain setxattr operations.
Samuel Just [Thu, 13 Oct 2011 21:35:13 +0000 (14:35 -0700)]
PG: Fix log.empty confusion
Previously, log.empty meant that the log.head was everion_t(). However,
it was in a few places used to mean that log.head == log.tail. Now,
log.empty means log.head == log.tail and log.null() indicates that
log.head is eversion_t().
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Samuel Just [Thu, 13 Oct 2011 21:35:13 +0000 (14:35 -0700)]
PG: Fix log.empty confusion
Previously, log.empty meant that the log.head was everion_t(). However,
it was in a few places used to mean that log.head == log.tail. Now,
log.empty means log.head == log.tail and log.null() indicates that
log.head is eversion_t().
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Interval tree is an optimized data structure for representing and
querying intervals. Elementary intervals are represented as nodes of an
avl tree and the corresponding data is stored on these nodes based on a
concept of span. This representation allows log(n) (where n is the
number of data) storage. The balanced avl tree allows a log(n) query.
The implementation is a template class that is instantiated based on
parameters : - Interval type - Data type
Signed-off-by: Jojy George Varghese <jvarghese@scalecomputing.com>
Sage Weil [Thu, 13 Oct 2011 16:53:41 +0000 (09:53 -0700)]
osd: bound generate_past_intervals() by oldest map
The oldest osdmap we maintain is a lower bound on last_epoch_clean for the
entire system (assuming the monitor is doing it's job right). We can stop
generating past intervals when we hit it.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Greg Farnum [Wed, 12 Oct 2011 23:37:55 +0000 (16:37 -0700)]
cls_rgw: rewrite rgw_bucket_complete_op to use update.
Unfortunately we can't do multiple writes via the interface -- the
second one will clobber the first one. So use the update functionality
and go through that pain instead.
Greg Farnum [Wed, 5 Oct 2011 20:30:59 +0000 (13:30 -0700)]
objclass: add map interfaces.
Right now, they implement the TMAP functions, plus a few obvious
extras to read/write select keys and the header. In the future it
should be easy to switch them to better mapping implementations.
Sage Weil [Tue, 11 Oct 2011 18:16:20 +0000 (11:16 -0700)]
osd: fix race between op requeueing and _dispatch
If a message is working it's way through _dispatch, and another thread
requeues waiting messages under pg->lock (e.g.
osd->take_waiting(waiting_for_active)), the requeued ops are processed
after the one _dispatch() is chewing on, breaking client ordering.
Instead, add a new OSD::requeue_ops() that reinjects ops back into the
op queue by feeding them to the _handle_*() helpers. Those do last minute
checks before enqueuing the ops.
Fixes: #1490 (again) Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Samuel Just [Mon, 3 Oct 2011 20:29:47 +0000 (13:29 -0700)]
OSD,ReplicatedPG: expire and cleanup unconnected watchers
During handle_notify_timeout or ms_handle_reset, watchers are now marked
unconnected via pg->register_unconnected_watcher. A safe timer event has
been added to trigger OSD::handle_watch_timeout.
remove_watchers_and_notifies (called on role change) cleans up these
events before peering.
Signed-off-by: Samuel Just <samuel.just@dreamhost.com>