Sage Weil [Fri, 27 Jan 2012 18:41:50 +0000 (10:41 -0800)]
filestore: dump offending transaction on any error
Clean this code up to explicitly whitelist what is ok so that the flow is
less annoying to follow/maintain, and so that we dump the transaction
contents on whitelisted errors.
Signed-off-by: Sage Weil <sage@newdream.net> Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
Sage Weil [Fri, 27 Jan 2012 18:39:49 +0000 (10:39 -0800)]
objecter: fix bounds checking on op reply demuxing
We can't assume that the size of out_ops (from the reply) matches the
op->out_* vectors from our request state. In particular, the out_ops might
be shorter than what we sent the OSD if the OSD was sloppy. Check them.
We can assume that op->ops and op->out_* all match; assert as much in
op_submit().
Fixes: #1986 Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Greg Farnum <gregory.farnum@dreamhost.com>
Sage Weil [Wed, 25 Jan 2012 20:38:06 +0000 (12:38 -0800)]
osd: remove num_kb from object_stat_sum_t stats
This is redundant--we can just use num_bytes. If we're worried about the
per-object overhead or rounding, we can factor in some overhead based on
num_objects.
And, the kb accounting has a bug (#1988).
Avoid changing the encoding at all for now. Next time the encoding changes
we'll drop the old field.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Sage Weil [Wed, 25 Jan 2012 06:03:51 +0000 (22:03 -0800)]
osd: track obc for clone from log replay
We need to keep an in-memory obc to track the state of the in-flight io
to disk. This is analogous to when an object is pushed + written, and we
can share the same completion function.
Alexandre Oliva [Tue, 17 Jan 2012 19:22:17 +0000 (17:22 -0200)]
package *.py* files
Some post-install rpmbuild defaults byte-compile all packaged python
files, so don't bother removing the .pyc files, and package .py* to
get both .pyo and .pyc. It wastes a tiny little bit of space, but it
makes the spec file portable across a wider range of rpm and python
configurations.
Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicam.br> Signed-off-by: Sage Weil <sage@newdream.net>
Josh Durgin [Wed, 25 Jan 2012 00:52:27 +0000 (16:52 -0800)]
librbd: don't infinite loop when header is too large
Since snapshots are currently stored at the end of the header, having
many snapshots made the header larger than the read size, resulting in
an infinite loop when the offset was not changed.
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Mon, 23 Jan 2012 18:21:04 +0000 (10:21 -0800)]
osd: ignore MInfoRec, MNotifyRec in WaitActingChange
We should ignore logs, infos, and notifies while we are waiting for the
map to change. Peering has reached a dead-end (we need acting to change)
and we will redo our work when that happens. That includes the replicas
resending notifies.
Fixes: #1958 Signed-off-by: Sage Weil <sage.weil@dreamhost.com> Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
Sage Weil [Thu, 19 Jan 2012 02:01:09 +0000 (18:01 -0800)]
osd: do not clobber log on backfill progress update
This is unnecessary and counterproductive, since the log is used to detect
dup ops. It's an artifact of an earlier backfill iteration that didn't
preserve the log on the backfill target.
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
Yehuda Sadeh [Fri, 20 Jan 2012 20:54:14 +0000 (12:54 -0800)]
rgw: read_user_buckets() fix redone
The problem with the original fix is that it wasn't atomic. Going back
to the original inefficient (though atomic) method. We should limit
the number of buckets per user anyway, and shouldn't get into a point
where this code is actually execised.
Neil Horman [Wed, 18 Jan 2012 17:00:14 +0000 (12:00 -0500)]
Convert mount.ceph to use KEY_SPEC_PROCESS_KEYRING
having mount.ceph use KEY_SPEC_USER_KEYRING to pass keys to the kernel has
several disadvantages:
1) It leaves the key setting in the uid_keyring, which is reachable from the
session keyring via a link (see keyctl list <root session keyring ref>). This
means its accessible to other processes in the same session that don't need
access to it, even after the kernel is done with it.
2) The user keyring has some very counter-intuitive semantics as far as keyring
permissions goes. The user keyring is access via a link from the session
keyring, which a process may not have permission to access in some situations.
For instance if mount.ceph is executed via su without having started a new
session, mount.ceph will not have access to the uid keyring unless the calling
proces (in this case su) has granted access permission. The result is a -EPERM
error when executing mount.ceph to a cephx enabled server. If the same command
is attempted in a new root session (e.g. su - or su -l), the mount command will
work fine
Switching the mount.ceph command to use the KEY_SPEC_PROCESS_KEYRING solves both
of these problems. By using this keyring, accessibility is guaranteed because
its added and accessed in the same process context both in user space and the
kernel, assuring aceesability, despite the session specifics. It also ensures
that the key will get cleaned up after the mount.ceph process exits
automatically, since there is no longer a need for it (the kernel clones the key
during the mount process and releases it on unmount).
I've tested this here on my local ceph cluster, and it works properly under both
su and su -l .
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Josh Durgin <josh.durgin@dreamhost.com>
Yehuda Sadeh [Wed, 18 Jan 2012 07:42:08 +0000 (23:42 -0800)]
rgw: fix intent log processing
Intent log processing was completely broken. First, it wasn't
parsing the date correctly (due to failure to initalize strptime).
Second, it was trying to load the entire log to memory in one
piece (and in a racy way). This fixed bug #1948.
Sage Weil [Wed, 18 Jan 2012 05:10:05 +0000 (21:10 -0800)]
objecter: gift reply data to outbl _after_ demuxing
Divvy up the result bl first, then gift the whole shebang to outbl. If
we gift it first, there's nothing to demux (since we move intead of copy
the bufferlist ptrs).
Sage Weil [Tue, 17 Jan 2012 19:41:15 +0000 (11:41 -0800)]
filestore: overwrite fsid during --mkfs
This mainly matters because read_fsid() now looks at the file size to
determine if it's an old- or new-style fsid, and not overwriting mean a
downgrade confuses things. Not that anyone would do that, but...