]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agoleveldbstore: handle old versions of leveldb
Greg Farnum [Wed, 17 Apr 2013 20:21:04 +0000 (13:21 -0700)]
leveldbstore: handle old versions of leveldb

The filter_policy (bloom filter) stuff is fairly new in LevelDB's life,
and it turns out that precise's version is too old for it. Add conditional
compilation for those members in order to build and work properly.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-4521-fix' into next
Sage Weil [Wed, 17 Apr 2013 22:03:03 +0000 (15:03 -0700)]
Merge remote-tracking branch 'gh/wip-4521-fix' into next

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: change XLOCK/XLOCKDONE's next state to LOCK
Yan, Zheng [Fri, 12 Apr 2013 08:11:11 +0000 (16:11 +0800)]
mds: change XLOCK/XLOCKDONE's next state to LOCK

For simplelock and filelock, XLOCK/XLOCKDONE's next state is SYNC.
But filelock in XLOCK/XLOCKDONE state allow Fb caps, filelock in
SYNC state does not. So filelock can be stuck in XLOCK/XLOCKDONE
state forever if there are Fb caps issued.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: pass proper mask to CInode::get_caps_issued
Yan, Zheng [Fri, 12 Apr 2013 08:11:09 +0000 (16:11 +0800)]
mds: pass proper mask to CInode::get_caps_issued

There is a total of 22 cap bits and file lock uses 8 cap bits.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomon: Monitor: convert osdmap_full as well 220/head 221/head
Joao Eduardo Luis [Thu, 4 Apr 2013 17:19:02 +0000 (18:19 +0100)]
mon: Monitor: convert osdmap_full as well

Store conversion wasn't converting the osdmap_full/ versions, only the
incrementals under osdmap/ and the latest full version stashed.  This
would lead to some serious problems during OSDMonitor's update_from_paxos
when the latest stashed didn't correspond to the first available
incremental.

Fixes: #4521
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: PaxosService: add helper function to check if a given version exists
Joao Eduardo Luis [Thu, 4 Apr 2013 17:17:21 +0000 (18:17 +0100)]
mon: PaxosService: add helper function to check if a given version exists

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoosd/PG.cc: initialize PG::flushed in constructor
Danny Al-Gaaf [Tue, 16 Apr 2013 16:14:49 +0000 (18:14 +0200)]
osd/PG.cc: initialize PG::flushed in constructor

Initialize PG::flushed in constructor with false as
described in doc/dev/osd_internals/pg.rst .

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit fb840c8ff75b0c66dfeed48e8558542fe3da4c24)

12 years agoMerge pull request #215 from ceph/wip-leveldb-config
Sage Weil [Wed, 17 Apr 2013 16:49:11 +0000 (09:49 -0700)]
Merge pull request #215 from ceph/wip-leveldb-config

os: bring leveldbstore options up to date

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFix policy handling for RESTful admin api.
caleb miles [Wed, 17 Apr 2013 15:11:21 +0000 (11:11 -0400)]
Fix policy handling for RESTful admin api.

Signed-off-by caleb miles <caleb.miles@inktank.com>

12 years agoqa: pull qemu-iotests from ceph.com mirror
Sage Weil [Tue, 16 Apr 2013 23:39:17 +0000 (16:39 -0700)]
qa: pull qemu-iotests from ceph.com mirror

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #214 from ceph/wip-objectcacher-handler-ordered
Sage Weil [Tue, 16 Apr 2013 22:48:15 +0000 (15:48 -0700)]
Merge pull request #214 from ceph/wip-objectcacher-handler-ordered

keep write responses to clones in order

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: flush on diff_iterate
Sage Weil [Tue, 16 Apr 2013 22:45:41 +0000 (15:45 -0700)]
librbd: flush on diff_iterate

The diff_iterate() tests fail when caching is enabled because recent writes
aren't visible to listsnaps.  Flush from diff_iterate to ensure that they
are.  Someday, maybe, we might make diff_iterate() inspect the cache
contents to make this more efficient, but for now that is not necessary.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'next' of https://github.com/ceph/ceph into next
John Wilkins [Tue, 16 Apr 2013 20:29:15 +0000 (13:29 -0700)]
Merge branch 'next' of https://github.com/ceph/ceph into next

12 years agodoc: Cherry-picked from master to next. Uses ceph-mds package during upgrade.
John Wilkins [Tue, 16 Apr 2013 20:28:18 +0000 (13:28 -0700)]
doc: Cherry-picked from master to next. Uses ceph-mds package during upgrade.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Cherry-picked from master to next. Rewrite of CloudStack document.
John Wilkins [Tue, 16 Apr 2013 20:26:32 +0000 (13:26 -0700)]
doc: Cherry-picked from master to next. Rewrite of CloudStack document.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Cherry-picked from master to next. Updates config to use virtio.
John Wilkins [Tue, 16 Apr 2013 20:24:47 +0000 (13:24 -0700)]
doc: Cherry-picked from master to next. Updates config to use virtio.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Cherry-picked from master to next. Reorders ceph osd create.
John Wilkins [Tue, 16 Apr 2013 20:23:56 +0000 (13:23 -0700)]
doc: Cherry-picked from master to next. Reorders ceph osd create.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Cherry picked from master to next. Adds comments on naming OSDs.
John Wilkins [Tue, 16 Apr 2013 20:22:13 +0000 (13:22 -0700)]
doc: Cherry picked from master to next. Adds comments on naming OSDs.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoos/FileJournal: fix journal completion plug removal
Sage Weil [Tue, 16 Apr 2013 15:26:47 +0000 (08:26 -0700)]
os/FileJournal: fix journal completion plug removal

We plug completions when transitioning from a full to non-full journal
to ensure that we do not complete items before we have a stable journal
starting point that is past the committed_thru marker.  However, the order
of the header update and completion queueing means that we never remove
the plug if the journalq is empty--the seq test is always false.  The
result is very slow osd requests that only commit when we do a full sync.

This bug was masked until recently by another issue, fixed in
170d4a3d794260476ecde1e5e2ee719b7cb3ffd1.

The simple fix is to reorder the completion queuing before we update the
new header.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoconfig: provide settings for the LevelDB stores we use 215/head
Greg Farnum [Tue, 16 Apr 2013 17:59:21 +0000 (10:59 -0700)]
config: provide settings for the LevelDB stores we use

Now that we can set up the LevelDB options internally, provide
config options on the OSD and the Monitor. We leave the OSD values
at the defaults for now as they're performance-sensitive, but we
set new values on the Monitor so that it can scale to large PGMaps.
(Previously there were issues with large PGMaps taking forever to write;
these changes to the use of compression and the default block and
write buffers counteract them.)

Since we pass these variables through, users who are interested in
doing so now can test and tune them more appropriately.

Reported-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoclient: Fix inode remove from snaprealm race
Sam Lang [Fri, 12 Apr 2013 16:08:35 +0000 (11:08 -0500)]
client: Fix inode remove from snaprealm race

This is a follow on fix to b5ce4d0.  Always remove the inode from the
snaprealm's list of inodes_with_caps before the snaprealm ref is
decremented (and the snaprealm potentially gets freed).

Fixes #4694.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agolibrbd: use initialized data for DiffIterateDiscard test
Sage Weil [Tue, 16 Apr 2013 04:49:38 +0000 (21:49 -0700)]
librbd: use initialized data for DiffIterateDiscard test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: print seed for all DiffIterate tests
Sage Weil [Tue, 16 Apr 2013 04:32:03 +0000 (21:32 -0700)]
librbd: print seed for all DiffIterate tests

This will aid debugging on failures, and give better coverage.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #217 from alram/master
Sage Weil [Tue, 16 Apr 2013 03:32:46 +0000 (20:32 -0700)]
Merge pull request #217 from alram/master

Fix: use absolute path with udev

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFix: use absolute path with udev 217/head
Alexandre Marangone [Mon, 15 Apr 2013 22:57:00 +0000 (15:57 -0700)]
Fix: use absolute path with udev

Avoids the following: udevd[61613]: failed to execute '/lib/udev/bash'
'bash -c 'while [ ! -e /dev/mapper/....

Signed-off-by: Alexandre Marangone <alexandre.marangone@inktank.com>
12 years agoqa: add workunit for running qemu-iotests
Josh Durgin [Sat, 13 Apr 2013 00:33:45 +0000 (17:33 -0700)]
qa: add workunit for running qemu-iotests

This uses the old stand-alone qemu-iotests repo so it works with the
version of qemu in Ubuntu 12.04. The tests depend tightly on qemu
version, so to use later tests we'd need to install corresponding
versions of qemu.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoos: bring leveldbstore options up to date
Greg Farnum [Wed, 10 Apr 2013 22:58:42 +0000 (15:58 -0700)]
os: bring leveldbstore options up to date

LevelDB has a lot of options which we don't implement right now. Add
an options struct to the LevelDBStore which users can access as they
wish in order to set values different from the defaults.
This will let us set various size values, as well as turning on
caching or bloom filter read optimizations.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agomds: output error number when failing to load an MDSTable
Greg Farnum [Fri, 12 Apr 2013 20:12:03 +0000 (13:12 -0700)]
mds: output error number when failing to load an MDSTable

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoinit-radosgw.sysv: New radosgw init file for rpm based systems
Gary Lowell [Wed, 20 Feb 2013 01:25:27 +0000 (17:25 -0800)]
init-radosgw.sysv:  New radosgw init file for rpm based systems

Added init-radosgw.sys file for rpm based systems, added it to
the tarball list in the makefile, and updated the specfile to
install it.  Also added the a dependency in ceph since it uses
utility routes from that package (On debian systems these are
packaged in ceph-common).  Incorporated review comments from
Alex. (Bug #4571)

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Reviewed-by: Alexandre Marangone <alexandre.marangone@inktank.com>
12 years agoMerge pull request #213 from ceph/wip-sessionmap-4644
Sam Lang [Thu, 11 Apr 2013 16:08:04 +0000 (09:08 -0700)]
Merge pull request #213 from ceph/wip-sessionmap-4644

mds: fix session_info_t decoding

Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoMerge pull request #212 from ceph/wip-4451
Gregory Farnum [Thu, 11 Apr 2013 15:45:06 +0000 (08:45 -0700)]
Merge pull request #212 from ceph/wip-4451

12 years agomds: Delay export on missing inodes for reconnect 212/head
Sam Lang [Tue, 9 Apr 2013 15:35:19 +0000 (10:35 -0500)]
mds: Delay export on missing inodes for reconnect

The reconnect caps sent by the client on reconnect may not have
inodes found in the inode cache until after clientreplay (when
the client creates a new file, for example). Currently, we send an
export for that cap to the client if we don't see an inode in the cache
and path_is_mine() returns false (for example, if the client didn't
send a path because the file was already unlinked).
Instead, we want to delay handling of the reconnect cap until
clientreplay completes.

This patch modifies handle_client_reconnect() so that we don't assume
the cap isn't ours if we don't have an inode for it, but instead delay
recovery for later. An export cap message is only sent if the inode exists
and the cap isn't ours (non-auth) during reconnect. If any remaining
recovered caps exist in the recovered list once the mds goes active, we
send export messages at that point.

Also, after removing the path_is_mine check,
MDCache::parallel_fetch_traverse_dir() needs to skip non-auth dirfrags.

Fixes #4451.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoclient: Unify session close handling
Sam Lang [Thu, 4 Apr 2013 20:59:56 +0000 (15:59 -0500)]
client:  Unify session close handling

If mds failure causes client reconnect while the
client is unmounting, the client will send a session
close request to the mds even if there are outstanding
inodes in the cache waiting to receive flush_acks.   This
causes the mds to send back a session close message and
the client closes the connection, so that when the mds tries
to send flush acks back to the client, they get dropped, resulting
in the client hanging on unmount.  The pattern for this bug is:

1. mds restart
2. client sends session open request
3. client unmount sets unmounting flag and waits for flush_acks
4. mds sends session open reply
5. client sends session close request (because its unmounting)
6. mds sends session close, client closes connection
7. mds tries to send flush_acks, but drops them because the connection
is gone

This patch unifies the session close handling so that the client
only sends a session close in unmount once all flush acks have been
received.  If the mds restarts during session close, the reconnect
logic will kick the session close waiter so that session close requests
are re-sent for session close replies not yet received.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoLibrbdWriteback: complete writes strictly in order 214/head
Josh Durgin [Wed, 10 Apr 2013 21:16:56 +0000 (14:16 -0700)]
LibrbdWriteback:  complete writes strictly in order

RADOS returns writes to the same object in the same order. The
ObjectCacher relies on this assumption to make sure previous writes
are complete and maintain consistency. Reads, however, may be
reordered with respect to each other. When writing to an rbd clone,
reads to the parent must be performed when the object does not exist
in the child yet. These reads may be reordered, resulting in the
original writes being reordered. This breaks the assmuptions of the
ObjectCacher, causing an assert to fail.

To fix this, keep a per-object queue of outstanding writes to an
object in the LibrbdWriteback handler, and finish them in the order in
which they were sent.

Fixes: #4531
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoOSD: make pg upgrade logging quiet
Samuel Just [Wed, 10 Apr 2013 21:13:12 +0000 (14:13 -0700)]
OSD: make pg upgrade logging quiet

Fixes: #4701
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge branch 'wip_4654' into next
Samuel Just [Wed, 10 Apr 2013 21:00:13 +0000 (14:00 -0700)]
Merge branch 'wip_4654' into next

Fixes: #wip_4654
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agorbd qa/workunits: add rbd read data test
Alex Elder [Wed, 10 Apr 2013 20:44:01 +0000 (15:44 -0500)]
rbd qa/workunits: add rbd read data test

This adds a new test script for validating data reads from a mapped
rbd image is what it's expected to be.

See the content of the file for a bit more explanation.

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agorgw_admin: Create keys for a new user by default.
caleb miles [Wed, 10 Apr 2013 19:00:06 +0000 (15:00 -0400)]
rgw_admin: Create keys for a new user by default.

Create a new key pair for new users or when --gen-access-key is specified.

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agoFileJournal: start_seq is seq+1 if journalq.empty()
Samuel Just [Tue, 9 Apr 2013 22:14:19 +0000 (15:14 -0700)]
FileJournal: start_seq is seq+1 if journalq.empty()

This is also the same as journaled_seq + 1 for writeahead
journaling, but not for parallel journaling.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileJournal: fix off by one error in committed_thru
Samuel Just [Tue, 9 Apr 2013 22:13:38 +0000 (15:13 -0700)]
FileJournal: fix off by one error in committed_thru

journalq.front().first is the sequence number of the entry
at journalq.front().second.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournal: commits may not include all journaled seqs
Samuel Just [Tue, 9 Apr 2013 21:53:52 +0000 (14:53 -0700)]
Journal: commits may not include all journaled seqs

At one point, a commit had to drain the FileStore op
queue.  This is no longer the case.  Consequently, the
journal may have to wait more than one commit for the
filestore to create a stable commit point at a particular
sequence.  Handling this requires two changes:

1) We cannot transition to FULL_WAIT until we receive
a commit_start on a seq >= journaled_seq.
2) We cannot remove the journal completion plug until get
a committed_thru on a seq >= header.start_seq at least as
new as the oldest committed item in the journal.  If on
replay, the journal does not include fs_op_seq, we ignore
it, which is fine since we won't have reported those
entries committed!

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournal: pass the sequence number to commit_start
Samuel Just [Tue, 9 Apr 2013 21:18:51 +0000 (14:18 -0700)]
Journal: pass the sequence number to commit_start

A subsequent patch will need to see the committing seq.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agomds: fix session_info_t decoding 213/head
Yan, Zheng [Fri, 5 Apr 2013 05:58:36 +0000 (13:58 +0800)]
mds: fix session_info_t decoding

commit 0bcf2ac081 changes session_info_t's format, but there is
a typo in the code that decodes old format. We also need to
handle struct_v == 1, which had the same encoding but without
the size guards (which is all handled by DECODE_START_LEGACY_COMPAT).

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoLibrbdWriteback: removed unused and undefined method
Josh Durgin [Wed, 10 Apr 2013 19:22:02 +0000 (12:22 -0700)]
LibrbdWriteback: removed unused and undefined method

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoLibrbdWriteback: use a tid_t for tids
Josh Durgin [Wed, 10 Apr 2013 19:06:36 +0000 (12:06 -0700)]
LibrbdWriteback: use a tid_t for tids

An int could be much smaller, leading to overflow and bad behavior.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoWritebackHandler: make read return nothing
Josh Durgin [Wed, 10 Apr 2013 19:03:04 +0000 (12:03 -0700)]
WritebackHandler: make read return nothing

The tid returned by reads is ignored, and would make tracking writes
internally more difficult by using the same id-space as them. Make read
void and update all implementations.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoObjectCacher: deduplicate final part of flush_set()
Josh Durgin [Mon, 1 Apr 2013 21:51:46 +0000 (14:51 -0700)]
ObjectCacher: deduplicate final part of flush_set()

Both versions of flush_set() did the same thing. Move it into a
helper called from both.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agotest_stress_watch: remove bogus asserts
Josh Durgin [Wed, 10 Apr 2013 18:35:46 +0000 (11:35 -0700)]
test_stress_watch: remove bogus asserts

There's no reason to check the duration of a watch. The notify will
timeout after 30s on the OSD, but there's no guarantee the client will
see that in any bounded time. This test is really meant as a stress
test of the OSDs anyway, not of the clients, so just remove asserts
about operation duration.

Fixes: #4591
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
12 years agotest: update rbd formatted-output for progress changes
Josh Durgin [Wed, 10 Apr 2013 17:43:13 +0000 (10:43 -0700)]
test: update rbd formatted-output for progress changes

Progress output now goes to stderr instead of stdout.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge branch 'wip-journaler-4618' into next
Greg Farnum [Tue, 9 Apr 2013 23:00:41 +0000 (16:00 -0700)]
Merge branch 'wip-journaler-4618' into next

Reviewed-by: Sam Lang <sam.lang@inktank.com>
12 years agoconfig: fix osd_client_message_cap comment
Greg Farnum [Tue, 9 Apr 2013 19:11:27 +0000 (12:11 -0700)]
config: fix osd_client_message_cap comment

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoMerge remote-tracking branch 'origin/wip-osd-throttle2' into next
Greg Farnum [Tue, 9 Apr 2013 19:11:15 +0000 (12:11 -0700)]
Merge remote-tracking branch 'origin/wip-osd-throttle2' into next

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoFileJournal: clarify meaning of start_seq and fix initialization
Samuel Just [Tue, 9 Apr 2013 17:27:50 +0000 (10:27 -0700)]
FileJournal: clarify meaning of start_seq and fix initialization

Second guessing the first sequence number from the FileStore
was silly and broke tests which had the temerity to start at
1 instead of 2...

Fixes: #4687
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoRevert "global: call config observers on global_init (and start logging!)"
Greg Farnum [Tue, 9 Apr 2013 01:20:53 +0000 (18:20 -0700)]
Revert "global: call config observers on global_init (and start logging!)"

This reverts commit a30917746614275baeb718e902133f06ef44fba6. This commit
includes calls that involve Mutexes, Lockers, and lockdep -- which isn't
yet set up, so things break horribly. A more subtle approach is required.

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agomon: Use _daemon version of argparse functions
Dan Mick [Mon, 8 Apr 2013 20:52:32 +0000 (13:52 -0700)]
mon: Use _daemon version of argparse functions

Allow argparse functions to fail if no argument given by using
special versions that avoid the default CLI behavior of "cerr/exit"

Fixes: #4678
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoceph_argparse: add _daemon versions of argparse calls
Dan Mick [Mon, 8 Apr 2013 20:49:22 +0000 (13:49 -0700)]
ceph_argparse: add _daemon versions of argparse calls

mon needs to call argparse for a couple of -- options, and the
argparse_witharg routines were attempting to cerr/exit on missing
arguments.  This is appropriate for the CLI usage, but not the daemon
usage.  Add a 'cli' flag that can be set false for the daemon usage
(and cause the parsing routine to return false instead of exit).

The daemon's parsing code due for a rewrite soon.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoPipe: call discard_requeued_up_to under pipe_lock
Samuel Just [Mon, 8 Apr 2013 22:43:53 +0000 (15:43 -0700)]
Pipe: call discard_requeued_up_to under pipe_lock

Fixes: #4627
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoMerge pull request #202 from ceph/wip-log-boot
Gregory Farnum [Mon, 8 Apr 2013 22:53:30 +0000 (15:53 -0700)]
Merge pull request #202 from ceph/wip-log-boot

Fixes #4676.

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agojournaler: remove the unused prefetch_from member variable
Greg Farnum [Mon, 8 Apr 2013 21:09:23 +0000 (14:09 -0700)]
journaler: remove the unused prefetch_from member variable

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoMerge pull request #206 from ceph/wip-4660
Gregory Farnum [Mon, 8 Apr 2013 18:18:53 +0000 (11:18 -0700)]
Merge pull request #206 from ceph/wip-4660

mds: Keep LogSegment ref for openc backtrace

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomds: Keep LogSegment ref for openc backtrace 206/head
Sam Lang [Mon, 8 Apr 2013 14:09:41 +0000 (09:09 -0500)]
mds: Keep LogSegment ref for openc backtrace

The MDRequest is destroyed once the client reply is sent, but
we need the reference to the LogSegment for updating the backtrace, so
store a temporary ref to the LogSegment for later.

Fixes #4660.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agomds: fix journaler to set temp_fetch_len appropriately and read the requested amount
Greg Farnum [Mon, 8 Apr 2013 16:10:35 +0000 (09:10 -0700)]
mds: fix journaler to set temp_fetch_len appropriately and read the requested amount

The _prefetch() function which intereprets temp_fetch_len interprets
it as the amount of data we need from read_pos, which is the beginning
of read_buf. So by setting it to the amount *more* we needed, we were
getting stuck forever if we actually hit this condition. Fix it by
setting temp_fetch_len based on the amount of data we need in aggregate.

Furthermore, we were previously rounding *down* the requested amount in
order to read only full log segments. Round up instead!

Fixes #4618

Signed-off-by: Greg Farnum <greg@inktank.com>
12 years agoglobal: call config observers on global_init (and start logging!) 202/head
Sage Weil [Sun, 7 Apr 2013 16:06:23 +0000 (09:06 -0700)]
global: call config observers on global_init (and start logging!)

Currently we don't start logging on daemon startup unless the log_file
parameter was adjusted by ceph.conf.  Instead, we should call all config
observers so that the logging subsystem is fully configured and we log
even prior to the daemonize and common_init_finish (when we call observers
again).  This fixes logging for the initial period before we daemonize.
For some of the daemons (osd, mon), this includes significant work.  It
also fixes the problem where users don't see the 'ceph version ...' banner
on daemon start.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoglobal: flush log before stopping/starting on daemonize
Sage Weil [Sun, 7 Apr 2013 16:04:37 +0000 (09:04 -0700)]
global: flush log before stopping/starting on daemonize

Ensure that we push log data out before we restart logging.  This may not
be strictly necessary, but it avoids a whole class of possible pitfalls.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: make 'osd crush move ...' idempotent
Sage Weil [Sat, 6 Apr 2013 20:54:10 +0000 (13:54 -0700)]
mon: make 'osd crush move ...' idempotent

If we don't need to move the item, return success.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: fix DiffIterateStress again
Sage Weil [Sat, 6 Apr 2013 16:37:52 +0000 (09:37 -0700)]
librbd: fix DiffIterateStress again

- fix seed
- the array indices are points in time; no need to subtract one from i!
- pick a random seed and print it to stdout

I ran this with several different seeds without failure, so I am confident
we are in good shape.  And if we ever get a future failure, we'll have the
seed to reproduce.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: throttle client messages by count, not just by bytes 201/head
Sage Weil [Thu, 4 Apr 2013 04:59:16 +0000 (21:59 -0700)]
osd: throttle client messages by count, not just by bytes

This lets us put a cap on outstanding client IOs.  This is particularly
important for clients issuing lots of small IOs.

Fixes: #4579
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsgr: add second per-message throttler to message policy
Sage Weil [Thu, 4 Apr 2013 04:30:51 +0000 (21:30 -0700)]
msgr: add second per-message throttler to message policy

We already have a throttler that lets of limit the amount of memory
consumed by messages from a given source.  Currently this is based only
on the size of the message payload.  Add a second throttler that limits
the number of messages so that we can effectively throttle small requests
as well.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: fix DiffIterateStress test
Sage Weil [Sat, 6 Apr 2013 05:28:38 +0000 (22:28 -0700)]
librbd: fix DiffIterateStress test

If we write to an interval that didn't previously exist and then discard
it so that it again doesn't exist, all during the same interval, then we
should not include it in the 'written' set (or exists set, obviously).

Similarly, when we got to look at a merged diff, we can ignore extents
that were written (and possibly zeroed) if they neither existed before nor
after.

Bump up the iteration count to get more confidence that this is actually
correct.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw: translate object marker to raw format
Yehuda Sadeh [Sun, 31 Mar 2013 07:02:15 +0000 (00:02 -0700)]
rgw: translate object marker to raw format

Fixes: #4600
Object marker should be treated as an object, so that name is formatted
correctly when getting the raw oid.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoAllow creation of buckets starting with underscore in RGW
caleb miles [Fri, 5 Apr 2013 16:31:56 +0000 (09:31 -0700)]
Allow creation of buckets starting with underscore in RGW

Signed-off-by caleb miles <caleb.miles@inktank.com>

12 years agoMerge pull request #198 from dalgaaf/wip-da-spec
Gary Lowell [Fri, 5 Apr 2013 16:46:27 +0000 (09:46 -0700)]
Merge pull request #198 from dalgaaf/wip-da-spec

Fix some install and rpm SPEC issues

Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 5 Apr 2013 05:22:43 +0000 (22:22 -0700)]
Merge remote-tracking branch 'gh/next'

12 years agoFileJournal: introduce start_seq header entry
Samuel Just [Wed, 3 Apr 2013 22:44:39 +0000 (15:44 -0700)]
FileJournal: introduce start_seq header entry

FileStore::header_t::start_seq now encodes the op seq which may be
written at FileStore::header_t::start.  This way, FileStore::open()
can pass a valid sequence number to read_entry for validation.
Otherwise, read_entry has no way of knowing whether a failure of a
read at header.start indicates that the journal was empty, or that
the entry is corrupt.  With start_seq, read_entry can assume
corruption if start_seq <= committed_up_to.

Fixes: #4527
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoFileJournal: fill in committed_up_to for old headers
Samuel Just [Wed, 3 Apr 2013 22:19:35 +0000 (15:19 -0700)]
FileJournal: fill in committed_up_to for old headers

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agodebian/ceph-test.install: add installed but not packaged files 198/head
Danny Al-Gaaf [Thu, 4 Apr 2013 16:38:11 +0000 (18:38 +0200)]
debian/ceph-test.install: add installed but not packaged files

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: add installed but not packaged files to ceph-test
Danny Al-Gaaf [Thu, 4 Apr 2013 16:30:40 +0000 (18:30 +0200)]
ceph.spec.in: add installed but not packaged files to ceph-test

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: remove some twice created directories
Danny Al-Gaaf [Thu, 4 Apr 2013 16:27:13 +0000 (18:27 +0200)]
ceph.spec.in: remove some twice created directories

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: fix udev rules.d files handling
Danny Al-Gaaf [Thu, 4 Apr 2013 16:23:40 +0000 (18:23 +0200)]
ceph.spec.in: fix udev rules.d files handling

Move 50-rbd.rules into the ceph base package since the related
ceph-rbdnamer binary is part of this package. Use correct install
pattern.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: use macros for standard directories
Danny Al-Gaaf [Thu, 4 Apr 2013 16:21:31 +0000 (18:21 +0200)]
ceph.spec.in: use macros for standard directories

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: reorder and fix ceph file list
Danny Al-Gaaf [Thu, 4 Apr 2013 16:18:30 +0000 (18:18 +0200)]
ceph.spec.in: reorder and fix ceph file list

Reorder file list of ceph package. Fix handling of placeholder
directories, make use of directories marcros like %{_localstatedir}
for /var.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge pull request #176 from dachary/wip-4597
Sage Weil [Thu, 4 Apr 2013 15:52:56 +0000 (08:52 -0700)]
Merge pull request #176 from dachary/wip-4597

fix nspace assignment in LFNIndex::lfn_parse_object_name

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodebian/ceph.install: no need to move files to usr/sbin/
Danny Al-Gaaf [Thu, 4 Apr 2013 14:00:18 +0000 (16:00 +0200)]
debian/ceph.install: no need to move files to usr/sbin/

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: don't move ceph-disk* and ceph-create-keys around
Danny Al-Gaaf [Thu, 4 Apr 2013 13:58:12 +0000 (15:58 +0200)]
ceph.spec.in: don't move ceph-disk* and ceph-create-keys around

Don't move these files around, they get installed now directly to
%{_sbindir}.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMakefile.am: install ceph-* python scripts to /usr/bin directly
Danny Al-Gaaf [Thu, 4 Apr 2013 13:54:31 +0000 (15:54 +0200)]
Makefile.am: install ceph-* python scripts to /usr/bin directly

Install ceph-* scripts directly to $(prefix)$(sbindir) (which
normaly would be /usr/sbin) instead of moving it around after
installation in SPEC file or debian files.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: use %{_sbindir} instead of /usr/sbin
Danny Al-Gaaf [Thu, 4 Apr 2013 10:52:55 +0000 (12:52 +0200)]
ceph.spec.in: use %{_sbindir} instead of /usr/sbin

Use %{_sbindir} macro which points to /usr/sbin instead of
hard coded path.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge pull request #196 from ceph/wip-mon-crush2
Sage Weil [Wed, 3 Apr 2013 22:46:11 +0000 (15:46 -0700)]
Merge pull request #196 from ceph/wip-mon-crush2

Wip mon crush2

Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agomon: fix crush unit tests for idempotency 196/head
Sage Weil [Wed, 3 Apr 2013 22:45:34 +0000 (15:45 -0700)]
mon: fix crush unit tests for idempotency

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #197 from ceph/wip-3266
Gregory Farnum [Wed, 3 Apr 2013 22:42:22 +0000 (15:42 -0700)]
Merge pull request #197 from ceph/wip-3266

mds: verify mds tell 'dumpcache <filename>' target does not exist

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomds: verify mds tell 'dumpcache <filename>' target does not exist 197/head
Sage Weil [Wed, 3 Apr 2013 22:32:51 +0000 (15:32 -0700)]
mds: verify mds tell 'dumpcache <filename>' target does not exist

Open target with O_CREAT|O_EXCL to ensure we don't overwrite some other
important file (like, say, /etc/passwd).  This is irritating because there
is not c++ ofstream equivalent for O_EXCL; kludge around it using
ostringstream instead.

Fixes: #3266
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: make 'osd crush unlink ..' idempotent
Sage Weil [Wed, 3 Apr 2013 22:04:00 +0000 (15:04 -0700)]
mon: make 'osd crush unlink ..' idempotent

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #191 from ceph/wip-4582b
Gregory Farnum [Wed, 3 Apr 2013 21:32:48 +0000 (14:32 -0700)]
Merge pull request #191 from ceph/wip-4582b

Fixes #4582.

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agomds: do not go through handle_mds_failure for oneself
Greg Farnum [Wed, 3 Apr 2013 19:43:49 +0000 (12:43 -0700)]
mds: do not go through handle_mds_failure for oneself

A standby MDS can attempt the handle_mds_failure paths for itself, if
it sees the transition from up to down. This leads it to insert itself
into the resolve_gather set, which is bad. So check if the failed MDS
is the same as whoami, and abort if so. This fixes #4637.

Signed-off-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
12 years agoMerge pull request #194 from ceph/wip-rbd-diff
Josh Durgin [Wed, 3 Apr 2013 19:16:50 +0000 (12:16 -0700)]
Merge pull request #194 from ceph/wip-rbd-diff

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge pull request #195 from dalgaaf/wip-da-fix-make
Sage Weil [Wed, 3 Apr 2013 18:34:05 +0000 (11:34 -0700)]
Merge pull request #195 from dalgaaf/wip-da-fix-make

Makefile.am: fix build of ceph_test_cors

Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMakefile.am: fix build of ceph_test_cors 195/head
Danny Al-Gaaf [Wed, 3 Apr 2013 18:04:02 +0000 (20:04 +0200)]
Makefile.am: fix build of ceph_test_cors

Fix build of ceph_test_cors: use $(CRYPTO_LIBS) instead of -lcryptopp.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMerge pull request #192 from ceph/wip-mon-disk-warn
João Eduardo Luís [Wed, 3 Apr 2013 17:59:00 +0000 (10:59 -0700)]
Merge pull request #192 from ceph/wip-mon-disk-warn

mon: limit warnings about low mon disk space

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge pull request #193 from ceph/wip-mon-crush
Dan Mick [Wed, 3 Apr 2013 17:56:43 +0000 (10:56 -0700)]
Merge pull request #193 from ceph/wip-mon-crush

mon: make 'osd crush rm|unlink ...' idempotent

Reviewed-by: Dan Mick dan.mick@inktank.com
12 years agoMerge remote-tracking branch 'gh/wip-4623'
Sage Weil [Wed, 3 Apr 2013 17:26:30 +0000 (10:26 -0700)]
Merge remote-tracking branch 'gh/wip-4623'

12 years agoclient: Kick waiters for max size 191/head
Sam Lang [Mon, 1 Apr 2013 14:06:59 +0000 (09:06 -0500)]
client: Kick waiters for max size

If the mds restarts without successfully logging a max size
cap update, the client waits indefinitely in Client::get_caps
on the waitfor_caps list.  So when the client gets an mds map
indicating a new active mds has replaced a down mds, we need to
kick the caps update request.  This patch mimics the behavior
in the kernel by setting the wanted_max_size
and requested_max_size to 0 and wakes up the waiters.

Fixes #4582.
Signed-off-by: Sam Lang <sam.lang@inktank.com>