]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agolibrbd: fix delete[]
Sage Weil [Sat, 25 Aug 2012 02:36:44 +0000 (19:36 -0700)]
librbd: fix delete[]

CID 716902: Non-array delete for scalars (DELETE_ARRAY)
At (15): Deleting array variable "buf" with non-array delete in "delete buf".

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: clarify rbd man page (esp. layering)
Josh Durgin [Thu, 30 Aug 2012 00:30:17 +0000 (17:30 -0700)]
doc: clarify rbd man page (esp. layering)

* a clone's size can't be overridden
* note which commands require format 2
* clarify details of copy
* add examples for cloning
* add pool to map example for consistency
* fix a couple warnings and re-sync man page with rst

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: add --format option
Josh Durgin [Wed, 29 Aug 2012 17:58:30 +0000 (10:58 -0700)]
rbd: add --format option

This chooses whether to use the original (supported by krbd)
or the new (supports layering) format.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agolibrbd: prevent racing clone and snap unprotect
Josh Durgin [Wed, 29 Aug 2012 00:24:47 +0000 (17:24 -0700)]
librbd: prevent racing clone and snap unprotect

If the following sequence of events occured,
a clone could be created of an unprotected snapshot:

1. A: begin clone - check that snap foo is protected
2. B: rbd unprotect snap foo
3. B: check that all pools have no clones of foo
4. B: unprotect snap foo
5. A: finish creating clone of foo, add it as a child

To stop this from happening, check at the beginning and end of
cloning that the parent snapshot is protected. If it is not,
or checking protection status fails (possibly because the parent
snapshot was removed), remove the clone and return an error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agorbd: add "children" command, update cli test files
Dan Mick [Tue, 21 Aug 2012 23:07:25 +0000 (16:07 -0700)]
rbd: add "children" command, update cli test files

Fixes: #2720
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agolibrbd: add {rbd_}list_children() methods
Dan Mick [Tue, 21 Aug 2012 22:58:21 +0000 (15:58 -0700)]
librbd: add {rbd_}list_children() methods

These iterate over all pools and check for children of a
particular snapshot.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agoReplicatedPG: do not start_recovery_op if we are already pushing
Samuel Just [Tue, 11 Sep 2012 18:05:40 +0000 (11:05 -0700)]
ReplicatedPG: do not start_recovery_op if we are already pushing

Should fix bug #2761.

If we are already pushing soid, recovery_ops will only be decremented once for
all current pushes, so only increment recovery_ops if we are not currently
pushing it.

This bug causes us to leak a recovery op and get stuck in backfill.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoosd: fill in user log entry last after snapdir tran
Sage Weil [Tue, 4 Sep 2012 22:25:20 +0000 (15:25 -0700)]
osd: fill in user log entry last after snapdir tran

Reorder the snapdir logic and ctx->at_version adjustments prior to filling
in the object_info_t and user_versions and all that stuff.  Adjust
at_version after appending the log entry (so that it points to the next
position/version we will write at.. culminating in the actual user
event).

The user log entry contains the request id, which will be used
by replay ops to put themselves in the correct place in the
waiting_for_commit/ack maps.  Thus, the repop needs to be tagged
with the same version as the log entry with the request id.
Thus, the request id bearing log entry should be the last in
the log entry vector.

This should fix #3072, wherein a replay which should wait on
the repop tagged as version '36 will instead wait on '35.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: fix waiting_for_disk assertion
Sage Weil [Tue, 28 Aug 2012 22:14:41 +0000 (15:14 -0700)]
osd: fix waiting_for_disk assertion

If requeue is false, we won't have cleared out waiting_for_ondisk; adjust
assert placement as appropriate.  Also, make sur we handle the requeue
and !op case properly (although I'm not sure offhand if/when it would
come up).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorados_bench: wait for completion callbacks before returning
Mike Ryan [Tue, 28 Aug 2012 18:57:03 +0000 (11:57 -0700)]
rados_bench: wait for completion callbacks before returning

If we don't wait for the callback, the finisher may cleanup the callback
context before the callback is actually invoked, causing a
use-after-free error.

This fixes #3048.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoMerge branch 'wip-objecter' into next
Sage Weil [Tue, 28 Aug 2012 00:26:13 +0000 (17:26 -0700)]
Merge branch 'wip-objecter' into next

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoobjecter: fix skipped map handling
Sage Weil [Mon, 27 Aug 2012 15:24:08 +0000 (08:24 -0700)]
objecter: fix skipped map handling

If we skip a map, we want to translate NO_ACTION to NEED_RESEND, but leave
POOL_DNE alone.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoobjecter: send queued requests when we get first osdmap
Sage Weil [Mon, 27 Aug 2012 14:38:34 +0000 (07:38 -0700)]
objecter: send queued requests when we get first osdmap

If we get our first osdmap and already have requests queued, send them.

Fixes: #3050
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoobjecter: fix is_latest_map() retry on mon session restart
Sage Weil [Mon, 27 Aug 2012 04:21:44 +0000 (21:21 -0700)]
objecter: fix is_latest_map() retry on mon session restart

If the mon session drops, we get an EAGAIN callback, which we already
correctly ignored.  (Clean this up and comment so it's clearer what is
going on.)

Fix ms_handle_connect() to resubmit those requests.

Noticed while fixing #3049.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomonclient: pass EAGAIN to is_latest_map() callers
Sage Weil [Mon, 27 Aug 2012 04:17:05 +0000 (21:17 -0700)]
monclient: pass EAGAIN to is_latest_map() callers

If our map get_version check needs to be retried, tell the
is_latest_map() callers instead of giving returning 0 ("no").

Fixes: #3049
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomonclient: document get_version(), and fix return value
Sage Weil [Tue, 28 Aug 2012 00:25:54 +0000 (17:25 -0700)]
monclient: document get_version(), and fix return value

Return -EAGAIN instead of -1, since that's more meaningful, and
document it.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: requeue dup ops inline with in-progress ops
Sage Weil [Mon, 27 Aug 2012 21:31:32 +0000 (14:31 -0700)]
osd: requeue dup ops inline with in-progress ops

We should requeue the dups along with the originals.  This avoids
situations where, after requeue, the dups are reordered with respect to
each other.  For example:

 - client sends A, B, C
 - osd receives A
 - connection drops
 - client sends A', B', C'
 - osd puts A' in waiting_for_ondisk, starts B' and C'
 - on_change() requeues everything

Final queue order (before this patch) is
    A, B', C', A'

After this patch, the resulting queue order is
    A, A', B', C'

Or somewhat more generally, it might be:

    A, A', B, B', B'', C', C'', D'', ....

Fixes (another source of): #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: turn off lockdep during shutdown signal handler
Sage Weil [Sun, 26 Aug 2012 15:42:06 +0000 (08:42 -0700)]
osd: turn off lockdep during shutdown signal handler

We don't shut down all threads, and the surviving ones fight with
exit()'s teardown.  Kludge until we have a clean shutdown process.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge tag 'v0.51'
Sage Weil [Sun, 26 Aug 2012 15:18:45 +0000 (08:18 -0700)]
Merge tag 'v0.51'

v0.51

12 years agov0.51 v0.51
Sage Weil [Sat, 25 Aug 2012 22:58:39 +0000 (15:58 -0700)]
v0.51

12 years agomon: require --id
Sage Weil [Mon, 20 Aug 2012 20:12:26 +0000 (13:12 -0700)]
mon: require --id

Fixes: #2997
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agointerval_set: predeclare const_iterator
Sage Weil [Fri, 24 Aug 2012 21:55:12 +0000 (14:55 -0700)]
interval_set: predeclare const_iterator

This makes the coverity build happier.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
12 years agoMakefile: update coverity rules
Sage Weil [Fri, 24 Aug 2012 21:54:51 +0000 (14:54 -0700)]
Makefile: update coverity rules

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
12 years agolibrbd-dev.install: package new rbd/features.h header file.
Gary Lowell [Fri, 24 Aug 2012 22:16:05 +0000 (15:16 -0700)]
librbd-dev.install: package new rbd/features.h header file.

12 years agoMerge branch 'next'
Sage Weil [Fri, 24 Aug 2012 21:38:58 +0000 (14:38 -0700)]
Merge branch 'next'

12 years agomon: describe how pgs are stuck in 'health detail'
Sage Weil [Fri, 24 Aug 2012 21:43:56 +0000 (14:43 -0700)]
mon: describe how pgs are stuck in 'health detail'

Showing the current state and saying it is stuck doesn't tell you how it
is stuck (e.g. stuck unclean, stuck inactive, etc.).  Also include the
stuck duration.

Fixes: #2876
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix use-after-free in handle_notify_timeout
Sage Weil [Fri, 24 Aug 2012 18:16:01 +0000 (11:16 -0700)]
osd: fix use-after-free in handle_notify_timeout

Valgrind turned this up.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoceph.spec.in: package new rados library.
Gary Lowell [Fri, 24 Aug 2012 04:35:21 +0000 (21:35 -0700)]
ceph.spec.in: package new rados library.

12 years agoMerge remote-tracking branch 'gh/wip-mon-report'
Sage Weil [Thu, 23 Aug 2012 23:11:58 +0000 (16:11 -0700)]
Merge remote-tracking branch 'gh/wip-mon-report'

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip_rados_bench_really_final'
Sage Weil [Thu, 23 Aug 2012 23:07:32 +0000 (16:07 -0700)]
Merge remote-tracking branch 'gh/wip_rados_bench_really_final'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoobj_bencher: use async remove during slow remove-by-prefix
Mike Ryan [Thu, 21 Jun 2012 18:03:15 +0000 (11:03 -0700)]
obj_bencher: use async remove during slow remove-by-prefix

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: remove all benchmark files matching a prefix
Mike Ryan [Tue, 24 Jul 2012 03:45:31 +0000 (20:45 -0700)]
obj_bencher: remove all benchmark files matching a prefix

This is a fallback for when a user wishes to delete ALL benchmark files
matching a particular prefix. In the fast case, a metadata file tells us
enough to quickly delete the files in parallel. This is the slow case,
where each file's name must be checked against the prefix.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: cleanup files in parallel using aio
Mike Ryan [Thu, 23 Aug 2012 18:52:51 +0000 (11:52 -0700)]
obj_bencher: cleanup files in parallel using aio

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: remove benchmark objects by prefix
Mike Ryan [Thu, 21 Jun 2012 17:08:53 +0000 (10:08 -0700)]
obj_bencher: remove benchmark objects by prefix

This intelligently removes objects from a rados or rest benchmark run by
using parameters from the metadata file.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: store per-benchmark metadata
Mike Ryan [Wed, 20 Jun 2012 21:50:04 +0000 (14:50 -0700)]
obj_bencher: store per-benchmark metadata

Store metadata for each benchmark run so that the objects can be
efficiently removed at a later point.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: clean up objects after a write benchmark
Mike Ryan [Wed, 20 Jun 2012 21:47:46 +0000 (14:47 -0700)]
obj_bencher: clean up objects after a write benchmark

Per #2477, objects created during rados or rest write benchmark are
automatically cleaned up after the test. They can optionally be left in
place.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoobj_bencher: announce prefix during write benchmark
Mike Ryan [Tue, 19 Jun 2012 20:54:40 +0000 (13:54 -0700)]
obj_bencher: announce prefix during write benchmark

Per #2477 this can be used during a post-benchmark cleanup in rest and
rados bench.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoDon't package crush header files.
Gary Lowell [Thu, 23 Aug 2012 18:48:50 +0000 (11:48 -0700)]
Don't package crush header files.

12 years agoceph.spec.in: package new rbd header and rados library.
Gary Lowell [Thu, 23 Aug 2012 20:40:18 +0000 (13:40 -0700)]
ceph.spec.in:  package new rbd header and rados library.

12 years agoMerge branch 'wip-msgr'
Sage Weil [Thu, 23 Aug 2012 20:29:10 +0000 (13:29 -0700)]
Merge branch 'wip-msgr'

12 years agomsg/Pipe: conditionally detect session reset
Sage Weil [Thu, 23 Aug 2012 20:26:32 +0000 (13:26 -0700)]
msg/Pipe: conditionally detect session reset

Lossless peers (osd<->osd, mds<->mds, mon<->mon) never reset sessions
to each other.  In the osd and mds cases, there is no need to check for
session resets.  More significantly, these checks can trigger with an
unfortunately sequence of socket failures.  In particular,

 - A sends connect request to B
 - B accepts, increments connect_seq, then has a socket failure
   before telling A
 - A reconnects, stil with connect_seq == 0
 - B sees connect_seq == 0 and thinks there was a reset

This warrants a closer look in the fs client <-> mds case, but for now,
in the cluster-internal communications, it is moot, since reset
detection is unnecessary.

In the monitor case: we do need to check with resets because the peers
reuse the same entity_addr_t's (nonce==0), which means that a daemon
restart is effectively a reset.  In that case, use a different policy
that continues to check for resets.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoosd: prefer acting osds in calc_acting()
Sage Weil [Thu, 23 Aug 2012 20:27:26 +0000 (13:27 -0700)]
osd: prefer acting osds in calc_acting()

We currently prefer up osds, and then pull sequentially from peer_info
(strays we know about at the time).  This adds an additional preference
for the current acting, which means we can avoid changes to acting when
they are largely useless.

In particular, I observed that we chose [5,3] and later (when recovery
completed) chose [5,1] because we had since heard about an eligible stray
on 1.  That switch was basically a waste...

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agolibrados: implement aio_remove
Mike Ryan [Tue, 19 Jun 2012 23:56:40 +0000 (16:56 -0700)]
librados: implement aio_remove

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agorbd: force all exiting paths through main()/return
Dan Mick [Mon, 20 Aug 2012 22:02:57 +0000 (15:02 -0700)]
rbd: force all exiting paths through main()/return
This properly destroys objects.  In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
12 years agoMerge branch 'wip-mon-mkfs'
Sage Weil [Thu, 23 Aug 2012 19:59:28 +0000 (12:59 -0700)]
Merge branch 'wip-mon-mkfs'

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: name cluster uuid file 'cluster_uuid'
Sage Weil [Thu, 23 Aug 2012 19:46:40 +0000 (12:46 -0700)]
mon: name cluster uuid file 'cluster_uuid'

Begin the transition.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoobjecter: use ordered map<> for tracking tids to preserve order on resend
Sage Weil [Wed, 22 Aug 2012 04:12:33 +0000 (21:12 -0700)]
objecter: use ordered map<> for tracking tids to preserve order on resend

We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoDon't package crush header files.
Gary Lowell [Thu, 23 Aug 2012 18:48:50 +0000 (11:48 -0700)]
Don't package crush header files.

12 years agomon: create cluster_fsid on startup if not present
Sage Weil [Mon, 20 Aug 2012 17:56:37 +0000 (10:56 -0700)]
mon: create cluster_fsid on startup if not present

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: create, verify cluster_fsid file in mon_data dir on mkfs
Sage Weil [Mon, 20 Aug 2012 17:56:14 +0000 (10:56 -0700)]
mon: create, verify cluster_fsid file in mon_data dir on mkfs

Having this present is convenient for external tools.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Thu, 23 Aug 2012 03:23:02 +0000 (20:23 -0700)]
Merge remote-tracking branch 'gh/next'

12 years agocephfs: add 'map' command to dump file mapping onto objects, osds
Sage Weil [Tue, 21 Aug 2012 16:18:53 +0000 (09:18 -0700)]
cephfs: add 'map' command to dump file mapping onto objects, osds

Closes: #3010
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoperf-watch: initial version
Sage Weil [Thu, 23 Aug 2012 00:22:05 +0000 (17:22 -0700)]
perf-watch: initial version

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoobjecter: use ordered map<> for tracking tids to preserve order on resend
Sage Weil [Wed, 22 Aug 2012 04:12:33 +0000 (21:12 -0700)]
objecter: use ordered map<> for tracking tids to preserve order on resend

We are using a hash_map<> to map tids to Op*'s.  In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order.  These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.

Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.

Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map().  This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests.  A simpler but slower fix would be to just use map<> instead.

This is one of many bugs contributing to #2947.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agodoc: Either use a backslash and a newline, or neither.
Tommi Virtanen [Wed, 22 Aug 2012 17:50:22 +0000 (10:50 -0700)]
doc: Either use a backslash and a newline, or neither.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-crypto'
Sage Weil [Tue, 21 Aug 2012 22:47:57 +0000 (15:47 -0700)]
Merge remote-tracking branch 'gh/wip-crypto'

12 years agomon: implement 'ceph report <tag ...>' command
Sage Weil [Tue, 21 Aug 2012 21:22:20 +0000 (14:22 -0700)]
mon: implement 'ceph report <tag ...>' command

Generate a simple "signed" report of the current cluster status.  Include
a simple crc so that the report is vaguely verifiable.

This is part of #2829.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: remove dead osd options
Sage Weil [Tue, 21 Aug 2012 20:24:55 +0000 (13:24 -0700)]
config: remove dead osd options

The read balancing/shedding stuff is old.  Same goes for class timeouts and
the raid options.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoFix compilation warnings on squeeze; can't printf() snapid_t directly
Dan Mick [Tue, 21 Aug 2012 18:32:45 +0000 (11:32 -0700)]
Fix compilation warnings on squeeze; can't printf() snapid_t directly

12 years agorgw: use sizeof() for snprintf
Sage Weil [Tue, 21 Aug 2012 18:01:11 +0000 (11:01 -0700)]
rgw: use sizeof() for snprintf

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Tue, 21 Aug 2012 17:51:54 +0000 (10:51 -0700)]
Merge branch 'next'

12 years agoosd: fix requeue order for waiting_for_ondisk
Sage Weil [Tue, 21 Aug 2012 17:35:37 +0000 (10:35 -0700)]
osd: fix requeue order for waiting_for_ondisk

We are calling requeue_ops() on each individual op, which means we need
to requeue in reverse order (newest first, oldest last).

Fixes: #2947
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: dump content_range using 64 bit formatters
Yehuda Sadeh [Sat, 18 Aug 2012 00:34:23 +0000 (17:34 -0700)]
rgw: dump content_range using 64 bit formatters

Fixes: #2961
Also make sure that size is 64 bit.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoRevert "rgw: dump content_range using 64 bit formatters"
Sage Weil [Tue, 21 Aug 2012 17:48:12 +0000 (10:48 -0700)]
Revert "rgw: dump content_range using 64 bit formatters"

This reverts commit cc435e99802f77b3d4b21abe022665ac9df259cf.

Wrong fix; fcgi doesn't do %lld

12 years agomon: fix monitor cluster contraction race
Sage Weil [Tue, 21 Aug 2012 00:04:58 +0000 (17:04 -0700)]
mon: fix monitor cluster contraction race

If we contract to 1 monitor, we win_standalone_election() without bumping
the election epoch.  Racing paxos updates can then reach us without being
ignored and trigger an assert:

mon/Paxos.cc: In function 'void Paxos::handle_accept(MMonPaxos*)' thread 7f85eae05700 time 2012-08-20 16:01:00.843937
mon/Paxos.cc: 468: FAILED assert(state == STATE_UPDATING)

Fixes: #3003
Reported-by: John Wilkins <john.wilkins@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoAdd manpage sections for flatten, snap {un}protect
Dan Mick [Tue, 21 Aug 2012 01:00:46 +0000 (18:00 -0700)]
Add manpage sections for flatten, snap {un}protect

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: John Wilkins <john.wilkins@inktank.com>
12 years agomkcephfs, init-ceph: Warn if hostname "localhost" is seen in ceph.conf.
Tommi Virtanen [Tue, 21 Aug 2012 00:06:09 +0000 (17:06 -0700)]
mkcephfs, init-ceph: Warn if hostname "localhost" is seen in ceph.conf.

Given a ceph.conf that looks like

  [osd.42]
  host = localhost

mkcephfs used to exit with an obscure error message:

  cat: /tmp/mkcephfs.MCBIHvn4Ru/key.*: No such file or directory

"localhost" was never intended to be a valid hostname to use there.
Warn if we see it, and skip the entry. You should use the proper short
hostname of the box.

As init-ceph and mkcephfs share this library, this change affects the
sysvinit scripts too. The behavior *shouldn't* change there (localhost
entries were ignored earlier, too), but you may see this extra
warning. Which is good.

Closes: #3001
Signed-off-by: Tommi Virtanen <tv@inktank.com>
12 years ago"Removed 274 from xfstests"
tamil [Mon, 20 Aug 2012 23:53:18 +0000 (16:53 -0700)]
"Removed 274 from xfstests"

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
12 years agotest_rbd.py: remove clone before image it depends on
Dan Mick [Mon, 20 Aug 2012 22:59:33 +0000 (15:59 -0700)]
test_rbd.py: remove clone before image it depends on

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agorgw: dump content_range using 64 bit formatters
Yehuda Sadeh [Sat, 18 Aug 2012 00:34:23 +0000 (17:34 -0700)]
rgw: dump content_range using 64 bit formatters

Fixes: #2961
Also make sure that size is 64 bit.

backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'next'
Sage Weil [Mon, 20 Aug 2012 22:04:45 +0000 (15:04 -0700)]
Merge branch 'next'

12 years agoosd: fix requeue order of dup ops
Sage Weil [Mon, 20 Aug 2012 19:33:08 +0000 (12:33 -0700)]
osd: fix requeue order of dup ops

The waiting_for_ondisk (and ack) maps get dups of ops that are in progress.
If we have a peering change in which the role does not change, we will
requeue the in-progress ops but leave these in the waiting_for_ondisk
maps, which will then trigger an assert the next time we examine that map
and find it didn't match up with what we expected.

Fix this by requeuing these on any peering reset in on_change().  This
keeps the two queues in sync.

Fixes: #2956
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: fix warning
Sage Weil [Mon, 20 Aug 2012 20:30:50 +0000 (13:30 -0700)]
osd: fix warning

signed/unsigned comp

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoinit-ceph: use SSH in "service ceph status -a" to get version
Travis Rhoden [Mon, 20 Aug 2012 20:29:11 +0000 (13:29 -0700)]
init-ceph: use SSH in "service ceph status -a" to get version

When running "service ceph status -a", a version number was never
returned for remote hosts, only for the local.  This was because
the command to query the version number didn't use the do_cmd
function, which is responsible for running the command over SSH
when needed.

Modify the ceph init.d script to use do_cmd for querying the
Ceph version.

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
12 years agodoc: mkcephfs man page, -c ceph.conf is not optional
Travis Rhoden [Fri, 17 Aug 2012 20:45:09 +0000 (16:45 -0400)]
doc: mkcephfs man page, -c ceph.conf is not optional

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ANSI_X3.4-1968" character set.  ]
    [ Some characters may be displayed incorrectly. ]

The man page for mkcephfs and the output of mkcephfs --help
do not agree with each other.  the man page says -c ceph.conf
is optional, while mkcephfs --help says it is required.

Through empirical evidence, I believe it is required.  Update
the man page to make it so.

Signed-off-by: Travis Rhoden <trhoden@gmail.com>
12 years agoosd: make notify debug output less noisy
Sage Weil [Mon, 20 Aug 2012 20:23:21 +0000 (13:23 -0700)]
osd: make notify debug output less noisy

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: do not return null dentry lease on getattr
Sage Weil [Fri, 17 Aug 2012 16:02:10 +0000 (09:02 -0700)]
mds: do not return null dentry lease on getattr

Specifically, /foo may exist and client may try to mount /foo/bar.  That
GETATTR request is on #1/foo/bar, but we cannot return a null dentry on bar
because the client is not prepared to handle it and will crash in
fill_trace().

Fixes: #2959
Reported-by: Yan Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: add MonitorStore::sync()
Sage Weil [Mon, 20 Aug 2012 17:56:29 +0000 (10:56 -0700)]
mon: add MonitorStore::sync()

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-rbd-layer'
Sage Weil [Mon, 20 Aug 2012 17:49:31 +0000 (10:49 -0700)]
Merge remote-tracking branch 'gh/wip-rbd-layer'

12 years agocrypto: cache CryptoHandler in CryptoKey
Sage Weil [Mon, 20 Aug 2012 17:19:18 +0000 (10:19 -0700)]
crypto: cache CryptoHandler in CryptoKey

This avoids a call into cct and a switch to get the handler every time.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: fix key export syntax
Sage Weil [Mon, 20 Aug 2012 16:14:33 +0000 (09:14 -0700)]
doc: fix key export syntax

'ceph auth export mon.' no longer works as a side-effect of switching
around the mon. key handling.  'get' works, though; use that for now.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge branch 'wip-rbd-layering'
Dan Mick [Sat, 18 Aug 2012 02:29:13 +0000 (19:29 -0700)]
Merge branch 'wip-rbd-layering'

Conflicts:
src/librbd/internal.cc

12 years agoRoll up loose ends from a marathon merge/rebase session
Dan Mick [Sat, 18 Aug 2012 01:56:37 +0000 (18:56 -0700)]
Roll up loose ends from a marathon merge/rebase session

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoReview:
Dan Mick [Fri, 17 Aug 2012 17:40:05 +0000 (10:40 -0700)]
Review:

standardize on "*_id" form of variable names
log errors in parameter decode in rbd_children methods
whitespace, assert -> comment

12 years agolibrbd: snap_protect: verify layering is supported
Dan Mick [Fri, 17 Aug 2012 23:01:15 +0000 (16:01 -0700)]
librbd: snap_protect: verify layering is supported

12 years agolibrbd: review: don't call to the OSD to get current protection status
Dan Mick [Fri, 17 Aug 2012 22:55:00 +0000 (15:55 -0700)]
librbd: review: don't call to the OSD to get current protection status

12 years agotest_rbd.py: actually make unprotect_with_children work, and clean up
Dan Mick [Fri, 17 Aug 2012 22:44:37 +0000 (15:44 -0700)]
test_rbd.py: actually make unprotect_with_children work, and clean up

12 years agolibrbd: change EINVAL to EBUSY on "can't unprotect because children exist"
Dan Mick [Fri, 17 Aug 2012 20:03:21 +0000 (13:03 -0700)]
librbd: change EINVAL to EBUSY on "can't unprotect because children exist"
Add pool, number of children in this pool that caused failure to log

12 years agoreview: librbd, test_librbd: make "protect protected snap" fail
Dan Mick [Fri, 17 Aug 2012 20:00:48 +0000 (13:00 -0700)]
review: librbd, test_librbd: make "protect protected snap" fail

12 years agoUpdate protection methods to use parent_spec, parent_types.h, etc.
Dan Mick [Sat, 18 Aug 2012 00:58:31 +0000 (17:58 -0700)]
Update protection methods to use parent_spec, parent_types.h, etc.

12 years agotest_rbd: add test for denying removal of protected parent
Dan Mick [Tue, 14 Aug 2012 22:32:30 +0000 (15:32 -0700)]
test_rbd: add test for denying removal of protected parent

12 years agoget_features requires md_lock and snap_lock to be held
Dan Mick [Tue, 14 Aug 2012 22:30:48 +0000 (15:30 -0700)]
get_features requires md_lock and snap_lock to be held

12 years agolibrbd: clone return codes: ENOSYS for no layering, EINVAL for no prot
Dan Mick [Tue, 14 Aug 2012 18:42:54 +0000 (11:42 -0700)]
librbd: clone return codes: ENOSYS for no layering, EINVAL for no prot

12 years agolibrbd, test_librbd: snap_unprotect: refuse if children still exist
Dan Mick [Tue, 14 Aug 2012 18:51:55 +0000 (11:51 -0700)]
librbd, test_librbd: snap_unprotect: refuse if children still exist

12 years agoMerge branch 'wip-rbd-protect' into more-rebasing
Dan Mick [Sat, 18 Aug 2012 01:42:04 +0000 (18:42 -0700)]
Merge branch 'wip-rbd-protect' into more-rebasing

Conflicts:
src/librbd/ImageCtx.cc
src/librbd/SnapInfo.h
src/librbd/internal.cc
src/test/rbd/test_cls_rbd.cc

12 years agotest_cls_rbd: get_parent with no parent: should fail and return null-pspec
Dan Mick [Fri, 17 Aug 2012 20:07:04 +0000 (13:07 -0700)]
test_cls_rbd: get_parent with no parent: should fail and return null-pspec

12 years agolibrbd: cause add_child/remove_child to treat duplicate ops as errors
Dan Mick [Fri, 17 Aug 2012 19:50:00 +0000 (12:50 -0700)]
librbd: cause add_child/remove_child to treat duplicate ops as errors

12 years agolibrbd: review: add helper for 'scanning snapshots for this parent'
Dan Mick [Fri, 17 Aug 2012 22:44:10 +0000 (15:44 -0700)]
librbd: review: add helper for 'scanning snapshots for this parent'

12 years agolibrbd: review: change get_snapinfo to get_parent_spec
Dan Mick [Fri, 17 Aug 2012 19:59:26 +0000 (12:59 -0700)]
librbd: review: change get_snapinfo to get_parent_spec

12 years agolibrbd, cls_rbd: move parent_info and parent_spec to parent_types.h
Dan Mick [Sat, 18 Aug 2012 00:54:27 +0000 (17:54 -0700)]
librbd, cls_rbd: move parent_info and parent_spec to parent_types.h

parent_type.h is a new librbd-scope header containing info
related to parents and children (clones)

Signed-off-by: Dan Mick <dan.mick@inktank.com>