]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agoReplicatedPG: create ObjectContext with SharedPtrRegistry 414/head
Loic Dachary [Tue, 13 Aug 2013 15:28:31 +0000 (17:28 +0200)]
ReplicatedPG: create ObjectContext with SharedPtrRegistry

All new ObjectContext are replaced with calls to
SharedPtrRegistry::lookup_or_create to ensure that they are all
registered. Because the constructor is invoked with no argument, care
is taken to always initialize the destructor_callback data member
immediately afterwards.

ReplicatedPG::get_object_context contains a redundant call to
get_snapset_context that is removed.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: replace object_contexts.find with object_contexts.lookup
Loic Dachary [Tue, 13 Aug 2013 15:02:40 +0000 (17:02 +0200)]
ReplicatedPG: replace object_contexts.find with object_contexts.lookup

The std::map equivalent of find is SharedPtrRegistry::lookup

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: add Context to cleanup the PG after an ObjectContext deletion
Loic Dachary [Tue, 13 Aug 2013 14:52:18 +0000 (16:52 +0200)]
ReplicatedPG: add Context to cleanup the PG after an ObjectContext deletion

ReplicatedPG::C_PG_ObjectContext is added to encapsulate a
call to ReplicatedPG::object_context_destructor_callback method
which is reponsible for

  * manually de-allocating the SnapSetContext of the ObjectContext if
    any. It will eventually be managed by a SharedPtrRegistry.

ReplicatedPG::C_PG_ObjectContext must be added to the destructor_callback
member of ObjectContext immediately after it is created.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: replace map iterators with SharedPtrRegistry::get_next
Loic Dachary [Tue, 13 Aug 2013 14:40:06 +0000 (16:40 +0200)]
ReplicatedPG: replace map iterators with SharedPtrRegistry::get_next

SharedPtrRegistry does not provide an iterator equivalent to

    map<hobject_t, ObjectContext*>::iterator i

It is replaced with a thread safe get_next method roughly used
as follows:

    pair<hobject_t, ObjectContextRef> i;
    while (object_contexts.get_next(i.first, &i))

All occurences of the iterator are replaced with get_next style
traversal.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: remove lookup_object_context method
Loic Dachary [Tue, 13 Aug 2013 14:13:19 +0000 (16:13 +0200)]
ReplicatedPG: remove lookup_object_context method

Both ReplicatedPG::lookup_object_context and
ReplicatedPG::_lookup_object_context methods are provided by
SharedPtrRegistry.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: remove reference counting logic
Loic Dachary [Mon, 12 Aug 2013 16:19:06 +0000 (18:19 +0200)]
ReplicatedPG: remove reference counting logic

ObjectContext manual reference counting and managing the
object_contexts object involves calls to

* obc->ref++ and obc->get()
* put_object_context and put_object_contexts
* register_object_context
* assertions on obc->registered

They are all removed because SharedPtrRegistry provides the
same service.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: ObjectContext * becomes ObjectContextRef
Loic Dachary [Mon, 12 Aug 2013 15:45:44 +0000 (17:45 +0200)]
ReplicatedPG: ObjectContext * becomes ObjectContextRef

The map of hobject_t to ObjectContext is made a
SharedPtrRegistry owned by ReplicatedPG

    -  map<hobject_t, ObjectContext*> object_contexts;
    +  SharedPtrRegistry<hobject_t, ObjectContext> object_contexts;

All ObjectContext pointers are changed into ObjectContextRef, i.e.
shared_ptr.

In Watch.h std::tr1::shared_ptr<ObjectContext> is used instead
of ObjectContextRef because Watch.h is included before it is
defined.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: ObjectContext is made compatible with SharedPtrRegistry
Loic Dachary [Mon, 12 Aug 2013 14:47:42 +0000 (16:47 +0200)]
ReplicatedPG: ObjectContext is made compatible with SharedPtrRegistry

When creating a new object SharedPtrRegistry::lookup_or_create uses
the default ObjectContext constructor with no argument. The existing
ObjectContext constructor is modified to have no argument and the
initialization that was previously done within the constructor is done
by the caller (that only happens three times).

The ObjectContext::get method is removed: its only purpose is to
increment the ref.

The ObjectContext::registered data member is removed as well as all
the associated assert()

The ObjectContext::destructor_callback data member Context is added
and called by the destructor. It will allow the caller to perform
additional cleanup, if necessary.

All ObjectContext * data members are replaced with shared_ptr.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoReplicatedPG: add Mutex to protect snapset_contexts
Loic Dachary [Thu, 15 Aug 2013 18:15:03 +0000 (20:15 +0200)]
ReplicatedPG: add Mutex to protect snapset_contexts

snapset_contexts_locks is added and locked in each function where
snapset_contexts or the SnapSetContext::ref data member needs to be
accessed or modified.

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoPG: remove unused PG::_cond
Loic Dachary [Thu, 15 Aug 2013 17:42:13 +0000 (19:42 +0200)]
PG: remove unused PG::_cond

http://tracker.ceph.com/issues/5510 refs #5510

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agosharedptr_registry: add a variant of get_next() and the empty() method
Loic Dachary [Mon, 12 Aug 2013 12:05:38 +0000 (14:05 +0200)]
sharedptr_registry: add a variant of get_next() and the empty() method

The SharedPtrRegistry::get_next() method with a value of type VPtr
instead of V is added because it is sometime more convenient to not
copy the value when walking the registry. The
SharedPtrRegistry::empty() predicate method is added.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge branch 'next'
Josh Durgin [Wed, 21 Aug 2013 23:29:29 +0000 (16:29 -0700)]
Merge branch 'next'

11 years agoobjecter: fix keys of dump_linger_ops
Josh Durgin [Wed, 21 Aug 2013 22:56:20 +0000 (15:56 -0700)]
objecter: fix keys of dump_linger_ops

The registering flag no longer exists, and registered was using the
wrong property due to a copy-paste error.

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
11 years agoobjecter: resend unfinished lingers when osdmap is no longer paused
Josh Durgin [Wed, 21 Aug 2013 21:28:49 +0000 (14:28 -0700)]
objecter: resend unfinished lingers when osdmap is no longer paused

Plain Ops that haven't finished yet need to be resent if the osdmap
transitions from full or paused to unpaused.  If these Ops are
triggered by LingerOps, they will be cancelled instead (since
should_resend = false), but the LingerOps that triggered them will not
be resent.

Fix this by checking the registered flag for all linger ops, and
resending any of them that aren't paused anymore.

Fixes: #6070
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage.weil@inktank.com>
11 years agorgw: change cache / watch-notify init sequence
Yehuda Sadeh [Mon, 19 Aug 2013 15:40:16 +0000 (08:40 -0700)]
rgw: change cache / watch-notify init sequence

Fixes: #6046
We were initializing the watch-notify (through the cache
init) before reading the zone info which was much too
early, as we didn't have the control pool name yet. Now
simplifying init/cleanup a bit, cache doesn't call watch/notify
init and cleanup directly, but rather states its need
through a virtual callback.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Wed, 21 Aug 2013 18:02:26 +0000 (11:02 -0700)]
Merge branch 'master' of https://github.com/ceph/ceph

11 years agodoc: Clarified quorum requirements.
John Wilkins [Wed, 21 Aug 2013 18:01:48 +0000 (11:01 -0700)]
doc: Clarified quorum requirements.

fixes: #5412

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge pull request #524 from ceph/wip-mon-delta
Sage Weil [Wed, 21 Aug 2013 18:00:45 +0000 (11:00 -0700)]
Merge pull request #524 from ceph/wip-mon-delta

mon: add 'pg dump delta' to get just the rate info

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agodoc: Fixed typo.
John Wilkins [Wed, 21 Aug 2013 17:56:23 +0000 (10:56 -0700)]
doc: Fixed typo.

fixes: #5968

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge pull request #523 from dachary/master
Sage Weil [Wed, 21 Aug 2013 17:36:54 +0000 (10:36 -0700)]
Merge pull request #523 from dachary/master

doc: fix erasure code formatting warnings and errors

11 years agodoc: fix erasure code formatting warnings and errors 523/head
Loic Dachary [Wed, 21 Aug 2013 16:09:03 +0000 (18:09 +0200)]
doc: fix erasure code formatting warnings and errors

http://tracker.ceph.com/issues/4929 refs #4929

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Wed, 21 Aug 2013 05:40:13 +0000 (22:40 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agoceph-disk: partprobe after creating journal partition
Sage Weil [Wed, 21 Aug 2013 05:39:09 +0000 (22:39 -0700)]
ceph-disk: partprobe after creating journal partition

At least one user reports that a partprobe is needed after creating the
journal partition.  It is not clear why sgdisk is not doing it, but this
fixes ceph-disk for them, and should be harmless for other users.

Fixes: #5599
Tested-by: lurbs in #ceph
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/wip-6004' into next
Sage Weil [Tue, 20 Aug 2013 23:57:46 +0000 (16:57 -0700)]
Merge remote-tracking branch 'gh/wip-6004' into next

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years ago.gitignore: ignore test-driver
Sage Weil [Fri, 9 Aug 2013 19:49:57 +0000 (12:49 -0700)]
.gitignore: ignore test-driver

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agofuse: fix warning when compiled against old fuse versions
Sage Weil [Fri, 9 Aug 2013 19:42:49 +0000 (12:42 -0700)]
fuse: fix warning when compiled against old fuse versions

client/fuse_ll.cc: In function 'void invalidate_cb(void*, vinodeno_t, int64_t, int64_t)':
warning: client/fuse_ll.cc:540: unused variable 'fino'

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agojson_spirit: remove unused typedef
Sage Weil [Fri, 9 Aug 2013 19:40:34 +0000 (12:40 -0700)]
json_spirit: remove unused typedef

In file included from json_spirit/json_spirit_writer.cpp:7:0:
json_spirit/json_spirit_writer_template.h: In function 'String_type json_spirit::non_printable_to_string(unsigned int)':
json_spirit/json_spirit_writer_template.h:37:50: warning: typedef 'Char_type' locally defined but not used [-Wunused-local-typedefs]
         typedef typename String_type::value_type Char_type;

(Also, ha ha, this file uses \r\n.)

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agogtest: add build-aux/test-driver to .gitignore
Sage Weil [Fri, 9 Aug 2013 19:31:41 +0000 (12:31 -0700)]
gtest: add build-aux/test-driver to .gitignore

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #517 from dmick/wip-6049
Dan Mick [Tue, 20 Aug 2013 19:18:43 +0000 (12:18 -0700)]
Merge pull request #517 from dmick/wip-6049

mon/PGMap: OSD byte counts 4x too large (conversion to bytes overzealous)

Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon/Paxos: always refresh after any store_state
Sage Weil [Tue, 20 Aug 2013 18:27:23 +0000 (11:27 -0700)]
mon/Paxos: always refresh after any store_state

If we store any new state, we need to refresh the services, even if we
are still in the midst of Paxos recovery.  This is because the
subscription path will share any committed state even when paxos is
still recovering.  This prevents a race like:

 - we have maps 10..20
 - we drop out of quorum
 - we are elected leader, paxos recovery starts
 - we get one LAST with committed states that trim maps 10..15
 - we get a subscribe for map 10..20
   - we crash because 10 is no longer on disk because the PaxosService
     is out of sync with the on-disk state.

Fixes: #6045
Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon/Paxos: return whether store_state stored anything
Sage Weil [Tue, 20 Aug 2013 18:27:09 +0000 (11:27 -0700)]
mon/Paxos: return whether store_state stored anything

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agomon/Paxos: cleanup: use do_refresh from handle_commit
Sage Weil [Tue, 20 Aug 2013 18:26:57 +0000 (11:26 -0700)]
mon/Paxos: cleanup: use do_refresh from handle_commit

This avoid duplicated code by using the helper created exactly for this
purpose.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
11 years agopybind: fix Rados.conf_parse_env test
Sage Weil [Tue, 20 Aug 2013 18:23:46 +0000 (11:23 -0700)]
pybind: fix Rados.conf_parse_env test

This happens after we connect, which means we get ENOSYS always.
Instead, parse_env inside the normal setup method, which had the added
benefit of being able to debug these tests.

Backport: dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon/PGMap: OSD byte counts 4x too large (conversion to bytes overzealous) 517/head
Dan Mick [Tue, 20 Aug 2013 18:10:42 +0000 (11:10 -0700)]
mon/PGMap: OSD byte counts 4x too large (conversion to bytes overzealous)

Fixes: #6049
Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agoMerge pull request #516 from dachary/master
athanatos [Tue, 20 Aug 2013 17:34:32 +0000 (10:34 -0700)]
Merge pull request #516 from dachary/master

erasure code : plugin, interface and glossary documentation updates

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoerasure code : plugin, interface and glossary documentation updates 516/head
Loic Dachary [Tue, 20 Aug 2013 14:17:10 +0000 (16:17 +0200)]
erasure code : plugin, interface and glossary documentation updates

* replace the erasure code plugin abstract interface with a doxygen link
  that will be populated when the header shows in master
* update the plugin documentation to reflect the current draft implementation
* fix broken link to PGBackend-h
* add a glossary to define chunk, stripe, shard and strip with a drawing

http://tracker.ceph.com/issues/4929 refs #4929

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Tue, 20 Aug 2013 05:53:28 +0000 (22:53 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agoPG: remove old log when we upgrade log version
Samuel Just [Tue, 20 Aug 2013 00:23:44 +0000 (17:23 -0700)]
PG: remove old log when we upgrade log version

Otherwise the log_oid will be non-empty and the next
boot will cause us to try to upgrade again.

Fixes: #6057
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge branch 'wip-fallocate'
Sage Weil [Tue, 20 Aug 2013 05:50:11 +0000 (22:50 -0700)]
Merge branch 'wip-fallocate'

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph-fuse: fallocate appears in fuse 2.9.1, not 2.9
Sage Weil [Tue, 20 Aug 2013 04:46:29 +0000 (21:46 -0700)]
ceph-fuse: fallocate appears in fuse 2.9.1, not 2.9

There is no macro to differentiate 2.9 from 2.9.1, so we have to wait
to use this until 3.0.  :(

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoclient: do not mark_caps_dirty for generic fallocate
Sage Weil [Fri, 16 Aug 2013 06:05:17 +0000 (23:05 -0700)]
client: do not mark_caps_dirty for generic fallocate

A normal fallocate in which the size is not changed is still a no-op.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoclient: guard fallocate with #ifdefs
Sage Weil [Fri, 16 Aug 2013 06:01:59 +0000 (23:01 -0700)]
client: guard fallocate with #ifdefs

Only include linux header if it's linux.  Only implement the fallocate
method if FALLOC_FL_PUNCH_HOLE is defined.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoCeph-fuse: Fallocate and punch hole support
Li Wang [Thu, 15 Aug 2013 04:04:03 +0000 (12:04 +0800)]
Ceph-fuse: Fallocate and punch hole support

This patch implements fallocate and punch hole support for Ceph fuse client.

Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agomon: add 'pg dump delta' to get just the rate info 524/head
Sage Weil [Tue, 20 Aug 2013 04:37:00 +0000 (21:37 -0700)]
mon: add 'pg dump delta' to get just the rate info

Still include it in the basic 'pg dump summary' info.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoPGLog: add a config to disable PGLog::check()
Samuel Just [Mon, 19 Aug 2013 07:02:24 +0000 (00:02 -0700)]
PGLog: add a config to disable PGLog::check()

This is a debug check which may be causing excessive
cpu usage.

Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agodoc: Title change.
John Wilkins [Tue, 20 Aug 2013 00:27:10 +0000 (17:27 -0700)]
doc: Title change.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoosd/ReplicatedPG: remove broken AccessMode logic
Sage Weil [Mon, 19 Aug 2013 05:34:24 +0000 (22:34 -0700)]
osd/ReplicatedPG: remove broken AccessMode logic

The original intent here was to handle reads in two modes.  For
workloads with read/modify/write ops, the RMW mode would:

 - queue writes for local store and replicas immediately
 - block reads until the write commits to all replicas

For mixed read/write workloads without read/modify/write ops, the
DELAYED mode would:

 - queue writes for replicas
 - allow local reads
 - once replicas commit, queue write locally
 - block local reads until local write completes

In reality, we never use the DELAYED mode.  It's untested and possibly
broken, and it is unlikely we will see a workload where it is important
in the near to mid term.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoMerge pull request #508 from ceph/wip-5905
Gregory Farnum [Mon, 19 Aug 2013 22:14:40 +0000 (15:14 -0700)]
Merge pull request #508 from ceph/wip-5905

examples: add a librados/hello_world program

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
11 years agoexamples: add a librados/hello_world program 508/head
Greg Farnum [Thu, 15 Aug 2013 23:16:37 +0000 (16:16 -0700)]
examples: add a librados/hello_world program

This is a simple program with lots of explanatory comments people
can use as a model for using librados.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoceph: parse CEPH_ARGS environment variable
Sage Weil [Mon, 19 Aug 2013 19:48:50 +0000 (12:48 -0700)]
ceph: parse CEPH_ARGS environment variable

Fixes: #6052
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
11 years agorados pybind: add conf_parse_env()
Sage Weil [Mon, 19 Aug 2013 19:48:40 +0000 (12:48 -0700)]
rados pybind: add conf_parse_env()

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Mon, 19 Aug 2013 19:41:54 +0000 (12:41 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agodoc/release-notes: v0.61.8
Sage Weil [Mon, 19 Aug 2013 19:41:26 +0000 (12:41 -0700)]
doc/release-notes: v0.61.8

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #513 from dalgaaf/fix/wip-da-documentation
Sage Weil [Mon, 19 Aug 2013 19:32:30 +0000 (12:32 -0700)]
Merge pull request #513 from dalgaaf/fix/wip-da-documentation

Fix documentation issues

11 years agofilestore-config-ref.rst: mark some filestore keys as deprecated 513/head
Danny Al-Gaaf [Mon, 19 Aug 2013 18:56:48 +0000 (20:56 +0200)]
filestore-config-ref.rst: mark some filestore keys as deprecated

Marked the following keys as deprecated since v0.65:
- filestore flusher
- filestore flusher max fds
- filestore sync flush

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
11 years agoMerge pull request #512 from ceph/wip-5988
Sage Weil [Mon, 19 Aug 2013 18:16:57 +0000 (11:16 -0700)]
Merge pull request #512 from ceph/wip-5988

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge branch 'wip-erasure-coded-doc'
Samuel Just [Mon, 19 Aug 2013 18:02:45 +0000 (11:02 -0700)]
Merge branch 'wip-erasure-coded-doc'

11 years agolibrados: synchronous commands should return on commit instead of ack 512/head
Greg Farnum [Mon, 19 Aug 2013 17:29:49 +0000 (10:29 -0700)]
librados: synchronous commands should return on commit instead of ack

This is unlikely to be noticed by anybody, but it is a big change. Document
in the PendingReleaseNotes and bump up the librados minor version number
to 68.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoMerge pull request #493 from dachary/wip-erasure-coding-doc
athanatos [Mon, 19 Aug 2013 17:28:48 +0000 (10:28 -0700)]
Merge pull request #493 from dachary/wip-erasure-coding-doc

rearrange erasure code documents

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agomon: make MonMap error message about unspecified monitors less specific.
Greg Farnum [Mon, 19 Aug 2013 17:21:16 +0000 (10:21 -0700)]
mon: make MonMap error message about unspecified monitors less specific.

The error message helpfully references the -m and -c CLI options for
specifying monitors, but this code can be invoked from non-core librados
client applications so that's unfortunately not kosher. Remove the
reference.

Fixes #5979.

Signed-off-by: Greg Farnum <greg@inktank.com>
11 years agoauth-config-ref.rst: fix signature keys
Danny Al-Gaaf [Mon, 19 Aug 2013 08:33:37 +0000 (10:33 +0200)]
auth-config-ref.rst: fix signature keys

Fix names of cephx signature keys.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
11 years agoobjclass: move cls_log into class_api.cc
Sage Weil [Sat, 17 Aug 2013 21:30:37 +0000 (14:30 -0700)]
objclass: move cls_log into class_api.cc

Not sure why but this seems to resolve a linking problem when loading
classes:

2013-08-17 13:28:19.015776 7fb2bcffa700  0 _load_class could not open class /usr/lib/rados-classes/libcls_hello.so (dlopen failed): /usr/lib/rados-classes/libcls_hello.so: undefined symbol: cls_log
2013-08-17 13:28:19.015786 7fb2bcffa700 -1 osd.4 12 class hello open got (5) Input/output error

In any case, it's simpler.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/dev/filestore-filesystem-compatibliity: remove outdated xattr notes
Sage Weil [Sat, 17 Aug 2013 18:04:47 +0000 (11:04 -0700)]
doc/dev/filestore-filesystem-compatibliity: remove outdated xattr notes

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #494 from kri5/wip-s3-compliance-doc
Sage Weil [Sat, 17 Aug 2013 18:00:59 +0000 (11:00 -0700)]
Merge pull request #494 from kri5/wip-s3-compliance-doc

doc: complete S3 features status from existing doc page

11 years agoMerge pull request #491 from kri5/wip-clang-compilation
Sage Weil [Sat, 17 Aug 2013 17:59:01 +0000 (10:59 -0700)]
Merge pull request #491 from kri5/wip-clang-compilation

Fix compilation -Wmismatched-tags warnings

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agodoc: Updated upgrade doc to include dumpling and incorporate ceph-deploy.
John Wilkins [Sat, 17 Aug 2013 17:35:32 +0000 (10:35 -0700)]
doc: Updated upgrade doc to include dumpling and incorporate ceph-deploy.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge pull request #479 from devoid/fix-5797
Sage Weil [Sat, 17 Aug 2013 17:09:01 +0000 (10:09 -0700)]
Merge pull request #479 from devoid/fix-5797

Document unstable nature of CephFS

11 years agoMakefile: move objclass/*.cc to libosd.la
Sage Weil [Sat, 17 Aug 2013 16:40:44 +0000 (09:40 -0700)]
Makefile: move objclass/*.cc to libosd.la

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/changelog: add missing file
Sage Weil [Sat, 17 Aug 2013 15:38:55 +0000 (08:38 -0700)]
doc/changelog: add missing file

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoos/FileStore: initialize blk_size on _detect_fs()
Sage Weil [Sat, 17 Aug 2013 15:30:26 +0000 (08:30 -0700)]
os/FileStore: initialize blk_size on _detect_fs()

This was missed by a25d73effb38118602bc73da0aa258c639f69c2c.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agodoc/release-notes: v0.67.1
Sage Weil [Sat, 17 Aug 2013 15:20:00 +0000 (08:20 -0700)]
doc/release-notes: v0.67.1

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #505 from ceph/wip-post-file
Sage Weil [Sat, 17 Aug 2013 06:41:38 +0000 (23:41 -0700)]
Merge pull request #505 from ceph/wip-post-file

ceph-post-file: single command to upload a file to cephdrop

11 years agomds: create only one ESubtreeMap during fs creation
Sage Weil [Sat, 17 Aug 2013 05:08:00 +0000 (22:08 -0700)]
mds: create only one ESubtreeMap during fs creation

Previously we would create an empty ESubtreeMap when we opened the log
segment and then immediately journal a second one that created the root
and mdsdir.  More importantly, for the second ESubtreeMap, we would not
wait for it to commit before requesting the ACTIVE state, leading to
#4894.

Instead, break start_new_segment() into two steps: one that creates the
in-memory LogSegment tracking structure, and one that journals the
ESubtreeMap.  Open things early and write the (one) ESubtreeMap at the
end of boot_create().. and then wait for it.

Fixes: #4894
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agodoc: quickstart: be more explicit that node == mon node
Sage Weil [Sat, 17 Aug 2013 04:18:21 +0000 (21:18 -0700)]
doc: quickstart: be more explicit that node == mon node

This appears to be one source of confusion for new users that leads to
a failure to form an initial mon quorum.  See comments on

 http://tracker.ceph.com/issues/4924

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agorgw: drain requests before exiting
Yehuda Sadeh [Tue, 13 Aug 2013 20:16:07 +0000 (13:16 -0700)]
rgw: drain requests before exiting

Fixes: #5953
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph-post-file: single command to upload a file to cephdrop 505/head
Sage Weil [Sat, 17 Aug 2013 00:59:11 +0000 (17:59 -0700)]
ceph-post-file: single command to upload a file to cephdrop

Use sftp to upload to a directory that only this user and ceph devs can
access.

Distribute an ssh key to connect to the account.  This will let us revoke
the key in the future if we feel the need.  Also distribute a known_hosts
file so that users have some confidence that they are connecting to the
real ceph drop account and not some third party.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
11 years agodoc: Removed old mkcephfs references.
John Wilkins [Sat, 17 Aug 2013 00:31:43 +0000 (17:31 -0700)]
doc: Removed old mkcephfs references.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Removed mkcephfs references.
John Wilkins [Sat, 17 Aug 2013 00:28:15 +0000 (17:28 -0700)]
doc: Removed mkcephfs references.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Updated script for dumpling.
John Wilkins [Sat, 17 Aug 2013 00:27:53 +0000 (17:27 -0700)]
doc: Updated script for dumpling.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Updated APT script for dumpling.
John Wilkins [Sat, 17 Aug 2013 00:27:16 +0000 (17:27 -0700)]
doc: Updated APT script for dumpling.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agodoc: Removed mkcephfs references. Did a bit of clean-up work.
John Wilkins [Sat, 17 Aug 2013 00:26:25 +0000 (17:26 -0700)]
doc: Removed mkcephfs references. Did a bit of clean-up work.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoMerge pull request #509 from dmick/wip-rest-conf
Dan Mick [Sat, 17 Aug 2013 00:05:52 +0000 (17:05 -0700)]
Merge pull request #509 from dmick/wip-rest-conf

config_opts: add two ceph-rest-api-only variables for convenience

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoReplicatedPG: add osd_recover_clone_overlap_limit to limit clones
Samuel Just [Thu, 15 Aug 2013 22:35:26 +0000 (15:35 -0700)]
ReplicatedPG: add osd_recover_clone_overlap_limit to limit clones

We don't want to clone_range from clones too many times.
For now, just skip the cloning if there are too many holes.

Fixes: #5985
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoconfig_opts: add two ceph-rest-api-only variables for convenience 509/head
Dan Mick [Fri, 16 Aug 2013 20:15:34 +0000 (13:15 -0700)]
config_opts: add two ceph-rest-api-only variables for convenience

These aren't used by the C++ code at all, but in order for
rados_conf_get to find them, they need to be listed.  They're
consumed by ceph_rest_api.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agoMerge remote-tracking branch 'upstream/wip-zfs'
Samuel Just [Fri, 16 Aug 2013 23:35:21 +0000 (16:35 -0700)]
Merge remote-tracking branch 'upstream/wip-zfs'

Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #504 from ceph/wip-cls-hello
Sage Weil [Fri, 16 Aug 2013 18:07:02 +0000 (11:07 -0700)]
Merge pull request #504 from ceph/wip-cls-hello

cls/hello: hello, world rados class

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.com>
11 years agoosdc/ObjectCacher: do not merge rx buffers
Sage Weil [Fri, 16 Aug 2013 04:48:06 +0000 (21:48 -0700)]
osdc/ObjectCacher: do not merge rx buffers

We do not try to merge rx buffers currently.  Make that explicit and
documented in the code that it is not supported.  (Otherwise the
last_read_tid values will get lost and read results won't get applied
to the cache properly.)

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosdc/ObjectCacher: match reads with their original rx buffers
Sage Weil [Fri, 16 Aug 2013 04:47:18 +0000 (21:47 -0700)]
osdc/ObjectCacher: match reads with their original rx buffers

Consider a sequence like:

 1- start read on 100~200
       100~200 state rx
 2- truncate to 200
       100~100 state rx
 3- start read on 200~200
       100~100 state rx
       200~200 state rx
 4- get 100~200 read result

Currently this makes us crash on

osdc/ObjectCacher.cc: 738: FAILED assert(bh->length() <= start+(loff_t)length-opos)

when processing the second 200~200 bufferhead (it is too big).  The
larger issue, though, is that we should not be looking at this data at
all; it has been truncated away.

Fix this by marking each rx buffer with the read request that is sent to
fill it, and only fill it from that read request.  Then the first reply
will fill the first 100~100 extend but not touch the other extent; the
second read will do that.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge branch 'wip-5848-coll'
David Zafman [Fri, 16 Aug 2013 01:32:30 +0000 (18:32 -0700)]
Merge branch 'wip-5848-coll'

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoosd: Add perf tracking for all states in RecoveryState
David Zafman [Thu, 15 Aug 2013 19:28:06 +0000 (12:28 -0700)]
osd: Add perf tracking for all states in RecoveryState

Fixes: #5848
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agocls/hello: hello, world rados class 504/head
Sage Weil [Fri, 16 Aug 2013 00:20:43 +0000 (17:20 -0700)]
cls/hello: hello, world rados class

Simple example of a rados class doing read, write, and read/modify/write
methods.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: enforce RD, WR flags for class methods
Sage Weil [Thu, 15 Aug 2013 23:19:21 +0000 (16:19 -0700)]
osd: enforce RD, WR flags for class methods

Class methods are marked with RD and WR to help the OSD decide when we need
to flush objects or require certain permissions.  Ensure that methods do
not step outside their advertised capabilities by keeping a counter of rd
and wr ops we perform in do_osd_ops() and making sure that class methods,
and any ops the indirectly call, do not break the rules.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agocls_rbd: remove old assign_bid method
Sage Weil [Thu, 15 Aug 2013 22:22:41 +0000 (15:22 -0700)]
cls_rbd: remove old assign_bid method

This method is problematic because it both writes/mutates and returns data,
which means that an untimely client disconnect or peering event will result
in a success to the client with no payload.

It has not been used since v0.52 (18054ba46fe2779d8df8b1a0d69ec93ca6a66c34)
which is pre-bobtail; so this change breaks compatibility with pre-bobtail
librbd clients (at least for image creation).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agolibrbd: remove mostly-useless assign_bid helper
Sage Weil [Thu, 15 Aug 2013 22:18:51 +0000 (15:18 -0700)]
librbd: remove mostly-useless assign_bid helper

Do it inline.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: do not return data payload for successful writes
Sage Weil [Thu, 15 Aug 2013 22:06:38 +0000 (15:06 -0700)]
osd: do not return data payload for successful writes

We were somewhat inadvertantly returning a data payload for write
operations.  This was a side-effect of the OpContext::ops field being a
reference to MOSDOp::ops: the return data would end up there, and then
the MOSDOpReply ctor would copy it.

Fix this by breaking the ref, and making the do_op() logic also claim
return result data for error values (so that errors can return data to the
caller).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agocommon/Preforker: shut up warning
Sage Weil [Thu, 15 Aug 2013 21:35:28 +0000 (14:35 -0700)]
common/Preforker: shut up warning

common/Preforker.h: In member function 'void Preforker::daemonize()':
common/Preforker.h:97:40: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Fri, 16 Aug 2013 00:21:00 +0000 (17:21 -0700)]
Merge remote-tracking branch 'gh/next'

11 years agoMerge pull request #506 from dmick/wip-admin-daemon
Sage Weil [Fri, 16 Aug 2013 00:14:23 +0000 (17:14 -0700)]
Merge pull request #506 from dmick/wip-admin-daemon

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoceph.in: --admin-daemon was not returning EINVAL on bad command 506/head
Dan Mick [Fri, 16 Aug 2013 00:10:56 +0000 (17:10 -0700)]
ceph.in: --admin-daemon was not returning EINVAL on bad command

Fix by restructuring code to hoist common code and have only one
place where admin_socket is actually called.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
11 years agoMerge pull request #507 from ceph/wip-4635.master
João Eduardo Luís [Thu, 15 Aug 2013 22:54:10 +0000 (15:54 -0700)]
Merge pull request #507 from ceph/wip-4635.master

Bunch of tidying up on monitor services & fix #4635

Reviewed-by: Sage Weil <sage@inktank.com>