]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agomon: Single-paxos and key/value store support
Joao Eduardo Luis [Mon, 11 Jun 2012 13:55:21 +0000 (14:55 +0100)]
mon: Single-paxos and key/value store support

We are converting the monitor subsystem to a Single-Paxos architecture,
backed by a key/value store. The previous architecture used a Paxos
instance for each Paxos Service, backed by a nasty Monitor Store that
provided few to no consistency guarantees whatsoever, which led to a fair
amount of workarounds.

Changes:

* Paxos:
  - Add k/v store support
  - Add documentation describing the new Paxos storage layout and behavior
  - Get rid of the stashing code, which was used as a consistency point
    mechanism (we no longer need it, because of our k/v store)
  - Debug level of 30 will output json-formatted transaction dumps
  - Allows for proposal queueing, to be proposed in the same order as
    they were queued.
  - No more 'is_leader()' function, using instead the Monitor's for
    enhanced simplicity.
  - Add 'is_lease_valid()' function.
  - Disregard 'stashed versions'
  - Make the paxos 'state' variable a bit-map, so we lock the proposal
    mechanism while maintaining the state [5].
  - Related notes: [3]

* PaxosService:
  - Add k/v store support, creating wrappers to be used by the services
  - Add documentation
  - Support single-paxos behavior, creating wrappers to be used by the
    services and service-specific version
  - Rearrange variables so they are neatly organized in the beginning of
    the class
  - Add a trim_to() function to be used by the services, instead of letting
    them rely on Paxos::trim_to(), which is no longer adequate to the job
    at hand
  - Debug level of 30 will output json-formatted transaction dumps
  - Support proposal queueing, taking it into consideration when
    assessing the current state of the service (active, writeable,
    readable, ...)
  - Redefine the conditions for 'is_{active,readable,writeable}()' given
    the new single-paxos approach, with proposal queueing [1].
  - Use our own waiting_for_* callback lists, which now must be
    dissociated from their Paxos counterparts [2].
  - Related notes: [3], [4]

* Monitor:
  - Add k/v store support
  - Use only one Paxos instance and pass it down to each service instance
  - Crank up CEPH_MON_PROTOCOL to 10

* {Auth,Log,MDS,Monmap,OSD,PG}Monitor:
  - Add k/v store support
  - Add single-paxos support

* AuthMonitor:
  - Don't always propose full versions: if the KeyServer doesn't have
    keys, we cannot propose a full version. This should only happen when
    we start with a brand new store and we are creating the first
    pending proposal, and if we were to commit a full version filled
    with nothing but a big void of nothingness, we could eventually end
    up with a corrupted version.

* Elector:
  - Add k/v store support
  - Add single-paxos support

* ceph-mon:
  - Use the monitor's k/v store instead of MonitorStore

* MMonPaxos:
  - remove the machine_id field: This field was used to identify from/to
    which paxos service a given message belonged. We no longer have a Paxos
    for each service, so this field became obsolete.

Notes:

[1] Redefine the conditions for 'is_{active,readable,writeable}()' on
    the PaxosService class, to be used with single-paxos and proposal
    queueing:

  We should not rely on the Paxos::is_*() functions, since they do not apply
  directly to the PaxosService.

  All the PaxosService classes share the same Paxos class, but they do not
  rely on its values. Each service only relies, uses and updates its own
  values on the k/v store. Thus, we may have a given service (e.g., the
  OSDMonitor) proposing a new value, hence updating or waiting to update its
  store, and we may still consider the LogMonitor as being able to read and
  write its own values on the k/v store. In a nutshell, different services
  do not overlap on their access to their own store when it comes to reading,
  and since the Paxos will queue their updates and deal with them in a FIFO
  order, their updates won't overlap either.

  Therefore, the conditions for the PaxosService::is_{active,readable,
  writeable} differ from those on the Paxos::is_{active,readable,writeable}.

  * PaxosService::is_active() - the PaxosService will be considered as
  active iff it is not proposing and the Paxos is not recovering. This
  means that a given PaxosService (e.g., the OSDMonitor) may be considered
  as being active even though some other service (e.g., the LogMonitor) is
  proposing a new value and the Paxos is on the UPDATING state. This means
  that the OSDMonitor will be able to read its own versions and queue any
  changes on to the Paxos. However, if the Paxos is on state RECOVERING,
  we cannot be considered as active.

  * PaxosService::is_writeable() - We will be able to propose new values
  iff we are the Leader, we have a valid lease, and we are not already
  proposing. If we are proposing, we must wait for our proposal to finish
  in order to proceed with writing to our k/v store; otherwise we could
  incur in assuming that our last committed version was, say, 10; then
  assign map epochs/versions taking that into consideration, make changes
  to the store based on those values, just to come to smash previously
  proposed values on the store. We really don't want that. To be fair,
  there was a chance we could assume we were always writable, but there
  may be unforeseen consequences to this; so we take the conservative
  approach here for now, and we will relax it in the future if we believe
  it to be fruitful.

  * PaxosService::is_readable() - We will be readable iff we are not
  proposing and the Paxos is not recovering; if our last committed version
  exists; and if we are either a cluster of one or we have a valid lease.

[2] Use own waiting_for_* callback lists on PaxosService, which now must
    be dissociated from their Paxos counterparts:

  We were relying on Paxos to wait for state changes, but since our state
  became somewhat independent from the Paxos state, we have to deal with
  callbacks waiting for 'readable', 'writable' or 'active' on different
  terms than those that Paxos provide.

  So, basically, we will take one of two approaches when it comes to waiting:

  * If we are proposing, queue ourselves on our own list, waiting for the
  proposal to finish;
  * Otherwise, the cause for the need to wait comes from Paxos, so queue
  the callback directly on Paxos.

  This approach means that we must make sure to check our desired state
  whenever the callback is fired up, and re-queue ourselves if the state
  didn't quite change (or if it changed but our waiting condition result
  didn't). For instance, if we were waiting for a proposal to finish due to
  a failed 'is_active()', we will need to recheck if we are active before
  continuing once the callback is fired. This is mainly because we may have
  finished our proposal, but a new Election may have been called and the
  Paxos may not be active.

[3] Propose everything in the queue before bootstrapping, but don't
    allow new proposals:

  The MonmapMonitor may issue bootstraps once it is updated. We must ensure
  that we propose every single pending proposal before we actually do it.

  However, ee don't want to propose if we are going to bootstrap; otherwise,
  we may end up losing proposals.

[4] Handle the case when first_committed_version equals 0 on a
    PaxosService

  In a nutshell, the services do not set the first committed version, as
  they consider it as a SEP (Somebody Else's Problem). They do rely on it
  though, and we, the PaxosService, must ensure that it contains a valid
  value (that is, higher than zero) at all times.

  Since we will only have a first_committed version equal to zero once,
  and that is before the service's first proposal, we are safe to simply
  read the variable from the store and assign the first_committed the same
  value as the last_committed iff the first_committed version is zero.

  This also affects trimming, since trimming relies on the first_committed
  version as the lower bound for version trimming. Even though the k/v store
  will gracefully ignore any problem from trying to remove non-existent
  versions, the main issue would still stand: we'd be removing a non-existent
  version and that just doesn't make any sense.

[5] 'lock' paxos when we are running some internal proposals

  Force the paxos services to wait for us to complete whatever we are
  doing before they can proceed.  This is required because on certain
  occasions we might need to run internal proposals, not affected to any of
  the paxos services (for instance, when learning an old value), and we need
  them to stay put, or they might incur in erroneous state and crash the
  monitor.

  This could have been done with an extra bool, but there was no point
  in creating a new variable when we can just as easily reuse the
  'state' variable for our twisted interests.

Fixes: #4175
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Monitor: keyring always on mon_data/keyring by default
Joao Eduardo Luis [Mon, 11 Jun 2012 14:13:05 +0000 (15:13 +0100)]
mon: Monitor: keyring always on mon_data/keyring by default

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: MonitorDBStore: Add a key/value store to be used in the monitor
Joao Eduardo Luis [Tue, 12 Jun 2012 22:51:10 +0000 (23:51 +0100)]
mon: MonitorDBStore: Add a key/value store to be used in the monitor

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoauth: cephx: KeyServer: add 'has_secrets()' function
Joao Eduardo Luis [Thu, 21 Jun 2012 00:23:03 +0000 (01:23 +0100)]
auth: cephx: KeyServer: add 'has_secrets()' function

We need this in the AuthMonitor to assess if we should encode a full version
of the KeyServer and submit it to the Paxos along with the incrementals.

Not performing such a check will lead us to an erroneous version, from which
we won't be able to recover.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agomon: Remove global version code introduced around bobtail's release
Joao Eduardo Luis [Mon, 21 Jan 2013 18:45:03 +0000 (18:45 +0000)]
mon: Remove global version code introduced around bobtail's release

This patch reverts most of the global version (gv) related patches that
were introduced around bobtail's release as a prelude to the single-paxos
patches.

The gv infrastructure allowed us to gather version information on the
monitors, essential to the move to a single-paxos implementation on
existing clusters -- this means that for an existing cluster to upgrade
to the a single-paxos monitor, it will first have to be upgraded to a
version prior to this patch.  This patch strips the monitor subsystem of
all the gv-related code that is of no use for upcoming versions.

Furthermore, from this patch onwards until all single-paxos patches
are merged, ceph-mon won't work as expected, and may not compile at some
point in the git history.

These patches are not retro-compatible, and the monitors are not expected
to work with earlier versions.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
12 years agoosd: update snap collections for sub_op_modify log records conditionaly
Sage Weil [Mon, 11 Feb 2013 14:23:54 +0000 (06:23 -0800)]
osd: update snap collections for sub_op_modify log records conditionaly

The only remaining caller is sub_op_modify().  If we do have a non-empty
op transaction, we want to do this update, regardless of what we think
last_backfill is (our notion may be not completely in sync with the
primary).  In particular, our last_backfill may be the same object but
a different snapid, but the primary disagrees and is pushing an op
transaction through.

Instead, update the collections if we have a non-empty transaction.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: include snaps in pg_log_entry_t::dump()
Sage Weil [Mon, 11 Feb 2013 01:02:45 +0000 (17:02 -0800)]
osd: include snaps in pg_log_entry_t::dump()

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: unconditionally encode snaps buffer
Sage Weil [Mon, 11 Feb 2013 00:59:48 +0000 (16:59 -0800)]
osd: unconditionally encode snaps buffer

Previously we would only encode the updated snaps vector for CLONE ops.
This doesn't work for MODIFY ops generated by the snap trimmer, which
may also adjust the clone collections.  It is also possible that other
operations may need to populate this field in the future (e.g.,
LOST_REVERT may, although it currently does not).

Fixes: #4071, and possibly #4051.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: improve debug output on snap collections
Sage Weil [Sun, 10 Feb 2013 18:57:12 +0000 (10:57 -0800)]
osd: improve debug output on snap collections

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoRevert "rgw: plain format always appends eol to data"
Yehuda Sadeh [Mon, 11 Feb 2013 19:30:02 +0000 (11:30 -0800)]
Revert "rgw: plain format always appends eol to data"

This commit breaks the swift unit test. The reason is that it
makes it so that returned error status ends with eol, which
is not as expected.

This reverts commit c31aff5f9f0b9fe4ada6b259dd1f424627b3e875.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoqa/rbd: +x on map-snapshot-io.sh
Sage Weil [Mon, 11 Feb 2013 16:48:44 +0000 (08:48 -0800)]
qa/rbd: +x on map-snapshot-io.sh

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa: fix iogen script
Sage Weil [Thu, 7 Feb 2013 06:01:24 +0000 (22:01 -0800)]
qa: fix iogen script

Wait 10 minutes and then stop.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 75c40fac603a3d21407d326e9faa8883166ad035)

12 years agoosd: do not spam system log on successful read_log
Sage Weil [Wed, 6 Feb 2013 17:02:54 +0000 (09:02 -0800)]
osd: do not spam system log on successful read_log

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 1948a02bc888fadafc29cf2e6f0a92129c68fd4c)

12 years agorgw: plain format always appends eol to data
Yehuda Sadeh [Fri, 8 Feb 2013 21:16:36 +0000 (13:16 -0800)]
rgw: plain format always appends eol to data

Beforehand we just prepended the eol to the next line, so that
the last line also gets eol.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agorgw: change json formatting for swift list container
Yehuda Sadeh [Fri, 8 Feb 2013 21:14:49 +0000 (13:14 -0800)]
rgw: change json formatting for swift list container

Fixes: #4048
There is some difference in the way swift formats the
xml output and the json output for list container. In
xml the entity is named 'name' and in json it is named
'subdir'.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoosd: fix load_pgs collection handling
Sage Weil [Sat, 9 Feb 2013 08:05:33 +0000 (00:05 -0800)]
osd: fix load_pgs collection handling

On a _TEMP pg, is_pg() would succeed, which meant we weren't actually
hitting the cleanup checks.  Instead, restructure this loop as positive
checks and handle each type of collection we understand.

This fixes _TEMP cleanup.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: fix load_pgs handling of pg dirs without a head
Sage Weil [Sat, 9 Feb 2013 08:04:29 +0000 (00:04 -0800)]
osd: fix load_pgs handling of pg dirs without a head

If there is a pgid that passes coll_t::is_pg() but there is no head, we
will populate the pgs map but then fail later when we try to do
read_state.  This is a side-effect of 55f8579.

Take explicit note of _head collections we see, and then warn when we
find stray snap collections.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoOSD::load_pgs: first scan colls before initing PGs
Samuel Just [Thu, 7 Feb 2013 21:34:47 +0000 (13:34 -0800)]
OSD::load_pgs: first scan colls before initing PGs

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomon: fix typo in C_Stats
Sage Weil [Fri, 8 Feb 2013 17:59:25 +0000 (09:59 -0800)]
mon: fix typo in C_Stats

Broken by previous commit.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: retry PGStats message on EAGAIN
Sage Weil [Fri, 8 Feb 2013 07:13:11 +0000 (23:13 -0800)]
mon: retry PGStats message on EAGAIN

If we get EAGAIN from a paxos restart/election/whatever, we should
restart the message instead of just blindly acking it.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agomon: handle -EAGAIN in completion contexts
Sage Weil [Fri, 8 Feb 2013 06:06:14 +0000 (22:06 -0800)]
mon: handle -EAGAIN in completion contexts

We can get ECANCELED, EAGAIN, or success out of the completion contexts,
but in the EAGAIN case (meaning there was an election) we were sending
a success to the client.  This resulted in client hangs and all-around
confusion when the monitor cluster was thrashing.

Backport: bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agoradosgw-admin: fix cli test
Sage Weil [Fri, 8 Feb 2013 06:51:29 +0000 (22:51 -0800)]
radosgw-admin: fix cli test

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoReplicatedPG: check store for temp collection in have_temp_coll
Samuel Just [Thu, 7 Feb 2013 19:53:28 +0000 (11:53 -0800)]
ReplicatedPG: check store for temp collection in have_temp_coll

We may not have "created" the temp collection since OSD restart
before removing the PG.  have_temp_coll must also look at the
OSD store.  Currently, the only user is pg removal, so the
extra work is acceptable.

Backport: bobtail
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: a tool to fix clobbered bucket info in user's bucket list
Yehuda Sadeh [Thu, 7 Feb 2013 01:10:00 +0000 (17:10 -0800)]
rgw: a tool to fix clobbered bucket info in user's bucket list

This fixes bad entries in user's bucket list that may have occured
due to issue #4039. Syntax:

 $ radosgw-admin user check --uid=<uid> [--fix]

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9cb6c33f0e2281b66cc690a28e08459f2e62ca13)

Conflicts:
src/rgw/rgw_admin.cc

12 years agorgw: bucket recreation should not clobber bucket info
Yehuda Sadeh [Thu, 7 Feb 2013 00:43:48 +0000 (16:43 -0800)]
rgw: bucket recreation should not clobber bucket info

Fixes: #4039
User's list of buckets is getting modified even if bucket already
exists. This fix removes the newly created directory object, and
makes sure that user info's data points at the correct bucket.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9d006ec40ced9d97b590ee07ca9171f0c9bec6e9)

Conflicts:
src/rgw/rgw_op.cc
src/rgw/rgw_rados.cc

12 years agoMerge branch 'wip-cephtool' into next
Dan Mick [Thu, 7 Feb 2013 21:09:28 +0000 (13:09 -0800)]
Merge branch 'wip-cephtool' into next

Usage/errmsg fixups for the ceph CLI tool

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoceph: fix 'pg' error message to direct user toward better input
Dan Mick [Thu, 7 Feb 2013 00:27:39 +0000 (16:27 -0800)]
ceph: fix 'pg' error message to direct user toward better input

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agomds: error messages for export_dir said 'migrate_dir'
Dan Mick [Wed, 6 Feb 2013 06:17:00 +0000 (22:17 -0800)]
mds: error messages for export_dir said 'migrate_dir'

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoceph: ceph mon delete doesn't exist; ceph mon remove is the command
Dan Mick [Tue, 5 Feb 2013 04:40:12 +0000 (20:40 -0800)]
ceph: ceph mon delete doesn't exist; ceph mon remove is the command
Fix up cli test as well (doc is already correct)

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoosd: fix name of setomapval admin-daemon command
Dan Mick [Thu, 31 Jan 2013 23:07:14 +0000 (15:07 -0800)]
osd: fix name of setomapval admin-daemon command

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoceph: use "config set" consistently in help/error msgs
Dan Mick [Thu, 31 Jan 2013 22:26:10 +0000 (14:26 -0800)]
ceph: use "config set" consistently in help/error msgs

apparently it was once known as set_config.  Fix up everything to
refer to the new name.  Also, fix up the help message.

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoPG: dirty_info on handle_activate_map
Samuel Just [Thu, 7 Feb 2013 18:38:00 +0000 (10:38 -0800)]
PG: dirty_info on handle_activate_map

We need to make sure the pg epoch is persisted during
activate_map.

Backport: bobtail
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoosd: flush peering queue (consume maps) prior to boot
Sage Weil [Thu, 7 Feb 2013 18:21:49 +0000 (10:21 -0800)]
osd: flush peering queue (consume maps) prior to boot

If the osd itself is behind on many maps during boot, it will get more and
(as part of that) flush the peering wq to ensure the pgs consume them.
However, it is possible for OSD to have latest/recnet maps, but pgs to be
behind, and to jump directly to boot and join.  The OSD is then laggy and
unresponsive because the peering wq is way behind.

To avoid this, call consume_map() (kick the peering wq) at the end of
init and flush it to ensure we are *internally* all caught up before we
consider joining the cluster.

I'm pretty sure this is the root cause of #3905 and possibly #3995.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: a tool to fix buckets with leaked multipart references
Yehuda Sadeh [Tue, 5 Feb 2013 22:50:54 +0000 (14:50 -0800)]
rgw: a tool to fix buckets with leaked multipart references

Checks specified bucket for the #4011 symptoms, optionally fix
the issue.

sytax:
  radosgw-admin bucket check --bucket=<bucket> [--fix]

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 2d8faf8e5f15e833e6b556b0f3c4ac92e4a4151e)

Conflicts:
src/rgw/rgw_admin.cc
src/rgw/rgw_rados.h

12 years agorgw: radosgw-admin object unlink
Yehuda Sadeh [Tue, 5 Feb 2013 21:54:11 +0000 (13:54 -0800)]
rgw: radosgw-admin object unlink

Add a radosgw-admin option to remove object from bucket index

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 16235a7acb9543d60470170bb2a09956364626cd)

Conflicts:
src/rgw/rgw_admin.cc
src/rgw/rgw_rados.h
src/test/cli/radosgw-admin/help.t

12 years agomon: check correct length of command
Dan Mick [Tue, 5 Feb 2013 21:27:40 +0000 (13:27 -0800)]
mon: check correct length of command

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoos: default to 'journal aio = true'
Sage Weil [Tue, 5 Feb 2013 18:29:11 +0000 (10:29 -0800)]
os: default to 'journal aio = true'

Hooray, testing indicates this is a win!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge pull request #36 from cmello/master
Gregory Farnum [Tue, 5 Feb 2013 18:20:18 +0000 (10:20 -0800)]
Merge pull request #36 from cmello/master

libexpat dependency

12 years agoMerge pull request #38 from alram/master
John Wilkins [Tue, 5 Feb 2013 17:57:32 +0000 (09:57 -0800)]
Merge pull request #38 from alram/master

Fixes in ./docs/radosgw/config.rst

12 years agoEdit endpoint-create in ./doc/radosgw/config.rst 38/head
Alexandre Marangone [Tue, 5 Feb 2013 05:20:07 +0000 (21:20 -0800)]
Edit endpoint-create in ./doc/radosgw/config.rst

internalurl and adminurl are mandatory. Typo in publicurl.

12 years agoEdit rgw keystone url in ./doc/radosgw/config.rst
Alexandre Marangone [Tue, 5 Feb 2013 05:14:54 +0000 (21:14 -0800)]
Edit rgw keystone url in ./doc/radosgw/config.rst

Won't work with the public port, it needs to be the admin port.

12 years agoNote on host in ./doc/radosgw/config.rst
Alexandre Marangone [Tue, 5 Feb 2013 05:09:37 +0000 (21:09 -0800)]
Note on host in ./doc/radosgw/config.rst

Some people have configured host with a FQDN or an IP
which prevents /etc/init.d/radosgw start to launch the daemon.

12 years agodoc: Updated to note bobtail supports RGW + Keystone.
John Wilkins [Tue, 5 Feb 2013 00:42:03 +0000 (16:42 -0800)]
doc: Updated to note bobtail supports RGW + Keystone.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agocli test: add pg deep-scrub option to test
Gary Lowell [Mon, 4 Feb 2013 22:14:45 +0000 (14:14 -0800)]
cli test: add pg deep-scrub option to test

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoAdd "pg deep-scrub..." missing from ceph usage output
David Zafman [Mon, 4 Feb 2013 19:45:49 +0000 (11:45 -0800)]
Add "pg deep-scrub..." missing from ceph usage output

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agorgw: fix setting of NULL to string
Yehuda Sadeh [Fri, 1 Feb 2013 18:56:11 +0000 (10:56 -0800)]
rgw: fix setting of NULL to string

Fixes: #3777
s->env->get() returns char * and not string and can return NULL.
Also, remove some old unused code.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoxattr_bench.cc: remove twice included <time.h>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:09 +0000 (17:54 +0100)]
xattr_bench.cc: remove twice included <time.h>

Cleanup includes, remove twice included <time.h>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph-filestore-dump.cc: remove twice included <iostream>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:08 +0000 (17:54 +0100)]
ceph-filestore-dump.cc: remove twice included <iostream>

Cleanup includes, remove twice included <iostream>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotestmsgr.cc: remove twice included <sys/stat.h>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:07 +0000 (17:54 +0100)]
testmsgr.cc: remove twice included <sys/stat.h>

Cleanup includes, remove twice included <sys/stat.h>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoperf_counters.cc: remove twice included header files
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:06 +0000 (17:54 +0100)]
perf_counters.cc: remove twice included header files

Cleanup includes, remove twice included "global/global_init.h" and
"common/ceph_context.h".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotestxattr.cc: remove twice included <iostream>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:05 +0000 (17:54 +0100)]
testxattr.cc: remove twice included <iostream>

Cleanup includes, remove twice included <iostream>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoworkload_generator.cc: remove twice included "common/debug.h"
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:04 +0000 (17:54 +0100)]
workload_generator.cc: remove twice included "common/debug.h"

Cleanup includes, remove twice included "common/debug.h"

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotest_idempotent.cc: remove twice included "os/FileStore.h"
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:03 +0000 (17:54 +0100)]
test_idempotent.cc: remove twice included "os/FileStore.h"

Cleanup includes, remove twice included "os/FileStore.h".

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotp_bench.cc: remove twice included <iostream>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:02 +0000 (17:54 +0100)]
tp_bench.cc: remove twice included <iostream>

Cleanup includes, remove twice included <iostream>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agosmall_io_bench*.cc: remove twice included <iostream>
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:01 +0000 (17:54 +0100)]
small_io_bench*.cc: remove twice included <iostream>

Cleanup includes, remove twice included <iostream>.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoMDS.cc: remove twice included common/errno.h
Danny Al-Gaaf [Mon, 4 Feb 2013 16:54:00 +0000 (17:54 +0100)]
MDS.cc: remove twice included common/errno.h

Cleanup includes, remove twice included common/errno.h.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agomon: enforce reweight be between 0..1
Sage Weil [Mon, 4 Feb 2013 17:14:39 +0000 (09:14 -0800)]
mon: enforce reweight be between 0..1

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Luis <joao.luis@inktank.com>
12 years agoqa: smalliobenchrbd workunit
Sage Weil [Sun, 3 Feb 2013 17:28:22 +0000 (09:28 -0800)]
qa: smalliobenchrbd workunit

Run a bunch of parallel smalliobenchrbd processes.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-rbd-bench'
Sage Weil [Sun, 3 Feb 2013 16:59:48 +0000 (08:59 -0800)]
Merge remote-tracking branch 'gh/wip-rbd-bench'

Conflicts:
ceph.spec.in
debian/ceph-test.install
src/.gitignore

12 years agoMerge branch 'wip-rpm-update3'
Gary Lowell [Sat, 2 Feb 2013 07:26:21 +0000 (23:26 -0800)]
Merge branch 'wip-rpm-update3'

Patches to ceph.spec.in and addition of rbd-fuse package.

12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Fri, 1 Feb 2013 19:31:10 +0000 (11:31 -0800)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: Minor edits.
John Wilkins [Fri, 1 Feb 2013 19:30:30 +0000 (11:30 -0800)]
doc: Minor edits.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agorgw: key indexes are only link to user info
Yehuda Sadeh [Thu, 13 Dec 2012 23:52:34 +0000 (15:52 -0800)]
rgw: key indexes are only link to user info

Instead of keeping multiple copies of the user info,
we just treat the key index as a pointer to the actual
user info (indexed by uid). This helps with two issues:
first, it scales better as we don't need to update the
entire set of keys whenever we make any change. Second,
it helps with the uid index atomicity.
One point to keep in mind is that both the links and the
info can be cached, so effect on performance is minimal.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Reviewed-by: caleb miles <caleb.miles@inktank.com>
12 years agoBuild: Add -n to files and description for rbd-fuse in ceph.sepc.in
Gary Lowell [Fri, 1 Feb 2013 05:51:44 +0000 (21:51 -0800)]
Build:  Add -n to files and description for rbd-fuse in ceph.sepc.in

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoMakefile: Install new rdb-fuse.8 man page
Gary Lowell [Fri, 1 Feb 2013 05:04:49 +0000 (21:04 -0800)]
Makefile:  Install new rdb-fuse.8 man page

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agobuild: Add new rbd-fuse package
Gary Lowell [Fri, 1 Feb 2013 04:35:26 +0000 (20:35 -0800)]
build:  Add new rbd-fuse package

rdb-fuse is a new facility to map ceph rdb images to files.

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoRevert "Don't install rbd-fuse binary"
Danny Al-Gaaf [Wed, 30 Jan 2013 18:00:40 +0000 (19:00 +0100)]
Revert "Don't install rbd-fuse binary"

This reverts commit 35e5d74e5c5786bc91df5dc10b5c08c77305df4e.

-> fix build instead

12 years agorbd-fuse: quick and dirty manpage
Dan Mick [Fri, 1 Feb 2013 02:43:29 +0000 (18:43 -0800)]
rbd-fuse: quick and dirty manpage

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agorbd-fuse: quick and dirty manpage
Dan Mick [Fri, 1 Feb 2013 02:43:29 +0000 (18:43 -0800)]
rbd-fuse: quick and dirty manpage

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agoceph-filestore-dump.cc: don't use po::value<string>()->required()
Danny Al-Gaaf [Thu, 31 Jan 2013 14:41:19 +0000 (15:41 +0100)]
ceph-filestore-dump.cc: don't use po::value<string>()->required()

Don't use po::value<string>()->required() since this breaks build on
RHEL/CentOs6. Check if the options are set as in the code of other
ceph parts.

Move some checks up in the code to validate options as soon
as possible. Remove printing 'help' twice, and check it first.

Fix type description.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agodoc: Added more detail to SSD section. Links to performance blogs.
John Wilkins [Fri, 1 Feb 2013 00:34:02 +0000 (16:34 -0800)]
doc: Added more detail to SSD section. Links to performance blogs.

fixes: #3960

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge pull request #37 from alram/master
Yehuda Sadeh [Fri, 1 Feb 2013 00:19:28 +0000 (16:19 -0800)]
Merge pull request #37 from alram/master

Add important note in doc/radosgw/config.rst

12 years agoAdd important note in doc/radosgw/config.rst 37/head
Alexandre Marangone [Thu, 31 Jan 2013 23:58:15 +0000 (15:58 -0800)]
Add important note in doc/radosgw/config.rst

For CentOS and similar, FastCgiWrapper is turned on by default.
This causes Apache to spawn radosgw processes.

12 years agoceph-filestore-dump.cc: don't use po::value<string>()->required()
Danny Al-Gaaf [Thu, 31 Jan 2013 14:41:19 +0000 (15:41 +0100)]
ceph-filestore-dump.cc: don't use po::value<string>()->required()

Don't use po::value<string>()->required() since this breaks build on
RHEL/CentOs6. Check if the options are set as in the code of other
ceph parts.

Move some checks up in the code to validate options as soon
as possible. Remove printing 'help' twice, and check it first.

Fix type description.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
Signed-off-by: David Zafman <david.zafman@inktank.com>
12 years agoceph.spec.in: fix file section for ceph-resource-agents
Danny Al-Gaaf [Wed, 30 Jan 2013 18:00:45 +0000 (19:00 +0100)]
ceph.spec.in: fix file section for ceph-resource-agents

Create needed dirs (/usr/lib/ocf/resource.d/ceph) for the ceph-resource-agents
subpackage.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: extend fix for libedit-devel on special SUSE versions
Danny Al-Gaaf [Wed, 30 Jan 2013 18:00:44 +0000 (19:00 +0100)]
ceph.spec.in: extend fix for libedit-devel on special SUSE versions

Extend fix for libedit-devel on special SUSE versions, use ncurses
also on src/ocf/Makefile and src/java/Makefile

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: don't move libcephfs_jni files around
Danny Al-Gaaf [Wed, 30 Jan 2013 18:00:43 +0000 (19:00 +0100)]
ceph.spec.in: don't move libcephfs_jni files around

Don't move libcephfs_jni files around from %{_libdir} to /usr/lib/jni/
in the buildroot. They should be placed in %{_libdir} as all libs.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoceph.spec.in: move libcephfs_jni.so to ceph-devel
Danny Al-Gaaf [Wed, 30 Jan 2013 18:00:42 +0000 (19:00 +0100)]
ceph.spec.in: move libcephfs_jni.so to ceph-devel

Move libcephfs_jni.so to the ceph-devel package since so-files they
shouldn't be part of the library package.

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agoValidate format strings for CLS_ERR/CLS_LOG
Dan Mick [Thu, 31 Jan 2013 01:33:09 +0000 (17:33 -0800)]
Validate format strings for CLS_ERR/CLS_LOG

cls_log needed __attribute__((format(printf..)) to allow the compiler
to crosscheck format strings and arguments.  After adding that, there
needed to be a bunch of fixups for %ll, and a few changes for missing
arguments, etc. uncovered by the checking.

Fixes: #3970
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
12 years agoqa: update the rbd/concurrent.sh workunit
Alex Elder [Thu, 31 Jan 2013 12:47:59 +0000 (06:47 -0600)]
qa: update the rbd/concurrent.sh workunit

A few changes, now that a few rbd problems have been fixed.
First, the more substantive changes:
    - Generate a source file, and compare what's read back from rbd
      devices with the content of that file.
    - Write to the rbd device such that the written data spans
      an (assumed 4 MB) rbd object boundary, as well as starting
      and ending on non-page-aligned offsets.
    - Perform multiple reads on rbd devices: entirely within a range
      before any written data; beginning before but ending within
      written data; the exact written data (and validating what's
      read); beginning within written data but ending after it;
      reading after written data but within a written rbd object;
      and reading from an unwritten rbd object.
    - Have the sleep between iterations provide a non-integer value
      to avoid zero (or quantized) delays.

Also, some a little less substantive (but possibly informative):
    - Don't run with "set -x".  It produces a ton of noise that is
      not useful for this test.  This is an exerciser, looking
      really for system crashes during concurrent activity, and
      knowing which commands were (concurrently) active isn't going
      to help much in diagnosis.
    - Create two more directories, used to track the degree of
      concurrency (more or less) and the highest rbd id consumed.
      Files whose names are numbers are touched in each, and the
      highest at the end is the highest during the run.  This gets
      around issues passing environment info from sub-shells to the
      top-level shell.  As a bonus, it offers a better chance of
      avoiding problems due to concurrent update.
    - NAMESDIR is renamed NAMES_DIR, and it (and the others) is
      set up in the setup() function.
    - Increase the concurrency and iteration counts.
    - Move the default definitions before the ceph secrets stuff

Signed-off-by: Alex Elder <elder@inktank.com>
12 years agoAdd ceph-filestore-dump to the packaging
David Zafman [Thu, 31 Jan 2013 02:50:07 +0000 (18:50 -0800)]
Add ceph-filestore-dump to the packaging

Feature: #3890

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
12 years agodoc: v0.56.2 release notes
Sage Weil [Wed, 30 Jan 2013 23:41:39 +0000 (15:41 -0800)]
doc: v0.56.2 release notes

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: create tool to extract pg info and pg log from filestore
David Zafman [Wed, 30 Jan 2013 02:21:51 +0000 (18:21 -0800)]
osd: create tool to extract pg info and pg log from filestore

New application ceph-filestore-dump created that mounts filstore
and can dump info or log in JSON when an OSD is not running.

Feature: #3890

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMove read_log() function to prep for next commit
David Zafman [Wed, 30 Jan 2013 01:59:45 +0000 (17:59 -0800)]
Move read_log() function to prep for next commit

Signed-off-by: David Zafman <david.zafman@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoPGMap: fix -Wsign-compare warning
Danny Al-Gaaf [Wed, 30 Jan 2013 17:52:24 +0000 (18:52 +0100)]
PGMap: fix -Wsign-compare warning

Fix -Wsign-compare compiler warning:

mon/PGMap.cc: In member function 'void PGMap::apply_incremental
 (CephContext*, const PGMap::Incremental&)':
mon/PGMap.cc:247:30: warning: comparison between signed and
 unsigned integer expressions [-Wsign-compare]

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
12 years agotest_libcephfs: fix xattr test
Sage Weil [Wed, 30 Jan 2013 19:32:23 +0000 (11:32 -0800)]
test_libcephfs: fix xattr test

Ignore the ceph.*.layout xattrs.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoqa: add test for rbd map and snapshots
Sage Weil [Wed, 30 Jan 2013 09:06:03 +0000 (01:06 -0800)]
qa: add test for rbd map and snapshots

This tests for the behavior reported in #3964.  It passes on the current
code, but fails on 3.2 in squeeze (and 32-bit?).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'gh/next'
Sage Weil [Wed, 30 Jan 2013 09:05:07 +0000 (01:05 -0800)]
Merge remote-tracking branch 'gh/next'

12 years agocls_rbd, cls_rgw: use PRI*64 when printing/logging 64-bit values
Dan Mick [Wed, 30 Jan 2013 07:05:49 +0000 (23:05 -0800)]
cls_rbd, cls_rgw: use PRI*64 when printing/logging 64-bit values

caused segfaults in 32-bit build

Fixes: #3961
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agomds: move lexical_cast and assert re-#include to the top
Sage Weil [Wed, 30 Jan 2013 03:48:25 +0000 (19:48 -0800)]
mds: move lexical_cast and assert re-#include to the top

We should keep the re-#includes immediately following the offender, and
documented.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoDon't install rbd-fuse binary
Dan Mick [Wed, 30 Jan 2013 03:00:27 +0000 (19:00 -0800)]
Don't install rbd-fuse binary

fixes packaging warnings

Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agomds/Server.cc: fix warring assert.h's
Dan Mick [Wed, 30 Jan 2013 02:41:20 +0000 (18:41 -0800)]
mds/Server.cc: fix warring assert.h's

New include boost/lexical_cast.hpp apparently drags in the system
assert.h on quantal and squeeze at least, breaking our careful
assert.h; re-include our file to fix it back

Fixes: #3957
Signed-off-by: Dan Mick <dan.mick@inktank.com>
12 years agomon: require name for 'auth add ...' command
Sage Weil [Wed, 30 Jan 2013 02:41:52 +0000 (18:41 -0800)]
mon: require name for 'auth add ...' command

Otherwise we interpret the empty string as 'unknown.'.

Fixes: #3956
Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoMerge remote-tracking branch 'origin/wip-fuse-create-fix'
Greg Farnum [Wed, 30 Jan 2013 01:07:49 +0000 (17:07 -0800)]
Merge remote-tracking branch 'origin/wip-fuse-create-fix'

Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoinit-ceph: make ulimit -n be part of daemon command
Dan Mick [Tue, 29 Jan 2013 23:18:53 +0000 (15:18 -0800)]
init-ceph: make ulimit -n be part of daemon command

ulimit -n from 'max open files' was being set only on the machine
running /etc/init.d/ceph.  It needs to be added to the commands to
start the daemons, and run both locally and remotely.

Verified by examining /proc/<pid>/limits on local and remote hosts

Fixes: #3900
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Loïc Dachary <loic@dachary.org>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoMerge remote-tracking branch 'gh/wip-recovery-stats-b'
Sage Weil [Wed, 30 Jan 2013 00:34:21 +0000 (16:34 -0800)]
Merge remote-tracking branch 'gh/wip-recovery-stats-b'

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'wip-vxattr'
Sage Weil [Wed, 30 Jan 2013 00:26:57 +0000 (16:26 -0800)]
Merge branch 'wip-vxattr'

Reviewed-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoqa: add layout_vxattrs.sh test script
Sage Weil [Sat, 19 Jan 2013 19:33:04 +0000 (11:33 -0800)]
qa: add layout_vxattrs.sh test script

Test virtual xattrs for file and directory layouts.

TODO: create a data pool, add it to the fs, and make sure we can use it.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: allow dir layout/policy to be removed via removexattr on ceph.dir.layout
Sage Weil [Sat, 19 Jan 2013 18:11:18 +0000 (10:11 -0800)]
mds: allow dir layout/policy to be removed via removexattr on ceph.dir.layout

This lets a user remove a policy that was previously set on a dir.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomds: handle ceph.*.layout.* setxattr
Sage Weil [Sat, 19 Jan 2013 18:09:39 +0000 (10:09 -0800)]
mds: handle ceph.*.layout.* setxattr

Allow individual fields of file or dir layouts to be set via setxattr.

Signed-off-by: Sage Weil <sage@inktank.com>