]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agomds: fix race in connection accept; fix con replacement
Sage Weil [Tue, 10 Jul 2012 20:24:51 +0000 (13:24 -0700)]
mds: fix race in connection accept; fix con replacement

We solve two problems with this patch.  The first is that the messenger
will now reuse an existing session's Connection with a new connection,
which means that we don't want to change session->connection when we
are validating an authorizer.  Instead, set (but do not change) it.

We also want to avoid a race where:

 - mds recovers, replays Sessions with no con's
 - multiple connection attempts for the same session race in the msgr
 - both are authorized, but out of order
 - Session->connection gets set to the losing attempt's Connection*

Instead, we take advantage of an accept event that is called only for
accepted winners.

Signed-off-by: Sage Weil <sage@inktank.com>
fixup

13 years agomsgr: queue accept event when pipe is accepted
Sage Weil [Tue, 10 Jul 2012 20:33:38 +0000 (13:33 -0700)]
msgr: queue accept event when pipe is accepted

Queue an event when an incoming connection is accepted.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsg/DispatchQueue: queue and deliver accept events
Sage Weil [Tue, 10 Jul 2012 20:32:10 +0000 (13:32 -0700)]
msg/DispatchQueue: queue and deliver accept events

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodispatcher: new 'accept' event type
Sage Weil [Tue, 10 Jul 2012 20:20:30 +0000 (13:20 -0700)]
dispatcher: new 'accept' event type

Create a new event type when we successfully accept a connection.  This is
distinct from the authorizor verification, which may happen for multiple
racing connection attempts.  In contrast, this will only happen on those
that win the race(s).  I don't think this is that important for stateless
servers (OSD, MON), but it is important for the MDS to ensure that it keeps
its Session con reference pointing to the most recently-successful
connection attempt.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: drop unnecessary (un)locking on queuing connection events
Sage Weil [Mon, 9 Jul 2012 17:05:12 +0000 (10:05 -0700)]
msgr: drop unnecessary (un)locking on queuing connection events

This used to be necessary because the pipe_lock was used when queueing
the pipe in the dispatch queue.  Now that is handled by IncomingQueue's
own lock, so these can be removed.

By no longer dropping the lock, we eliminate a whole category of potential
hard-to-debug races.  (Not that any were observed, but now we dno't need to
worry about them.)

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: move dispatch thread into DispatchQueue
Sage Weil [Thu, 5 Jul 2012 03:47:54 +0000 (20:47 -0700)]
msgr: move dispatch thread into DispatchQueue

The DispatchQueue class now completely owns message delivery.  This is
cleaner and lets us drop the redundant destination_stopped flag from
msgr (DQ has its own stop flag).

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: simplify checks for queueing connection events
Sage Weil [Mon, 9 Jul 2012 17:06:55 +0000 (10:06 -0700)]
msgr: simplify checks for queueing connection events

Looking through git history it is not clear exactly how these checks
came to be.  They seem to have grown during the multiple-entity-per-rank
transition a few years back.  I'm not fully convinced they are necessary,
but we will keep them regardless.

Push checks into DispatchQueue and look at the local stop flag to
determine whether these events should be queued.  This moves us away from
the kludgey SimpleMessenger::destination_stopped flag (which will soon
be removed).

Also move the refcount futzing into the DispatchQueue methods.  This makes
the callers much simpler.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: remove unnecessary accept check
Sage Weil [Tue, 3 Jul 2012 04:54:58 +0000 (21:54 -0700)]
msgr: remove unnecessary accept check

We don't need to worry about racing with shutdown here; the cleanup
procedure will stop the accepter thread before cleaning up all the
pipes.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: remove obsolete dead path
Sage Weil [Tue, 3 Jul 2012 04:49:32 +0000 (21:49 -0700)]
msgr: remove obsolete dead path

This hasn't triggered in years.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: uninline ctor and dtor
Sage Weil [Tue, 3 Jul 2012 04:34:11 +0000 (21:34 -0700)]
msgr: uninline ctor and dtor

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: move Pipe, DispatchQueue into separate files
Sage Weil [Tue, 3 Jul 2012 01:23:46 +0000 (18:23 -0700)]
msgr: move Pipe, DispatchQueue into separate files

These don't need to be subclasses of SimpleMessenger.  Separate!

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: simplify IncomingQueue creation, pointers
Sage Weil [Sun, 1 Jul 2012 04:19:05 +0000 (21:19 -0700)]
msgr: simplify IncomingQueue creation, pointers

 * create it via DispatchQueue
 * keep pointer to parent DispatchQueue
 * drop now-useless contextual arguments to most methods

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: use local IncomingQueue instead of Pipe
Sage Weil [Sun, 1 Jul 2012 04:11:10 +0000 (21:11 -0700)]
msgr: use local IncomingQueue instead of Pipe

Simpler, cleaner.  No need for the rest of the Pipe crap.  We just need to
queue messages for ourselves.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: use explicit Connection for messages sent to ourself
Sage Weil [Sun, 1 Jul 2012 03:52:42 +0000 (20:52 -0700)]
msgr: use explicit Connection for messages sent to ourself

Move to an explicit Connection for messages sent to ourselves, instead of
using the one on the local_pipe (which we'll remove shortly).

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: take over existing Connection on Pipe replacement
Sage Weil [Tue, 10 Jul 2012 20:18:27 +0000 (13:18 -0700)]
msgr: take over existing Connection on Pipe replacement

If a new pipe/socket is taking over an existing session, it should also
take over the Connection* associated with the existing session.  Because
we cannot clear existing->connection_state, we just take another reference.

Clean up the comments a bit while we're here.

This affects MDS<->client sessions when reconnecting after a socket fault.
It probably also affects intra-cluster (osd/osd, mds/mds, mon/mon)
sessions as well, but I did not confirm that.

Backport: argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: restart_queue when replacing existing pipe and taking over the queue
Sage Weil [Mon, 2 Jul 2012 00:23:28 +0000 (17:23 -0700)]
msgr: restart_queue when replacing existing pipe and taking over the queue

The queue may have been previously stopped (by discard_queue()), and needs
to be restarted.

Fixes consistent failures from the mon_recovery.py integration tests.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: choose incoming connection if ours is STANDBY
Sage Weil [Sun, 1 Jul 2012 22:37:31 +0000 (15:37 -0700)]
msgr: choose incoming connection if ours is STANDBY

If the connect_seq matches, but our existing connection is in STANDBY, take
the incoming one.  Otherwise, the other end will wait indefinitely for us
to connect but we won't.

Alternatively, we could "win" the race and trigger a connection by sending
a keepalive (or similar), but that is more work; we may as well accept the
incoming connection we have now.

This removes STANDBY from the acceptable WAIT case states.  It also keeps
responsibility squarely on the shoulders of the peer with something to
deliver.

Without this patch, a 3-osd vstart cluster with
'ms inject socket failures = 100' and rados bench write -b 4096 would start
generating slow request warnings after a few minutes due to the osds
failing to connect to each other.  With the patch, I complete a 10 minute
run without problems.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: preserve incoming message queue when replacing pipes
Sage Weil [Fri, 29 Jun 2012 00:50:47 +0000 (17:50 -0700)]
msgr: preserve incoming message queue when replacing pipes

If we replace an existing pipe with a new one, move the incoming queue
of messages that have not yet been dispatched over to the new Pipe so that
they are not lost.  This prevents messages from being lost.

Alternatively, we could set in_seq = existing->in_seq - existing->in_qlen,
but that would make the other end resend those messages, which is a waste
of bandwidth.

Very easy to reproduce the original bug with 'ms inject socket failures'.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: move dispatch_entry into DispatchQueue class
Sage Weil [Fri, 29 Jun 2012 00:45:24 +0000 (17:45 -0700)]
msgr: move dispatch_entry into DispatchQueue class

A bit cleaner.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: move incoming queue to separate class
Sage Weil [Fri, 29 Jun 2012 00:38:34 +0000 (17:38 -0700)]
msgr: move incoming queue to separate class

This extricates the incoming queue and its funky relationship with
DispatchQueue from Pipe and moves it into IncomingQueue.  There is now a
single IncomingQueue attached to each Pipe.  DispatchQueue is now no
longer tied to Pipe.

This modularizes the code a bit better (tho that is still a work in
progress) and (more importantly) will make it possible to move the
incoming messages from one pipe to another in accept().

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: make D_CONNECT constant non-zero, fix ms_handle_connect() callback
Sage Weil [Thu, 28 Jun 2012 00:06:40 +0000 (17:06 -0700)]
msgr: make D_CONNECT constant non-zero, fix ms_handle_connect() callback

A while ago we inadvertantly broke ms_handle_connect() callbacks because
of a check for m being non-zero in the dispatch_entry() thread.  Adjust the
enums so that they get delivered again.

This fixes hangs when, for example, the ceph tool sends a command, gets a
connection reset, and doesn't get the connect callback to resend after
reconnecting to a new monitor.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: fix pipe replacement assert
Sage Weil [Wed, 27 Jun 2012 00:10:40 +0000 (17:10 -0700)]
msgr: fix pipe replacement assert

We may replace an existing pipe in the STANDBY state if the previous
attempt failed during accept() (see previous patches).

This might fix #1378.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: do not try to reconnect con with CLOSED pipe
Sage Weil [Wed, 27 Jun 2012 00:07:31 +0000 (17:07 -0700)]
msgr: do not try to reconnect con with CLOSED pipe

If we have a con with a closed pipe, drop the message.  For lossless
sessions, the state will be STANDBY if we should reconnect.  For lossy
sessions, we will end up with CLOSED and we *should* drop the message.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsgr: move to STANDBY if we replace during accept and then fail
Sage Weil [Wed, 27 Jun 2012 00:06:41 +0000 (17:06 -0700)]
msgr: move to STANDBY if we replace during accept and then fail

If we replace an existing pipe during accept() and then fail, move to
STANDBY so that our connection state (connect_seq, etc.) is preserved.
Otherwise, we will throw out that information and falsely trigger a
RESETSESSION on the next connection attempt.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: initialize quorum_features
Sage Weil [Mon, 2 Jul 2012 23:05:16 +0000 (16:05 -0700)]
mon: initialize quorum_features

This could cause us to incorrectly encode new features into the monstore
that an old mon won't understand.

This is overly conservative; we probably need to persist the set of quorum
features that are supported and use those.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoOSD::do_command: unlock pg only if we had it
Samuel Just [Mon, 2 Jul 2012 16:51:37 +0000 (09:51 -0700)]
OSD::do_command: unlock pg only if we had it

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agoMOSDSubOp: set hobject_incorrect_pool in decode_payload
Samuel Just [Mon, 2 Jul 2012 16:49:52 +0000 (09:49 -0700)]
MOSDSubOp: set hobject_incorrect_pool in decode_payload

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agofilestore: initialize m_filestore_do_dump
Sage Weil [Mon, 2 Jul 2012 14:10:33 +0000 (07:10 -0700)]
filestore: initialize m_filestore_do_dump

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoosdmap: check new pool name on rename
Sage Weil [Sat, 30 Jun 2012 02:56:07 +0000 (19:56 -0700)]
osdmap: check new pool name on rename

Ensure the new pool name doesn't already exist, both in the current and
project map.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoosd: handle pool name changes properly
Sage Weil [Sat, 30 Jun 2012 02:54:35 +0000 (19:54 -0700)]
osd: handle pool name changes properly

 * Remove the old name from the name->id map.

Fixes: #2676
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: 'osd pool rename <oldname> <newname>'
Sage Weil [Fri, 29 Jun 2012 21:51:32 +0000 (14:51 -0700)]
mon: 'osd pool rename <oldname> <newname>'

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agorest-bench: mark request as complete later
Yehuda Sadeh [Wed, 27 Jun 2012 00:16:11 +0000 (17:16 -0700)]
rest-bench: mark request as complete later

We marked a request as complete in the callback, however
it might be that we're still inside S3_runall_request_context()
which means that request is not really complete yet.
Possibly fixes bug #2652.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoDBObjectMap: clones must inherit spos from parent
Samuel Just [Thu, 28 Jun 2012 01:09:37 +0000 (18:09 -0700)]
DBObjectMap: clones must inherit spos from parent

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agofilestore: sync object_map object in lfn_remove when nlink > 1
Samuel Just [Wed, 27 Jun 2012 22:16:42 +0000 (15:16 -0700)]
filestore: sync object_map object in lfn_remove when nlink > 1

In the following sequence:

1) create (a, 1)
2) setattr (a, 1)
3) link (a, 1), (b, 1)
4) remove (a, 1)

If we play 1-4 and then replay 1-4 again, we will end up removing
(b, 1)'s attributes since nlink for (a, 1) the second time through
is 1.  We fix this by marking spos on the object_map header for
(a, 1) when we remove (a, 1) but not eh attributes.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agodebian: move metadata server into ceph-mds
Sage Weil [Mon, 18 Jun 2012 16:29:48 +0000 (09:29 -0700)]
debian: move metadata server into ceph-mds

Also adjust the recommends and depends, so that libcephfs1 and ceph-fuse
hang off of ceph-mds instead of ceph.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodebian: move mount.ceph and cephfs into ceph-fs-common
Sage Weil [Mon, 18 Jun 2012 16:20:40 +0000 (09:20 -0700)]
debian: move mount.ceph and cephfs into ceph-fs-common

Based on patches from Laszlo Boszormenyi (GCS) <gcs@debian.hu>.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodebian: arch linux-any
Sage Weil [Mon, 18 Jun 2012 16:15:56 +0000 (09:15 -0700)]
debian: arch linux-any

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodebian: build with libnss instead of crypto++
Laszlo Boszormenyi (GCS) [Sat, 16 Jun 2012 20:39:56 +0000 (13:39 -0700)]
debian: build with libnss instead of crypto++

Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
13 years agodoc/config-cluster/authentication: keyring default locations, simplify key management
Sage Weil [Tue, 12 Jun 2012 19:47:57 +0000 (12:47 -0700)]
doc/config-cluster/authentication: keyring default locations, simplify key management

- keyrings have new default locations that everyone should use.
- the user key setup is vastly simplified if you use the
  'ceph auth get-or-create' command.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: MonmapMonitor: Use default port when the specified on 'add' is zero
Joao Eduardo Luis [Wed, 27 Jun 2012 23:29:24 +0000 (00:29 +0100)]
mon: MonmapMonitor: Use default port when the specified on 'add' is zero

Fixes a bug triggered by using the ceph tool to 'mon add' with a port set
to zero. We now default to the monitor's default port (6789) instead, and
we will fail if that port is already assigned to some other monitor.

Fixes: bug #2661
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
13 years agoOSD: disconnect_session_watches: handle race with watch disconnect
Samuel Just [Tue, 26 Jun 2012 17:38:20 +0000 (10:38 -0700)]
OSD: disconnect_session_watches: handle race with watch disconnect

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Tested-by: Stefan Priebe <s.priebe@profihost.ag>
13 years agomon: don't tick the PaxosServices if we are currently slurping.
Greg Farnum [Mon, 25 Jun 2012 20:04:15 +0000 (13:04 -0700)]
mon: don't tick the PaxosServices if we are currently slurping.

They aren't prepared to deal with the on-disk state being inconsistent.

Signed-off-by: Greg Farnum <greg@inktank.com>
13 years agoobjecter: do not feed session to op_submit()
Sage Weil [Wed, 20 Jun 2012 18:07:29 +0000 (11:07 -0700)]
objecter: do not feed session to op_submit()

The linger_send() method was doing this, but it is problematic because the
new Op doesn't get its pgid or acting vector set correctly.  The result is
that the request goes to the right OSD, but has the wrong pgid, and makes
the OSD complain about misdirected requests and drop it on the floor.  It
didn't affect the test results because we weren't testing whether the
watch was working in that case.

Instead, we'll just recalculate and get the same value the parent linger
op did.  Which is fine, and goes through all the usual code paths so
nothing is missed.

Also, increment num_homeless_ops before we recalc_op_target(), so that we
don't (harmlessly, but confusingly) underflow.

Fixes: #2022
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoObjectStore::Transaction: initialize pool_override in all constructors
Samuel Just [Sun, 24 Jun 2012 20:30:53 +0000 (13:30 -0700)]
ObjectStore::Transaction: initialize pool_override in all constructors

use_pool_override and pool_override weren't initialized in these two
constructors.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agoosd_types.cc: remove hobject_t decode asserts
Samuel Just [Fri, 22 Jun 2012 00:08:20 +0000 (17:08 -0700)]
osd_types.cc: remove hobject_t decode asserts

These asserts were useful for ensuring that pool is passed
in in the correct places, but they prevent the encoder
testing from working.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agomon: note that monmap may be reencoded later
Sage Weil [Thu, 21 Jun 2012 14:33:47 +0000 (07:33 -0700)]
mon: note that monmap may be reencoded later

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: encoding new monmap using quorum feature set
Sage Weil [Thu, 21 Jun 2012 14:31:47 +0000 (07:31 -0700)]
mon: encoding new monmap using quorum feature set

It is probably unlikely that someone will expand the mon cluster with a
mixed feature set, but we know the quorum features here, so we should use
them.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: conditionally encode mon features for remote mon
Sage Weil [Thu, 21 Jun 2012 14:27:49 +0000 (07:27 -0700)]
mon: conditionally encode mon features for remote mon

The only time we encode these is when forwarding messages.  Encoding using
the destination's feature set.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: conditionally encode PGMap[::Incremental] with quorum features
Sage Weil [Thu, 21 Jun 2012 14:23:56 +0000 (07:23 -0700)]
mon: conditionally encode PGMap[::Incremental] with quorum features

This allows a mon cluster to transition to the new encoding during a
rolling upgrade.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: conditionally encode auth incremental with quorum feature bits
Sage Weil [Thu, 21 Jun 2012 03:41:17 +0000 (20:41 -0700)]
mon: conditionally encode auth incremental with quorum feature bits

If the quorum does not yet all have the MONENC feature, stick to the old
encoding.

It might be more polite to require a super-quorum before switching over,
and take note so that thereafter we can stick to the new encoding, but
that has more moving parts and I'm not sure it's worth the complexity.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: track intersection of quorum member features
Sage Weil [Thu, 21 Jun 2012 03:33:41 +0000 (20:33 -0700)]
mon: track intersection of quorum member features

When we form a quorum, also note the intersection of the quorum members'
feature bits.  This will inform decisions about what encodings we use.

This is an imperfect strategy because the quorum may change, and we may
have a mon with old code join in and not understand what is going on.
However, it does ensure that a majority of the members run new code, so in
the absence of other failures we can make progress.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: conditionally encode old monmap when peer lacks feature
Sage Weil [Thu, 21 Jun 2012 02:08:34 +0000 (19:08 -0700)]
mon: conditionally encode old monmap when peer lacks feature

This allows a rolling upgrade from 0.47.2 to 0.48.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoOSD,PG,ObjectStore: handle messages with old hobject_t encoding
Samuel Just [Wed, 20 Jun 2012 19:55:38 +0000 (12:55 -0700)]
OSD,PG,ObjectStore: handle messages with old hobject_t encoding

Messages that embed an hobject_t need to have the pool field fixed
on messages from old peers.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agologrotate: reload all upstart instances
Sage Weil [Thu, 21 Jun 2012 19:42:53 +0000 (12:42 -0700)]
logrotate: reload all upstart instances

upstart doesn't let you wildcard all instances of a given job, so we
slog through initctl list output, and reload any running daemons.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Tommi Virtanen <tv@inktank.com>
13 years agoMerge remote-tracking branch 'gh/stable' into next
Sage Weil [Thu, 21 Jun 2012 15:20:17 +0000 (08:20 -0700)]
Merge remote-tracking branch 'gh/stable' into next

13 years agov0.47.3 v0.47.3
Sage Weil [Wed, 20 Jun 2012 17:57:41 +0000 (10:57 -0700)]
v0.47.3

13 years agofilestore: disable 'filestore fiemap' by default
Sage Weil [Fri, 15 Jun 2012 17:00:54 +0000 (10:00 -0700)]
filestore: disable 'filestore fiemap' by default

We've seen this failing on both btrfs (Guido) and XFS (Oliver).  This works
around #2535.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoOSD: clear_temp: split delete into many transactions
Samuel Just [Tue, 19 Jun 2012 21:29:48 +0000 (14:29 -0700)]
OSD: clear_temp: split delete into many transactions

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agorgw: set s->header_ended before flushing formatter
Yehuda Sadeh [Mon, 18 Jun 2012 21:44:38 +0000 (14:44 -0700)]
rgw: set s->header_ended before flushing formatter

otherwise we don't account the formatter in s->bytes_sent.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: log user and not bucket owner for service operations
Yehuda Sadeh [Mon, 18 Jun 2012 21:28:25 +0000 (14:28 -0700)]
rgw: log user and not bucket owner for service operations

For operations that are done on the service (e.g., list buckets)
we need to log the user that did the operation, and not the bucket
owner.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agorgw: initalize s->enable_usage_log
Yehuda Sadeh [Mon, 18 Jun 2012 21:27:51 +0000 (14:27 -0700)]
rgw: initalize s->enable_usage_log

Missing initialization, we ended up not logging every operation.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoosd: use derr (instead of cerr) for convertfs
Sage Weil [Tue, 19 Jun 2012 17:12:40 +0000 (10:12 -0700)]
osd: use derr (instead of cerr) for convertfs

This will appear in the log *and* stderr (if we're running in the
foreground).

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoosd: close stderr on daemonize
Sage Weil [Tue, 19 Jun 2012 17:11:01 +0000 (10:11 -0700)]
osd: close stderr on daemonize

This spams stderr in an ugly way.  Users should look at the logs.

In particular, filestore upgrades spam the console, which is unpleasant.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoPG: improve find_best_info
Samuel Just [Tue, 19 Jun 2012 16:11:57 +0000 (09:11 -0700)]
PG: improve find_best_info

07f853db3982e68b952a337cf91cbf7ec0709de9 is actually too conservative,
it suffices to find any info with a last_update of at least the least
last_update from the last period to go active.  An info from a previous
interval is acceptable if the last interval never reported a commited
operation and thus still has the same last_update.

Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agoPG: reg_last_pg_scrub on pg resurrection
Samuel Just [Mon, 18 Jun 2012 16:26:12 +0000 (09:26 -0700)]
PG: reg_last_pg_scrub on pg resurrection

This may solve the unreg_last_pg_scrub assert.

see #2453.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agoceph_osd: move auto-upgrade to after fork
Samuel Just [Mon, 18 Jun 2012 21:02:28 +0000 (14:02 -0700)]
ceph_osd: move auto-upgrade to after fork

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agofilestore: make disk format upgrade warning less scary, more informative
Sage Weil [Mon, 18 Jun 2012 21:07:20 +0000 (14:07 -0700)]
filestore: make disk format upgrade warning less scary, more informative

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Sam Just <sam.just@inktank.com>
13 years agomon: include quorum in ceph status
Sage Weil [Mon, 18 Jun 2012 21:02:29 +0000 (14:02 -0700)]
mon: include quorum in ceph status

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomon: gracefully handle slow 'ceph -w' clients
Sage Weil [Mon, 18 Jun 2012 21:00:06 +0000 (14:00 -0700)]
mon: gracefully handle slow 'ceph -w' clients

If we are sending log updates to a client (ceph -w), and they are far
enough behind to drop behind first_committed, include a friendly message
in their stream but continue.

Drop useless return value from _create_sub_incremental().  Assert that we
can read the state file.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoPG: best_info must have a last_epoch_started as high as any other info
Samuel Just [Sat, 16 Jun 2012 00:09:42 +0000 (17:09 -0700)]
PG: best_info must have a last_epoch_started as high as any other info

We disregard incomplete infos during find_best_info, but we can't an
info with a last_epoch_started less that of the incomplete info.

This should avoid cases like #2462.  In that case, it appears that
a peer with empty info/log was chosen as authoritative even though
there was a non-empty incomplete peer.

Signed-off-by: Samuel Just <sam.just@inktank.com>
13 years agodebian: fix python-ceph depends
Laszlo Boszormenyi (GCS) [Sat, 16 Jun 2012 20:49:41 +0000 (13:49 -0700)]
debian: fix python-ceph depends

Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
13 years agodebian: update homepage url
Laszlo Boszormenyi (GCS) [Sat, 16 Jun 2012 20:39:20 +0000 (13:39 -0700)]
debian: update homepage url

Signed-off-by: Laszlo Boszormenyi (GCS) <gcs@debian.hu>
13 years agofilestore: fix 'omap' collection skipping
Sage Weil [Sun, 17 Jun 2012 20:20:59 +0000 (13:20 -0700)]
filestore: fix 'omap' collection skipping

The if/else if/... structure was skipping this test if the file system
didn't support d_type.

Fixes: #2598
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agorun-cli-test: use new pip incantation
Sage Weil [Fri, 15 Jun 2012 21:48:22 +0000 (14:48 -0700)]
run-cli-test: use new pip incantation

http://www.pip-installer.org/en/latest/news.html#id1

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agocls_rbd: do not pass snapid_t to vargs
Sage Weil [Sun, 17 Jun 2012 16:07:41 +0000 (09:07 -0700)]
cls_rbd: do not pass snapid_t to vargs

On squeeze,

warning: cls_rbd.cc:534: cannot pass objects of non-POD type ‘struct snapid_t’ through ‘...’; call will abort at runtime

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agomsg: fix buffer overflow in ipv6 addr parsing
Sage Weil [Sun, 17 Jun 2012 03:09:04 +0000 (20:09 -0700)]
msg: fix buffer overflow in ipv6 addr parsing

Noticed because of failing i386 unit tests for long addrs; x86_64 passed
fine.  Sigh.  FTR, the failing address was

2001:0db8:85a3:0000:0000:8a2e:0370:7334

Sadly the full length addrs don't turn it up on x86_64, still, nor does
valgrind notice.  But, this fixes it on i386.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agocls_rbd: drop useless snapshot metadata helpers
Sage Weil [Sat, 16 Jun 2012 14:33:19 +0000 (07:33 -0700)]
cls_rbd: drop useless snapshot metadata helpers

Now that cls_rbd_snap is encodable, we don't need these helpers; get_key()
will suffice.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agocls_rbd: use encode macros for on-disk snap metadata
Sage Weil [Thu, 14 Jun 2012 23:12:49 +0000 (16:12 -0700)]
cls_rbd: use encode macros for on-disk snap metadata

This will let us version this encoding later when we add new information
and features, like a per-snap parent.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agofilejournal: make less noise about open failures
Sage Weil [Fri, 15 Jun 2012 21:48:22 +0000 (14:48 -0700)]
filejournal: make less noise about open failures

The callers report errors and pass up errors, so do not spam stderr with
this.  Fixes the confusion that sparked #2595.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agorgw: obj copy respects -metadata-directive
Yehuda Sadeh [Tue, 12 Jun 2012 21:42:03 +0000 (14:42 -0700)]
rgw: obj copy respects -metadata-directive

Fixes #2542. The old behavior just merged src object attrs
and provided attributes. The new (and correct) behavior looks
at the x-[amz|rgw|...]-metadata-directive and either copies
the source attrs, or replaces them with the provided attrs.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
13 years agoosd: optional verify that sparse_read holes are zero-filled
Sage Weil [Thu, 14 Jun 2012 19:51:07 +0000 (12:51 -0700)]
osd: optional verify that sparse_read holes are zero-filled

This should help us track down/verify #2535.  It seems to happen on several
different systems, but we haven't figured out which ones yet.

This detects the bug, but does not attempt to correct it.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agobuffer: add list and ptr is_zero() method
Sage Weil [Thu, 14 Jun 2012 19:34:46 +0000 (12:34 -0700)]
buffer: add list and ptr is_zero() method

Simple helper to check if a buffer is all zeros.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agofilestore: disable 'filestore fiemap' by default
Sage Weil [Fri, 15 Jun 2012 17:00:54 +0000 (10:00 -0700)]
filestore: disable 'filestore fiemap' by default

We've seen this failing on both btrfs (Guido) and XFS (Oliver).  This works
around #2535.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMerge branch 'wip-radosgw-upstart'
Sage Weil [Thu, 14 Jun 2012 22:17:03 +0000 (15:17 -0700)]
Merge branch 'wip-radosgw-upstart'

13 years agoradosgw: stop startup timer on failed start
Sage Weil [Thu, 14 Jun 2012 22:09:16 +0000 (15:09 -0700)]
radosgw: stop startup timer on failed start

This fixes crashes like

    -1> 2012-06-14 15:04:31.733009 7f544e18c780 -1 Couldn't init storage provider (RADOS)
     0> 2012-06-14 15:04:31.734110 7f544e18c780 -1 common/Timer.cc: In function 'SafeTimer::~SafeTimer()' thread 7f544e18c780 time 2012-06-14 15:04:31.733020
common/Timer.cc: 57: FAILED assert(thread == __null)

 ceph version 0.47.2-481-g6f30f1f (commit:6f30f1fcdecd6c9390d4678c754dadd305165e3e)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x5399a0]
 2: (SafeTimer::~SafeTimer()+0x39) [0x533e77]
 3: (main()+0x6f5) [0x51bc9d]
 4: (__libc_start_main()+0xfd) [0x7f544b38eead]
 5: /home/sage/src/ceph/src/.libs/lt-radosgw() [0x4f9e09]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoupstart: make id charset include - _ and .
Sage Weil [Thu, 14 Jun 2012 22:04:07 +0000 (15:04 -0700)]
upstart: make id charset include - _ and .

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoradosgw: takes --id, but not -i
Sage Weil [Thu, 14 Jun 2012 22:03:46 +0000 (15:03 -0700)]
radosgw: takes --id, but not -i

The -i short versio doesn't work on 'client' code that tend to use it for
input files.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodoc: added qemu-img documentation for rbd.
John Wilkins [Thu, 14 Jun 2012 21:18:53 +0000 (14:18 -0700)]
doc: added qemu-img documentation for rbd.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
13 years agorgw: limit number of buckets per user
Yehuda Sadeh [Tue, 12 Jun 2012 06:31:09 +0000 (23:31 -0700)]
rgw: limit number of buckets per user

Adding a configurable max_buckets per user. Bucket creation
verifies that max_buckets has not reached.

Backport: dho
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
13 years agodoc: Added steps for OpenStack install with DevStack
John Wilkins [Thu, 14 Jun 2012 16:46:32 +0000 (09:46 -0700)]
doc: Added steps for OpenStack install with DevStack

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
13 years agodoc: fixed bash syntax error.
John Wilkins [Thu, 14 Jun 2012 16:35:17 +0000 (09:35 -0700)]
doc: fixed bash syntax error.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
13 years agoqa: disable xfstest 68 for now
Sage Weil [Thu, 14 Jun 2012 16:07:26 +0000 (09:07 -0700)]
qa: disable xfstest 68 for now

Stop the qa noise we fix #2410.  Looks like a freeze/thaw thing.

Maybe Jan's new freeze/thaw code will address this?  That's probably
wishful thinking.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoqa: disable xfstest 219 for now
Sage Weil [Thu, 14 Jun 2012 16:01:42 +0000 (09:01 -0700)]
qa: disable xfstest 219 for now

The cause of 219 failing is non-obvious.  Disable it for now.  :(

Avoids #2522.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: fix leveldb dep for system library case
Sage Weil [Wed, 13 Jun 2012 04:17:11 +0000 (21:17 -0700)]
Makefile: fix leveldb dep for system library case

We conditionally add this below only if using the bundled version.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: fix leveldb includes for system library case
Sage Weil [Wed, 13 Jun 2012 04:16:45 +0000 (21:16 -0700)]
Makefile: fix leveldb includes for system library case

Use the installed headers, not ours.

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodoc: Multi-monitor support means ceph::single_mon was renamed.
Tommi Virtanen [Wed, 13 Jun 2012 23:23:14 +0000 (16:23 -0700)]
doc: Multi-monitor support means ceph::single_mon was renamed.

This changed in ceph-cookbooks.git commit
8e56551b11fe28cc4f29f4fcdcf6c38516bdc833.

Signed-off-by: Tommi Virtanen <tv@inktank.com>
13 years agorbd: fix usage test
Sage Weil [Wed, 13 Jun 2012 18:24:19 +0000 (11:24 -0700)]
rbd: fix usage test

Fixes: #2347
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoMakefile: link gtest statically
Sage Weil [Wed, 13 Jun 2012 18:05:43 +0000 (11:05 -0700)]
Makefile: link gtest statically

The problem:

 - the unittests link against gtest, and gtest is not installed.  that's
   normally fine, but...
 - rbd and rados api unit tests link against gtest, and are installed
   by 'make install'.  they are needed for teuthology runs, etc.
 - if we build gtest as an .la library, we can only control whether *all*
   or *no* .la libraries are linked statically.
 - we want librados to be linked dynamically.

The solution:

 - build gtest as .a instead of a libtool library
 - link it statically, always.

Unit test binaries are bigger now.  Oh well...

Fixes: #2331
Signed-off-by: Sage Weil <sage@inktank.com>
13 years agodebian: install radosgw upstart configs, daemon dir
Sage Weil [Tue, 12 Jun 2012 20:40:43 +0000 (13:40 -0700)]
debian: install radosgw upstart configs, daemon dir

Signed-off-by: Sage Weil <sage@inktank.com>
13 years agoradosgw: upstart support
Sage Weil [Tue, 12 Jun 2012 20:39:57 +0000 (13:39 -0700)]
radosgw: upstart support

Like the other upstart configs, these assume the default value for
'rgw data'.  Same pattern as ceph-mon and ceph-mds.

Fixes: #2415
Signed-off-by: Sage Weil <sage@inktank.com>