]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agoosd: refuse to return data payload if request wrote anything
Sage Weil [Tue, 21 Feb 2012 05:11:46 +0000 (21:11 -0800)]
osd: refuse to return data payload if request wrote anything

Write operations aren't allowed to return a data payload because
we can't do so reliably. If the client has to resend the request
and it has already been applied, we will return 0 with no
payload.  Non-deterministic behavior is no good.

See #1765.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agotest/encoding/readable.sh: sh, not dash
Sage Weil [Mon, 20 Feb 2012 14:27:47 +0000 (06:27 -0800)]
test/encoding/readable.sh: sh, not dash

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'stable'
Sage Weil [Mon, 20 Feb 2012 03:36:00 +0000 (19:36 -0800)]
Merge branch 'stable'

13 years agomsgr: fix shutdown race again
Sage Weil [Mon, 20 Feb 2012 03:37:13 +0000 (19:37 -0800)]
msgr: fix shutdown race again

Only unlock once.  Sigh.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agov0.42 v0.42
Sage Weil [Sun, 19 Feb 2012 23:30:37 +0000 (15:30 -0800)]
v0.42

13 years agomsgr: fix accept shutdown race fault
Sage Weil [Sun, 19 Feb 2012 22:52:41 +0000 (14:52 -0800)]
msgr: fix accept shutdown race fault

Need to hold pipe_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: fix message discard on shutdown
Sage Weil [Sun, 19 Feb 2012 20:44:58 +0000 (12:44 -0800)]
mon: fix message discard on shutdown

Return true, so the messenger is happy, and drop the message reference.

Avoids an assert like

2012-02-19T12:36:05.102 INFO:teuthology.task.ceph.mon.2.err:ms_deliver_dispatch: fatal error: unhandled message 0x1b7b280 paxos(auth lease_ack lc 8 fc 1 pn 0 opn 0) v1 from mon.2 10.3.14.197:6789/0msg/Messenger.h: In function 'void Messenger::ms_deliver_dispatch(Message*)' thread 7fd7fe360700 time 2012-02-19 12:36:05.094713
2012-02-19T12:36:05.102 INFO:teuthology.task.ceph.mon.2.err:msg/Messenger.h: 143: FAILED assert(0)

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomds: use want_state to indicate shutdown
Sage Weil [Sun, 19 Feb 2012 15:41:47 +0000 (07:41 -0800)]
mds: use want_state to indicate shutdown

State gets DNE when we receive the first map.  And want_ makes more sense
anyway.  Fixes MDS startup.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: fix up argument to PG::init()
Sage Weil [Sun, 19 Feb 2012 06:17:35 +0000 (22:17 -0800)]
osd: fix up argument to PG::init()

Commit cefa55b288b40e17ade9875493dd94de52ac22bf moved PG initialization
into init(), but passed acting for both up and acting args.  This lead to
confusion between primary and replica.

Also fix debug print so that the output is useful.

Fixes: #2075, #2070
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoSimpleMessenger: drop unused sigint()
Sage Weil [Sun, 19 Feb 2012 05:49:35 +0000 (21:49 -0800)]
SimpleMessenger: drop unused sigint()

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomsgr: promote SimpleMessenger::Policy to Messenger::Policy
Sage Weil [Sun, 19 Feb 2012 05:48:50 +0000 (21:48 -0800)]
msgr: promote SimpleMessenger::Policy to Messenger::Policy

This is part of the generic interface, not specific to the implementation.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: ignore all msgr callbacks on shutdown, not just dispatch
Sage Weil [Sun, 19 Feb 2012 05:43:18 +0000 (21:43 -0800)]
mds: ignore all msgr callbacks on shutdown, not just dispatch

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: discard messages while shutting down
Sage Weil [Sun, 19 Feb 2012 05:37:09 +0000 (21:37 -0800)]
mon: discard messages while shutting down

Add SHUTDOWN state.  Ignore any msgr callbacks if set.

Fixes crash like

2012-02-18T21:57:58.912 INFO:teuthology.task.ceph:Shutting down mon daemons...
2012-02-18T21:57:58.912 DEBUG:teuthology.task.ceph.mon.a:waiting for process to exit
2012-02-18T21:57:58.913 INFO:teuthology.task.ceph.mon.a.err:2012-02-18 21:57:58.927759 7fe98dfa1700 mon.a@1(peon) e1 *** Got Signal Terminated ***
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err:*** Caught signal (Segmentation fault) **
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err: in thread 7fe98d7a0700
2012-02-18T21:57:59.014 INFO:teuthology.task.ceph.mon.a.err: ceph version 0.41-382-gc1db900 (commit:c1db9009c2cde9dc7ab8857b0d28a1b6d931e98a)
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 1: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x5b0871]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 2: (()+0xfb40) [0x7fe991a1eb40]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 3: (PerfCounters::set(int, unsigned long)+0x1a) [0x52008a]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 4: (PGMonitor::update_logger()+0x96) [0x4d4bf6]
2012-02-18T21:57:59.015 INFO:teuthology.task.ceph.mon.a.err: 5: (PGMonitor::update_from_paxos()+0xa70) [0x4e0980]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 6: (Monitor::_ms_dispatch(Message*)+0x143b) [0x47bd6b]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 7: (Monitor::ms_dispatch(Message*)+0x90) [0x489210]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 8: (SimpleMessenger::dispatch_entry()+0x89a) [0x53959a]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 9: (SimpleMessenger::DispatchThread::entry()+0x1c) [0x46358c]
2012-02-18T21:57:59.016 INFO:teuthology.task.ceph.mon.a.err: 10: (()+0x7971) [0x7fe991a16971]
2012-02-18T21:57:59.017 INFO:teuthology.task.ceph.mon.a.err: 11: (clone()+0x6d) [0x7fe9902a592d]

which is analogous to #2014.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomsgr: fix shutdown vs accept race
Sage Weil [Sat, 18 Feb 2012 21:45:37 +0000 (13:45 -0800)]
msgr: fix shutdown vs accept race

This is a kludge.  The real fix is to rewrite SimpleMessenger as a state
machine.

Fixes: #2073
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: drop all messages during suicide
Sage Weil [Sat, 18 Feb 2012 21:36:24 +0000 (13:36 -0800)]
mds: drop all messages during suicide

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge remote branch 'gh/wip-pg-states'
Sage Weil [Sat, 18 Feb 2012 22:00:50 +0000 (14:00 -0800)]
Merge remote branch 'gh/wip-pg-states'

13 years agoosd: update_stats() in GetInfo state start
Sage Weil [Fri, 17 Feb 2012 23:26:37 +0000 (15:26 -0800)]
osd: update_stats() in GetInfo state start

This is the first stage of peering.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: don't update_stats() on prec_replica_info
Sage Weil [Fri, 17 Feb 2012 23:26:06 +0000 (15:26 -0800)]
osd: don't update_stats() on prec_replica_info

Nothing changes here...

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agofilestore: hold journal_lock during
Sage Weil [Fri, 17 Feb 2012 21:59:08 +0000 (13:59 -0800)]
filestore: hold journal_lock during

Hold journal_lock during replay so that we don't stomp on variables like
op_seq and open_ops that the the commit thread cares about.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: only complete/deregister repop once
Sage Weil [Sat, 18 Feb 2012 00:23:50 +0000 (16:23 -0800)]
osd: only complete/deregister repop once

It's now possible to send the ack and deregister the repop before the
op_applied() happens.  And when that happens, we'll call eval_repop() once
more.  Don't do anything in that case.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'next'
Josh Durgin [Fri, 17 Feb 2012 22:31:44 +0000 (14:31 -0800)]
Merge branch 'next'

13 years agoman: regenerate man pages
Josh Durgin [Fri, 17 Feb 2012 22:11:18 +0000 (14:11 -0800)]
man: regenerate man pages

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoman: move man page fixes to rst
Josh Durgin [Fri, 17 Feb 2012 22:09:55 +0000 (14:09 -0800)]
man: move man page fixes to rst

83cf1b62fde525d068bc292c4a1ccc42199657ae and
e5f49104ab62ba7bc42cf6ecf41c9257b46585f7 updated the nroff output
but not the rst source.

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agodoc: fix snapshot creation/deletion syntax in rbd man page (trivial)
Florian Haas [Fri, 17 Feb 2012 20:15:15 +0000 (21:15 +0100)]
doc: fix snapshot creation/deletion syntax in rbd man page (trivial)

Creating a snapshot requires using "rbd snap create",
as opposed to just "rbd create". Also for purposes of
clarification, add note that removing a snapshot similarly
requires "rbd snap rm".

Thanks to Josh Durgin for the explanation on IRC.

Signed-off-by: Florian Haas <florian@hastexo.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoosd: make op_commit imply op_applied for purposes of repop completion
Sage Weil [Fri, 17 Feb 2012 21:48:02 +0000 (13:48 -0800)]
osd: make op_commit imply op_applied for purposes of repop completion

For repop completion, we want waitfor_ack and _commit to be empty.  For
replicas, a commit reply implies ack, so ack is always a subset of commit.
But for the local write, we wait for applied separately, so we can have
repops open where we sent the reply to the client but still have it open
and consuming memory.  And generating 'old request' warnings in the logs
(when the filestore is taking a long time to apply to the fs).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: add REMAPPED state
Sage Weil [Fri, 17 Feb 2012 21:46:11 +0000 (13:46 -0800)]
osd: add REMAPPED state

Set this bit whenever up != acting.  This tells you that the OSDMap is
explicitly remapping the PG to different nodes (than what CRUSH specified).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: refactor recovery completion
Sage Weil [Fri, 17 Feb 2012 21:19:57 +0000 (13:19 -0800)]
osd: refactor recovery completion

- rename is_all_update() -> needs_recovery(), reverse logic.
- drop up != acting check; that has nothing to do with
  recovery itself
- drop trigger in Active::react(const ActMap&)... it's nonsensical
- CompleteRecovery always leads to finish_recovery (or acting set change)

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: introduce RECOVERING pg state
Sage Weil [Fri, 17 Feb 2012 18:55:12 +0000 (10:55 -0800)]
osd: introduce RECOVERING pg state

Since clean now means not degraded, we need some other indication that
recovery has completed and we are "done" (given the current up/down state
of the OSDs).

Adding a 'recovering' state also makes it clearer to users that work is
being done, as opposed to the current situation, where they look for the
absense of 'clean'.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agopaxos: fix is_consistent() check
Sage Weil [Fri, 17 Feb 2012 18:23:12 +0000 (10:23 -0800)]
paxos: fix is_consistent() check

If our last_committed == 1, we don't need a separate stash.  This is the
logic that slurp() follows, so fix is_consistent() to match.

Fixes: #2077
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: change nested iterator name
Tom Callaway [Fri, 17 Feb 2012 17:14:16 +0000 (09:14 -0800)]
osd: change nested iterator name

Don't shadow the iterator variable.

Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
13 years agoadd missing #includes to build on gcc 4.7
Tom Callaway [Fri, 17 Feb 2012 17:14:56 +0000 (09:14 -0800)]
add missing #includes to build on gcc 4.7

Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
13 years agomds: comment out unused code in mds dump_pop_map
Tom Callaway [Fri, 17 Feb 2012 16:58:40 +0000 (08:58 -0800)]
mds: comment out unused code in mds dump_pop_map

Signed-off-by: Tom Callaway <spot@redhat.com>
Signed-off-by: David Nalley <david@gnsa.us>
13 years agoMerge branch 'next'
Sage Weil [Fri, 17 Feb 2012 05:00:49 +0000 (21:00 -0800)]
Merge branch 'next'

13 years agoosd: fix _activate_committed replica->primary message
Sage Weil [Wed, 15 Feb 2012 21:16:25 +0000 (13:16 -0800)]
osd: fix _activate_committed replica->primary message

Normally we take a fresh map reference in PG::lock().  However,
_activate_committed needs to make sure the map hasn't changed significantly
before acting.  In the case of #2068, the OSD map has moved forward and
the mapping has changed, but the PG hasn't processed that yet, and thus
mis-tags the MOSDPGInfo message.

Tag the message with the e epoch, and also pass down the primary's address
to send the message to the right location.

Fixes: #2068
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: skip threadpool pause on shutdown when blackholed
Sage Weil [Thu, 16 Feb 2012 23:18:58 +0000 (15:18 -0800)]
osd: skip threadpool pause on shutdown when blackholed

We can't pause the threadpools if they're blocked on a blackholed
filestore.  Instead, just call _exit().

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix _activate_committed replica->primary message
Sage Weil [Wed, 15 Feb 2012 21:16:25 +0000 (13:16 -0800)]
osd: fix _activate_committed replica->primary message

Normally we take a fresh map reference in PG::lock().  However,
_activate_committed needs to make sure the map hasn't changed significantly
before acting.  In the case of #2068, the OSD map has moved forward and
the mapping has changed, but the PG hasn't processed that yet, and thus
mis-tags the MOSDPGInfo message.

Tag the message with the e epoch, and also pass down the primary's address
to send the message to the right location.

Fixes: #2068
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: fix do not always clear DEGRADED/set CLEAN on recovery finish
Sage Weil [Wed, 15 Feb 2012 23:20:35 +0000 (15:20 -0800)]
osd: fix do not always clear DEGRADED/set CLEAN on recovery finish

Clean means we have exactly the right number of replicas and recovery is
complete.  Degraded means we do not have enough replicas, either because
recovery is in progress, or because acting is too small.

A consequence is that if we have a PG with len(up) == 1 but a pg_temp
mapping so that len(acting) == 2, it will be active and not clean.

Fixes: #2060
Signed-off-by: Sage Weil <sage@newdream.net>
Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoinit: Only check if auto start is disabled when the issued command is "start"
Wido den Hollander [Wed, 15 Feb 2012 15:20:16 +0000 (16:20 +0100)]
init: Only check if auto start is disabled when the issued command is "start"

This still makes sure daemons don't start on boot.

When auto start was disabled it would also prevent logrotate from doing it's job.

Signed-off-by: Wido den Hollander <wido@widodh.nl>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph.spec.in: Move libcls_*.so from -devel to base package
Holger Macht [Wed, 15 Feb 2012 16:29:09 +0000 (17:29 +0100)]
ceph.spec.in: Move libcls_*.so from -devel to base package

OSDs (src/osd/ClassHandler.cc) specifically look for libcls_*.so in
/usr/$libdir/rados-classes, so libcls_rbd.so and libcls_rgw.so need to
be shipped along with the base package.

Signed-off-by: Holger Macht <hmacht@suse.de>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoobjclass: add debug_objclass knob, default to off
Sage Weil [Wed, 15 Feb 2012 17:04:22 +0000 (09:04 -0800)]
objclass: add debug_objclass knob, default to off

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: reduce watch/notify debug noise
Sage Weil [Wed, 15 Feb 2012 17:03:28 +0000 (09:03 -0800)]
osd: reduce watch/notify debug noise

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomsgr: mark_all_down on shutdown
Sage Weil [Wed, 15 Feb 2012 16:09:32 +0000 (08:09 -0800)]
msgr: mark_all_down on shutdown

This ensures we destroy all the Pipes and discard their messages.  Among
other things, this can avoid

2012-02-15 03:16:46.385242 7fe712b9a700 mon.f@5(peon) e1 *** Got Signal Terminated ***
2012-02-15 03:16:46.470227 7fe712b9a700 mon.f@5(peon) e1 shutdown
msg/SimpleMessenger.h: In function 'virtual SimpleMessenger::Pipe::~Pipe()' thread 7fe716a37780 time 2012-02-15 03:16:46.471005
msg/SimpleMessenger.h: 234: FAILED assert(!i->second->is_on_list())
 ceph version 0.41-362-g40802ae (commit:40802ae883a94d205a8716065b80ad5d7ff57d12)
 1: (SimpleMessenger::Pipe::~Pipe()+0x199) [0x4669d9]
 2: (SimpleMessenger::~SimpleMessenger()+0x31) [0x552231]
 3: (main()+0x3026) [0x4614a6]
 4: (__libc_start_main()+0xfe) [0x7fe714dd6d8e]
 5: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e219]
 ceph version 0.41-362-g40802ae (commit:40802ae883a94d205a8716065b80ad5d7ff57d12)
 1: (SimpleMessenger::Pipe::~Pipe()+0x199) [0x4669d9]
 2: (SimpleMessenger::~SimpleMessenger()+0x31) [0x552231]
 3: (main()+0x3026) [0x4614a6]
 4: (__libc_start_main()+0xfe) [0x7fe714dd6d8e]
 5: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x45e219]

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: do not sync_and_flush if blackholed
Sage Weil [Wed, 15 Feb 2012 16:21:02 +0000 (08:21 -0800)]
osd: do not sync_and_flush if blackholed

If we have blackholed this will block forever.  In that case dont' bother.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoworkqueue: make pause/unpause count
Sage Weil [Wed, 15 Feb 2012 16:20:32 +0000 (08:20 -0800)]
workqueue: make pause/unpause count

We can pause() multiple times, and we need as many unpause()s to actually
resume work.

This resolves problems where we have two actors interested in pausing a
queue, both want to stop work, and they aren't interacting/coordinating.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: exit code 0 on SIGINT/SIGTERM
Sage Weil [Wed, 15 Feb 2012 06:05:36 +0000 (22:05 -0800)]
osd: exit code 0 on SIGINT/SIGTERM

This makes daemon-handler happy...

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agosignals: check write(2) return values
Sage Weil [Tue, 14 Feb 2012 17:09:39 +0000 (09:09 -0800)]
signals: check write(2) return values

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: semi-clean shutdown on signal
Sage Weil [Sun, 12 Feb 2012 22:35:03 +0000 (14:35 -0800)]
osd: semi-clean shutdown on signal

Make some effort to stop work in progress, remove pid file, and exit with
informative error code.

Note that this is much simpler than the shutdown() exit path; I'm not sure
whether a complete teardown is useful.  It's also difficult to maintain
and get right with everything else going on, and it's not clear that it's
worth the effort right now.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: remove some cruft
Sage Weil [Sun, 12 Feb 2012 22:12:44 +0000 (14:12 -0800)]
mds: remove some cruft

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: remove pidfile
Sage Weil [Sun, 12 Feb 2012 00:39:27 +0000 (16:39 -0800)]
mds: remove pidfile

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: do a clean shutdown on SIGINT/SIGTERM
Sage Weil [Sun, 12 Feb 2012 22:43:13 +0000 (14:43 -0800)]
mon: do a clean shutdown on SIGINT/SIGTERM

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: install async signal handlers for SIG{HUP,INT,TERM}
Sage Weil [Sun, 12 Feb 2012 00:38:06 +0000 (16:38 -0800)]
mon: install async signal handlers for SIG{HUP,INT,TERM}

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: install async signal handlers for SIG{HUP,INT,TERM}
Sage Weil [Sun, 12 Feb 2012 00:36:33 +0000 (16:36 -0800)]
osd: install async signal handlers for SIG{HUP,INT,TERM}

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomds: install async signal handlers for SIG{HUP,INT,TERM}
Sage Weil [Sun, 12 Feb 2012 00:33:51 +0000 (16:33 -0800)]
mds: install async signal handlers for SIG{HUP,INT,TERM}

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agosignal: remove unused/obsolete handle_shutdown_signal
Sage Weil [Sun, 12 Feb 2012 00:39:48 +0000 (16:39 -0800)]
signal: remove unused/obsolete handle_shutdown_signal

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agosignals: do not install default SIGHUP, SIGINT, SIGTERM handlers
Sage Weil [Sun, 12 Feb 2012 00:30:26 +0000 (16:30 -0800)]
signals: do not install default SIGHUP, SIGINT, SIGTERM handlers

These should be app specific and async.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agosignals: implement safe async signal handler framework
Sage Weil [Sat, 11 Feb 2012 17:45:06 +0000 (09:45 -0800)]
signals: implement safe async signal handler framework

Based on http://evbergen.home.xs4all.nl/unix-signals.html.

Instead of his design, though, we write single bytes, and create a pipe per
signal we have handlers registered for.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agodoc: Balance backticks.
Tommi Virtanen [Tue, 14 Feb 2012 23:52:55 +0000 (15:52 -0800)]
doc: Balance backticks.

Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com>
13 years agoMerge branch 'wip-osd-hb'
Sage Weil [Tue, 14 Feb 2012 22:01:22 +0000 (14:01 -0800)]
Merge branch 'wip-osd-hb'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agomds: use new tmap_get pbl argument
Sage Weil [Tue, 14 Feb 2012 21:41:29 +0000 (13:41 -0800)]
mds: use new tmap_get pbl argument

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrados: need prval for tmap_get
Sage Weil [Tue, 14 Feb 2012 21:39:46 +0000 (13:39 -0800)]
librados: need prval for tmap_get

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agolibrados: add aio_operate for reads and tmap_get for ObjectWriteOp
Samuel Just [Tue, 7 Feb 2012 16:51:01 +0000 (08:51 -0800)]
librados: add aio_operate for reads and tmap_get for ObjectWriteOp

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: remove unused need_size
Sage Weil [Tue, 14 Feb 2012 21:35:04 +0000 (13:35 -0800)]
osd: remove unused need_size

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoMerge branch 'wip_push_refactor'
Samuel Just [Tue, 14 Feb 2012 21:02:44 +0000 (13:02 -0800)]
Merge branch 'wip_push_refactor'

Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoReplicatedPG: pull() should return PULL_NONE, not false
Samuel Just [Tue, 14 Feb 2012 20:56:32 +0000 (12:56 -0800)]
ReplicatedPG: pull() should return PULL_NONE, not false

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: clean up push/pull
Samuel Just [Tue, 14 Feb 2012 20:55:43 +0000 (12:55 -0800)]
ReplicatedPG: clean up push/pull

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd_types.h: Add constructors for ObjectRecovery*
Samuel Just [Tue, 14 Feb 2012 20:52:59 +0000 (12:52 -0800)]
osd_types.h: Add constructors for ObjectRecovery*

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agotest_filestore_idempotent: fix test to create initial object
Sage Weil [Tue, 14 Feb 2012 19:53:05 +0000 (11:53 -0800)]
test_filestore_idempotent: fix test to create initial object

Filestore now properly fails to clone a non-existent object, which means
we should create one.

Fixes: #2062
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agolibcephfs: define CEPH_SETATTR_*
Sage Weil [Tue, 14 Feb 2012 17:06:21 +0000 (09:06 -0800)]
libcephfs: define CEPH_SETATTR_*

These are also defined internally in ceph_fs.h, so use a guard.  Annoying,
but gives us consistent naming (ceph_*/CEPH_*, not LIBCEPHFS_SETATTR_*).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agotest/encoding/readable.sh: drop bashisms
Sage Weil [Mon, 13 Feb 2012 22:43:18 +0000 (14:43 -0800)]
test/encoding/readable.sh: drop bashisms

=, not ==!

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilejournal: drop unused variable
Sage Weil [Mon, 13 Feb 2012 22:35:01 +0000 (14:35 -0800)]
filejournal: drop unused variable

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilejournal: aio off by default
Sage Weil [Mon, 13 Feb 2012 22:32:07 +0000 (14:32 -0800)]
filejournal: aio off by default

For now, until we have a better handle on the ext4 bug, and demonstrate
that it is a clear performance win with the full stack.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-journal-aio-rebased'
Sage Weil [Mon, 13 Feb 2012 22:31:17 +0000 (14:31 -0800)]
Merge remote-tracking branch 'gh/wip-journal-aio-rebased'

13 years agoMerge remote-tracking branch 'gh/wip-osd'
Sage Weil [Mon, 13 Feb 2012 22:09:04 +0000 (14:09 -0800)]
Merge remote-tracking branch 'gh/wip-osd'

Reviewed-by: Samuel Just <samuel.just@dreamhost.com>
13 years agotest/encoding/readable.sh: skip old version with known incompatibilities
Sage Weil [Mon, 13 Feb 2012 20:40:33 +0000 (12:40 -0800)]
test/encoding/readable.sh: skip old version with known incompatibilities

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoceph-dencoder: add osd_peer_stat_t
Sage Weil [Mon, 13 Feb 2012 20:13:18 +0000 (12:13 -0800)]
ceph-dencoder: add osd_peer_stat_t

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: remove extra useless info in bucket entry encoding
Yehuda Sadeh [Mon, 13 Feb 2012 20:07:17 +0000 (12:07 -0800)]
rgw: remove extra useless info in bucket entry encoding

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoReplicatedPG: refactor push and pull
Samuel Just [Mon, 13 Feb 2012 19:49:42 +0000 (11:49 -0800)]
ReplicatedPG: refactor push and pull

Now, push progress is represented by ObjectRecoveryProgress.  In
particular, rather than tracking data_subset_*ing, we track the furthest
offset before which the data will be consistent once cloning is complete.
sub_op_push now separates the pull response implementation from the
replica push implementation.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoadd CEPH_FEATURE_OSDENC
Sage Weil [Mon, 13 Feb 2012 19:27:11 +0000 (11:27 -0800)]
add CEPH_FEATURE_OSDENC

Require it for osd <-> osd and osd <-> mon communication.

This covers all the new encoding changes, except hobject_t, which is used
between the rados command line tool and the OSD for a object listing
position marker.  We can't distinguish between specific types of clients,
though, and we don't want to introduce any incompatibility with other
clients, so we'll just have to make do here.  :(

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoReplicatedPG: consider backfill_pos to be degraded
Samuel Just [Sun, 12 Feb 2012 01:50:49 +0000 (17:50 -0800)]
ReplicatedPG: consider backfill_pos to be degraded

A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoReplicatedPG: add debugging for in flight backfill ops
Samuel Just [Sun, 12 Feb 2012 01:52:13 +0000 (17:52 -0800)]
ReplicatedPG: add debugging for in flight backfill ops

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: is_degraded may return true for backfill
Samuel Just [Sun, 12 Feb 2012 01:53:47 +0000 (17:53 -0800)]
ReplicatedPG: is_degraded may return true for backfill

If is_degraded returns true for backfill, the object may not be
in any replica's missing set.  Only call start_recovery_op if
we actually started an op.  This bug could cause a stuck
in backfill error.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMOSDSubOp: Add new object recovery state
Samuel Just [Wed, 8 Feb 2012 00:35:04 +0000 (16:35 -0800)]
MOSDSubOp: Add new object recovery state

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: consider backfill_pos to be degraded
Samuel Just [Sun, 12 Feb 2012 01:50:49 +0000 (17:50 -0800)]
ReplicatedPG: consider backfill_pos to be degraded

A write may trigger via make_writeable the creation of a clone which
sorts before the object being written.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: add debugging for in flight backfill ops
Samuel Just [Sun, 12 Feb 2012 01:52:13 +0000 (17:52 -0800)]
ReplicatedPG: add debugging for in flight backfill ops

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoReplicatedPG: is_degraded may return true for backfill
Samuel Just [Sun, 12 Feb 2012 01:53:47 +0000 (17:53 -0800)]
ReplicatedPG: is_degraded may return true for backfill

If is_degraded returns true for backfill, the object may not be
in any replica's missing set.  Only call start_recovery_op if
we actually started an op.  This bug could cause a stuck
in backfill error.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd: remove peer_stat from MOSDOp entirely
Sage Weil [Mon, 13 Feb 2012 19:06:34 +0000 (11:06 -0800)]
osd: remove peer_stat from MOSDOp entirely

We haven't used this feature for years and years, and don't plan to.  It
was there to facilitate "read shedding", where the primary OSD would
forward a read request to a replica.  However, replicas can't reply back
to the client in that case because OSDs don't initiate connections (they
used to).

Rip this out for now, especially since osd_peer_stat_t just changed.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge remote-tracking branch 'gh/wip-mon-lag'
Sage Weil [Mon, 13 Feb 2012 18:01:32 +0000 (10:01 -0800)]
Merge remote-tracking branch 'gh/wip-mon-lag'

Reviewed-by: Sage Weil <sage@newdream.net>
13 years agoosd: new osd_peer_stat_t shell type
Sage Weil [Mon, 13 Feb 2012 17:42:37 +0000 (09:42 -0800)]
osd: new osd_peer_stat_t shell type

We weren't using this, and it had broken (raw) encoding.  The constructor
also didn't initialize fields properly.

Clear out the struct and use the new encoding scheme, so we can cleanly
add fields moving forward.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoqa/btrfs/.gitignore: ignore targets
Sage Weil [Mon, 13 Feb 2012 17:35:17 +0000 (09:35 -0800)]
qa/btrfs/.gitignore: ignore targets

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: use single helper for pg creation
Sage Weil [Mon, 13 Feb 2012 04:47:02 +0000 (20:47 -0800)]
osd: use single helper for pg creation

Take a bool so that we initialize the last_epoch_started properly on
newly created PGs.  This gives us a single code path for all new PGs.

We drop the clear_primary_state(), which has no effect, given that this is
a newly constructed PG.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: use PG::init() for newly local (but not created) PGs
Sage Weil [Mon, 13 Feb 2012 04:35:14 +0000 (20:35 -0800)]
osd: use PG::init() for newly local (but not created) PGs

Use the helper for PGs that are newly instantiated on the local OSD.

This fixes the initialization of pg->info.stats.{up,acting,mapping_epoch}.

We also get rid of a premature (and useless) write_info/log, which has
bad information (and is soon after followed by the real/good one).

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: move new pg initialization into PG::info()
Sage Weil [Mon, 13 Feb 2012 04:32:25 +0000 (20:32 -0800)]
osd: move new pg initialization into PG::info()

Move initialization of misc elements of the new pg from OSD.cc to a PG
method.  No change in functionality.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoosd: protect per-pg heartbeat peers with inner lock
Sage Weil [Mon, 13 Feb 2012 02:08:34 +0000 (18:08 -0800)]
osd: protect per-pg heartbeat peers with inner lock

Currently we update the overall heartbeat peers by looking directly at
per-pg state.  This is potentially problematic now (#2033), and definitely
so in the future when we push more peering operations into the work queues.

Create a per-pg set of peers, protected by an inner lock, and update it
using PG::update_heartbeat_peers() when appropriate under pg->lock.  Then
aggregate it into the osd peer list in OSD::update_heatbeat_peers() under
osd_lock and the inner lock.

We could probably have re-used osd->heartbeat_lock instead of adding a
new pg->heartbeat_peer_lock, but the finer locking can't hurt.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agorgw: don't use SCRIPT_NAME and QUERY_STRING vars
Yehuda Sadeh [Sun, 12 Feb 2012 06:43:35 +0000 (22:43 -0800)]
rgw: don't use SCRIPT_NAME and QUERY_STRING vars

REQUEST_URI holds everything we need, and it's encoded correctly.

Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
13 years agoosd: flush pg on activate _after_ we queue our transaction
Sage Weil [Sun, 12 Feb 2012 05:47:42 +0000 (21:47 -0800)]
osd: flush pg on activate _after_ we queue our transaction

We recently added a flush on activate, but we are still building the
transaction (the caller queues it), so calling osr.flush() here is totally
useless.

Instead, set a flag 'need_flush', and do the flush the next time we receive
some work.

This has the added benefit of doing the flush in the worker thread, outside
of osd_lock.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: do OpRequest dispatch into PG::do_request
Sage Weil [Sun, 12 Feb 2012 05:46:17 +0000 (21:46 -0800)]
osd: do OpRequest dispatch into PG::do_request

This simplifies the external PG interface, and gives us a single path into
the PG...

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: make flush() block forever if blackholed
Sage Weil [Sun, 12 Feb 2012 05:24:54 +0000 (21:24 -0800)]
filestore: make flush() block forever if blackholed

If we are blackholing the disk, we need to make flush() wait forever, or
else the flush() logic will return (the IO wasn't queued!) and higher
layers will continue and (eventually) misbehave.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoRevert "rgw: don't treat plus as a space in url decode"
Yehuda Sadeh [Sun, 12 Feb 2012 05:16:50 +0000 (21:16 -0800)]
Revert "rgw: don't treat plus as a space in url decode"

This reverts commit a6d7629c177fbab722a7a0c7f861caf91ff92deb.

13 years agoosd: emit useful scrub error on missing clone
Sage Weil [Sun, 12 Feb 2012 05:15:11 +0000 (21:15 -0800)]
osd: emit useful scrub error on missing clone

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agofilestore: return error from CLONE
Sage Weil [Sun, 12 Feb 2012 05:14:53 +0000 (21:14 -0800)]
filestore: return error from CLONE

Aie!

Signed-off-by: Sage Weil <sage@newdream.net>