]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agorgw: fix radosgw-admin buckets list
Yehuda Sadeh [Wed, 26 Jun 2013 18:28:57 +0000 (11:28 -0700)]
rgw: fix radosgw-admin buckets list

Fixes: #5455
Backport: cuttlefish
This commit fixes a regression, where radosgw-admin buckets list
operation wasn't returning any data.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e1f9fe58d2860fcbb18c92d3eb3946236b49a6ce)

12 years agoceph-disk: use unix lock instead of lockfile class
Sage Weil [Thu, 20 Jun 2013 00:27:49 +0000 (17:27 -0700)]
ceph-disk: use unix lock instead of lockfile class

The lockfile class relies on file system trickery to get safe mutual
exclusion.  However, the unix syscalls do this for us.  More
importantly, the unix locks go away when the owning process dies, which
is behavior that we want here.

Fixes: #5387
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit 2a4953b697a3464862fd3913336edfd7eede2487)

12 years agoceph-disk: do not mount over an osd directly in /var/lib/ceph/osd/$cluster-$id
Sage Weil [Thu, 27 Jun 2013 01:27:49 +0000 (18:27 -0700)]
ceph-disk: do not mount over an osd directly in /var/lib/ceph/osd/$cluster-$id

If we see a 'ready' file in the target OSD dir, do not mount our device
on top of it.

Among other things, this prevents ceph-disk activate on stray disks from
stepping on teuthology osds.

Fixes: #5445
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8a17f33b14d858235dfeaa42be1f4842dcfd66d2)

12 years agomds: fix underwater dentry cleanup
Yan, Zheng [Tue, 2 Apr 2013 07:46:51 +0000 (15:46 +0800)]
mds: fix underwater dentry cleanup

If the underwater dentry is a remove link, we shouldn't mark the
inode clean

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
(cherry picked from commit 81d073fecb58e2294df12b71351321e6d2e69652)

12 years agomon/Elector: cancel election timer if we bootstrap
Sage Weil [Tue, 25 Jun 2013 01:51:07 +0000 (18:51 -0700)]
mon/Elector: cancel election timer if we bootstrap

If we short-circuit and bootstrap, cancel our timer.  Otherwise it will
go off some time later when we are in who knows what state.

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 9ae0ec83dabe37ac15e5165559debdfef7a5f91d)

12 years agomon: cancel probe timeout on reset
Sage Weil [Tue, 25 Jun 2013 01:12:11 +0000 (18:12 -0700)]
mon: cancel probe timeout on reset

If we are probing and get (say) an election timeout that calls reset(),
cancel the timer.  Otherwise, we assert later with a splat like

2013-06-24 01:09:33.675882 7fb9627e7700  4 mon.b@0(leader) e1 probe_timeout 0x307a520
2013-06-24 01:09:33.676956 7fb9627e7700 -1 mon/Monitor.cc: In function 'void Monitor::probe_timeout(int)' thread 7fb9627e7700 time 2013-06-24 01:09:43.675904
mon/Monitor.cc: 1888: FAILED assert(is_probing() || is_synchronizing())

 ceph version 0.64-613-g134d08a (134d08a9654f66634b893d493e4a92f38acc63cf)
 1: (Monitor::probe_timeout(int)+0x161) [0x56f5c1]
 2: (Context::complete(int)+0xa) [0x574a2a]
 3: (SafeTimer::timer_thread()+0x425) [0x7059a5]
 4: (SafeTimerThread::entry()+0xd) [0x7065dd]
 5: (()+0x7e9a) [0x7fb966f62e9a]
 6: (clone()+0x6d) [0x7fb9652f9ccd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Fixes: #5438
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 03d3be3eaa96a8e72754c36abd6f355c68d52d59)

12 years agoceph-disk: make list_partition behave with unusual device names
Alexandre Maragone [Tue, 18 Jun 2013 23:18:01 +0000 (16:18 -0700)]
ceph-disk: make list_partition behave with unusual device names

When you get device names like sdaa you do not want to mistakenly conclude that
sdaa is a partition of sda.  Use /sys/block/$device/$partition existence
instead.

Fixes: #5211
Backport: cuttlefish
Signed-off-by: Alexandre Maragone <alexandre.maragone@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8c0daafe003935881c5192e0b6b59b949269e5ae)

12 years agoclient: fix warning
Sage Weil [Tue, 18 Jun 2013 03:28:24 +0000 (20:28 -0700)]
client: fix warning

client/Client.cc: In member function 'virtual void Client::ms_handle_remote_reset(Connection*)':
warning: client/Client.cc:7892:9: enumeration value 'STATE_NEW' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_OPEN' not handled in switch [-Wswitch]
warning: client/Client.cc:7892:9: enumeration value 'STATE_CLOSED' not handled in switch [-Wswitch]

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit 8bd936f077530dfeb2e699164e4492b1c0973088)

12 years agomon/AuthMonitor: ensure initial rotating keys get encoded when create_initial called 2x
Sage Weil [Tue, 25 Jun 2013 00:58:48 +0000 (17:58 -0700)]
mon/AuthMonitor: ensure initial rotating keys get encoded when create_initial called 2x

The create_initial() method may get called multiple times; make sure it
will unconditionally generate new/initial rotating keys.  Move the block
up so that we can easily assert as much.

Broken by commit cd98eb0c651d9ee62e19c2cc92eadae9bed678cd.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit 521fdc2a4e65559b3da83283e6ca607b6e55406f)

12 years agoinit-radosgw.sysv: remove -x debug mode
Sage Weil [Tue, 25 Jun 2013 00:42:04 +0000 (17:42 -0700)]
init-radosgw.sysv: remove -x debug mode

Fixes: #5443
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 31d6062076fdbcd2691c07a23b381b26abc59f65)

12 years agocommon/pick_addresses: behave even after internal_safe_to_start_threads
Sage Weil [Mon, 24 Jun 2013 19:52:44 +0000 (12:52 -0700)]
common/pick_addresses: behave even after internal_safe_to_start_threads

ceph-mon recently started using Preforker to working around forking issues.
As a result, internal_safe_to_start_threads got set sooner and calls to
pick_addresses() which try to set string config values now fail because
there are no config observers for them.

Work around this by observing the change while we adjust the value.  We
assume pick_addresses() callers are smart enough to realize that their
result will be reflected by cct->_conf and not magically handled elsewhere.

Fixes: #5195, #5205
Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit eb86eebe1ba42f04b46f7c3e3419b83eb6fe7f9a)

12 years agomon/PaxosService: allow paxos service writes while paxos is updating
Sage Weil [Thu, 20 Jun 2013 22:39:23 +0000 (15:39 -0700)]
mon/PaxosService: allow paxos service writes while paxos is updating

In commit f985de28f86675e974ac7842a49922a35fe24c6c I mistakenly made
is_writeable() false while paxos was updating due to a misread of
Paxos::propose_new_value() (I didn't see that it would queue).
This is problematic because it narrows the window during which each service
is writeable for no reason.

Allow service to be writeable both when paxos is active and updating.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 11169693d086e67dcf168ce65ef6e13eebd1a1ab)

12 years agomon/PaxosService: not active during paxos UPDATING_PREVIOUS
Sage Weil [Fri, 7 Jun 2013 18:41:21 +0000 (11:41 -0700)]
mon/PaxosService: not active during paxos UPDATING_PREVIOUS

Treat this as an extension of the recovery process, e.g.

 RECOVERING -> ACTIVE
or
 RECOVERING -> UPDATING_PREVIOUS -> ACTIVE

and we are not active until we get to "the end" in both cases.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 392a8e21f8571b410c85be2129ef62dd6fc52b54)

12 years agomon: simplify states
Sage Weil [Fri, 7 Jun 2013 18:40:22 +0000 (11:40 -0700)]
mon: simplify states

- make states mutually exclusive (an enum)
- rename locked -> updating_previous
- set state prior to begin() to simplify things a bit

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ee34a219605d1943740fdae0d84cfb9020302dd6)

12 years agomon/Paxos: not readable when LOCKED
Sage Weil [Fri, 7 Jun 2013 18:14:58 +0000 (11:14 -0700)]
mon/Paxos: not readable when LOCKED

If we are re-proposing a previously accepted value from a previous quorum,
we should not consider it readable, because it is possible it was exposed
to clients as committed (2/3 accepted) but not recored to be committed, and
we do not want to expose old state as readable when new state was
previously readable.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ec2ea86ed55e00265c2cc5ad0c94460b4c92865c)

12 years agomon/Paxos: cleanup: drop unused PREPARING state bit
Sage Weil [Fri, 7 Jun 2013 18:07:38 +0000 (11:07 -0700)]
mon/Paxos: cleanup: drop unused PREPARING state bit

This is never set when we block, and nobody looks at it.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7b7ea8e30e20704caad9a841332ecb2e39819a41)

12 years agomon/PaxosService: simplify is_writeable
Sage Weil [Thu, 6 Jun 2013 22:20:05 +0000 (15:20 -0700)]
mon/PaxosService: simplify is_writeable

Recast this in terms of paxos check + our conditions, and make it
match wait_for_writeable().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f985de28f86675e974ac7842a49922a35fe24c6c)

12 years agomon/PaxosService: simplify readable check
Sage Weil [Wed, 5 Jun 2013 00:03:15 +0000 (17:03 -0700)]
mon/PaxosService: simplify readable check

Recast this in terms of the paxos check and our additional conditions,
which match wait_for_readable().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 3aa61a0beb540e48bf61ceded766d6ff52c95eb2)

12 years agomon: simplify Monitor::init_paxos()
Sage Weil [Fri, 31 May 2013 23:45:08 +0000 (16:45 -0700)]
mon: simplify Monitor::init_paxos()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e832e76a4af04b091c806ad412bcfd0326d75a2d)

12 years agomon/Paxos: go active *after* refreshing
Sage Weil [Fri, 31 May 2013 23:39:37 +0000 (16:39 -0700)]
mon/Paxos: go active *after* refreshing

The update_from_paxos() methods occasionally like to trigger new activity.
As long as they check is_readable() and is_writeable(), they will defer
until we go active and that activity will happen in the normal callbacks.

This fixes the problem where we active but is_writeable() is still false,
triggered by PGMonitor::check_osd_map().

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e68b1bd36ed285e38a558899f83cf224d3aa60ed)

12 years agomon: safely signal bootstrap from MonmapMonitor::update_from_paxos()
Sage Weil [Fri, 31 May 2013 22:32:06 +0000 (15:32 -0700)]
mon: safely signal bootstrap from MonmapMonitor::update_from_paxos()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit dc83430124a5fd37573202a4cc0986c3c03739ef)

12 years agomon/Paxos: do paxos refresh in finish_proposal; and refactor
Sage Weil [Sun, 2 Jun 2013 23:57:11 +0000 (16:57 -0700)]
mon/Paxos: do paxos refresh in finish_proposal; and refactor

Do the paxos refresh inside finish_proposal, ordered *after* the leader
assertion so that MonmapMonitor::update_from_paxos() calling bootstrap()
does not kill us.

Also, remove unnecessary finish_queued_proposal() and move the logic inline
where the bad leader assertion is obvious.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a42d7582f816b45f5d19c393fd45447555e78fdd)

12 years agomon/PaxosService: cache {first,last}_committed
Joao Eduardo Luis [Sun, 2 Jun 2013 23:15:02 +0000 (16:15 -0700)]
mon/PaxosService: cache {first,last}_committed

Refresh the in-memory values when we are told the on-disk paxos state
may have changed.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 2fccb300bdf6ffd44db3462eb05115da11322ed4)

12 years agomon: no need to refresh from _active
Sage Weil [Fri, 31 May 2013 21:30:48 +0000 (14:30 -0700)]
mon: no need to refresh from _active

The refresh is done explicitly by the monitor, independent of the more
fragile PaxosService callbacks.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d941363d6e4249e97b64faff0e573f75e918ac0c)

12 years agomon: remove unnecessary update_from_paxos calls
Sage Weil [Sun, 2 Jun 2013 23:10:57 +0000 (16:10 -0700)]
mon: remove unnecessary update_from_paxos calls

The refresh() will do this when the state changes; no need to
opportunistically call this method all of the time.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 03014a4ecc06cde420fad0c6c2a0177ebd7b839d)

12 years agomon: explicitly refresh_from_paxos() when leveldb state changes
Sage Weil [Sun, 2 Jun 2013 23:14:01 +0000 (16:14 -0700)]
mon: explicitly refresh_from_paxos() when leveldb state changes

Instead of opportunistically calling each service's update_from_paxos(),
instead explicitly refresh all in-memory state whenever we know the
paxos state may have changed.  This is simpler and less fragile.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cc339c07312006e65854207523f50542d00ecf87)

12 years agomon/AuthMonitor: make initial auth include rotating keys
Sage Weil [Sun, 23 Jun 2013 16:25:55 +0000 (09:25 -0700)]
mon/AuthMonitor: make initial auth include rotating keys

This closes a very narrow race during mon creation where there are no
service keys.

Fixes: #5427
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cd98eb0c651d9ee62e19c2cc92eadae9bed678cd)

12 years agomds: fix iterator invalidation for backtrace removal
Sage Weil [Fri, 21 Jun 2013 18:53:29 +0000 (11:53 -0700)]
mds: fix iterator invalidation for backtrace removal

- Don't increment before we dereference!
- We need to update the iterator before we delete the item.

This code is changed in master, so this fix is for cuttlefish only.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
12 years agoosd: init test_ops_hook
Sage Weil [Thu, 9 May 2013 16:44:20 +0000 (09:44 -0700)]
osd: init test_ops_hook

CID 1019628 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member "test_ops_hook" is not initialized in this constructor nor in any functions that it calls.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e30a03210c3efb768b1653df5ae58917ef26e579)

12 years agoosd: initialize OSDService::next_notif_id
Sage Weil [Thu, 9 May 2013 16:45:51 +0000 (09:45 -0700)]
osd: initialize OSDService::next_notif_id

CID 1019627 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member "next_notif_id" is not initialized in this constructor nor in any functions that it calls.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 499edd8bfc355c2d590f5fa1ef197d1ea5680351)

12 years agomon: more fix dout use in sync_requester_abort()
Sage Weil [Thu, 20 Jun 2013 16:46:42 +0000 (09:46 -0700)]
mon: more fix dout use in sync_requester_abort()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d60534b8f59798feaeeaa17adba2a417d7777cbf)

12 years agomon: fix raw use of *_dout in sync_requester_abort()
Sage Weil [Mon, 10 Jun 2013 18:48:25 +0000 (11:48 -0700)]
mon: fix raw use of *_dout in sync_requester_abort()

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8a4ed58e39b287fd8667c62b45848487515bdc80)

12 years agov0.61.4 v0.61.4
Gary Lowell [Wed, 19 Jun 2013 20:51:38 +0000 (13:51 -0700)]
v0.61.4

12 years agomessages/MOSDMarkMeDown: fix uninit field
Sage Weil [Wed, 22 May 2013 21:29:37 +0000 (14:29 -0700)]
messages/MOSDMarkMeDown: fix uninit field

Fixes valgrind warning:
==14803== Use of uninitialised value of size 8
==14803==    at 0x12E7614: sctp_crc32c_sb8_64_bit (sctp_crc32.c:567)
==14803==    by 0x12E76F8: update_crc32 (sctp_crc32.c:609)
==14803==    by 0x12E7720: ceph_crc32c_le (sctp_crc32.c:733)
==14803==    by 0x105085F: ceph::buffer::list::crc32c(unsigned int) (buffer.h:427)
==14803==    by 0x115D7B2: Message::calc_front_crc() (Message.h:441)
==14803==    by 0x1159BB0: Message::encode(unsigned long, bool) (Message.cc:170)
==14803==    by 0x1323934: Pipe::writer() (Pipe.cc:1524)
==14803==    by 0x13293D9: Pipe::Writer::entry() (Pipe.h:59)
==14803==    by 0x120A398: Thread::_entry_func(void*) (Thread.cc:41)
==14803==    by 0x503BE99: start_thread (pthread_create.c:308)
==14803==    by 0x6C6E4BC: clone (clone.S:112)

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit eb91f41042fa31df2bef9140affa6eac726f6187)

12 years agoMerge remote-tracking branch 'gh/wip-4976-cuttlefish' into cuttlefish
Sage Weil [Wed, 19 Jun 2013 17:56:51 +0000 (10:56 -0700)]
Merge remote-tracking branch 'gh/wip-4976-cuttlefish' into cuttlefish

Reviewed-by: Samuel Just <sam.just@inktank.com>
12 years agocommon/Preforker: fix warning
Sage Weil [Tue, 18 Jun 2013 03:32:15 +0000 (20:32 -0700)]
common/Preforker: fix warning

common/Preforker.h: In member function ‘int Preforker::signal_exit(int)’:
warning: common/Preforker.h:82:45: ignoring return value of ‘ssize_t safe_write(int, const void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]

This is harder than it should be to fix.  :(
  http://stackoverflow.com/questions/3614691/casting-to-void-doesnt-remove-warn-unused-result-error

Whatever, I guess we can do something useful with this return value.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: David Zafman <david.zafman@inktank.com>
(cherry picked from commit ce7b5ea7d5c30be32e4448ab0e7e6bb6147af548)

12 years agomon: Monitor: make sure we backup a monmap during sync start
Joao Eduardo Luis [Wed, 19 Jun 2013 01:50:45 +0000 (02:50 +0100)]
mon: Monitor: make sure we backup a monmap during sync start

First of all, we must find a monmap to backup.  The newest version.

Secondly, we must make sure we back it up before clearing the store.

Finally, we must make sure that we don't remove said backup while
clearing the store; otherwise, we would be out of a backup monmap if the
sync happened to fail (and if the monitor happened to be killed before a
new sync had finished).

This patch makes sure these conditions are met.

Fixes: #5256 (partially)
Backport: cuttlefish

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5e6dc4ea21b452e34599678792cd36ce1ba3edb3)

12 years agomon: Monitor: obtain latest monmap on sync store init
Joao Eduardo Luis [Wed, 19 Jun 2013 01:36:44 +0000 (02:36 +0100)]
mon: Monitor: obtain latest monmap on sync store init

Always use the highest version amongst all the typically available
monmaps: whatever we have in memory, whatever we have under the
MonmapMonitor's store, and whatever we have backed up from a previous
sync.  This ensures we always use the newest version we came across
with.

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 6284fdce794b73adcc757fee910e975b6b4bd054)

12 years agomon: Monitor: don't remove 'mon_sync' when clearing the store during abort
Joao Eduardo Luis [Wed, 19 Jun 2013 01:21:58 +0000 (02:21 +0100)]
mon: Monitor: don't remove 'mon_sync' when clearing the store during abort

Otherwise, we will end up losing the monmap we backed up when we started
the sync, and the monitor may be unable to start if it is killed or
crashes in-between the sync abort and finishing a new sync.

Fixes: #5256 (partially)
Backport: cuttlefish

Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit af5a9861d7c6b4527b0d2312d0efa792910bafd9)

12 years agoos/FileStore: drop posix_fadvise(...DONTNEED)
Sage Weil [Wed, 19 Jun 2013 04:31:23 +0000 (21:31 -0700)]
os/FileStore: drop posix_fadvise(...DONTNEED)

On XFS this call is problematic because it directly calls the filemap
writeback without vectoring through xfs.  This can break the delicate
ordering of writeback and range zeroing; see #4976 and this thread

      http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

Drop this behavior for now to avoid subtle data corruption.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoos/FileStore: use fdatasync(2) instead of sync_file_range(2)
Sage Weil [Wed, 19 Jun 2013 04:24:16 +0000 (21:24 -0700)]
os/FileStore: use fdatasync(2) instead of sync_file_range(2)

The use of sync_file_range(2) on XFS screws up XFS' delicate ordering
of writeback and range zeroing; see #4976 and this thread:

  http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

Instead, replace all sync_file_range(2) calls with fdatasync(2), which
*does* do ordered writeback and should not leak unzeroed blocks.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoconfig: fix run_dir typo
Sage Weil [Thu, 13 Jun 2013 04:47:09 +0000 (21:47 -0700)]
config: fix run_dir typo

From 654299108bfb11e7dce45f54946d1505f71d2de8.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e9689ac6f5f50b077a6ac874f811d204ef996c96)

12 years agoceph.spec: create /var/run on package install
Sage Weil [Tue, 18 Jun 2013 21:51:08 +0000 (14:51 -0700)]
ceph.spec: create /var/run on package install

The %ghost %dir ... line will make this get cleaned up but won't install
it.

Reported-by: Derek Yarnell <derek@umiacs.umd.edu>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Gary Lowell <gary.lowell@inktank.com>
(cherry picked from commit 64ee0148a5b7324c7df7de2d5f869b880529d452)

12 years agoglobal: create /var/run/ceph on daemon startup
Sage Weil [Sat, 8 Jun 2013 00:03:41 +0000 (17:03 -0700)]
global: create /var/run/ceph on daemon startup

This handles cases where the daemon is started without the benefit of
sysvinit or upstart (as with teuthology or ceph-fuse).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 654299108bfb11e7dce45f54946d1505f71d2de8)

12 years agoPG: don't dirty log unconditionally in activate()
Samuel Just [Wed, 5 Jun 2013 18:10:34 +0000 (11:10 -0700)]
PG: don't dirty log unconditionally in activate()

merge_log and friends all take care of dirtying the log
as necessary.

Fixes: #5238
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 5deece1d034749bf72b7bd04e4e9c5d97e5ad6ce)

12 years agomon: OSDMonitor: don't ignore apply_incremental()'s return on UfP [1]
Joao Eduardo Luis [Fri, 14 Jun 2013 16:11:43 +0000 (17:11 +0100)]
mon: OSDMonitor: don't ignore apply_incremental()'s return on UfP [1]

apply_incremental() may return -EINVAL.  Don't ignore it.

[1] UfP = Update from Paxos

Fixes: #5343
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit e3c33f4315cbf8718f61eb79e15dd6d44fc908b7)

12 years agoclient: handle reset during initial mds session open
Sage Weil [Mon, 17 Jun 2013 23:38:26 +0000 (16:38 -0700)]
client: handle reset during initial mds session open

If we get a reset during our attempt to open an MDS session, close out the
Connection* and retry to open the session, moving the waiters over.

Fixes: #5379
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit df8a3e5591948dfd94de2e06640cfe54d2de4322)

12 years agoceph-disk: add some notes on wth we are up to
Sage Weil [Mon, 17 Jun 2013 22:43:40 +0000 (15:43 -0700)]
ceph-disk: add some notes on wth we are up to

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8c6b24e9039079e897108f28d6af58cbc703a15a)

12 years agoceph-disk: clear TERM to avoid libreadline hijinx
Sage Weil [Fri, 14 Jun 2013 23:29:10 +0000 (16:29 -0700)]
ceph-disk: clear TERM to avoid libreadline hijinx

The weird output from libreadline users is related to the TERM variable.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e538829f16ce19d57d63229921afa01cc687eb86)

12 years agoceph-disk-udev: set up by-partuuid, -typeuuid symlinks on ancient udev
Sage Weil [Mon, 17 Jun 2013 16:49:46 +0000 (09:49 -0700)]
ceph-disk-udev: set up by-partuuid, -typeuuid symlinks on ancient udev

Make the ancient-udev/blkid workaround script for RHEL/CentOS create the
symlinks for us too.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d7f7d613512fe39ec883e11d201793c75ee05db1)

12 years agoceph-disk: do not stop activate-all on first failure
Sage Weil [Sun, 16 Jun 2013 03:06:33 +0000 (20:06 -0700)]
ceph-disk: do not stop activate-all on first failure

Keep going even if we hit one activation error.  This avoids failing to
start some disks when only one of them won't start (e.g., because it
doesn't belong to the current cluster).

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c9074375bfbe1e3757b9c423a5ff60e8013afbce)

12 years agoceph.spec: include partuuid rules in package
Sage Weil [Fri, 14 Jun 2013 23:30:24 +0000 (16:30 -0700)]
ceph.spec: include partuuid rules in package

Commit f3234c147e083f2904178994bc85de3d082e2836 missed this.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 253069e04707c5bf46869f4ff5a47ea6bb0fde3e)

12 years agoceph.spec: install/uninstall init script
Sage Weil [Fri, 14 Jun 2013 22:01:14 +0000 (15:01 -0700)]
ceph.spec: install/uninstall init script

This was commented out almost years ago in commit 9baf5ef4 but it is not
clear to me that it was correct to do so.  In any case, we are not
installing the rc.d links for ceph, which means it does not start up after
a reboot.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cc9b83a80262d014cc37f0c974963cf7402a577a)

12 years agosysvinit, upstart: ceph-disk activate-all on start
Sage Weil [Fri, 14 Jun 2013 20:39:03 +0000 (13:39 -0700)]
sysvinit, upstart: ceph-disk activate-all on start

On 'service ceph start' or 'service ceph start osd' or start ceph-osd-all
we should activate any osd GPT partitions.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 13680976ef6899cb33109f6f841e99d4d37bb168)

12 years agoceph-disk: add 'activate-all'
Sage Weil [Fri, 14 Jun 2013 20:34:40 +0000 (13:34 -0700)]
ceph-disk: add 'activate-all'

Scan /dev/disk/by-parttypeuuid for ceph OSDs and activate them all.  This
is useful when the event didn't trigger on the initial udev event for
some reason.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 5c7a23687a1a21bec5cca7b302ac4ba47c78e041)

12 years agoudev: /dev/disk/by-parttypeuuid/$type-$uuid
Sage Weil [Fri, 14 Jun 2013 20:23:52 +0000 (13:23 -0700)]
udev: /dev/disk/by-parttypeuuid/$type-$uuid

We need this to help trigger OSD activations.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit d512dc9eddef3299167d4bf44e2018b3b6031a22)

12 years agorgw: escape prefix correctly when listing objects
Yehuda Sadeh [Fri, 14 Jun 2013 21:53:54 +0000 (14:53 -0700)]
rgw: escape prefix correctly when listing objects

Fixes: #5362
When listing objects prefix needs to be escaped correctly (the
same as with the marker). Otherwise listing objects with prefix
that starts with underscore doesn't work.
Backport: bobtail, cuttlefish

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit d582ee2438a3bd307324c5f44491f26fd6a56704)

12 years agomessages/MMonSync: initialize crc in ctor
Sage Weil [Tue, 11 Jun 2013 00:28:22 +0000 (17:28 -0700)]
messages/MMonSync: initialize crc in ctor

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit cd1c289b96a874ff99a83a44955d05efc9f2765a)

12 years agoclient: fix ancient typo in caps revocation path
Sage Weil [Sat, 15 Jun 2013 15:48:37 +0000 (08:48 -0700)]
client: fix ancient typo in caps revocation path

If we have dropped all references to a revoked capability, send the ack
to the MDS.  This typo has been there since v0.7 (early 2009)!

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit b7143c2f84daafbe2c27d5b2a2d5dc40c3a68d15)

12 years agomessages/MMonHealth: remove unused flag field
Sage Weil [Wed, 5 Jun 2013 15:42:25 +0000 (08:42 -0700)]
messages/MMonHealth: remove unused flag field

This was initialized in (one of) the ctor(s), but not encoded/decoded,
and not used.  Remove it.  This makes valgrind a happy.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 08bb8d510b5abd64f5b9f8db150bfc8bccaf9ce8)

12 years agomessages/MMonProbe: fix uninitialized variables
Sage Weil [Wed, 5 Jun 2013 15:34:20 +0000 (08:34 -0700)]
messages/MMonProbe: fix uninitialized variables

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4974b29e251d433101b69955091e22393172bcd8)

12 years agocommon/Preforker: fix broken recursion on exit(3)
Sage Weil [Sat, 15 Jun 2013 15:14:40 +0000 (08:14 -0700)]
common/Preforker: fix broken recursion on exit(3)

If we exit via preforker, call exit(3) and not recursively back into
Preforker::exit(r).  Otherwise you get a hang with the child blocked
at:

Thread 1 (Thread 0x7fa08962e7c0 (LWP 5419)):
#0  0x000000309860e0cd in write () from /lib64/libpthread.so.0
#1  0x00000000005cc906 in Preforker::exit(int) ()
#2  0x00000000005c8dfb in main ()

and the parent at

#0  0x000000309860eba7 in waitpid () from /lib64/libpthread.so.0
#1  0x00000000005cc87a in Preforker::parent_wait() ()
#2  0x00000000005c75ae in main ()

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7e7ff7532d343c473178799e37f4b83cf29c4eee)

12 years agorules: Don't disable tcmalloc on ARM (and other non-intel)
Gary Lowell [Thu, 13 Jun 2013 23:38:26 +0000 (16:38 -0700)]
rules:  Don't disable tcmalloc on ARM (and other non-intel)

Fixes #5342

Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
12 years agoRemove mon socket in post-stop
Guilhem Lettron [Mon, 27 May 2013 10:41:53 +0000 (12:41 +0200)]
Remove mon socket in post-stop

If ceph-mon segfault, socket file isn't removed.

By adding a remove in post-stop, upstart clean run directory properly.

Signed-off-by: Guilhem Lettron <guilhem@lettron.fr>
(cherry picked from commit 554b41b171eab997038e83928c462027246c24f4)

12 years agoRemove stop on from upstart tasks
James Page [Mon, 20 May 2013 09:26:06 +0000 (10:26 +0100)]
Remove stop on from upstart tasks

Upstart tasks don't have to concept of 'stop on' as they
are not long running.
(cherry picked from commit 17f6fccabc262b9a6d59455c524b550e77cd0fe3)

12 years agoceph-disk: extra dash in error message
Dan Mick [Thu, 13 Jun 2013 05:22:42 +0000 (22:22 -0700)]
ceph-disk: extra dash in error message

Signed-off-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit f86b4e7a4831c684033363ddd335d2f3fb9a189a)

12 years agoceph-disk: cast output of _check_output()
Danny Al-Gaaf [Fri, 24 May 2013 10:41:11 +0000 (12:41 +0200)]
ceph-disk: cast output of _check_output()

Cast output of _check_output() to str() to be able to use
str.split().

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 16ecae153d260407085aaafbad1c1c51f4486c9a)

12 years agoceph-disk: remove unnecessary semicolons
Danny Al-Gaaf [Fri, 24 May 2013 10:46:15 +0000 (12:46 +0200)]
ceph-disk: remove unnecessary semicolons

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 9785478a2aae7bf5234fbfe443603ba22b5a50d2)

12 years agoceph-disk: fix undefined variable
Danny Al-Gaaf [Fri, 24 May 2013 10:33:16 +0000 (12:33 +0200)]
ceph-disk: fix undefined variable

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit 9429ff90a06368fc98d146e065a7b9d1b68e9822)

12 years agoceph-disk: add missing spaces around operator
Danny Al-Gaaf [Fri, 24 May 2013 10:29:07 +0000 (12:29 +0200)]
ceph-disk: add missing spaces around operator

Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>
(cherry picked from commit c127745cc021c8b244d721fa940319158ef9e9d4)

12 years agoudev: drop useless --mount argument to ceph-disk
Sage Weil [Fri, 14 Jun 2013 05:02:03 +0000 (22:02 -0700)]
udev: drop useless --mount argument to ceph-disk

It doesn't mean anything anymore; drop it.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bcfd2f31a50d27038bc02e645795f0ec99dd3b32)

12 years agoceph-disk-udev: activate-journal
Sage Weil [Fri, 14 Jun 2013 05:01:34 +0000 (22:01 -0700)]
ceph-disk-udev: activate-journal

Trigger 'ceph-disk activate-journal' from the alt udev rules.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit b139152039bfc0d190f855910d44347c9e79b22a)

12 years agoceph-disk: do not use mount --move (or --bind)
Sage Weil [Fri, 14 Jun 2013 04:56:23 +0000 (21:56 -0700)]
ceph-disk: do not use mount --move (or --bind)

The kernel does not let you mount --move when the parent mount is
shared (see, e.g., https://bugzilla.redhat.com/show_bug.cgi?id=917008
for another person this also confused).  We can't use --bind either
since that (on RHEL at least) screws up /etc/mtab so that the final
result looks like

 /var/lib/ceph/tmp/mnt.HNHoXU /var/lib/ceph/osd/ceph-0 none rw,bind 0 0

Instead, mount the original dev in the final location and then umount
from the old location.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit e5ffe0d2484eb6cbcefcaeb5d52020b1130871a5)

12 years agoceph.spec: include by-partuuid udev workaround rules
Sage Weil [Fri, 14 Jun 2013 04:22:53 +0000 (21:22 -0700)]
ceph.spec: include by-partuuid udev workaround rules

These are need for old or buggy udev.  Having them for new and unbroken
udev is harmless.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit f3234c147e083f2904178994bc85de3d082e2836)

12 years agoceph-disk: work around buggy rhel/centos parted
Sage Weil [Fri, 14 Jun 2013 19:10:49 +0000 (12:10 -0700)]
ceph-disk: work around buggy rhel/centos parted

parted on RHEL/Centos prefixes the *machine readable output* with

 1b 5b 3f 31 30 33 34 68

Note that the same thing happens when you 'import readline' in python.

Work around it!

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 82ff72f827b9bd7f91d30a09d35e42b25d2a7344)

12 years agoceph-disk: implement 'activate-journal'
Sage Weil [Thu, 13 Jun 2013 22:54:58 +0000 (15:54 -0700)]
ceph-disk: implement 'activate-journal'

Activate an osd via its journal device.  udev populates its symlinks and
triggers events in an order that is not related to whether the device is
an osd data partition or a journal.  That means that triggering
'ceph-disk activate' can happen before the journal (or journal symlink)
is present and then fail.

Similarly, it may be that they are on different disks that are hotplugged
with the journal second.

This can be wired up to the journal partition type to ensure that osds are
started when the journal appears second.

Include the udev rules to trigger this.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a2a78e8d16db0a71b13fc15457abc5fe0091c84c)

12 years agoceph-disk: call partprobe outside of the prepare lock; drop udevadm settle
Sage Weil [Wed, 12 Jun 2013 01:35:01 +0000 (18:35 -0700)]
ceph-disk: call partprobe outside of the prepare lock; drop udevadm settle

After we change the final partition type, sgdisk may or may not trigger a
udev event, depending on how well udev is behaving (it varies between
distros, it seems).  The old code would often settle and wait for udev to
activate the device, and then partprobe would uselessly fail because it
was already mounted.

Call partprobe only at the very end, after prepare is done.  This ensures
that if partprobe calls udevadm settle (which is sometimes does) we do not
get stuck.

Drop the udevadm settle.  I'm not sure what this accomplishes; take it out,
at least until we determine we need it.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 8b3b59e01432090f7ae774e971862316203ade68)

12 years agoceph-disk: add 'zap' command
Sage Weil [Thu, 13 Jun 2013 18:03:37 +0000 (11:03 -0700)]
ceph-disk: add 'zap' command

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 10ba60cd088c15d4b4ea0b86ad681aa57f1051b6)

12 years agoceph-disk: fix stat errors with new suppress code
Sage Weil [Tue, 21 May 2013 19:52:03 +0000 (12:52 -0700)]
ceph-disk: fix stat errors with new suppress code

Broken by 225fefe5e7c997b365f481b6c4f66312ea28ed61.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit bcc8bfdb672654c6a6b48a2aa08267a894debc32)

12 years agoceph-disk: add '[un]suppress-activate <dev>' command
Sage Weil [Mon, 13 May 2013 19:35:32 +0000 (12:35 -0700)]
ceph-disk: add '[un]suppress-activate <dev>' command

It is often useful to prepare but not activate a device, for example when
preparing a bunch of spare disks.  This marks a device as 'do not
activate' so that it can be prepared without activating.

Fixes: #3255
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 225fefe5e7c997b365f481b6c4f66312ea28ed61)

12 years agoupstart: start ceph-all on runlevel [2345]
Sage Weil [Fri, 14 Jun 2013 18:21:25 +0000 (11:21 -0700)]
upstart: start ceph-all on runlevel [2345]

Starting when only one network interface has started breaks machines with
multiple nics in very problematic ways.

There may be an earlier trigger that we can use for cases where other
services on the local machine depend on ceph, but for now this is better
than the existing behavior.

See #5248

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 7e08ed1bf154f5556b3c4e49f937c1575bf992b8)

12 years agoclient: set issue_seq (not seq) in cap release
Sage Weil [Sun, 9 Jun 2013 00:38:07 +0000 (17:38 -0700)]
client: set issue_seq (not seq) in cap release

We regularly have been observing a stall where the MDS is blocked waiting
for a cap revocation (Ls, in our case) and never gets a reply.  We finally
tracked down the sequence:

 - mds issues cap seq 1 to client
 - mds does revocation (seq 2)
 - client replies
 - much time goes by
 - client trims inode from cache, sends release with seq == 2
 - mds ignores release because its issue_seq is 1
 - mds later tries to revoke other caps
 - client discards message because it doesn't have the inode in cache

The problem is simply that we are using seq instead of issue_seq in the
cap release message.  Note that the other release call site in
encode_inode_release() is correct.  That one is much more commonly
triggered by short tests, as compared to this case where the inode needs to
get pushed out of the client cache.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9b012e234a924efd718826ab6a53b9aeb7cd6649)

12 years agoosd: skip mark-me-down message if osd is not up
Sage Weil [Wed, 22 May 2013 22:03:50 +0000 (15:03 -0700)]
osd: skip mark-me-down message if osd is not up

Fixes crash when the OSD has not successfully booted and gets a
SIGINT or SIGTERM.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit c2e262fc9493b4bb22c2b7b4990aa1ee7846940e)

12 years agoceph-fuse: create finisher threads after fork()
Sage Weil [Mon, 3 Jun 2013 04:21:51 +0000 (21:21 -0700)]
ceph-fuse: create finisher threads after fork()

The ObjectCacher and MonClient classes both instantiate Finisher
threads.  We need to make sure they are created *after* the fork(2)
or else the process will fail to join() them on shutdown, and the
threads will not exist while fuse is doing useful work.

Put CephFuse on the heap and move all this initalization into the child
block, and make sure errors are passed back to the parent.

Fix-proposed-by: Alexandre Marangone <alexandre.maragone@inktank.com>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 4fa5f99a40792341d247e51488c37301da3c4e4f)

12 years agoosd: do not include logbl in scrub map
Sage Weil [Thu, 6 Jun 2013 23:35:54 +0000 (16:35 -0700)]
osd: do not include logbl in scrub map

This is a potentially use object/file, usually prefixed by a zeroed region
on disk, that is not used by scrub at all.  It dates back to
f51348dc8bdd5071b7baaf3f0e4d2e0496618f08 (2008) and the original version of
scrub.

This *might* fix #4179.  It is not a leak per se, but I observed 1GB
scrub messages going over the write.  Maybe the allocations are causing
fragmentation, or the sub_op queues are growing.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit 0b036ecddbfd82e651666326d6f16b3c000ade18)

12 years agorgw: handle deep uri resources
Yehuda Sadeh [Fri, 7 Jun 2013 04:53:00 +0000 (21:53 -0700)]
rgw: handle deep uri resources

In case of deep uri resources (ones created beyond a single level
of hierarchy, e.g. auth/v1.0) we want to create a new empty
handlers for the path if no handlers exists. E.g., for
auth/v1.0 we need to have a handler for 'auth', otherwise
the default S3 handler will be used, which we don't want.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit ad3934e335399f7844e45fcfd17f7802800d2cb3)

12 years agorgw: fix get_resource_mgr() to correctly identify resource
Yehuda Sadeh [Fri, 7 Jun 2013 04:47:21 +0000 (21:47 -0700)]
rgw: fix get_resource_mgr() to correctly identify resource

Fixes: #5262
The original test was not comparing the correct string, ended up
with the effect of just checking the substring of the uri to match
the resource.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 8d55b87f95d59dbfcfd0799c4601ca37ebb025f5)

12 years agorgw: add 'cors' to the list of sub-resources
Yehuda Sadeh [Thu, 6 Jun 2013 18:22:38 +0000 (11:22 -0700)]
rgw: add 'cors' to the list of sub-resources

Fixes: #5261
Backport: cuttlefish
Add 'cors' to the list of sub-resources, otherwise auth signing
is wrong.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 9a0a9c205b8c24ca9c1e05b0cf9875768e867a9e)

12 years agomon: fix preforker exit behavior behavior
Sage Weil [Sat, 1 Jun 2013 04:23:45 +0000 (21:23 -0700)]
mon: fix preforker exit behavior behavior

In 3c5706163b72245768958155d767abf561e6d96d we made exit() not actually
exit so that the leak checking would behave for a non-forking case.
That is only needed for the normal exit case; every other case expects
exit() to actually terminate and not continue execution.

Instead, make a signal_exit() method that signals the parent (if any)
and then lets you return.  exit() goes back to it's usual behavior,
fixing the many other calls in main().

Backport: cuttlefish
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit 92d085f7fd6224ffe5b7651c1f83b093f964b5cd)

12 years agorados.py: correct some C types
Josh Durgin [Tue, 4 Jun 2013 20:23:36 +0000 (13:23 -0700)]
rados.py: correct some C types

trunc was getting size_t instead of uint64_t, leading to bad results
in 32-bit environments. Explicitly cast to the desired type
everywhere, so it's clear the correct type is being used.

Fixes: #5233
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 6dd7d469000144b499af84bda9b735710bb5cec3)

12 years agov0.61.3 v0.61.3
Gary Lowell [Wed, 5 Jun 2013 18:10:05 +0000 (11:10 -0700)]
v0.61.3

12 years agoos/LevelDBStore: only remove logger if non-null
Sage Weil [Tue, 4 Jun 2013 17:42:13 +0000 (10:42 -0700)]
os/LevelDBStore: only remove logger if non-null

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit ce67c58db7d3e259ef5a8222ef2ebb1febbf7362)
Fixes: #5255
12 years agotest_librbd: use correct type for varargs snap test
Josh Durgin [Mon, 3 Jun 2013 22:57:23 +0000 (15:57 -0700)]
test_librbd: use correct type for varargs snap test

uint64_t is passed in, but int was extracted. This fails on 32-bit builds.

Fixes: #5220
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit 17029b270dee386e12e5f42c2494a5feffd49b08)

12 years agoos/LevelDBStore: fix merge loop
Sage Weil [Mon, 3 Jun 2013 01:07:34 +0000 (18:07 -0700)]
os/LevelDBStore: fix merge loop

We were double-incrementing p, both in the for statement and in the
body.  While we are here, drop the unnecessary else's.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit eb6d5fcf994d2a25304827d7384eee58f40939af)

12 years agomsgr: add get_messenger() to Connection
Sage Weil [Mon, 3 Jun 2013 00:27:10 +0000 (17:27 -0700)]
msgr: add get_messenger() to Connection

This was part of commit 27381c0c6259ac89f5f9c592b4bfb585937a1cfc.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: start lease timer from peon_init()
Sage Weil [Sat, 1 Jun 2013 00:09:19 +0000 (17:09 -0700)]
mon: start lease timer from peon_init()

In the scenario:

 - leader wins, peons lose
 - leader sees it is too far behind on paxos and bootstraps
 - leader tries to sync with someone, waits for a quorum of the others
 - peons sit around forever waiting

The problem is that they never time out because paxos never issues a lease,
which is the normal timeout that lets them detect a leader failure.

Avoid this by starting the lease timeout as soon as we lose the election.
The timeout callback just does a bootstrap and does not rely on any other
state.

I see one possible danger here: there may be some "normal" cases where the
leader takes a long time to issue its first lease that we currently
tolerate, but won't with this new check in place.  I hope that raising
the lease interval/timeout or reducing the allowed paxos drift will make
that a non-issue.  If it is problematic, we will need a separate explicit
"i am alive" from the leader while it is getting ready to issue the lease
to prevent a live-lock.

Backport: cuttlefish, bobtail
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit f1ccb2d808453ad7ef619c2faa41a8f6e0077bd9)

12 years agomon: discard messages from disconnected clients
Sage Weil [Fri, 31 May 2013 05:52:21 +0000 (22:52 -0700)]
mon: discard messages from disconnected clients

If the client is not connected, discard the message.  They will
reconnect and resend anyway, so there is no point in processing it
twice (now and later).

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit fb3cd0c2a8f27a1c8d601a478fd896cc0b609011)

12 years agomsgr: add Messenger reference to Connection
Sage Weil [Wed, 22 May 2013 15:13:21 +0000 (08:13 -0700)]
msgr: add Messenger reference to Connection

This allows us to get the messenger associated with a connection.

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit 92a558bf0e5fee6d5250e1085427bff22fe4bbe4)

12 years agomon/Paxos: adjust trimming defaults up; rename options
Sage Weil [Thu, 30 May 2013 22:59:49 +0000 (15:59 -0700)]
mon/Paxos: adjust trimming defaults up; rename options

- trim more at a time (by an order of magnitude)
- rename fields to paxos_trim_{min,max}; only trim when there are min items
  that are trimmable, and trim at most max items at a time.
- adjust the paxos_service_trim_{min,max} values up by a factor of 2.

Since we are compacting every time we trim, adjusting these up mean less
frequent compactions and less overall work for the monitor.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
(cherry picked from commit 6b8e74f0646a7e0d31db24eb29f3663fafed4ecc)

12 years agocommon/Preforker: fix warnings
Sage Weil [Wed, 8 May 2013 23:42:24 +0000 (16:42 -0700)]
common/Preforker: fix warnings

Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit a284c9ece85f11d020d492120be66a9f4c997416)