]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
13 years agomon: overwrite in put_bl
Sage Weil [Thu, 10 Nov 2011 22:24:18 +0000 (14:24 -0800)]
mon: overwrite in put_bl

This fixes a situation where we accept a large value, there is some failure
and recovery, and then we commit a smaller value with the same version.

E.g.,

INFO:teuthology.task.ceph.mon.b.err:terminate called after throwing an instance of 'ceph::buffer::end_of_buffer'
INFO:teuthology.task.ceph.mon.b.err:  what():  buffer::end_of_buffer
INFO:teuthology.task.ceph.mon.b.err:*** Caught signal (Aborted) **
INFO:teuthology.task.ceph.mon.b.err: in thread 7f0a6037c700
INFO:teuthology.task.ceph.mon.b.err: ceph version 0.37-365-g5b20830 (commit:5b208302e1ad134f56933dfdbccb074e03c88be3)
INFO:teuthology.task.ceph.mon.b.err: 1: (ceph::BackTrace::BackTrace(int)+0x2d) [0x6f4d1b]
INFO:teuthology.task.ceph.mon.b.err: 2: /tmp/cephtest/binary/usr/local/bin/ceph-mon() [0x7e9492]
INFO:teuthology.task.ceph.mon.b.err: 3: (()+0xfb40) [0x7f0a63bf4b40]
INFO:teuthology.task.ceph.mon.b.err: 4: (gsignal()+0x35) [0x7f0a625cdba5]
INFO:teuthology.task.ceph.mon.b.err: 5: (abort()+0x180) [0x7f0a625d16b0]
INFO:teuthology.task.ceph.mon.b.err: 6: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f0a62e716bd]
INFO:teuthology.task.ceph.mon.b.err: 7: (()+0xb9906) [0x7f0a62e6f906]
INFO:teuthology.task.ceph.mon.b.err: 8: (()+0xb9933) [0x7f0a62e6f933]
INFO:teuthology.task.ceph.mon.b.err: 9: (()+0xb9a3e) [0x7f0a62e6fa3e]
INFO:teuthology.task.ceph.mon.b.err: 10: (ceph::buffer::list::iterator::copy(unsigned int, std::string&)+0xcb) [0x7d73a7]
INFO:teuthology.task.ceph.mon.b.err: 11: (decode(std::string&, ceph::buffer::list::iterator&)+0x44) [0x5fa2e8]
INFO:teuthology.task.ceph.mon.b.err: 12: (LogEntry::decode(ceph::buffer::list::iterator&)+0xa8) [0x6ceee8]
INFO:teuthology.task.ceph.mon.b.err: 13: (LogMonitor::update_from_paxos()+0x346) [0x6cce9a]
INFO:teuthology.task.ceph.mon.b.err: 14: (PaxosService::_active()+0x13b) [0x647ab5]
INFO:teuthology.task.ceph.mon.b.err: 15: (PaxosService::C_Active::finish(int)+0x25) [0x647cb9]
INFO:teuthology.task.ceph.mon.b.err: 16: (Context::complete(int)+0x2b) [0x61a5a9]
INFO:teuthology.task.ceph.mon.b.err: 17: (finish_contexts(CephContext*, std::list<Context*, std::allocator<Context*> >&, int)+0x20b) [0x61a7ef]
INFO:teuthology.task.ceph.mon.b.err: 18: (Paxos::handle_last(MMonPaxos*)+0xea7) [0x63d081]
INFO:teuthology.task.ceph.mon.b.err: 19: (Paxos::dispatch(PaxosServiceMessage*)+0x29c) [0x642046]
INFO:teuthology.task.ceph.mon.b.err: 20: (Monitor::_ms_dispatch(Message*)+0xd78) [0x61636e]
INFO:teuthology.task.ceph.mon.b.err: 21: (Monitor::ms_dispatch(Message*)+0x3a) [0x61de84]
INFO:teuthology.task.ceph.mon.b.err: 22: (Messenger::ms_deliver_dispatch(Message*)+0x63) [0x7c690f]
INFO:teuthology.task.ceph.mon.b.err: 23: (SimpleMessenger::dispatch_entry()+0x7c2) [0x7b0156]
INFO:teuthology.task.ceph.mon.b.err: 24: (SimpleMessenger::DispatchThread::entry()+0x2c) [0x5fd6ac]
INFO:teuthology.task.ceph.mon.b.err: 25: (Thread::_entry_func(void*)+0x23) [0x6e9261]
INFO:teuthology.task.ceph.mon.b.err: 26: (()+0x7971) [0x7f0a63bec971]
INFO:teuthology.task.ceph.mon.b.err: 27: (clone()+0x6d) [0x7f0a6268092d]

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: implement swift copy, fix copy auth
Yehuda Sadeh [Thu, 10 Nov 2011 22:56:10 +0000 (14:56 -0800)]
rgw: implement swift copy, fix copy auth

13 years agoPG: gen_prefix: use osdmap_ref rather than osd->osdmap
Samuel Just [Thu, 10 Nov 2011 22:08:36 +0000 (14:08 -0800)]
PG: gen_prefix: use osdmap_ref rather than osd->osdmap

Otherwise, the debug output might not match the map used by
the pg logic.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoOSD: sync_and_flush afer mkfs to create first snap
Samuel Just [Thu, 10 Nov 2011 22:07:12 +0000 (14:07 -0800)]
OSD: sync_and_flush afer mkfs to create first snap

Previously, if we kill the OSD process before the filestore
does its first sync, we end up replaying the journal on top
of current and potentially hitting -EEXIST.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoPG: update info.history even if lastmap is absent
Samuel Just [Thu, 10 Nov 2011 01:16:57 +0000 (17:16 -0800)]
PG: update info.history even if lastmap is absent

Previously, we did not update same_interval_since etc if
we do not have the previous map.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMakefile: add MMonProbe.h
Sage Weil [Thu, 10 Nov 2011 00:36:48 +0000 (16:36 -0800)]
Makefile: add MMonProbe.h

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosd: remove useless proc_replica_log() side-effect
Sage Weil [Wed, 9 Nov 2011 23:47:35 +0000 (15:47 -0800)]
osd: remove useless proc_replica_log() side-effect

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agohadoop: update patch and Readme.
Greg Farnum [Wed, 9 Nov 2011 23:23:38 +0000 (15:23 -0800)]
hadoop: update patch and Readme.

Patch generated by Noah Watkins <noahwatkins@gmail.com>

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: swift guesses mime type if not specified
Yehuda Sadeh [Wed, 9 Nov 2011 23:29:41 +0000 (15:29 -0800)]
rgw: swift guesses mime type if not specified

13 years agoosd: comment PG::lock*(), whitespace
Sage Weil [Wed, 9 Nov 2011 22:50:09 +0000 (14:50 -0800)]
osd: comment PG::lock*(), whitespace

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMerge branch 'master' of github.com:NewDreamNetwork/ceph
Sage Weil [Wed, 9 Nov 2011 22:46:58 +0000 (14:46 -0800)]
Merge branch 'master' of github.com:NewDreamNetwork/ceph

Conflicts:
src/osd/PG.cc

13 years agoosd: improve last_peering_reset debugging
Sage Weil [Mon, 31 Oct 2011 18:57:14 +0000 (11:57 -0700)]
osd: improve last_peering_reset debugging

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agocrypto: make crypto handlers non-static
Sage Weil [Wed, 9 Nov 2011 22:34:30 +0000 (14:34 -0800)]
crypto: make crypto handlers non-static

These were static in auth/Crypto.cc, which was mostly fine, except when
we got a signal shutting everything down for the gcov stuff, like so:

Thread 21 (Thread 2164):
#0  0x00007f31a800b3cd in open64 () from /lib/libpthread.so.0
#1  0x000000000081dee0 in __gcov_open ()
#2  0x000000000081e3fd in gcov_exit ()
#3  0x00007f31a67e64f2 in exit () from /lib/libc.so.6
#4  0x000000000054e1ca in handle_signal (signal=<value optimized out>) at osd/OSD.cc:600
#5  <signal handler called>
#6  0x00007f31a8007a9a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#7  0x0000000000636d7b in Wait (this=0x2241000) at ./common/Cond.h:48
#8  SimpleMessenger::wait (this=0x2241000) at msg/SimpleMessenger.cc:2637
#9  0x00000000004a4e35 in main (argc=<value optimized out>, argv=<value optimized out>) at ceph_osd.cc:343

and a racing thread would, say, accept a connection and then crash, like
so:

#0  0x00007f31a800ba0b in raise () from /lib/libpthread.so.0
#1  0x0000000000696eeb in reraise_fatal (signum=2164) at global/signal_handler.cc:59
#2  0x00000000006976cc in handle_fatal_signal (signum=<value optimized out>) at global/signal_handler.cc:106
#3  <signal handler called>
#4  0x00007f31a67e0ba5 in raise () from /lib/libc.so.6
#5  0x00007f31a67e46b0 in abort () from /lib/libc.so.6
#6  0x00007f31a70846bd in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6
#7  0x00007f31a7082906 in ?? () from /usr/lib/libstdc++.so.6
#8  0x00007f31a7082933 in std::terminate() () from /usr/lib/libstdc++.so.6
#9  0x00007f31a708328f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#10 0x0000000000690e5b in CryptoKey::decrypt (this=0x7f3195a67510, in=..., out=..., error=...) at auth/Crypto.cc:404
#11 0x000000000079ccee in void decode_decrypt_enc_bl<CephXServiceTicketInfo>(CephXServiceTicketInfo&, CryptoKey, ceph::buffer::list&, std::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#12 0x0000000000795ca3 in cephx_verify_authorizer (cct=0x2232000, keys=<value optimized out>, indata=...,
    ticket_info=<value optimized out>, reply_bl=<value optimized out>) at auth/cephx/CephxProtocol.cc:438
#13 0x00000000007a17cf in CephxAuthorizeHandler::verify_authorizer (this=<value optimized out>, cct=0x2232000, keys=0x2256000,
    authorizer_data=<value optimized out>, authorizer_reply=..., entity_name=..., global_id=@0x7f3195a67848, caps_info=...,
    auid=0x7f3195a67840) at auth/cephx/CephxAuthorizeHandler.cc:21
#14 0x00000000005577ff in OSD::ms_verify_authorizer (this=0x2267000, con=0x230da00, peer_type=<value optimized out>,
    protocol=<value optimized out>, authorizer_data=<value optimized out>, authorizer_reply=<value optimized out>,
    isvalid=@0x7f3195a67c0f) at osd/OSD.cc:2723
#15 0x0000000000611ce1 in ms_deliver_verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
    authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f) at msg/Messenger.h:145
#16 SimpleMessenger::verify_authorizer (this=<value optimized out>, con=0x230da00, peer_type=4, protocol=2,
    authorizer=<value optimized out>, authorizer_reply=<value optimized out>, isvalid=@0x7f3195a67c0f)
    at msg/SimpleMessenger.cc:2419
#17 0x00000000006309ab in SimpleMessenger::Pipe::accept (this=0x22ce280) at msg/SimpleMessenger.cc:756
#18 0x0000000000634711 in SimpleMessenger::Pipe::reader (this=0x22ce280) at msg/SimpleMessenger.cc:1546
#19 0x00000000004a7085 in SimpleMessenger::Pipe::Reader::entry (this=<value optimized out>) at msg/SimpleMessenger.h:208
#20 0x000000000060f252 in Thread::_entry_func (arg=0x874) at common/Thread.cc:42
#21 0x00007f31a8003971 in start_thread () from /lib/libpthread.so.0
#22 0x00007f31a689392d in clone () from /lib/libc.so.6
#23 0x0000000000000000 in ?? ()

Instead, put these on the heap.  Set them up in the ceph::crypto::init()
method, and tear them down in ceph::crypto::shutdown().

Fixes: #1633
Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agoPG: cache read-only reference to the current osdmap on pg lock
Samuel Just [Tue, 8 Nov 2011 18:54:57 +0000 (10:54 -0800)]
PG: cache read-only reference to the current osdmap on pg lock

Previously, we needed to grab an osd_map read lock to send messages,
among other things.  Now, we grab a reference to the osd_map on pg lock
and refer to that.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoOSDMap,CrushWrapper: const cleanup on OSDMap
Samuel Just [Tue, 8 Nov 2011 17:45:44 +0000 (09:45 -0800)]
OSDMap,CrushWrapper: const cleanup on OSDMap

The osd's cached maps are not actually modified once cached.  Marking
these methods const (which they should be) allows us to make OSDMapRef
shared_ptr<const OSDMap>.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoosd/: change type of osd::osdmap to a shared_ptr
Samuel Just [Tue, 8 Nov 2011 01:51:21 +0000 (17:51 -0800)]
osd/: change type of osd::osdmap to a shared_ptr

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoPG: always add backlog entry
Samuel Just [Wed, 2 Nov 2011 21:32:17 +0000 (14:32 -0700)]
PG: always add backlog entry

Previously, we did not add a backlog entry if the object already had an
entry in the log along with an entry for that entry's prior_version.
However, when scanning the log, an OSD will incorrectly conclude that it
has the prior_version's prior_version if the object is not already in
the missing set.  If there happens to be a clone entry with that version
as it's prior_version, the osd will attempt to recover the clone via a
clone operation on the non-existent object.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorbd: Fix the showmapped cmd usage
Stratos Psomadakis [Wed, 9 Nov 2011 22:05:35 +0000 (00:05 +0200)]
rbd: Fix the showmapped cmd usage

If the rbd showmapped cmd is given any extra arguments, rbd will fail
with "assert(0)". Fix it by exiting with "usage_exit()", if any
arguments are present, instead of failing.

Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
13 years agohadoop: return all replica hostnames
Noah Watkins [Wed, 9 Nov 2011 02:39:20 +0000 (18:39 -0800)]
hadoop: return all replica hostnames

Updates CephFileSystem to return all replica locations,
and in addition attempts to use reverse DNS to convert
the OSD IPs into hostnames. Hadoop does not do well at
comparing the IP with hostnames, and locality is lost.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: make listStatus quiet
Noah Watkins [Wed, 9 Nov 2011 02:39:21 +0000 (18:39 -0800)]
hadoop: make listStatus quiet

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: handle new ceph_get_file_stripe_address
Noah Watkins [Wed, 9 Nov 2011 02:39:19 +0000 (18:39 -0800)]
hadoop: handle new ceph_get_file_stripe_address

Updates the Hadoop JNI/CephFileSystem to handle
the new version of ceph_get_file_stripe_address
which returns the locations of replicas in addition
to the primary.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agoclient: return stripe address replicas
Noah Watkins [Wed, 9 Nov 2011 02:39:18 +0000 (18:39 -0800)]
client: return stripe address replicas

Changes ceph_get_file_stripe_address to return a
vector of entity_addr_t's for the primary and the
replicas. libcephfs is updated to return the
associated sockaddr_storage for each address.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agoclient: fix bad perfcounter fset callers
Sage Weil [Wed, 9 Nov 2011 21:15:55 +0000 (13:15 -0800)]
client: fix bad perfcounter fset callers

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoImprove use of syncfs.
Alexandre Oliva [Wed, 9 Nov 2011 17:51:26 +0000 (15:51 -0200)]
Improve use of syncfs.

Test syncfs return value and fallback to btrfs sync and then sync.

Signed-off-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>
13 years agoosd: fix perfcounter typo
Sage Weil [Wed, 9 Nov 2011 18:46:18 +0000 (10:46 -0800)]
osd: fix perfcounter typo

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoos: rename and make use of the split_threshold parameter.
Greg Farnum [Wed, 9 Nov 2011 19:43:21 +0000 (11:43 -0800)]
os: rename and make use of the split_threshold parameter.

This was accidentally left out of the must_split calculation. Put it
in, and rename it to split_multiplier (as that is a much better name
for how it's used).

In the default case this won't actually change behavior, but it
makes the behavior configurable as it's supposed to be.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoperfcounters: fix users of fset on averages
Greg Farnum [Wed, 9 Nov 2011 19:03:48 +0000 (11:03 -0800)]
perfcounters: fix users of fset on averages

I forgot to audit these before merging the assert and they popped up
in teuthology and stuff. :(

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoMerge branch 'wip-mon'
Sage Weil [Wed, 9 Nov 2011 06:11:04 +0000 (22:11 -0800)]
Merge branch 'wip-mon'

13 years agoosd: don't open deleted map from generate_past_intervals
Sage Weil [Wed, 9 Nov 2011 05:58:11 +0000 (21:58 -0800)]
osd: don't open deleted map from generate_past_intervals

The first get_map() call needs to be avoided when stop < last_epoch.  This
fixes a crash like

2011-11-08 21:51:09.046739 7fcf6e035700 osd.0 5 pg[1.1p0( empty n=0 ec=1 les/c 3/3 2/2/2) [0,1] r=0 mlcod 0'0 !hml peering] enter Started/Primary/Peering/GetInfo
2011-11-08 21:51:09.046767 7fcf6e035700 osd.0 5 pg[1.1p0( empty n=0 ec=1 les/c 3/3 2/2/2) [0,1] r=0 mlcod 0'0 !hml peering] generate_past_intervals over epochs 4-1
2011-11-08 21:51:09.046796 7fcf6e035700 osd.0 5 get_map 1 - loading and decoding 0x183b000
*** Caught signal (Aborted) **
 in thread 7fcf6e035700
 ceph version 0.37-327-g1bc1a24 (commit:1bc1a244dbf7342662322d002d0c9d41ad5fee6f)

...

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoautomake: enable 'make V=0'
Sage Weil [Wed, 9 Nov 2011 05:13:07 +0000 (21:13 -0800)]
automake: enable 'make V=0'

Enables silent mode for automake generated Makefiles,
and silent mode is _off_ by default. Using V=0 the output
is much easier to read when trying to find warnings:

nwatkins@piha:~/Projects/ceph/ceph$ make -j8 V=0
make[3]: Entering directory `/users/nwatkins/Projects/ceph/ceph/src'
  CC     locks.o
  CXX    journal.o
  CXX    Server.o
  CXX    Mutation.o
  CXX    MDCache.o
  CXX    Locker.o
  CXX    Migrator.o
  CXX    MDBalancer.o
  CXX    CDentry.o
  CXX    CDir.o
  CXX    CInode.o
  CXX    LogEvent.o
  CXX    MDSTable.o

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agomon: handle active -> electing transition properly
Sage Weil [Wed, 9 Nov 2011 00:24:01 +0000 (16:24 -0800)]
mon: handle active -> electing transition properly

If we are already active, make sure we reset things properly before going
into an election.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: don't return partial content response with bad header
Yehuda Sadeh [Tue, 8 Nov 2011 22:02:47 +0000 (14:02 -0800)]
rgw: don't return partial content response with bad header

13 years agorgw: swift bucket report returns both bytes size and actual size
Yehuda Sadeh [Tue, 8 Nov 2011 21:42:55 +0000 (13:42 -0800)]
rgw: swift bucket report returns both bytes size and actual size

13 years agorgw: abort early on incorrect method
Yehuda Sadeh [Tue, 8 Nov 2011 21:42:32 +0000 (13:42 -0800)]
rgw: abort early on incorrect method

13 years agopaxos: fix race between active and commit
Sage Weil [Tue, 8 Nov 2011 21:09:13 +0000 (13:09 -0800)]
paxos: fix race between active and commit

If paxos reproposes an old learned value, we have a C_Active waiter, and
also a commit in progress.

When we reach quorum, paxos goes active, and _active() creates a new
pending.  A bit later, the _commit callback goes, and we already have that
pending value ready.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: add 'quorum_status' command
Sage Weil [Tue, 8 Nov 2011 20:56:20 +0000 (12:56 -0800)]
mon: add 'quorum_status' command

Show status of the current quorum.  Block until there is one.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: do not participate in the election unless we are in electing state
Sage Weil [Tue, 8 Nov 2011 20:52:51 +0000 (12:52 -0800)]
mon: do not participate in the election unless we are in electing state

If we participate, we may be included in the quorum, even tho we are
probing, slurping, whatever.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: guard perfcounter accesses in rgw_cache.
Greg Farnum [Tue, 8 Nov 2011 19:50:42 +0000 (11:50 -0800)]
rgw: guard perfcounter accesses in rgw_cache.

This gets called by radosgw-admin, so it needs to handle
perfcounter being a null pointer.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: initialize all the perfcounters, in order
Greg Farnum [Tue, 8 Nov 2011 19:28:02 +0000 (11:28 -0800)]
rgw: initialize all the perfcounters, in order

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoReplicatedPG: use finc, not fset, on average counters
Greg Farnum [Tue, 8 Nov 2011 18:42:30 +0000 (10:42 -0800)]
ReplicatedPG: use finc, not fset, on average counters

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: 'mon_status' command to dump individual mon state
Sage Weil [Tue, 8 Nov 2011 18:42:20 +0000 (10:42 -0800)]
mon: 'mon_status' command to dump individual mon state

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: use l_rgw_qactive perfcounter
Greg Farnum [Tue, 8 Nov 2011 18:04:59 +0000 (10:04 -0800)]
rgw: use l_rgw_qactive perfcounter

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: add probe+slurp timeouts
Sage Weil [Tue, 8 Nov 2011 17:58:12 +0000 (09:58 -0800)]
mon: add probe+slurp timeouts

A short timeout on probe, so we can form new quorums quickly.

A longer timeout on slurp, so we will tolerate a slow response sucking
data off a loaded monitor.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: implement perfcounters
Greg Farnum [Tue, 8 Nov 2011 17:49:22 +0000 (09:49 -0800)]
rgw: implement perfcounters

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoperfcounter: add some minimal documentation.
Greg Farnum [Wed, 26 Oct 2011 23:05:45 +0000 (16:05 -0700)]
perfcounter: add some minimal documentation.

The data model is a bit obtuse if you're just looking at the code.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoperfcounter: assert when you try and set an average.
Greg Farnum [Wed, 26 Oct 2011 23:04:45 +0000 (16:04 -0700)]
perfcounter: assert when you try and set an average.

If you're trying to set an average, you're probably doing it wrong.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: create and tear down a radosgw perfcounter
Sage Weil [Tue, 11 Oct 2011 21:00:42 +0000 (14:00 -0700)]
rgw: create and tear down a radosgw perfcounter

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agomon: slurp latest state from active monitors before joining quorum
Sage Weil [Tue, 8 Nov 2011 06:46:09 +0000 (22:46 -0800)]
mon: slurp latest state from active monitors before joining quorum

If a monitor has been down and is behind, and joins the quorum, the
other nodes will try to send it all of the needed state, which can
bring the cluster to a halt.

Instead, implement a new bootstrap() procedure:

 - probe the cluster nodes
 - if there is an existing quorum,
   - and it is not too far ahead of me, join it (call an election)
   - otherwise, slurp down all the newer state and then restart (bootstrap)
 - if we see enough online nodes that are not part of the quorum, call
   an election.

We still need to add some timeouts.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: fix osdmap trim
Sage Weil [Tue, 8 Nov 2011 06:40:39 +0000 (22:40 -0800)]
mon: fix osdmap trim

We can raise the floor even when min_last_epoch_clean if very close to
the current version, as long as it is still above the oldest.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agopaxos: last_consumed == latest_stashed; behave accordingly
Sage Weil [Tue, 8 Nov 2011 06:09:56 +0000 (22:09 -0800)]
paxos: last_consumed == latest_stashed; behave accordingly

Initialize on startup.
Don't re-read off of disk on every trim_to() call.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomonmap: simplify constructor
Sage Weil [Tue, 8 Nov 2011 04:52:50 +0000 (20:52 -0800)]
monmap: simplify constructor

Explicitly set created, last_changed where appropriate.

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agomon: revamp monitor states
Sage Weil [Wed, 2 Nov 2011 16:07:10 +0000 (09:07 -0700)]
mon: revamp monitor states

starting -> probing, electing
some cleanup

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: rename election_starting -> restart
Sage Weil [Wed, 2 Nov 2011 02:45:17 +0000 (19:45 -0700)]
mon: rename election_starting -> restart

These callbacks reset monitor/paxos/paxosesrvice state, which used to
happen when an election started, but will now not necessarily be
immediately followed by an election.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agomon: don't call out to mon->call_election for internal election restarts
Sage Weil [Wed, 2 Nov 2011 02:32:41 +0000 (19:32 -0700)]
mon: don't call out to mon->call_election for internal election restarts

This lets us drop the is_new kludge.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agorgw: fix warning
Sage Weil [Tue, 8 Nov 2011 04:40:18 +0000 (20:40 -0800)]
rgw: fix warning

Signed-off-by: Sage Weil <sage.weil@dreamhost.com>
13 years agorgw: fix accept-range for suffix format, other related issues
Yehuda Sadeh [Tue, 8 Nov 2011 01:07:40 +0000 (17:07 -0800)]
rgw: fix accept-range for suffix format, other related issues

13 years agoTimer.cc: remove global thread variable
Samuel Just [Mon, 7 Nov 2011 23:04:45 +0000 (15:04 -0800)]
Timer.cc: remove global thread variable

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agocommon: return null if mc.init() unsuccessful
Samuel Just [Mon, 7 Nov 2011 23:04:02 +0000 (15:04 -0800)]
common: return null if mc.init() unsuccessful

Prevents ceph.cc from segfaulting on missing keyring.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorbd: add showmapped to clitests and rst man page
Josh Durgin [Mon, 7 Nov 2011 17:27:55 +0000 (09:27 -0800)]
rbd: add showmapped to clitests and rst man page

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorbd: Document the rbd showmapped cmd
Stratos Psomadakis [Mon, 7 Nov 2011 09:24:35 +0000 (11:24 +0200)]
rbd: Document the rbd showmapped cmd

Document the rbd showmapped cmd in rbd.usage(), and rbd's man page,
and add it to the bash completion script.

Signed-off-by: Stratos Psomadakis <psomas@grnet.gr>
13 years agorbd.py: fix list when there are no images
Josh Durgin [Mon, 7 Nov 2011 17:08:00 +0000 (09:08 -0800)]
rbd.py: fix list when there are no images

It should return [], not [''].

Reported-by: Eric Chen <Eric_YH_Chen@wistron.com>
Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agoOSD: write_info/log before dropping lock in generate_backlog
Samuel Just [Sat, 5 Nov 2011 00:36:21 +0000 (17:36 -0700)]
OSD: write_info/log before dropping lock in generate_backlog

Bug #1530

This should fix the following race:
1) osd->generate_backlog does pg->assemble_backlog
2) osd->generate_backlog drops the pg lock to grab the osd_map read lock
3) ...which is held by osd->handle_osd_map
4) at the end of osd->handle_osd_map, we call write_info on the pg since
it has progressed to a new peering interval
5) osd->generate_backlog gets the read_lock and the pg lock and promptly
bails since the backlog generation has been cancelled
6) osd dies, but not before the write_info transaction is durable

The result of this is that the in-memory backlog generated in
assemble_backlog doesn't make it to disk, but the updated info does
resulting in an ondisklog inconsistent with the pg info on osd
restart.

This should prevent the info from being written without the log.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoFileJournal: stop using sync_file_range
Christoph Hellwig [Thu, 3 Nov 2011 21:45:08 +0000 (17:45 -0400)]
FileJournal: stop using sync_file_range

Using sync_file_range means that neither any required metadata gets commited,
nor the disk cache gets flushed.  Stop using it for the journal, and add
a comment on why a fsync_range system call would be helpful here.

Btw, why does the code use O_SYNC (and not even O_DSYNC!) if using direct
I/O, but fdatasync/fsync for buffered I/O?  Avoiding cache flushes and
metadata updates for every writes is just as important for direct I/O
as it is for buffered I/O.

Signed-off-by: Christoph Hellwig <hch@lst.de>
13 years agomonclient: simplify auth_supported set
Sage Weil [Wed, 2 Nov 2011 18:06:59 +0000 (11:06 -0700)]
monclient: simplify auth_supported set

Use AuthSupported class instead of repopulating it ourselves.

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoMakefile: use static add for test_libcephfs_readdir.
Greg Farnum [Sat, 5 Nov 2011 00:19:34 +0000 (17:19 -0700)]
Makefile: use static add for test_libcephfs_readdir.

Otherwise it doesn't seem to play nicely with teuthology/sepia
due to requiring the host to have gtest installed.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoRadosModel: add DeleteOp to test object deletions
Samuel Just [Fri, 4 Nov 2011 21:55:36 +0000 (14:55 -0700)]
RadosModel: add DeleteOp to test object deletions

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agorgw: fix tmp objects leakage
Yehuda Sadeh [Fri, 4 Nov 2011 20:40:07 +0000 (13:40 -0700)]
rgw: fix tmp objects leakage

13 years agorgw: don't purge pools in any case
Yehuda Sadeh [Fri, 4 Nov 2011 20:11:19 +0000 (13:11 -0700)]
rgw: don't purge pools in any case

13 years agorgw: list system buckets through rados api
Yehuda Sadeh [Fri, 4 Nov 2011 18:41:43 +0000 (11:41 -0700)]
rgw: list system buckets through rados api

13 years agorbd: document --order and list required args where they're necessary
Josh Durgin [Thu, 3 Nov 2011 22:45:43 +0000 (15:45 -0700)]
rbd: document --order and list required args where they're necessary

Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
13 years agorgw: fix PUT without content length (non chunked)
Yehuda Sadeh [Thu, 3 Nov 2011 23:00:37 +0000 (16:00 -0700)]
rgw: fix PUT without content length (non chunked)

13 years agoMerge remote branch 'nwatkins/for-master'
Greg Farnum [Thu, 3 Nov 2011 21:43:50 +0000 (14:43 -0700)]
Merge remote branch 'nwatkins/for-master'

13 years agoMerge branch 'wip-getdir'
Greg Farnum [Thu, 3 Nov 2011 21:11:21 +0000 (14:11 -0700)]
Merge branch 'wip-getdir'

13 years agogitignore: just ignore all test_ files
Greg Farnum [Thu, 3 Nov 2011 20:59:25 +0000 (13:59 -0700)]
gitignore: just ignore all test_ files

We don't want to add a new ignore for each test!

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agoqa: workunit to run test_libcephfs_readder
Greg Farnum [Thu, 3 Nov 2011 20:55:53 +0000 (13:55 -0700)]
qa: workunit to run test_libcephfs_readder

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agotest: write a test to try and check on Client::readdir_r_cb.
Greg Farnum [Thu, 3 Nov 2011 20:49:56 +0000 (13:49 -0700)]
test: write a test to try and check on Client::readdir_r_cb.

It's made difficult by having to go through libcephfs, but it's better
than nothing and should catch most of the errors which were detected
while using it in Hadoop.

Signed-off-by: Greg Farnum <gregory.farnum@dreamhost.com>
13 years agorgw: fix null deref, cleanups
Yehuda Sadeh [Thu, 3 Nov 2011 18:45:44 +0000 (11:45 -0700)]
rgw: fix null deref, cleanups

13 years agorgw: add support for chunked upload
Yehuda Sadeh [Thu, 3 Nov 2011 18:24:57 +0000 (11:24 -0700)]
rgw: add support for chunked upload

13 years agorgw: fix crash when accessing swift auth without user
Yehuda Sadeh [Wed, 2 Nov 2011 19:40:51 +0000 (12:40 -0700)]
rgw: fix crash when accessing swift auth without user

13 years agohadoop: remove unused fs_default_name
Noah Watkins [Wed, 2 Nov 2011 22:49:09 +0000 (15:49 -0700)]
hadoop: remove unused fs_default_name

The variable fs_default_name is effectively unused
and the same affect is achieved by treating paths
in a standard way (they contain the scheme:/).

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: FileSystem.rename should not return FileNotFound
Noah Watkins [Wed, 2 Nov 2011 22:25:31 +0000 (15:25 -0700)]
hadoop: FileSystem.rename should not return FileNotFound

This fixes several unit test failure cases.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: ENOTDIR should be negative
Noah Watkins [Wed, 2 Nov 2011 22:24:54 +0000 (15:24 -0700)]
hadoop: ENOTDIR should be negative

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: fix unit test: testWorkingDirectory
Noah Watkins [Wed, 2 Nov 2011 20:43:37 +0000 (13:43 -0700)]
hadoop: fix unit test: testWorkingDirectory

The working directory should be set in initialize() and
is expected by the unit tests to be fully qualified (i.e.
with ceph://...).

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: remove deprecation warning
Noah Watkins [Wed, 2 Nov 2011 19:28:03 +0000 (12:28 -0700)]
hadoop: remove deprecation warning

The routine cannot be fully removed yet because it
still exists as an abstract function in FileSystem class.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: remove deprecated isDirectory()
Noah Watkins [Wed, 2 Nov 2011 19:25:15 +0000 (12:25 -0700)]
hadoop: remove deprecated isDirectory()

Uses the suggested getFileStatus() method for
replacing the deprecated isDirectory(). This is
only marginally slower as get_replication is called
to fill in the FileStatus. If performance ever became
an issue for the paths that use isDirectory() then
getFileStatus can be made faster by pushing more down
into JNI.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: remove statistics initialization
Noah Watkins [Wed, 2 Nov 2011 19:04:52 +0000 (12:04 -0700)]
hadoop: remove statistics initialization

This is already handled by super.initialize()

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: remove unused variable
Noah Watkins [Wed, 2 Nov 2011 19:01:15 +0000 (12:01 -0700)]
hadoop: remove unused variable

Remove CephFileSystem.debug as log4j is now
used for debug level control.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: remove initialization check
Noah Watkins [Wed, 2 Nov 2011 18:58:43 +0000 (11:58 -0700)]
hadoop: remove initialization check

The initialization check is removed because
it is part of Hadoop's treatment of file systems
that initialize() is called prior to any other
file system routines. This makes the code cleaner
but in the future verison of libcephfs-java, internal
initialization checks should still be made.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: simplify workingDir handling; add home directory
Noah Watkins [Wed, 2 Nov 2011 04:52:48 +0000 (21:52 -0700)]
hadoop: simplify workingDir handling; add home directory

1. Simplifies the handling of paths by allowing them to be passed
around and manipulated in their fully qualified form. Before
paths are passed into native Ceph calls the path-only portion
is extracted.

2. Sets the initial working directory to be the default home
directory for a user (e.g. /user/<username>/).

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: emulate Ceph file owner as current user
Noah Watkins [Wed, 2 Nov 2011 00:25:49 +0000 (17:25 -0700)]
hadoop: emulate Ceph file owner as current user

Make CephFileSystem tell Hadoop that the owner
of all files is the current user. This provides
zero security or isolation, but allows Hadoop
to be used with its default security settings.

A future solution will need to be developed that
provides some isolation, and gives a better user
experience.

This fixes tracker issue #1663

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agohadoop: use standard log4j logging facility
Noah Watkins [Tue, 1 Nov 2011 23:35:12 +0000 (16:35 -0700)]
hadoop: use standard log4j logging facility

Replace ceph.debug(msg, level) with LOG.level(msg)
provided by the log4j facility used by Hadoop. The
level can now be provided on a class-by-class basis
by modifying conf/log4j.properties.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
13 years agoPG: mark scrubmap entry as not absent when we see an update
Samuel Just [Wed, 2 Nov 2011 18:50:29 +0000 (11:50 -0700)]
PG: mark scrubmap entry as not absent when we see an update

Previously, there would be an assert failure in _scan_list if we see an
object deleted and then recreated.

Signed-off-by: Samuel Just <samuel.just@dreamhost.com>
13 years agoMerge branch 'wip-freebsd'
Sage Weil [Wed, 2 Nov 2011 15:45:36 +0000 (08:45 -0700)]
Merge branch 'wip-freebsd'

Conflicts:
src/osd/OSD.cc

13 years agodebian: empty dependency_libs in *.la files
Laszlo Boszormenyi [Tue, 1 Nov 2011 19:57:11 +0000 (12:57 -0700)]
debian: empty dependency_libs in *.la files

Per policy and multiarch support.

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agoadd missingok to logrotate
Laszlo Boszormenyi [Tue, 1 Nov 2011 19:56:34 +0000 (12:56 -0700)]
add missingok to logrotate

When ceph is not running, it has no logs. Thus logrotate has nothing to
rotate. The missingok directive handles this situation.

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agodebian: update VCS sources
Laszlo Boszormenyi [Tue, 1 Nov 2011 19:55:47 +0000 (12:55 -0700)]
debian: update VCS sources

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agodebian: fix libceph1 -> libcephfs1 rename
Laszlo Boszormenyi [Tue, 1 Nov 2011 19:55:17 +0000 (12:55 -0700)]
debian: fix libceph1 -> libcephfs1 rename

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agodebian: add watch
Laszlo Boszormenyi [Tue, 1 Nov 2011 19:54:27 +0000 (12:54 -0700)]
debian: add watch

Signed-off-by: Laszlo Boszormenyi <gcs@debian.hu>
13 years agoosdmaptool: test --create-with-conf with racks
Sage Weil [Wed, 2 Nov 2011 04:20:56 +0000 (21:20 -0700)]
osdmaptool: test --create-with-conf with racks

Make sure we generate a map that will map (and not assert about bad
max_osd/max_device mismatch).

Signed-off-by: Sage Weil <sage@newdream.net>
13 years agoosdmap: assert that osdmap max_osds >= crushmap max_devices
Sage Weil [Wed, 2 Nov 2011 04:14:19 +0000 (21:14 -0700)]
osdmap: assert that osdmap max_osds >= crushmap max_devices

This will catch potential array overruns before they happen.

Signed-off-by: Sage Weil <sage@newdream.net>