git.apps.os.sepia.ceph.com Git

osd: pool cleanups

missed this before:

- no need to initalize in create_pending(), constructor does that
- int32_t, not int
- pool_max while we're at it
- initialize pool_max in OSDMap constructor

commit | commitdiff | tree

Sage Weil [Tue, 16 Feb 2010 22:33:02 +0000 (14:33 -0800)]

todo

commit | commitdiff | tree

Sage Weil [Tue, 16 Feb 2010 22:32:43 +0000 (14:32 -0800)]

mds: ignore session RENEWCAPS if state not open|stale

This avoids breakage where a renewcaps races with a session
being purged, for example.

commit | commitdiff | tree

Greg Farnum [Tue, 16 Feb 2010 22:15:12 +0000 (14:15 -0800)]

osdmap/mon: Be more defensive about highest_pool_num usage

commit | commitdiff | tree

Greg Farnum [Tue, 16 Feb 2010 20:39:46 +0000 (12:39 -0800)]

rados tool: mkpool/rmpool commands now available

commit | commitdiff | tree

Greg Farnum [Tue, 16 Feb 2010 17:22:32 +0000 (09:22 -0800)]

mon: can now delete pools via 'ceph osd pool delete foo'

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 22:54:56 +0000 (14:54 -0800)]

rgw: actually delete pools when using rados!

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 22:54:37 +0000 (14:54 -0800)]

rados/objecter: can now delete pools!

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 22:25:57 +0000 (14:25 -0800)]

mon/msg: MPoolOp can carry POOL_OP_DELETE; OSDMon puts pool in incre old_pools

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 22:12:22 +0000 (14:12 -0800)]

librados: init PoolCtx properly -- was always setting snap_seq to CEPH_NOSNAP

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 21:21:22 +0000 (13:21 -0800)]

osd: Deal with pools being removed from OSDMap.

This potentially has issues, since pools are not removed from the map
until after all the PGs are removed (which is threaded, not inline with
map delivery). But Sage thinks it's okay and the system keeps working
even if you delete a pool while benchmarking on it with rados.

commit | commitdiff | tree

Greg Farnum [Fri, 12 Feb 2010 00:57:23 +0000 (16:57 -0800)]

OSDMap: get_pg_pool now returns a pointer
This lets us return NULL if the pool isn't in the map, which is
needed functionality for pool deletion. Meanwhile, code which
expects the pool to exist will continue to cause a crash if it doesn't.

commit | commitdiff | tree

Greg Farnum [Tue, 16 Feb 2010 17:21:32 +0000 (09:21 -0800)]

rados: fix seg fault on cleanup of a failed pool open

commit | commitdiff | tree

Sage Weil [Mon, 15 Feb 2010 21:47:41 +0000 (13:47 -0800)]

mds: infer 'follows' in journal_dirty_inode on non-head inodes

There are lots of callers to journal_dirty_inode that may
unwittingly be dealing with a non-head inode (e.g.
check_file_max). If the provided inode is snapped, infer an
appropriate follows values so as not to cow_inode() again.

commit | commitdiff | tree

Sage Weil [Mon, 15 Feb 2010 21:27:01 +0000 (13:27 -0800)]

mds: clear cap->issued on flushsnap

This allows _do_cap_update to clear out the client_range.

Kill (now) unused/unnecessary 'wanted' arg to _do_cap_update.

Also delay cap removal until after _do_cap_update (whcih takes
a Capability*). This probably needs further cleanup.

commit | commitdiff | tree

Sage Weil [Mon, 15 Feb 2010 19:40:20 +0000 (11:40 -0800)]

mds: don't croak on null dentries in cache during reconnect/rejoin

They're created when we replay unlink events from the log.

commit | commitdiff | tree

Yehuda Sadeh [Fri, 12 Feb 2010 22:32:11 +0000 (14:32 -0800)]

objectcacher: use trimtrunc read/write ops

commit | commitdiff | tree

Yehuda Sadeh [Fri, 12 Feb 2010 22:23:57 +0000 (14:23 -0800)]

osdc: clean up some mess

commit | commitdiff | tree

Yehuda Sadeh [Fri, 12 Feb 2010 22:05:42 +0000 (14:05 -0800)]

objecter: add read_trunc, write_trunc

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 22:54:01 +0000 (14:54 -0800)]

mkmonfs: rm -rf, so that we kill 0600 admin_keyring.bin

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 22:45:02 +0000 (14:45 -0800)]

osd: fix recovery requeue race

If a recovery op finished right as another recovery op was
begin started, we could get into start_recovery_ops() and get
max = 0 and not start anything. Since the PG wasn't being
requeued for later, it would never recover. So, requeue if we
race and get max == 0.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 22:20:02 +0000 (14:20 -0800)]

init-ceph: print 'already started' instead of failing to start

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 21:38:38 +0000 (13:38 -0800)]

msgr: more conservative locking, thread join asserts

We caught a bunch of crashes like this:

10.02.11 17:01:01.600660 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).do_sendmsg error Broken pipe
10.02.11 17:01:01.600700 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=36 pgs=2409 cs=1 l=0).writer error sending 0x7fc27da1c570, 32: Broken pipe
10.02.11 17:01:01.600796 7f87070c3950 -- 10.3.14.134:6800/8203 >> 10.3.14.130:6800/18914 pipe(0x7fc2be2cebe0 sd=-1 pgs=2409 cs=1 l=0).fault initiating reconnect
...
./common/Thread.h: In function 'int Thread::join(void**)':
./common/Thread.h:66: FAILED assert(0)
1: (Thread::join(void**)+0x73) [0x64fcd3]
2: (SimpleMessenger::Pipe::join_reader()+0x68) [0x6555a2]
3: (SimpleMessenger::Pipe::connect()+0xf5) [0x645be9]
4: (SimpleMessenger::Pipe::writer()+0x157) [0x64793d]
5: (SimpleMessenger::Pipe::Writer::entry()+0x19) [0x63e107]
6: (Thread::_entry_func(void*)+0x20) [0x64e816]
7: /lib/libpthread.so.0 [0x7fc2c3bbdfc7]
8: (clone()+0x6d) [0x7fc2c2e005ad]

that look a bit like multiple procs were racing into
join_reader(). Add an assert to catch that if it happens again,
and also wrap thread starts in pipe_lock to ensure we keep the
_running flags in sync with reality. Add in a few other
sanity checks too.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 21:35:57 +0000 (13:35 -0800)]

mon: note mds beacon times more carefully

We need to update the beacon timestamp even when we are updating
the mds state. Otherwise we can get caught in a busy loop
between marking an mds laggy and !laggy because the beacon stamp
never updates.

So even if we are updating, and the reply will be slow, update
our timestamp, so we don't mark the mds laggy.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 21:27:49 +0000 (13:27 -0800)]

osd: bail out of interval loop completely

We're going backwards, so once this test fails, it always fails,
and we can break instead of continue. Any skipped intervals will
be pruned shortly anyway.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 21:26:19 +0000 (13:26 -0800)]

osd: always update up_thru if pg changes before going active

We already required this if prior PG members were down, so this
affected the 'failure' case.  We now also require it for
non-failure PG changes (expansion, migration).

This fixes our maybe_went_rw calculation for prior PG intervals,
which is based on up_thru.  If maybe_went_rw is false when the
pg actually went rw, we can lose (and have lost) data.  But it is
not practical to calculate without up_thru being consistently
updated, because determining whether a pg would have been able to
go active depends on knowing last_epoch_started at a previous
point in time, which then determines how many prior intervals
may have been considered, which in turn determines whether
up_thru would have been updated, etc.  Much simpler to update it
all the time.

This should not impose a significantly greater cost, since we
already need it for the failure case.  And in general the
migration/expansion/whatever case is no more common nor critical
than the failure case.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 20:52:18 +0000 (12:52 -0800)]

osd: simplify, and version, pg attrs

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 20:45:15 +0000 (12:45 -0800)]

osd: remove some dead code from build_prior

Not sure what any_up_now used to be for, but it's not used now.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 19:20:30 +0000 (11:20 -0800)]

osd: fail startup if store is in use (before we fork)

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 19:07:20 +0000 (11:07 -0800)]

osd: set heartbeat addr properly

This was broken by the osd startup change in 8538efc

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:41:09 +0000 (15:41 -0800)]

osd: fix memset transposed params

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 00:18:54 +0000 (16:18 -0800)]

osd: don't block on mon negotiation on startup

That means we don't check for monmap vs ondisk fsid checks and
such. They're mostly useless anyway.

commit | commitdiff | tree

Sage Weil [Fri, 12 Feb 2010 00:10:26 +0000 (16:10 -0800)]

mkcephfs: fix up permissions, ownership on temp keyrings

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:37:27 +0000 (15:37 -0800)]

ceph_common: sudo su, not su

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:32:45 +0000 (15:32 -0800)]

mkcephfs: always clobber, since we don't support not clobbering anyway

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:31:40 +0000 (15:31 -0800)]

mkmonfs: require '-c conf' to avoid accidents; stash admin keyring

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:31:14 +0000 (15:31 -0800)]

cauthtool: mode 0600 for keyrings

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:21:27 +0000 (15:21 -0800)]

mkcephfs: put admin keyring in mon_data, for safe keeping

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:21:14 +0000 (15:21 -0800)]

mkcephfs: --clobber, not --clobber_old_data

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 21:32:42 +0000 (13:32 -0800)]

qa: +x snaptest1.sh

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 23:11:23 +0000 (15:11 -0800)]

objectcacher: use ObjectSet container instead of inodeno_t hash_maps

Caller provides an ObjectSet* to group objects into.
Later we can put other info here, like truncate_seq and
truncate_size.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 19:39:06 +0000 (11:39 -0800)]

cephx: adjust auth ticket renewal encoding a bit

This simplifies the code slightly, esp in the kclient.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 18:36:45 +0000 (10:36 -0800)]

qa: fix up runallonce.sh

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 18:36:38 +0000 (10:36 -0800)]

debian: fix init script hackery

Copy src/init-ceph to debian/ceph.init _after_ we make, so that
the autoconf paths are substituted in properly.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 18:03:06 +0000 (10:03 -0800)]

todo

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:25:59 +0000 (09:25 -0800)]

mon: print caps to debug log

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:25:15 +0000 (09:25 -0800)]

cephx: nicer debug output in service handler

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:24:42 +0000 (09:24 -0800)]

cephx: use 'next' key for ticketes when 'current' is expired

When generating tickets for clients, use next key if the current
is expired. That ensures they will renew before their ticket
times out.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:06:18 +0000 (09:06 -0800)]

csyn: print something on mount failure

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:04:25 +0000 (09:04 -0800)]

cephx: return expires service keys from rotatingkeyring

Otherwise there's no point in keeping around old service tickets.

To prevent really old tickets from working, we need to rotate
keys. We want slightly old ones to still work, though.. that's
why we keep 3.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 17:00:10 +0000 (09:00 -0800)]

cephx: fix negotiation on reconnect

Don't send another request after initial handshake if we don't
need an auth ticket.

commit | commitdiff | tree

Sage Weil [Thu, 11 Feb 2010 15:54:12 +0000 (07:54 -0800)]

monclient: renew service tickets a bit after the current expires

This ensures the monitor will actually ahve generated a newer
one to give us, avoiding a busy loop.

commit | commitdiff | tree

Yehuda Sadeh [Thu, 11 Feb 2010 01:17:45 +0000 (17:17 -0800)]

osd: don't update object size if didn't write anything

commit | commitdiff | tree

Sage Weil [Wed, 10 Feb 2010 22:51:18 +0000 (14:51 -0800)]

cmon: suggest mkcephfs when 'whoami' not in monfs

commit | commitdiff | tree

Sage Weil [Wed, 10 Feb 2010 22:51:08 +0000 (14:51 -0800)]

cephx: fix up key rotation

commit | commitdiff | tree

Sage Weil [Wed, 10 Feb 2010 19:57:04 +0000 (11:57 -0800)]

cephx: nicer keyserver debug output

commit | commitdiff | tree

Yehuda Sadeh [Wed, 10 Feb 2010 22:34:09 +0000 (14:34 -0800)]

osd: write op updates trancation sequence if not already set

commit | commitdiff | tree

Greg Farnum [Wed, 10 Feb 2010 20:02:42 +0000 (12:02 -0800)]

msgr: Update 'documentation'.

commit | commitdiff | tree

Sage Weil [Wed, 10 Feb 2010 19:33:23 +0000 (11:33 -0800)]

init-ceph, mkcephfs: fix ETCDIR

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:03:12 +0000 (10:03 -0800)]

mds: behave when we pipeline session updates to journal

commit | commitdiff | tree

Greg Farnum [Wed, 10 Feb 2010 00:26:18 +0000 (16:26 -0800)]

msg: union sockaddr_storage to hush strict aliasing warnings and clean up code

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:27:08 +0000 (10:27 -0800)]

init-ceph: Required-start: $remote_fs

This ensures /usr is mounted before ceph daemons start. It seems like
this may be problematic for hosts that act as both servers and clients,
but nfs-kernel-server does the same, so whatev!

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:20:20 +0000 (10:20 -0800)]

debian: do not include var/run/ceph in package; mkdir -p dirname unconditionally

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:19:57 +0000 (10:19 -0800)]

rados man page; include rados in ceph package

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:14:13 +0000 (10:14 -0800)]

debian: include cauthtool

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 18:14:04 +0000 (10:14 -0800)]

rename authtool -> cauthtool

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 17:58:54 +0000 (09:58 -0800)]

mutex: fix file mode

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 16:24:57 +0000 (08:24 -0800)]

debian: fixups to build inside pbuilder

commit | commitdiff | tree

Josef Bacik [Tue, 9 Feb 2010 16:24:23 +0000 (08:24 -0800)]

ceph: fix manpages so they are only installed once

While creating a spec file for CEPH, rpmbuild was complaining because make
install was copying the manpages in, and then copying them in again.  This is
because man_MANS and dist_man_MANS are supposed to be two seperate lists that do
not overlap.  So make install would install all the man pages in the man_MANS
list and the dist_man_MANS list.  This patch kills the dist_man_MANS thing to
keep this from happening.  This made rpmbuild happy, which makes me happy :).
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 16:08:12 +0000 (08:08 -0800)]

osd: count objects degraded due to degraded pg

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 16:06:35 +0000 (08:06 -0800)]

osd: prevent do_waiters() while _dispatch() is running

Fixes race between handle_osd_map and tick() requeuing ops.

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 19:15:03 +0000 (11:15 -0800)]

ceph: wait for monmap

commit | commitdiff | tree

Sage Weil [Tue, 9 Feb 2010 04:29:09 +0000 (20:29 -0800)]

osd: store local osd magic, whoami, and other static bits outside of ObjectStore

These values are immutable, and we also want to look at them prior to
forking and 'mounting' the ObjectStore. Just keep them in separate files
for simplicity.

This avoids the double filestore startup cost paid on cosd startup.

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 20:47:17 +0000 (12:47 -0800)]

osd: print truncate_size signed, and only print at all if _seq > 0

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 18:27:27 +0000 (10:27 -0800)]

debian: updated debian build scripts, changelog

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 17:57:17 +0000 (09:57 -0800)]

mkcephfs: warn on missing keyring for mds, osd

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 17:57:05 +0000 (09:57 -0800)]

authtool: add -a/--add-key command

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 17:56:41 +0000 (09:56 -0800)]

buffer: add decode_base64 method

commit | commitdiff | tree

Sage Weil [Mon, 8 Feb 2010 17:44:33 +0000 (09:44 -0800)]

cephx: pipe down about ticket renewals

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 19:29:39 +0000 (11:29 -0800)]

osd, mds: don't time out authenticate()

Still need to fix wait_auth_rotating....

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 19:19:39 +0000 (11:19 -0800)]

filejournal: make io contiguous in write_bl() for directio

Previously we were splitting the io for writing the header plus first
segment following a wrap.

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 19:18:51 +0000 (11:18 -0800)]

ceph: error out on authentication failure

asdf

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 19:27:17 +0000 (11:27 -0800)]

monc: fix authentication timeout

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 19:18:38 +0000 (11:18 -0800)]

monclient: kill unused wait_authenticate()

commit | commitdiff | tree

Sage Weil [Sat, 6 Feb 2010 18:39:21 +0000 (10:39 -0800)]

objectstore: include struct_v for Transaction