]>
git.apps.os.sepia.ceph.com Git - ceph.git/log
Sage Weil [Thu, 4 Jun 2009 20:18:45 +0000 (13:18 -0700)]
mds: allow cap updates to root inode
Sage Weil [Thu, 4 Jun 2009 19:28:51 +0000 (12:28 -0700)]
kclient: fix snap_rwsem vs s_mutex deadlock in ceph_inode_set_size
Call ceph_check_caps outside of the snap_rwsem. Fixes this lockdep
warning:
[84919.839574] cp/2724 is trying to acquire lock:
[84919.839574] (&s->s_mutex){--..}, at: [<
ffffffffa00fa0f3 >] ceph_check_caps+0x
968/0xc04 [ceph]
[84919.839574]
[84919.839574] but task is already holding lock:
[84919.839574] (&mdsc->snap_rwsem){----}, at: [<
ffffffffa00f3803 >] ceph_write_b
egin+0x1d9/0x696 [ceph]
[84919.839574]
[84919.839574] which lock already depends on the new lock.
[84919.839574]
[84919.839574]
[84919.839574] the existing dependency chain (in reverse order) is:
[84919.839574]
[84919.839574] -> #1 (&mdsc->snap_rwsem){----}:
[84919.839574] [<
ffffffff8025e4bd >] validate_chain+0x9ef/0xce8
[84919.839574] [<
ffffffff8025eee6 >] __lock_acquire+0x730/0x7b9
[84919.839574] [<
ffffffff8025eff4 >] lock_acquire+0x85/0xa9
[84919.839574] [<
ffffffff8061577d >] down_write+0x43/0x77
[84919.839574] [<
ffffffffa010ab3c >] ceph_mdsc_handle_reply+0x960/0xa72 [c
eph]
[84919.839574] [<
ffffffffa00e7520 >] ceph_dispatch+0x4eb/0x5f6 [ceph]
[84919.839574] [<
ffffffffa0103906 >] try_read+0x15cf/0x17da [ceph]
[84919.839574] [<
ffffffffa0103da0 >] con_work+0x28f/0x18a9 [ceph]
[84919.839574] [<
ffffffff8024cb7c >] run_workqueue+0xf5/0x209
[84919.839574] [<
ffffffff8024d88b >] worker_thread+0xdb/0xe8
[84919.839574] [<
ffffffff8025078c >] kthread+0x49/0x79
[84919.839574] [<
ffffffff8020d17a >] child_rip+0xa/0x20
[84919.839574] [<
ffffffffffffffff >] 0xffffffffffffffff
[84919.839574]
[84919.839574] -> #0 (&s->s_mutex){--..}:
[84919.839574] [<
ffffffff8025e14f >] validate_chain+0x681/0xce8
[84919.839574] [<
ffffffff8025eee6 >] __lock_acquire+0x730/0x7b9
[84919.839574] [<
ffffffff8025eff4 >] lock_acquire+0x85/0xa9
[84919.839574] [<
ffffffff80614fd2 >] mutex_lock_nested+0x116/0x2f4
[84919.839574] [<
ffffffffa00fa0f3 >] ceph_check_caps+0x968/0xc04 [ceph]
[84919.839574] [<
ffffffffa00e9c19 >] ceph_inode_set_size+0xe1/0xf4 [ceph]
[84919.839574] [<
ffffffffa00f2e3c >] ceph_write_end+0x16f/0x1b4 [ceph]
[84919.839574] [<
ffffffff8027e33c >] generic_file_buffered_write+0x1a2/0x2
d4
[84919.839574] [<
ffffffff8027e958 >] __generic_file_aio_write_nolock+0x354
/0x3be
[84919.839574] [<
ffffffff8027ebe4 >] generic_file_aio_write+0x66/0xc2
[84919.839574] [<
ffffffffa00f1737 >] ceph_aio_write+0x969/0xb52 [ceph]
[84919.839574] [<
ffffffff802aa3eb >] do_sync_write+0xe2/0x126
[84919.839574] [<
ffffffff802aabbc >] vfs_write+0xae/0x137
[84919.839574] [<
ffffffff802ab072 >] sys_write+0x47/0x6f
[84919.839574] [<
ffffffff8020c0db >] system_call_fastpath+0x16/0x1b
[84919.839574] [<
ffffffffffffffff >] 0xffffffffffffffff
[84919.839574]
[84919.839574] other info that might help us debug this:
[84919.839574]
[84919.839574] 2 locks held by cp/2724:
[84919.839574] #0: (&sb->s_type->i_mutex_key#12){--..}, at: [<
ffffffff8027ebcd
>] generic_file_aio_write+0x4f/0xc2
[84919.839574] #1: (&mdsc->snap_rwsem){----}, at: [<
ffffffffa00f3803 >] ceph_wr
ite_begin+0x1d9/0x696 [ceph]
[84919.839574]
[84919.839574] stack backtrace:
[84919.839574] Pid: 2724, comm: cp Not tainted 2.6.29 #16
[84919.839574] Call Trace:
[84919.839574] [<
ffffffff8025dac3 >] print_circular_bug_tail+0xc7/0xd2
[84919.839574] [<
ffffffff8025e14f >] validate_chain+0x681/0xce8
[84919.839574] [<
ffffffff8028352a >] ? get_page_from_freelist+0x3c6/0x6b5
[84919.839574] [<
ffffffff8025eee6 >] __lock_acquire+0x730/0x7b9
[84919.839574] [<
ffffffff8025eff4 >] lock_acquire+0x85/0xa9
[84919.839574] [<
ffffffffa00fa0f3 >] ? ceph_check_caps+0x968/0xc04 [ceph]
[84919.839574] [<
ffffffff80614fd2 >] mutex_lock_nested+0x116/0x2f4
[84919.839574] [<
ffffffffa00fa0f3 >] ? ceph_check_caps+0x968/0xc04 [ceph]
[84919.839574] [<
ffffffffa00fa0f3 >] ? ceph_check_caps+0x968/0xc04 [ceph]
[84919.839574] [<
ffffffffa00fa0f3 >] ceph_check_caps+0x968/0xc04 [ceph]
[84919.839574] [<
ffffffff8025a70e >] ? get_lock_stats+0x34/0x5e
[84919.839574] [<
ffffffffa00e9c19 >] ceph_inode_set_size+0xe1/0xf4 [ceph]
[84919.839574] [<
ffffffffa00f2e3c >] ceph_write_end+0x16f/0x1b4 [ceph]
[84919.839574] [<
ffffffff8027e33c >] generic_file_buffered_write+0x1a2/0x2d4
[84919.839574] [<
ffffffff8027e958 >] __generic_file_aio_write_nolock+0x354/0x3be
[84919.839574] [<
ffffffff8027ebe4 >] generic_file_aio_write+0x66/0xc2
[84919.839574] [<
ffffffffa00f1737 >] ceph_aio_write+0x969/0xb52 [ceph]
[84919.839574] [<
ffffffff802bd728 >] ? touch_atime+0xee/0x110
[84919.839574] [<
ffffffff80329414 >] ? nfs_file_read+0x107/0x116
[84919.839574] [<
ffffffff802aa3eb >] do_sync_write+0xe2/0x126
[84919.839574] [<
ffffffff8025ef04 >] ? __lock_acquire+0x74e/0x7b9
[84919.839574] [<
ffffffff8025a746 >] ? put_lock_stats+0xe/0x27
[84919.839574] [<
ffffffff802508a3 >] ? autoremove_wake_function+0x0/0x38
[84919.839574] [<
ffffffff802d1d5b >] ? dnotify_parent+0x6c/0x74
[84919.839574] [<
ffffffff803d5f70 >] ? security_file_permission+0x11/0x13
[84919.839574] [<
ffffffff802aabbc >] vfs_write+0xae/0x137
[84919.839574] [<
ffffffff802ab072 >] sys_write+0x47/0x6f
[84919.839574] [<
ffffffff8020c0db >] system_call_fastpath+0x16/0x1b
Sage Weil [Thu, 4 Jun 2009 17:30:34 +0000 (10:30 -0700)]
mon: set seq properly on mds takeover
Sage Weil [Thu, 4 Jun 2009 18:00:47 +0000 (11:00 -0700)]
osd: fix sub_op_push wakeup on pulled object
soid was a ref to op->.. move delete op to the end.
Sage Weil [Thu, 4 Jun 2009 17:49:45 +0000 (10:49 -0700)]
object: pass const refs to comparators
Sage Weil [Thu, 4 Jun 2009 16:51:31 +0000 (09:51 -0700)]
Merge commit 'origin/rados' into rados
Need to fix rados_list call, and ReplicatedPG PGLS code (translate
sobject_t -> object_t).
Conflicts:
src/librados.cc
src/os/FileStore.cc
src/osd/ReplicatedPG.cc
src/testradospp.cc
Yehuda Sadeh [Wed, 3 Jun 2009 23:59:40 +0000 (16:59 -0700)]
rados: pgls small cleanup
Yehuda Sadeh [Wed, 3 Jun 2009 23:56:58 +0000 (16:56 -0700)]
rados: pgls fixes, uses context
Sage Weil [Wed, 3 Jun 2009 23:50:14 +0000 (16:50 -0700)]
filestore: nicer filenames for CEPH_NOSNAP objects
Sage Weil [Wed, 3 Jun 2009 23:49:14 +0000 (16:49 -0700)]
osd: clear reqid in backlog log entries
This avoids confusion where the version and reqid don't match, and
a single reqid appears multiple times in the pg log.
Sage Weil [Wed, 3 Jun 2009 23:45:47 +0000 (16:45 -0700)]
osd: don't index BACKLOG caller_ops
Sage Weil [Wed, 3 Jun 2009 23:10:28 +0000 (16:10 -0700)]
mds: keep only one dir commit in flight
Ordering is not guaranteed, so putting multiple commits in flight
is not safe.. the osd may commit the older data on top of the new.
Sage Weil [Wed, 3 Jun 2009 22:43:01 +0000 (15:43 -0700)]
osd: rev osd protocol, disk format
Sage Weil [Wed, 3 Jun 2009 22:42:35 +0000 (15:42 -0700)]
kclient: specify object id as string
Yehuda Sadeh [Wed, 3 Jun 2009 21:43:18 +0000 (14:43 -0700)]
raods: minor and merge fixes
Yehuda Sadeh [Wed, 3 Jun 2009 21:39:27 +0000 (14:39 -0700)]
rados: list pg, first take, partially works
Yehuda Sadeh [Tue, 2 Jun 2009 00:17:49 +0000 (17:17 -0700)]
osd: add PGLS op
Sage Weil [Wed, 3 Jun 2009 21:05:47 +0000 (14:05 -0700)]
osd: nicer pg log object names
Sage Weil [Wed, 3 Jun 2009 20:53:43 +0000 (13:53 -0700)]
osd: clean up object_t args
Sage Weil [Wed, 3 Jun 2009 20:53:35 +0000 (13:53 -0700)]
osdc: clean up object_t args
Sage Weil [Wed, 3 Jun 2009 20:51:25 +0000 (13:51 -0700)]
rados: clean up object_t args
Sage Weil [Wed, 3 Jun 2009 20:48:58 +0000 (13:48 -0700)]
osd: kill pobject_t (use sobject_t throughout)
Sage Weil [Wed, 3 Jun 2009 20:45:56 +0000 (13:45 -0700)]
os: const ref (p)object_t arguments where possible
Sage Weil [Wed, 3 Jun 2009 20:39:07 +0000 (13:39 -0700)]
osd: make object_t a string
Sage Weil [Wed, 3 Jun 2009 18:37:45 +0000 (11:37 -0700)]
osd: fix obc ref counting
Sage Weil [Wed, 3 Jun 2009 18:29:30 +0000 (11:29 -0700)]
osd: fix object context registration
The probjected object state wasn't working because the obc
wasn't making it into the map.
Sage Weil [Wed, 3 Jun 2009 18:21:38 +0000 (11:21 -0700)]
osd: fix replication
The shipped transaction wasn't getting applied to disk.
Sage Weil [Wed, 3 Jun 2009 18:47:46 +0000 (11:47 -0700)]
filestore: fix build error on 32-bit
Sage Weil [Wed, 3 Jun 2009 17:15:50 +0000 (10:15 -0700)]
objecter, rados: constify write bufferlist& refs
Sage Weil [Wed, 3 Jun 2009 17:12:34 +0000 (10:12 -0700)]
rados: c++ aio methods
Sage Weil [Wed, 3 Jun 2009 00:14:40 +0000 (17:14 -0700)]
mds todo
Sage Weil [Wed, 3 Jun 2009 00:14:29 +0000 (17:14 -0700)]
uclient: fix snaprealm update in add_update_cap
Make conditional consistent with the in->snaprealm clearing in the
cap removal path.
This was throwing a bad assertion during mds restart.
Sage Weil [Tue, 2 Jun 2009 21:21:36 +0000 (14:21 -0700)]
mds: purge preallocated inos/files when client disconnected
When a client is kicked out of the cluster, purge any data written to
preallocated inos. This should be the first object in the file sequence.
Sage Weil [Tue, 2 Jun 2009 20:33:40 +0000 (13:33 -0700)]
kclient: correctly set REPLAY flag in requests on replay
Sage Weil [Tue, 2 Jun 2009 20:33:07 +0000 (13:33 -0700)]
mds: replay all old client requests before handling new requests
Adds a new CLIENTREPLAY state between REJOIN and ACTIVE. It it's strictly
necessary for anyone to know the MDS is handling it's backlog first, but
it doesn't hurt.
Sage Weil [Tue, 2 Jun 2009 19:20:49 +0000 (12:20 -0700)]
crushtool: fix warning
Sage Weil [Tue, 2 Jun 2009 18:44:37 +0000 (11:44 -0700)]
mds: no early reply when request has slaves
Doing an early reply when there are slaves means we need to be able to
reliably replay that op as well, and the complexity needed to do that
when we're locking stuff on multiple MDS's is so not worth it. These
ops are pretty rare anyway.
Sage Weil [Tue, 2 Jun 2009 17:52:38 +0000 (10:52 -0700)]
rados: add pg (non-object) ops
Some ops run against the whole pg, not individual objects. Setting the
PGOP flag selects a different write path that skips missing object checks
(ignores oid).
Sage Weil [Mon, 1 Jun 2009 23:45:00 +0000 (16:45 -0700)]
rados: fix warning
Sage Weil [Tue, 2 Jun 2009 04:21:07 +0000 (21:21 -0700)]
rados: test aio write
Sage Weil [Tue, 2 Jun 2009 04:20:54 +0000 (21:20 -0700)]
rados: rename aio_free to aio_release, fix bug
Yehuda Sadeh [Mon, 1 Jun 2009 23:51:42 +0000 (16:51 -0700)]
filestore: add collection_list_partial
Yehuda Sadeh [Mon, 1 Jun 2009 22:12:39 +0000 (15:12 -0700)]
class: add cls_read api function
Sage Weil [Mon, 1 Jun 2009 21:36:33 +0000 (14:36 -0700)]
todo
Sage Weil [Mon, 1 Jun 2009 21:36:14 +0000 (14:36 -0700)]
rados: first pass at aio interface
Yehuda Sadeh [Mon, 1 Jun 2009 20:24:47 +0000 (13:24 -0700)]
class: able to update classes in runtime
Yehuda Sadeh [Fri, 29 May 2009 22:10:13 +0000 (15:10 -0700)]
class: classhandler locking cleanup
Sage Weil [Sat, 30 May 2009 00:40:15 +0000 (17:40 -0700)]
mds todos
Sage Weil [Sat, 30 May 2009 00:32:42 +0000 (17:32 -0700)]
osd: cleanup types
Sage Weil [Sat, 30 May 2009 00:28:13 +0000 (17:28 -0700)]
logger: fix warning
Sage Weil [Sat, 30 May 2009 00:28:07 +0000 (17:28 -0700)]
osd: include client ticket in MOSDOp
One step closer to stabilizing the client <-> osd protocol.
Sage Weil [Fri, 29 May 2009 22:33:00 +0000 (15:33 -0700)]
logger: limit precision of averages
Sage Weil [Sat, 30 May 2009 00:07:29 +0000 (17:07 -0700)]
initscript: fix instance check, again
Sage Weil [Fri, 29 May 2009 22:35:43 +0000 (15:35 -0700)]
cosd: bad disks in cosd3
Sage Weil [Fri, 29 May 2009 21:43:53 +0000 (14:43 -0700)]
cosd: fix typo
Sage Weil [Fri, 29 May 2009 21:39:12 +0000 (14:39 -0700)]
autoconf: check for openssl dev files
Yehuda Sadeh [Fri, 29 May 2009 21:38:40 +0000 (14:38 -0700)]
class: added file
Sage Weil [Fri, 29 May 2009 21:29:15 +0000 (14:29 -0700)]
kclient: max trim_caps actually work
Sage Weil [Fri, 29 May 2009 21:11:23 +0000 (14:11 -0700)]
mds: trim client state based on # caps, not real memory utilization
Sage Weil [Mon, 20 Apr 2009 17:22:25 +0000 (10:22 -0700)]
mem: gather meminfo stats
Sage Weil [Mon, 20 Apr 2009 15:45:26 +0000 (08:45 -0700)]
mds: encapsulate /proc examination in MemoryModel; calc heap size
Sage Weil [Thu, 16 Apr 2009 04:17:47 +0000 (21:17 -0700)]
mds: some infrastructure to recall state from clients when under memory pressure
Also some logging to monitor mds inode and cap stats.
Sage Weil [Fri, 29 May 2009 18:58:35 +0000 (11:58 -0700)]
vstart: pipe down
Sage Weil [Fri, 29 May 2009 18:58:22 +0000 (11:58 -0700)]
kclient: trim caps on demand
Sage Weil [Fri, 29 May 2009 18:56:27 +0000 (11:56 -0700)]
kclient: rework iterate_session_caps
Do not drop s_cap_lock if we failed to get an inode ref. Othwerise we
risk the next *n pointer being invalid.
Sage Weil [Fri, 29 May 2009 17:43:31 +0000 (10:43 -0700)]
kclient: make session cap list an LRU; combine touch with check for issued caps
This both makes the issued cap check better (only traverses caps until we
get what we want), and combines the touch_cap step to bump it to the front
of the LRU.
So far all callers do the touch, so the touch arg is of dubious value..
Yehuda Sadeh [Fri, 29 May 2009 18:19:37 +0000 (11:19 -0700)]
class: added a crypto class that does md5 and sha1
Yehuda Sadeh [Fri, 29 May 2009 18:06:50 +0000 (11:06 -0700)]
class: dlopen uses RTLD_NOW instead of lazy loading
Yehuda Sadeh [Thu, 28 May 2009 18:42:08 +0000 (11:42 -0700)]
Merge branch 'rados' of ssh://ceph.newdream.net/git/ceph into rados
Yehuda Sadeh [Thu, 28 May 2009 18:30:51 +0000 (11:30 -0700)]
class: rdcall implemented
Sage Weil [Thu, 28 May 2009 18:36:50 +0000 (11:36 -0700)]
vstart: fewer pgs
Sage Weil [Thu, 28 May 2009 18:23:58 +0000 (11:23 -0700)]
osdmaptool: fix usage
Sage Weil [Thu, 28 May 2009 18:23:52 +0000 (11:23 -0700)]
buffer: fix hexdump formatting
Sage Weil [Thu, 28 May 2009 18:23:43 +0000 (11:23 -0700)]
osd: ship transaction (not op) to replicas
This simplifies the code path on the replica. It avoids duplicating
computation, which can be a win even for trivial ops when snaps are
involved. For active objects, it will avoid the computation as well.
Currently parallel execution will still happen if you specify a flag.
However, an object method should probably also be able to specify ||
execution is better in cases where the operation is deterministic and
|| exec will mean less data over the network.
Disable unused vector<T*> template, since it confuses the
vector<const char *> encoder.
Sage Weil [Wed, 27 May 2009 23:54:15 +0000 (16:54 -0700)]
kclient: avoid null resv ctx dereference
Sage Weil [Thu, 28 May 2009 04:43:41 +0000 (21:43 -0700)]
osd: move object_info_t, exists, size into ObjectState
The main goal is to capture everything needed by the op context
(the object_info_t, exists, size) in something that doesn't include all
of ObjectContext (such as teh current access mode, num writers, and other
things that only the primary needs) so that the sub_op_modify setup is
as simple as possible.
This kills the annoying exists, size args in all the helpers.
Yehuda Sadeh [Wed, 27 May 2009 22:53:49 +0000 (15:53 -0700)]
class: dependent classes are loaded automatically
Yehuda Sadeh [Wed, 27 May 2009 22:53:15 +0000 (15:53 -0700)]
class: utility to load classes
Sage Weil [Wed, 27 May 2009 21:40:45 +0000 (14:40 -0700)]
kclient: fix cap resv BUG
Sage Weil [Wed, 27 May 2009 21:06:25 +0000 (14:06 -0700)]
filer: fix probing when recovered size is 0
Fixes a bug where it wraps around to a negative (large)
file size.
Sage Weil [Wed, 27 May 2009 21:06:13 +0000 (14:06 -0700)]
kclient: put readdir max entries in mount options
Sage Weil [Wed, 27 May 2009 20:58:20 +0000 (13:58 -0700)]
kclient: rework cap reservation accounting a bit
Invariants:
total = used + reserved + avail
len(caps_list) = reserved + avail
Previously, reserved was part of avail, which was confusing. Also
fixed up the reserve func such that the above invariants are always
true.
Sage Weil [Wed, 27 May 2009 18:49:02 +0000 (11:49 -0700)]
initscript: fix instance check
Sage Weil [Wed, 27 May 2009 19:07:41 +0000 (12:07 -0700)]
cosd: journal to a dedicated disk
Sage Weil [Wed, 27 May 2009 19:07:07 +0000 (12:07 -0700)]
osd: fix PG::IndexLog unindex()
The REMOVE entries won't be in the objects map.
Sage Weil [Wed, 27 May 2009 19:06:38 +0000 (12:06 -0700)]
initscript: remove notreelog by default
This screws us if we put the journal file on the same btrfs volume. Not
that it works that well anyway, but it works even worse this way. And
since FileStore never calls fsync(), it's fine.
Sage Weil [Wed, 27 May 2009 17:34:41 +0000 (10:34 -0700)]
script: fix req format in osd latency check
Sage Weil [Wed, 27 May 2009 17:43:25 +0000 (10:43 -0700)]
filestore: stop sync thread before journal
Otherwise we get sync thread interaction with freed journal.
Sage Weil [Wed, 27 May 2009 17:03:26 +0000 (10:03 -0700)]
osd: create journal of specified size during mkfs
Sage Weil [Tue, 26 May 2009 23:09:34 +0000 (16:09 -0700)]
kclient: avoid d_time reuse s.t. leases and dir rdcache can coexist
We want to be able to remember a lease even when we are issued
RDCACHE on the entire dir. (Otherwise, we fail to release the
dentry when doing an op, and have to wait for the revoke round
trip.)
So avoid reusing d_time for the rdcache_gen. Unconditionally set
it in update_dentry_lease.
Also, force inclusion of a no-op inode update if we need to include
the dentry lease release, even when the dir caps aren't changing.
Sage Weil [Tue, 26 May 2009 22:51:01 +0000 (15:51 -0700)]
mds: look at loner issued|wanted in file_eval
Sage Weil [Wed, 27 May 2009 03:19:56 +0000 (20:19 -0700)]
rados: aio prototypes
Yehuda Sadeh [Tue, 26 May 2009 20:41:35 +0000 (13:41 -0700)]
librados: display result _after_ wait
Yehuda Sadeh [Tue, 26 May 2009 18:40:50 +0000 (11:40 -0700)]
class: don't add a class when no valid binary is supplied
Yehuda Sadeh [Tue, 26 May 2009 18:28:05 +0000 (11:28 -0700)]
osd: don't send reply on message that got EAGAIN
Sage Weil [Tue, 26 May 2009 17:33:42 +0000 (10:33 -0700)]
osd: show error string in reply msg printout
Sage Weil [Tue, 26 May 2009 17:28:42 +0000 (10:28 -0700)]
mds: fix EXCL -> * check in file_eval to use loner_wanted, not issued
We should leave EXCL if the loner doesn't want the EXCL bits
(WR, EXCL, BUFFER), not if it's not issued (which is transitory).
Sage Weil [Tue, 26 May 2009 17:27:53 +0000 (10:27 -0700)]
vstart: set debug levels in conf, not cmd line
This lets you restart things more easily.
Sage Weil [Tue, 26 May 2009 16:39:30 +0000 (09:39 -0700)]
librados: remove length args from C++ interface
The bufferlists remove any need for an additional length arg.
Return ERANGE when buffers are too small.
Sage Weil [Tue, 26 May 2009 16:10:10 +0000 (09:10 -0700)]
makefile: don't build fakesyn