]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
12 years agodoc/: document recovery reservation process
Mike Ryan [Wed, 7 Nov 2012 23:35:56 +0000 (15:35 -0800)]
doc/: document recovery reservation process

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agopg: recovery reservations
Mike Ryan [Fri, 5 Oct 2012 21:37:34 +0000 (14:37 -0700)]
pg: recovery reservations

This extends the backfill reservation system to work with log-based
recovery. The Active and RepActive states of the PG state machine are
greatly expanded to deal with the increased complexity of handling both
recovery and backfill reservations.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoosd: add PG state recovery_wait
Mike Ryan [Wed, 3 Oct 2012 22:25:11 +0000 (15:25 -0700)]
osd: add PG state recovery_wait

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agomessage: add MRecoveryReserve
Mike Ryan [Tue, 2 Oct 2012 22:19:04 +0000 (15:19 -0700)]
message: add MRecoveryReserve

This message will be used to reserve and release recovery slots on
replica PGs.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agomessage: add missing print statement for REJECT message
Mike Ryan [Tue, 2 Oct 2012 22:07:08 +0000 (15:07 -0700)]
message: add missing print statement for REJECT message

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoPG: correct sub-state names in ReplicaActive
Mike Ryan [Tue, 2 Oct 2012 21:18:54 +0000 (14:18 -0700)]
PG: correct sub-state names in ReplicaActive

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agoPG: requeue snap_trimmer after scrub finishes
Mike Ryan [Wed, 31 Oct 2012 18:36:49 +0000 (11:36 -0700)]
PG: requeue snap_trimmer after scrub finishes

Previously the snap_trimmer would continuously requeue itself until the
end of scrub. This degrades performance and fills up logs for No Good
Reason.

Signed-off-by: Mike Ryan <mike.ryan@inktank.com>
12 years agodoc: tiny syntax fix.
John Wilkins [Wed, 31 Oct 2012 21:12:21 +0000 (14:12 -0700)]
doc: tiny syntax fix.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: Added internal anchor references.
John Wilkins [Wed, 31 Oct 2012 21:11:50 +0000 (14:11 -0700)]
doc: Added internal anchor references.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agodoc: using remote copy
John Wilkins [Wed, 31 Oct 2012 21:11:12 +0000 (14:11 -0700)]
doc: using remote copy

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge remote-tracking branch 'upstream/wip_dep_fix'
Samuel Just [Wed, 31 Oct 2012 18:37:06 +0000 (11:37 -0700)]
Merge remote-tracking branch 'upstream/wip_dep_fix'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoREADME: add libboost-program-options-dev
Samuel Just [Wed, 31 Oct 2012 18:34:13 +0000 (11:34 -0700)]
README: add libboost-program-options-dev

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoconfigure.ac: add program_options header check
Samuel Just [Wed, 31 Oct 2012 17:27:33 +0000 (10:27 -0700)]
configure.ac: add program_options header check

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMerge branch 'wip_journal_perf'
Samuel Just [Tue, 30 Oct 2012 20:31:45 +0000 (13:31 -0700)]
Merge branch 'wip_journal_perf'

12 years agoReplicatedPG: actually delay op for backfill_pos
Samuel Just [Mon, 22 Oct 2012 21:25:27 +0000 (14:25 -0700)]
ReplicatedPG: actually delay op for backfill_pos

3f952afe5da644b30015fead8e3d42a129b59989 neglected to
actually delay the op in ReplicatedPG::do_op.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFinisher: add perf counter for queue len
Samuel Just [Mon, 22 Oct 2012 18:09:18 +0000 (11:09 -0700)]
Finisher: add perf counter for queue len

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileJournal: rename queue_lock to finisher_lock
Samuel Just [Tue, 16 Oct 2012 16:33:01 +0000 (09:33 -0700)]
FileJournal: rename queue_lock to finisher_lock

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileJournal: write_cond is not used
Samuel Just [Tue, 16 Oct 2012 16:25:07 +0000 (09:25 -0700)]
FileJournal: write_cond is not used

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileJournal: break writeq locking from queue_lock
Samuel Just [Mon, 15 Oct 2012 22:39:55 +0000 (15:39 -0700)]
FileJournal: break writeq locking from queue_lock

This prevents the relatively long process of queueing
finishers from preventing op submission.

In submit_entry, we no longer check for full before placing
the write in the writeq, committed_thru should work anyway,
and we don't want to grab the required lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoThrottle: reduce lock hold periods
Samuel Just [Tue, 16 Oct 2012 19:32:20 +0000 (12:32 -0700)]
Throttle: reduce lock hold periods

Previously, we tended to dump a lot of log output under
the Throttle lock.  The log level for most log statements
has been reduced to 10.

Additionally, count and max are now atomic_t and can be
read without the Throttle lock.

Finally, most of the perf counter manipulations have been
moved outside of the lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoos: instrument submit lock, apply lock, queue_lock, write_lock
Samuel Just [Thu, 11 Oct 2012 01:21:13 +0000 (18:21 -0700)]
os: instrument submit lock, apply lock, queue_lock, write_lock

Adds Mutex perfcounter tracking to mutexes of interest.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileStore: add op_throttle_lock
Samuel Just [Wed, 10 Oct 2012 16:44:32 +0000 (09:44 -0700)]
FileStore: add op_throttle_lock

Avoid using op_tp lock for the op throttle.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileStore: don't lock op_tp in queue_op
Samuel Just [Wed, 10 Oct 2012 16:43:57 +0000 (09:43 -0700)]
FileStore: don't lock op_tp in queue_op

Neither caller of queue_op can race.
1) in queue_transactions, already under submit lock
2) in _journaled_ahead, journal finisher is single threaded

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoperf_counters: add dec()
Samuel Just [Mon, 22 Oct 2012 17:46:57 +0000 (10:46 -0700)]
perf_counters: add dec()

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournalingFileStore: move apply/commit sequencing to apply_manager
Samuel Just [Sat, 6 Oct 2012 00:33:36 +0000 (17:33 -0700)]
JournalingFileStore: move apply/commit sequencing to apply_manager

syncing the filestore requires a stable commit point (i.e., all ops
up to applied_seq must have been applied).  Previously, we used
journal_lock to atomically block new applies while waiting for
the remaining ones to finish.  This creates unnecessary contention.
We now use apply_manager to manage that state atomically with its
own lock.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournalingFileStore: create submit_manager to order op submission
Samuel Just [Fri, 5 Oct 2012 20:46:13 +0000 (13:46 -0700)]
JournalingFileStore: create submit_manager to order op submission

Previously, we ensured op ordering by queueing for journal and
the op queue under the journal lock.  All that is required is
that obtaining an op sequence, queueing for journal, and
(for parallel) queueing for application to the fs are done
atomically.  To that end, submit_manager now handles op submission.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournalingObjectStore: remove force_commit, no longer needed
Samuel Just [Fri, 5 Oct 2012 23:26:35 +0000 (16:26 -0700)]
JournalingObjectStore: remove force_commit, no longer needed

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournalingObjectStore: whitespace fix
Samuel Just [Fri, 5 Oct 2012 23:12:36 +0000 (16:12 -0700)]
JournalingObjectStore: whitespace fix

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileStore: remove trigger_commit
Samuel Just [Thu, 2 Aug 2012 16:39:08 +0000 (09:39 -0700)]
FileStore: remove trigger_commit

This is no longer used.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoJournalingFileStore: pass -1 as the alignment if unimportant
Samuel Just [Tue, 31 Jul 2012 16:04:40 +0000 (09:04 -0700)]
JournalingFileStore: pass -1 as the alignment if unimportant

Previously, data_align began at 0 and remained that way if no
transaction contained a large data segment.  This 0 was propagated
to prepare_single_write, which padded out most of a page to ensure
that the bl started with 0 alignment.  Passing -1 will ensure that
we don't prepad these small segments.

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoFileStore: next_finish is not used
Samuel Just [Wed, 17 Oct 2012 20:06:51 +0000 (13:06 -0700)]
FileStore: next_finish is not used

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agotest/bench: add tp bench
Samuel Just [Tue, 23 Oct 2012 05:04:40 +0000 (22:04 -0700)]
test/bench: add tp bench

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agotest/bench: small io benchmarker
Samuel Just [Sat, 6 Oct 2012 20:58:37 +0000 (13:58 -0700)]
test/bench: small io benchmarker

Precreates objects and does writes to random offsets within
random objects.

Includes rados, filestore, and vanilla fs variants

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agoMutex: Instrument Mutex with perfcouter for Lock() wait
Samuel Just [Thu, 11 Oct 2012 01:20:31 +0000 (18:20 -0700)]
Mutex: Instrument Mutex with perfcouter for Lock() wait

Signed-off-by: Samuel Just <sam.just@inktank.com>
12 years agomsg/SimpleMessenger: start accepter in ready()
Sage Weil [Tue, 30 Oct 2012 20:19:30 +0000 (13:19 -0700)]
msg/SimpleMessenger: start accepter in ready()

Start the accepter thread when the first dispatcher is ready.  This ensures
that there will be someone around to verify authorizers for incoming
connections, and means we have a bit less failure noise on the monitors
as a result.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomon: separate pre- and post-fork init
Sage Weil [Tue, 30 Oct 2012 20:16:57 +0000 (13:16 -0700)]
mon: separate pre- and post-fork init

Do most init pre-fork, then do the last little bit (start up messenger,
bootstrap) post-fork.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: fix seq # fix
Sage Weil [Tue, 30 Oct 2012 20:08:57 +0000 (13:08 -0700)]
msg/Pipe: fix seq # fix

02f6262f47f72178a78d410f4facab7bbc97b098 got this all wrong (though it
worked by accident).

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosd: verify authorizers for heartbeat dispatcher
Sage Weil [Tue, 30 Oct 2012 19:49:53 +0000 (12:49 -0700)]
osd: verify authorizers for heartbeat dispatcher

This was broken with the fixed messenger behavior with missing
verify_authorizer methods in 100fcca3cb54c97c4332328aad67d4b796f33ec2.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: fix typo in cinder upstart config name
Josh Durgin [Tue, 30 Oct 2012 19:34:19 +0000 (12:34 -0700)]
doc: fix typo in cinder upstart config name

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
12 years agodoc: Added syntax fixes to Peter's session authentication doc.
John Wilkins [Tue, 30 Oct 2012 18:20:51 +0000 (11:20 -0700)]
doc: Added syntax fixes to Peter's session authentication doc.

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agomsg/Pipe: whitespace cleanup
Sage Weil [Tue, 30 Oct 2012 17:00:54 +0000 (10:00 -0700)]
msg/Pipe: whitespace cleanup

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agomsg/Pipe: only randomize start seq #'s if MSG_AUTH feature is present
Sage Weil [Tue, 30 Oct 2012 17:00:42 +0000 (10:00 -0700)]
msg/Pipe: only randomize start seq #'s if MSG_AUTH feature is present

The kernel client expects seq #'s to start at 1 or else it is unhappy.
So, only randomize these values if the MSG_AUTH feature is present--that is
the only time it matters anyway.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agodoc: update fs recommendations
Sage Weil [Mon, 29 Oct 2012 20:01:06 +0000 (13:01 -0700)]
doc: update fs recommendations

More forceful about recommending XFS.  More warning about using btrfs in
production deployments.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agocephx: don't check signature if MSG_AUTH feature isn't present
Sage Weil [Mon, 29 Oct 2012 22:48:15 +0000 (15:48 -0700)]
cephx: don't check signature if MSG_AUTH feature isn't present

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoauth: include features in cephx SessionHandler
Sage Weil [Mon, 29 Oct 2012 22:47:45 +0000 (15:47 -0700)]
auth: include features in cephx SessionHandler

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoFixed problem with checking authorizer in accept().
Peter Reiher [Mon, 29 Oct 2012 21:36:08 +0000 (14:36 -0700)]
Fixed problem with checking authorizer in accept().

Signed-off-by: Peter Reiher <reiher@inktank.com>
12 years agolibrbd: Fix 32-bit compilation errors
Dan Mick [Mon, 29 Oct 2012 18:03:15 +0000 (11:03 -0700)]
librbd: Fix 32-bit compilation errors

Switch size_t in clip_io to uint64_t; it's just easier, and the
alternative would be to limit 32-bit builds to sizes <= 4GB

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge branch 'master' of github.com:ceph/ceph
Peter Reiher [Mon, 29 Oct 2012 19:47:18 +0000 (12:47 -0700)]
Merge branch 'master' of github.com:ceph/ceph

12 years agoTemporary patch to a problem in Pipe related to monitor initialization.
Peter Reiher [Mon, 29 Oct 2012 19:42:29 +0000 (12:42 -0700)]
Temporary patch to a problem in Pipe related to monitor initialization.

Signed-off-by: Peter Reiher <reiher@inktank.com>
12 years agoMerge branch 'wip-oc-neg'
Sage Weil [Mon, 29 Oct 2012 19:37:08 +0000 (12:37 -0700)]
Merge branch 'wip-oc-neg'

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agodep-report.sh: ceph package dependency report.
Gary Lowell [Mon, 29 Oct 2012 16:55:33 +0000 (09:55 -0700)]
dep-report.sh:  ceph package dependency report.

This script searches the ceph build area for dependent header files and
and libraries to attempt to identify ceph package dependecies.

12 years agoclient: Fix ref counting double free with hardlink
Sam Lang [Mon, 29 Oct 2012 15:30:01 +0000 (10:30 -0500)]
client: Fix ref counting double free with hardlink

Peforming a hard link through the libcephfs interface causes
a double free on shutdown, due to the Client::link call decrementing
the parent (of the target) directory's inode.  This fix removes the
put_inode(dir) call, to match the behavior of Client::ll_link.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agotest: Functional test for hardlink/unmount pattern
Sam Lang [Fri, 19 Oct 2012 16:38:33 +0000 (11:38 -0500)]
test: Functional test for hardlink/unmount pattern

This test currently breaks on libcephfs as reported
in #3367.

Signed-off-by: Sam Lang <sam.lang@inktank.com>
12 years agoosdc/ObjectCacher: remove dead locking code
Sage Weil [Sat, 27 Oct 2012 20:56:24 +0000 (13:56 -0700)]
osdc/ObjectCacher: remove dead locking code

This is unused, and mostly broken in that there is no cleanup when there
is a failure.  Also, the support in the OSD has been largely removed.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: clip requests past end-of-image.
Dan Mick [Tue, 23 Oct 2012 04:15:51 +0000 (21:15 -0700)]
librbd: clip requests past end-of-image.

Rename check_io to clip_io, which can modify the passed-in length
to clamp it to the device size.  This is expected behavior for
block-device emulation.

Call clip_io in rbd_write(); need to return clipped length there,
even though aio_write() is calling clip_io() as well (for the
direct path).

Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agolibrbd: size max objects based on actual image object order size
Sage Weil [Sat, 27 Oct 2012 00:12:44 +0000 (17:12 -0700)]
librbd: size max objects based on actual image object order size

This has to happen after we open the image.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agorgw_cache: change call signature to overwrite rgw_rados put_obj_meta()
caleb miles [Fri, 26 Oct 2012 19:17:05 +0000 (15:17 -0400)]
rgw_cache: change call signature to overwrite rgw_rados put_obj_meta()

Signed-off-by: caleb miles <caleb.miles@inktank.com>
12 years agoMerge branch 'master' of github.com:ceph/ceph
Peter Reiher [Fri, 26 Oct 2012 22:32:48 +0000 (15:32 -0700)]
Merge branch 'master' of github.com:ceph/ceph

12 years agoMerge branch 'master' of https://github.com/ceph/ceph
John Wilkins [Fri, 26 Oct 2012 21:49:00 +0000 (14:49 -0700)]
Merge branch 'master' of https://github.com/ceph/ceph

12 years agodoc: updated front page graphic.
John Wilkins [Fri, 26 Oct 2012 21:45:08 +0000 (14:45 -0700)]
doc: updated front page graphic.

fixes: #3412

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
12 years agoMerge branch 'wip-java-cephfs'
Noah Watkins [Fri, 26 Oct 2012 21:37:25 +0000 (14:37 -0700)]
Merge branch 'wip-java-cephfs'

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reviewed-by: Joe Buck <joe.buck@inktank.com>
12 years agoPG: Do not discard op data too early
Jim Schutt [Thu, 27 Sep 2012 21:56:15 +0000 (15:56 -0600)]
PG: Do not discard op data too early

Under a sustained cephfs write load where the offered load is higher
than the storage cluster write throughput, a backlog of replication ops
that arrive via the cluster messenger builds up.  The client message
policy throttler, which should be limiting the total write workload
accepted by the storage cluster, is unable to prevent it, for any
value of osd_client_message_size_cap, under such an overload condition.

The root cause is that op data is released too early, in op_applied().

If instead the op data is released at op deletion, then the limit
imposed by the client policy throttler applies over the entire
lifetime of the op, including commits of replication ops.  That
makes the policy throttler an effective means for an OSD to
protect itself from a sustained high offered load, because it can
effectively limit the total, cluster-wide resources needed to process
in-progress write ops.

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
12 years agojava: use unique directory in test
Noah Watkins [Fri, 26 Oct 2012 20:28:52 +0000 (13:28 -0700)]
java: use unique directory in test

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: add tests for double mounting
Noah Watkins [Thu, 25 Oct 2012 22:10:17 +0000 (15:10 -0700)]
java: add tests for double mounting

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: add AlreadyMounted exception
Noah Watkins [Thu, 25 Oct 2012 22:09:54 +0000 (15:09 -0700)]
java: add AlreadyMounted exception

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: remove deprecated ceph_shutdown
Noah Watkins [Thu, 25 Oct 2012 21:42:27 +0000 (14:42 -0700)]
java: remove deprecated ceph_shutdown

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: clean-up in finalize()
Noah Watkins [Thu, 25 Oct 2012 21:43:09 +0000 (14:43 -0700)]
java: clean-up in finalize()

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: enable ceph_release
Noah Watkins [Thu, 25 Oct 2012 21:23:18 +0000 (14:23 -0700)]
java: enable ceph_release

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: enable ceph_unmount
Noah Watkins [Thu, 25 Oct 2012 21:10:24 +0000 (14:10 -0700)]
java: enable ceph_unmount

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: mkdirs returns IOException
Noah Watkins [Sat, 20 Oct 2012 17:58:23 +0000 (10:58 -0700)]
java: mkdirs returns IOException

For example, CephFileAlreadyExistsException may be returned if mkdirs is
called to create a directory already present.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: log listdir contents in java client
Noah Watkins [Thu, 25 Oct 2012 15:51:33 +0000 (08:51 -0700)]
java: log listdir contents in java client

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: remove tabs to fix formatting
Noah Watkins [Fri, 19 Oct 2012 19:22:05 +0000 (12:22 -0700)]
java: remove tabs to fix formatting

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: add O_WRONLY open flag
Noah Watkins [Fri, 19 Oct 2012 19:20:40 +0000 (12:20 -0700)]
java: add O_WRONLY open flag

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agojava: add FileAlreadyExists exception
Noah Watkins [Fri, 19 Oct 2012 19:10:25 +0000 (12:10 -0700)]
java: add FileAlreadyExists exception

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
12 years agoosdc/ObjectCacher: handle zero bufferheads on read
Sage Weil [Fri, 26 Oct 2012 18:55:34 +0000 (11:55 -0700)]
osdc/ObjectCacher: handle zero bufferheads on read

Interpret a zero bufferhead as zeros in _readx().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: add ZERO bufferheads from map_read()
Sage Weil [Fri, 26 Oct 2012 18:54:50 +0000 (11:54 -0700)]
osdc/ObjectCacher: add ZERO bufferheads from map_read()

When we add a bufferhead with zeros to the Object data map, use the new
zero type instead of allocating actual zeros.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: add zero bufferhead state
Sage Weil [Fri, 26 Oct 2012 18:48:51 +0000 (11:48 -0700)]
osdc/ObjectCacher: add zero bufferhead state

Wired up, but not yet used.

Treat these as clean.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agotest_librbd_fsx: sleep before exit
Sage Weil [Fri, 26 Oct 2012 18:33:31 +0000 (11:33 -0700)]
test_librbd_fsx: sleep before exit

This gives the log time to flush to disk.  Kludgey!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: some extra debugging
Sage Weil [Fri, 26 Oct 2012 18:32:44 +0000 (11:32 -0700)]
osdc/ObjectCacher: some extra debugging

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: fill in zero buffers in map_read() on miss if complete
Sage Weil [Wed, 24 Oct 2012 21:42:50 +0000 (14:42 -0700)]
osdc/ObjectCacher: fill in zero buffers in map_read() on miss if complete

If we know we have the complete object in cache, fill in zero buffers
when we miss.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: improve debug output for readx()
Sage Weil [Wed, 24 Oct 2012 21:43:03 +0000 (14:43 -0700)]
osdc/ObjectCacher: improve debug output for readx()

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: set complete flag when we observe ENOENT
Sage Weil [Wed, 24 Oct 2012 21:41:38 +0000 (14:41 -0700)]
osdc/ObjectCacher: set complete flag when we observe ENOENT

If we observe an ENOENT on a read, set the complete flag.  Any dirty
buffers we have will still be in memory, even if the write are in flight,
because the TX state remains pinned until the writes commit.  Writes cannot
proceed faster than reads, even though reads may proceed faster than
writes.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: clear complete on trim, release
Sage Weil [Wed, 24 Oct 2012 21:36:05 +0000 (14:36 -0700)]
osdc/ObjectCacher: clear complete on trim, release

Clear the complete flag when we are discarding buffers.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: add complete flag
Sage Weil [Wed, 24 Oct 2012 21:35:24 +0000 (14:35 -0700)]
osdc/ObjectCacher: add complete flag

This is set when we know we have *all* the data for this object.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: refresh iterator in read apply loop
Sage Weil [Wed, 24 Oct 2012 19:48:02 +0000 (12:48 -0700)]
osdc/ObjectCacher: refresh iterator in read apply loop

The p iterator points to the next bh, but try_merge_bh() at the end of the
loop might merge that into our result and invalidate the iterator.  Fix
this by repeating the lookup on each pass through the loop.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: do read completions after assimilating read result
Sage Weil [Wed, 24 Oct 2012 19:44:25 +0000 (12:44 -0700)]
osdc/ObjectCacher: do read completions after assimilating read result

Wait until we have applied the entire read result to the cache before we
trigger any read completion events.  This is a cleaner and safer approach
since we can be sure that the callback won't get blocked again on data we
have but haven't applied yet.  It also fixes a crash I just observed where
the completion did a read, called trim(), and invalidated/destroyed the
iterator/bh p was referencing.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: do not close objects explicitly
Sage Weil [Tue, 23 Oct 2012 16:20:53 +0000 (09:20 -0700)]
osdc/ObjectCacher: do not close objects explicitly

Let the trimmer do that.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: make trim() trim Objects
Sage Weil [Tue, 23 Oct 2012 16:20:35 +0000 (09:20 -0700)]
osdc/ObjectCacher: make trim() trim Objects

Pull unpinned objects off the LRU in trim().  This never happens currently
due to all the explicit calls to close_object()...

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: check lru_is_expireable() in can_close()
Sage Weil [Tue, 23 Oct 2012 16:18:04 +0000 (09:18 -0700)]
osdc/ObjectCacher: check lru_is_expireable() in can_close()

We assert that if can_close(), the Object isn't pinned in the LRU.  This
assumes we did yur get/put refcounting properly, such that the pins are
at least as restrictive as can_close().

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: add LRU for Object
Sage Weil [Tue, 23 Oct 2012 12:58:27 +0000 (05:58 -0700)]
osdc/ObjectCacher: add LRU for Object

Incomplete; we aren't trimming yet.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: take Object ref for bh writes
Sage Weil [Tue, 23 Oct 2012 13:04:08 +0000 (06:04 -0700)]
osdc/ObjectCacher: take Object ref for bh writes

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: take refs for inflight lock ops
Sage Weil [Tue, 23 Oct 2012 13:03:09 +0000 (06:03 -0700)]
osdc/ObjectCacher: take refs for inflight lock ops

These are all dead/unused; should probably just rip out this code!

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: take Object ref when there are buffers
Sage Weil [Tue, 23 Oct 2012 12:55:50 +0000 (05:55 -0700)]
osdc/ObjectCacher: take Object ref when there are buffers

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: add ref count to Object
Sage Weil [Tue, 23 Oct 2012 12:55:23 +0000 (05:55 -0700)]
osdc/ObjectCacher: add ref count to Object

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agoosdc/ObjectCacher: rename lru_* -> bh_lru_*
Sage Weil [Tue, 23 Oct 2012 12:42:37 +0000 (05:42 -0700)]
osdc/ObjectCacher: rename lru_* -> bh_lru_*

We'll be adding LRUs for objects, too.

Signed-off-by: Sage Weil <sage@inktank.com>
12 years agolibrbd: fix race in AioCompletion that are still being built
Sage Weil [Fri, 26 Oct 2012 18:30:06 +0000 (11:30 -0700)]
librbd: fix race in AioCompletion that are still being built

When caching is enabled, it is possible for the io completion to happen
faster than we call ->finish_adding_requests() (e.g., on cache read).
When that happens, the final read request completion doesn't see a
pending_count == 0 and thus doesn't do all the final buffer construction
that is necessary to return correct data.  In particular, users will see
zeroed buffers.  test_librbd_fsx is turning this up consistently after
several thousand ops with an image size of ~100MB and cloning disabled.

This was introduced with the extra logic added here with striping.

Fix this by making a separate flag to indicate the completion is under
construction, and make sure we call complete() when both pending_count==0
and building==false.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
12 years agoMerge branch 'wip-msgauth4'
Peter Reiher [Fri, 26 Oct 2012 16:25:15 +0000 (09:25 -0700)]
Merge branch 'wip-msgauth4'

Conflicts:
src/common/config_opts.h
Added a couple of options related to session authentication, accepted new values for option from master

12 years agoMerge branch 'wip-client-unmount'
Noah Watkins [Fri, 26 Oct 2012 16:07:19 +0000 (09:07 -0700)]
Merge branch 'wip-client-unmount'

Signed-off-by: Noah Watkins <noah.watkins@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
12 years agoVarious cleanup changes to session authentication code.
Peter Reiher [Fri, 26 Oct 2012 15:57:29 +0000 (08:57 -0700)]
Various cleanup changes to session authentication code.

Signed-off-by: Peter Reiher <reiher@inktank.com>
12 years agoclient: add ceph_release, ceph_shutdown
Noah Watkins [Thu, 25 Oct 2012 20:00:05 +0000 (13:00 -0700)]
client: add ceph_release, ceph_shutdown

Notes that ceph_shutdown() is now deprecated.

Signed-off-by: Noah Watkins <noahwatkins@gmail.com>