]> git.apps.os.sepia.ceph.com Git - ceph.git/log
ceph.git
11 years agopybind/rados.py: improve error output 1163/head
Yehuda Sadeh [Thu, 30 Jan 2014 00:05:17 +0000 (16:05 -0800)]
pybind/rados.py: improve error output

Fixes: 7264
When failing to load librados, output the exception.

Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agoMerge pull request #1156 from ceph/wip-ceph-disk
Alfredo Deza [Wed, 29 Jan 2014 15:25:07 +0000 (07:25 -0800)]
Merge pull request #1156 from ceph/wip-ceph-disk

ceph-disk: run the right executables from udev

11 years agoceph-disk: run the right executables from udev 1156/head
Josh Durgin [Wed, 29 Jan 2014 01:26:58 +0000 (17:26 -0800)]
ceph-disk: run the right executables from udev

When run by the udev rules, PATH is not defined. Thus,
ceph-disk-activate relies on its which() function to locate the
correct executable.  The which() function used os.defpath if none was
set, and this worked for anything using it.

ad6b4b4b08b6ef7ae8086f2be3a9ef521adaa88c added a new default value to
PATH, so only /usr/bin was checked by callers that did not use
which(). This resulted in the mount command not being found when
ceph-disk-activate was run by udev, and thus osds failing to start
after being prepared by ceph-deploy.

Make ceph-disk consistently use the existing helpers (command() and
command_check_call()) that use which(), so lack of PATH does not
matter. Simplify _check_output() to use command(),
another wrapper around subprocess.Popen.

Fixes: #7258
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
11 years agoosd: OSDMonitor: ignore pgtemps from removed pool
Joao Eduardo Luis [Tue, 28 Jan 2014 15:54:33 +0000 (15:54 +0000)]
osd: OSDMonitor: ignore pgtemps from removed pool

There's a window in-between receiving an MOSDPGTemp message from an OSD
and actually handling it that may lead to the pool the pg temps refer to
no longer existing. This may happen if the MOSDPGTemp message is queued
pending dispatching due to an on-going proposal (maybe even the pool
removal).

This patch fixes such behavior in two steps:

1. Check if the pool exists in the osdmap upon preprocessing
 - if pool does not exist in the osdmap, then the pool must have been
   removed prior to handling the message, but after the osd sent it.
 - safe to ignore the pg update
2. If all pg updates in the message have been ignored, ignore the whole
   message.  Otherwise, let prepare handle the rest.

3. Recheck if pool exists in the osdmap upon prepare
 - We may have ignored this pg back in preprocess, but other pgs in the
   message may have led the message to be passed on to prepare; ignore
   pg update once more.
4. Check if pool is pending removal and ignore pg update if so.

We delegate checking the pending value to prepare_pgtemp() because in this
case we should only ignore the update IFF the pending value is in fact
committed.  Otherwise we should retry the message.  prepare_pgtemp() is
the appropriate place to do so.

Fixes: 7116
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit f513f66f48383a07c70ca18a4dba6c2449ea9860)

11 years agobuffer: make 0-length splice() a no-op
Sage Weil [Tue, 28 Jan 2014 18:26:12 +0000 (10:26 -0800)]
buffer: make 0-length splice() a no-op

This was causing a problem in the Striper, but fixing it here will avoid
corner cases all over the tree.  Note that we have to bail out before
the end-of-buffer check to avoid hitting that check when the bufferlist is
also empty.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agoosdc/Striper: test zero-length add_partial_result
Sage Weil [Tue, 28 Jan 2014 18:09:17 +0000 (10:09 -0800)]
osdc/Striper: test zero-length add_partial_result

If we add a partial result that is 0-length, we used to hit an assert in
buffer::list::splice().  Add a unit test to verify the fix.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agopackaging: apply udev hack rule to RHEL
Derek Yarnell [Mon, 27 Jan 2014 19:27:51 +0000 (12:27 -0700)]
packaging: apply udev hack rule to RHEL

In the RPM spec file there is a test to deploy the uuid hack udev rules
for older udev operating systems. This includes CentOS and RHEL, but the
check currently only is for CentOS, causing RHEL clients to get a bogus
osd rules file.

Adjust the conditional to apply to RHEL as well as CentOS. (The %{rhel}
macro is defined in both platforms' redhat-rpm-config package.)

Fixes http://tracker.ceph.com/issues/7245

Signed-off-by: Ken Dreyer <ken.dreyer@inktank.com>
(cherry picked from commit 64a0b4fa563795bc22753940aa3a4a2946113109)

11 years agoMerge pull request #1132 from ceph/wip-erasure-rule
Loic Dachary [Thu, 23 Jan 2014 18:22:49 +0000 (10:22 -0800)]
Merge pull request #1132 from ceph/wip-erasure-rule

osd/OSDMap: do not create erasure rule by default

Reviewed-by: Greg Farnum <greg@inktank.com>
Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoosd/OSDMap: do not create erasure rule by default 1132/head
Sage Weil [Thu, 23 Jan 2014 17:16:54 +0000 (09:16 -0800)]
osd/OSDMap: do not create erasure rule by default

If we do, we will require the v2 feature bit from clients.

We could only include feature bits for rules that are actually referenced
by pools, but for now making the user create the rule is simpler.  There is
no need to create this rule ahead of time.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoadded perl script for rgw bucket quota tests
tamil [Wed, 22 Jan 2014 02:51:49 +0000 (18:51 -0800)]
added perl script for rgw bucket quota tests

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
11 years agoremoving rgw_tests.sh
tamil [Wed, 22 Jan 2014 02:44:57 +0000 (18:44 -0800)]
removing rgw_tests.sh

Signed-off-by: tamil <tamil.muthamizhan@inktank.com>
(cherry picked from commit 54caa0192b02b03549fe4ca5d062495e6e429f97)

11 years agoMerge pull request #1123 from ceph/wip-stray-mdsmaps
Gregory Farnum [Tue, 21 Jan 2014 20:04:42 +0000 (12:04 -0800)]
Merge pull request #1123 from ceph/wip-stray-mdsmaps

mon/MDSMonitor: do not generate mdsmaps from already-laggy mds
Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agomon/MDSMonitor: do not generate mdsmaps from already-laggy mds 1123/head
Sage Weil [Tue, 21 Jan 2014 19:29:56 +0000 (11:29 -0800)]
mon/MDSMonitor: do not generate mdsmaps from already-laggy mds

There is one path where a mds that is not sending its beacon (e.g.,
because it is not running at all) will lead to proposal of new mdsmaps.
Fix it.

Backport: emperor, dumpling
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1119 from ceph/wip-7184
Josh Durgin [Tue, 21 Jan 2014 18:34:20 +0000 (10:34 -0800)]
Merge pull request #1119 from ceph/wip-7184

osd: ignore num_objects_dirty for old pools

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
11 years agopackaging: ship libdir/ceph
Ken Dreyer [Wed, 15 Jan 2014 00:56:32 +0000 (17:56 -0700)]
packaging: ship libdir/ceph

Automake puts ceph_common.sh into libdir/ceph, but the Red Hat packaging
was not capturing this file.

Add the libdir/ceph location to the RPM packaging.

Fixes #7117

(cherry picked from commit 2d0d48b829bd5721b7058ec43f61481fe8542b12)

11 years agocommon: fix bufferlist::append(istream) test
Loic Dachary [Tue, 14 Jan 2014 17:25:55 +0000 (18:25 +0100)]
common: fix bufferlist::append(istream) test

bufferlist::append(istream) now filters out empty lines; reflect this in
the test

Signed-off-by: Loic Dachary <loic@dachary.org>
(cherry picked from commit 4b5f2570e9208bc9311267e93401d6078370807a)

11 years agoMerge branch 'bclibcoop/next-cors' of https://github.com/BCLibCoop/ceph into next
Yehuda Sadeh [Sun, 19 Jan 2014 05:48:30 +0000 (21:48 -0800)]
Merge branch 'bclibcoop/next-cors' of https://github.com/BCLibCoop/ceph into next

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
11 years agorgw: Use correct secret key for POST authn 1114/head
Robin H. Johnson [Sun, 19 Jan 2014 02:01:20 +0000 (18:01 -0800)]
rgw: Use correct secret key for POST authn

The POST authentication by signature validation looked up a user based
on the access key, then used the first secret key for the user. If the
access key used was not the first access key, then the expected
signature would be wrong, and the POST would be rejected.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
11 years agorgw: Fix signature variable naming/failure print
Robin H. Johnson [Sun, 19 Jan 2014 01:52:01 +0000 (17:52 -0800)]
rgw: Fix signature variable naming/failure print

The signature variables for expected vs got are poorly named, and this
lead them being swapped in the signature validation failure print.
Change them to 'expected' and 'received' and make the related temporary
variables consistent to match.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
11 years agorgw: Document fields for access/secret key
Robin H. Johnson [Sun, 19 Jan 2014 01:49:06 +0000 (17:49 -0800)]
rgw: Document fields for access/secret key

The field name mapping for access vs secret key is not clear, this
helped in debugging.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
11 years agoosd: ignore num_objects_dirty for old pools 1119/head
Sage Weil [Sun, 19 Jan 2014 05:19:58 +0000 (21:19 -0800)]
osd: ignore num_objects_dirty for old pools

Way back in a0ed9c20048750fd4b2c7ce0339fa8b20ef08ca3 we introduced the
dirty flag, but we did not track it in the stats until much later in
c561d5ea22c335e4c05fdc16ca8fb41f75f89d81.  Unfortunately this interval
spans the emperor release.  To avoid making scrub error out and require
repair on *any* of those old pools, flag stats that were encoded before
now such that the dirty stats are ignored.  Clear the flag if we *do*
do a repair so that it will be tracked properly thereafter.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge branch 'next'
Ken Dreyer [Tue, 14 Jan 2014 16:16:41 +0000 (16:16 +0000)]
Merge branch 'next'

11 years agoMerge pull request #1076 from dachary/wip-vector-op
Loic Dachary [Tue, 14 Jan 2014 16:10:59 +0000 (08:10 -0800)]
Merge pull request #1076 from dachary/wip-vector-op

erasure-code: use uintptr_t instead of long long

Reviewed-by: Andreas Peters <andreas.joachim.peters@cern.ch>
11 years agoMerge pull request #1078 from ceph/wip-mon-pgmap
Loic Dachary [Tue, 14 Jan 2014 06:38:09 +0000 (22:38 -0800)]
Merge pull request #1078 from ceph/wip-mon-pgmap

mon: make 'pg getmap' not include a trailing newline

Reviewed-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1071 from ceph/wip-max-file-size
Sage Weil [Tue, 14 Jan 2014 01:43:49 +0000 (17:43 -0800)]
Merge pull request #1071 from ceph/wip-max-file-size

allow mds max file size to be adjusted

Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
11 years agoMerge pull request #1058 from ceph/wip-cache-snap
Sage Weil [Tue, 14 Jan 2014 00:50:17 +0000 (16:50 -0800)]
Merge pull request #1058 from ceph/wip-cache-snap

snap/clone promotion, flush, and other goodies

This is now passing the thrashing with both cache and snap ops:
  sage-2014-01-13_15:45:26-rados:thrash-wip-cache-snap-testing-basic-plana

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agoosd/ReplicatedPG: use get_object_context in trim_object 1058/head
Sage Weil [Mon, 13 Jan 2014 23:09:27 +0000 (15:09 -0800)]
osd/ReplicatedPG: use get_object_context in trim_object

find_object_context() has all the logic to choose a particular clone given
a logical snap.  In the trim case, we want none of that: we just need to
pull the obc for a specific clone instance.  Note that this changes
none of the failure cases (previous we asserted r == 0).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: do not delete in-use snaps
Sage Weil [Fri, 10 Jan 2014 19:12:48 +0000 (11:12 -0800)]
ceph_test_rados: do not delete in-use snaps

There are a bunch of ops that read from snaps.  Do not delete a snap
while they are in use.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/OSDMonitor: fix 'osd tier add ...' pool mangling
Sage Weil [Fri, 10 Jan 2014 04:59:36 +0000 (20:59 -0800)]
osd/OSDMonitor: fix 'osd tier add ...' pool mangling

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: update ObjectContext's object_info_t for new hit_set objects
Sage Weil [Fri, 10 Jan 2014 00:04:21 +0000 (16:04 -0800)]
osd/ReplicatedPG: update ObjectContext's object_info_t for new hit_set objects

We were fabricating an object_info_t correctly and writing it to disk, but
it was not reflected by the in-memory ObjectContext.  If something came
along quickly (like backfill) and tried to use it, the info would be
invalid.

Fix this by fabricating it in the obc and copying it to the new_obs for
the update.

Fixes: #7122
Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: always return ENOENT on deleted snap
Sage Weil [Thu, 9 Jan 2014 22:49:52 +0000 (14:49 -0800)]
osd/ReplicatedPG: always return ENOENT on deleted snap

Previously, if a snap was deleted but the clone was there and we hadn't
trimmed it yet, we would still return the data.  Instead, return ENOENT
unconditionally (even it's not removed yet).  This makes the behavior from
the client perspective more predictable and conistent.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: partial test for promote vs snap trim race
Sage Weil [Thu, 9 Jan 2014 10:01:48 +0000 (02:01 -0800)]
ceph_test_rados_api_tier: partial test for promote vs snap trim race

This reliably returns ENODEV due to the test at the finish of flush.  Not
because we are actually racing with trim, though: the trimmer doesn't run
at all.  I believe it captures the important property, though.  Namely:
we should not write a promoted object that is "behind" the snap trimmer's
progress.  The fact that we are in front of it (the trimmer hasn't started
yet) should not matter since the object is logically deleted anyway.

We probably want to make the OSD return ENODEV on read in the normal case
when you try to access a clone that is pending trimming.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: cleanly abort flush if the object no longer exists
Sage Weil [Mon, 6 Jan 2014 01:44:49 +0000 (17:44 -0800)]
osd/ReplicatedPG: cleanly abort flush if the object no longer exists

If the object no longer exists (for example, because the snap trimmer just
killed it) clean up the flush state without trying to mark the object
clean.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/Replicated: mark obc !exists on snap trim
Sage Weil [Mon, 6 Jan 2014 01:43:57 +0000 (17:43 -0800)]
osd/Replicated: mark obc !exists on snap trim

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon: debug propagate_snaps_to_tiers
Sage Weil [Mon, 6 Jan 2014 01:43:23 +0000 (17:43 -0800)]
mon: debug propagate_snaps_to_tiers

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: fix propagation of removed snaps to other tiers
Sage Weil [Mon, 6 Jan 2014 01:43:05 +0000 (17:43 -0800)]
osd: fix propagation of removed snaps to other tiers

When we update removed_snaps we do not update snap_seq.  Drop this broken
optimization.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle promote that races with snap deletion
Sage Weil [Mon, 6 Jan 2014 00:02:19 +0000 (16:02 -0800)]
osd/ReplicatedPG: handle promote that races with snap deletion

If we are promoting a clone and realize that the object is no longer
defined for any snaps, abort the copy and delete any temp object.

If the defined snaps have changed, make sure they are updated in memory
so that on promote completion the snapshot metadata is correct.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: simplify copy-from temp object handling
Sage Weil [Sun, 5 Jan 2014 19:36:55 +0000 (11:36 -0800)]
osd/ReplicatedPG: simplify copy-from temp object handling

Previously the caller was generating a temp object name and passing it
down in severaly different ways.  Instead, generate one when we realize
that we need it, and store it in *one* place (CopyResults), where
the completions can get at the information.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_misc: test bad version for copy-from
Sage Weil [Sun, 5 Jan 2014 20:26:48 +0000 (12:26 -0800)]
ceph_test_rados_misc: test bad version for copy-from

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: adjust flow in process_copy_chunk
Sage Weil [Sun, 5 Jan 2014 09:04:16 +0000 (01:04 -0800)]
osd/ReplicatedPG: adjust flow in process_copy_chunk

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: make CopyResults inline in CopyOp
Sage Weil [Fri, 3 Jan 2014 23:26:00 +0000 (15:26 -0800)]
osd/ReplicatedPG: make CopyResults inline in CopyOp

No reason to put this on the heap.  Make the lifetime match that of the
CopyOp.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: flush can also fail due to snap trimming
Sage Weil [Thu, 2 Jan 2014 18:48:57 +0000 (10:48 -0800)]
ceph_test_rados: flush can also fail due to snap trimming

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: handle promotion of rollback, src_oids, etc.
Sage Weil [Mon, 30 Dec 2013 22:56:54 +0000 (14:56 -0800)]
osd/ReplicatedPG: handle promotion of rollback, src_oids, etc.

Make other find_object_context() callers handle the case where the object
in question needs to be promoted.  We add a flag here that forces a promote
for these secondary objects so that the entire operation happens in the
same pool.  Forwarding is not allowed in this case.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: preserve clean/dirty state on clone
Sage Weil [Mon, 30 Dec 2013 20:54:03 +0000 (12:54 -0800)]
osd/ReplicatedPG: preserve clean/dirty state on clone

If we have a clean object and clone it in make_writeable(), the clone
should also be clean (it does not need to be written back to the base
pool).  If the object was dirty, the clone should be dirty.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: improve read debug output
Sage Weil [Mon, 30 Dec 2013 20:52:39 +0000 (12:52 -0800)]
ceph_test_rados: improve read debug output

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: infer snaps from head when promoting oldest clean clone
Sage Weil [Mon, 30 Dec 2013 20:57:28 +0000 (12:57 -0800)]
osd/ReplicatedPG: infer snaps from head when promoting oldest clean clone

Consider:

 - base and cache have same object foo; marked clean in cache pool
 - modify + clone foo in cache pool.  foo clone is clean.
 - foo clone is evicted
 - foo clone is read, and promoted
 - we read foo@something from base pool, and get the head's content

copy-get does not provide us with a snaps list.  Instead, we use the
snap_seq from the head to infer what the snaps vector was in the cache
pool and will be in the base pool when we flush the updates to the object.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: include snap_seq in copy-get results
Sage Weil [Mon, 30 Dec 2013 19:47:33 +0000 (11:47 -0800)]
osd: include snap_seq in copy-get results

This is needed by the cache layer when reading a logical snap from a head
object on the backend in order to correctly recreate the clone in the
cache layer.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: always set obc->ssc SnapSetContext for clones
Sage Weil [Mon, 30 Dec 2013 20:52:20 +0000 (12:52 -0800)]
osd/ReplicatedPG: always set obc->ssc SnapSetContext for clones

This can be useful!

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: do not promote nonexistent clones
Sage Weil [Mon, 30 Dec 2013 19:10:46 +0000 (11:10 -0800)]
osd/ReplicatedPG: do not promote nonexistent clones

Do not promote a clone for a snap that we know doesn't exist.  If
find_object_context() didn't give us a missing_oid, there is nothing to
promote.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: is_dirty on non-flushing objects only
Sage Weil [Mon, 30 Dec 2013 17:04:40 +0000 (09:04 -0800)]
ceph_test_rados: is_dirty on non-flushing objects only

This makes its results reliable.  Otherwise, we can't mix the is_dirty
test with flush, which eliminates much of its value.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: assert on read error
Sage Weil [Mon, 30 Dec 2013 17:04:02 +0000 (09:04 -0800)]
ceph_test_rados: assert on read error

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: make flush clean correct snap in model
Sage Weil [Sat, 28 Dec 2013 01:17:19 +0000 (17:17 -0800)]
ceph_test_rados: make flush clean correct snap in model

11 years agoceph_test_rados: IsDirty on random snaps
Sage Weil [Sat, 28 Dec 2013 01:12:54 +0000 (17:12 -0800)]
ceph_test_rados: IsDirty on random snaps

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: test flush/evict on snaps
Sage Weil [Sat, 28 Dec 2013 00:55:15 +0000 (16:55 -0800)]
ceph_test_rados: test flush/evict on snaps

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados: don't update any state on successful cache-evict
Sage Weil [Sat, 28 Dec 2013 00:52:48 +0000 (16:52 -0800)]
ceph_test_rados: don't update any state on successful cache-evict

- we didn't touch the user_version
- we didn't change the clean/dirty state

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: test flush on snaps/clones
Sage Weil [Fri, 27 Dec 2013 23:42:09 +0000 (15:42 -0800)]
ceph_test_rados_api_tier: test flush on snaps/clones

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: construct appropriate snapc for flush/writeback
Sage Weil [Fri, 27 Dec 2013 23:41:47 +0000 (15:41 -0800)]
osd/ReplicatedPG: construct appropriate snapc for flush/writeback

Construct a snap context that will trigger the appropriate cloning (if any)
on the base pool.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: add pg_log_entry_t event type CLEAN
Sage Weil [Fri, 27 Dec 2013 23:14:42 +0000 (15:14 -0800)]
osd: add pg_log_entry_t event type CLEAN

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: refuse to flush when older dirty clones are present
Sage Weil [Fri, 27 Dec 2013 21:42:54 +0000 (13:42 -0800)]
osd/ReplicatedPG: refuse to flush when older dirty clones are present

If the next oldest clone is dirty, we cannot flush.  That is, we must
always flush starting with the oldest dirty clone.

Note that we can never have a sequence like dirty -> clean -> dirty,
because clones are only dirty on creation, are created in order, and cannot
be flushed (cleaned) out of order.  Thus checking the previous clone is
sufficient (and thankfully cheap).

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agovstart.sh: allow MDS=0
Sage Weil [Fri, 27 Dec 2013 21:31:07 +0000 (13:31 -0800)]
vstart.sh: allow MDS=0

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: make cache-[try-]flush CACHE instead of WR ops
Sage Weil [Fri, 27 Dec 2013 20:53:59 +0000 (12:53 -0800)]
osd/ReplicatedPG: make cache-[try-]flush CACHE instead of WR ops

This will allow us to send a flush op on a snap.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: allow cache-evict on snaps
Sage Weil [Sat, 28 Dec 2013 00:11:27 +0000 (16:11 -0800)]
osd/ReplicatedPG: allow cache-evict on snaps

We do three things here:

 - make cache-evict a CACHE instead of WR op, allowing us to submit it
   on snaps (not just head)
 - allow eviction of a snap
 - verify that all snaps are missing before evicting a head

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd: add rados CACHE mode (different from RD and WR)
Sage Weil [Fri, 27 Dec 2013 19:15:19 +0000 (11:15 -0800)]
osd: add rados CACHE mode (different from RD and WR)

It is useful to distinguish cache operations from read and modify
operations.  Specifically, we will allow cache ops to be sent for
snaps and also allow those ops to result in a write.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph_test_rados_api_tier: test promotion of clones
Sage Weil [Fri, 27 Dec 2013 02:06:13 +0000 (18:06 -0800)]
ceph_test_rados_api_tier: test promotion of clones

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: update snap_mapper for promoted clones
Sage Weil [Fri, 27 Dec 2013 01:32:43 +0000 (17:32 -0800)]
osd/ReplicatedPG: update snap_mapper for promoted clones

A clone that comes into existence via promotion takes an entirely
different path than a typical clone (which comes into existence via a
CLONE op in make_writeable()).  Make sure snap_mapper is updated
accordingly.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: only encode SnapSet on head objects in finish_ctx
Sage Weil [Fri, 27 Dec 2013 23:43:40 +0000 (15:43 -0800)]
osd/ReplicatedPG: only encode SnapSet on head objects in finish_ctx

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: always encode snaps in finish_ctx
Sage Weil [Thu, 26 Dec 2013 17:19:08 +0000 (09:19 -0800)]
osd/ReplicatedPG: always encode snaps in finish_ctx

On promote we use finish_ctx to build the final log entries, and need to
encode the snaps vector in that case.  (Normally this is done by
make_writeable or explicitly by the snap trimmer.)

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: mirror SnapSet info when promoting head
Sage Weil [Fri, 27 Dec 2013 02:05:22 +0000 (18:05 -0800)]
osd/ReplicatedPG: mirror SnapSet info when promoting head

When we promote the head for an object, get the list of snaps from the
backend pool and construct an appropriate SnapSet.  Note that this is
always placed on the head in the cache pool, since we will have a
whiteout object in this case.

Also note that the SnapSet's list of snapids will not include any snaps
for which there were no clones.  This is fine, since it is only used for
creating clones, and we've already done that.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/osd_types: SnapSet::from_snap_set
Sage Weil [Fri, 27 Dec 2013 01:51:21 +0000 (17:51 -0800)]
osd/osd_types: SnapSet::from_snap_set

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: add PROMOTE log entry type
Sage Weil [Fri, 27 Dec 2013 01:31:01 +0000 (17:31 -0800)]
osd/ReplicatedPG: add PROMOTE log entry type

This is an alternative to MODIFY that indicates the object was just
promoted from another tier.  Thanksfully, is_modify() is used in very
few places!

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: adjust clone stats when promoting clones
Sage Weil [Fri, 27 Dec 2013 02:02:16 +0000 (18:02 -0800)]
osd/ReplicatedPG: adjust clone stats when promoting clones

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: include snaps in copy-get results
Sage Weil [Tue, 24 Dec 2013 16:50:38 +0000 (08:50 -0800)]
osd/ReplicatedPG: include snaps in copy-get results

When promoting a snapped object, we need to also get the set of snaps over
which the clone is defined.  This is not strictly available except via the
list-snaps rados call, but that is only used on the snapdir object much
earlier when the head (whiteout) is promoted, and is not conveniently
available now.  Adding it to the internal copy-get is not exposed via
librados (copy-get is not exposed at all) so I don't think this is a
problem.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: using missing_oid to decide which object to promote
Sage Weil [Tue, 24 Dec 2013 01:26:39 +0000 (17:26 -0800)]
osd/ReplicatedPG: using missing_oid to decide which object to promote

find_object_context() now tells us which object it could use if it
doesn't find it on disk.  Promote that one.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoosd/ReplicatedPG: make find_object_context() pass missing_oid
Sage Weil [Tue, 24 Dec 2013 01:25:07 +0000 (17:25 -0800)]
osd/ReplicatedPG: make find_object_context() pass missing_oid

Prevoiusly we would return a snapid that we are blocked on if it is
missing.  This is necessary because the missing clone does not always
match the logical snap we are trying to read.

Extend this to return a full hobject_t that is the missing object we want.
For the missing clone case, this cleans things up slightly.  More
importantly, it lets find_object_context also tell us which on-disk
object is missing that, if it could be promoted, would help.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agomon/PGMap: make decode version match encode version 1078/head
Sage Weil [Mon, 13 Jan 2014 23:51:41 +0000 (15:51 -0800)]
mon/PGMap: make decode version match encode version

These should have been bumped way back in 091809b8.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoceph-dencoder: include offset in 'stray data' error message
Sage Weil [Mon, 13 Jan 2014 23:50:51 +0000 (15:50 -0800)]
ceph-dencoder: include offset in 'stray data' error message

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agobuffer: do not append trailing newline when appending empty istream
Sage Weil [Mon, 13 Jan 2014 23:50:29 +0000 (15:50 -0800)]
buffer: do not append trailing newline when appending empty istream

If we call

 bl.append(some_istream);

do not include a \n if the istream is empty (which is apparently is not
the same thing as eof).  This was causing 'ceph pg getmap' to include a
trailing newline.

Probably we don't want this newline at all!  But all callers need to be
fixed for that change.

Signed-off-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #931 from ceph/wip-5858-rebase
athanatos [Mon, 13 Jan 2014 22:25:51 +0000 (14:25 -0800)]
Merge pull request #931 from ceph/wip-5858-rebase

Wip 5858 rebase

Reviewed-by: Samuel Just <sam.just@inktank.com>
11 years agov0.75 v0.75
Ken Dreyer [Mon, 13 Jan 2014 21:07:01 +0000 (21:07 +0000)]
v0.75

11 years agodoc: Added comment and example for SSL enablement in rgw.conf
John Wilkins [Mon, 13 Jan 2014 20:57:02 +0000 (12:57 -0800)]
doc: Added comment and example for SSL enablement in rgw.conf

Signed-off-by: John Wilkins <john.wilkins@inktank.com>
11 years agoosd: Implement multiple backfill target handling 931/head
David Zafman [Mon, 13 Jan 2014 19:38:48 +0000 (11:38 -0800)]
osd: Implement multiple backfill target handling

Fixes: #5858
Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoosd: Interim backfill changes
David Zafman [Thu, 21 Nov 2013 23:21:53 +0000 (15:21 -0800)]
osd: Interim backfill changes

Make peer_backfill_info a map which holds a
BackfillInterval for all backfill targets.
Initially see if recover_backfill() can just backfill
the first one and mark them all finished.

Signed-off-by: David Zafman <david.zafman@inktank.com>
11 years agoMerge pull request #1077 from ceph/wip-7141
Sage Weil [Mon, 13 Jan 2014 19:22:49 +0000 (11:22 -0800)]
Merge pull request #1077 from ceph/wip-7141

DBObjectMap::clear_keys_header: use generate_new_header, not _generate_n...

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoDBObjectMap::clear_keys_header: use generate_new_header, not _generate_new_header 1077/head
Samuel Just [Mon, 13 Jan 2014 19:02:45 +0000 (11:02 -0800)]
DBObjectMap::clear_keys_header: use generate_new_header, not _generate_new_header

We aren't holding the header_lock here, so we need the locked version.

Signed-off-by: Samuel Just <sam.just@inktank.com>
11 years agoerasure-code: use uintptr_t instead of long long 1076/head
Loic Dachary [Mon, 13 Jan 2014 17:16:09 +0000 (18:16 +0100)]
erasure-code: use uintptr_t instead of long long

Checking the pointer alignment using a cast to long long raises a
warning when --Wpointer-to-int-cast is given.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoMerge pull request #1075 from dachary/wip-crush
Sage Weil [Mon, 13 Jan 2014 16:46:04 +0000 (08:46 -0800)]
Merge pull request #1075 from dachary/wip-crush

improve crushtool --build useability and documentation

Reviewed-by: Sage Weil <sage@inktank.com>
11 years agoMerge pull request #1072 from ceph/wip-tier-snap
Gregory Farnum [Mon, 13 Jan 2014 16:33:52 +0000 (08:33 -0800)]
Merge pull request #1072 from ceph/wip-tier-snap

Reviewed-by: Greg Farnum <greg@inktank.com>
11 years agodoc: format man pages with s/2013/2014/ 1075/head
Loic Dachary [Sun, 12 Jan 2014 16:48:00 +0000 (17:48 +0100)]
doc: format man pages with s/2013/2014/

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agodoc: copyright s/2013/2014/
Loic Dachary [Sun, 12 Jan 2014 16:46:18 +0000 (17:46 +0100)]
doc: copyright s/2013/2014/

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agodoc: update the crushtool manual page
Loic Dachary [Sun, 12 Jan 2014 16:34:52 +0000 (17:34 +0100)]
doc: update the crushtool manual page

* add information about CEPH_ARGS
* rework the --build documentation and example
* add an Author section
* replace vi with emacs for no good reason
* cleanup whitespace

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agodoc: crushtool man page nroff format
Loic Dachary [Sun, 12 Jan 2014 16:42:43 +0000 (17:42 +0100)]
doc: crushtool man page nroff format

also includes a modification from a prior patch series that was not
formatted to nroff.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: tests for crushtool --build
Loic Dachary [Sun, 12 Jan 2014 16:51:10 +0000 (17:51 +0100)]
crush: tests for crushtool --build

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: crushtool copyright notice update
Loic Dachary [Sun, 12 Jan 2014 16:28:19 +0000 (17:28 +0100)]
crush: crushtool copyright notice update

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: crushtool emacs compile helper
Loic Dachary [Sun, 12 Jan 2014 16:27:26 +0000 (17:27 +0100)]
crush: crushtool emacs compile helper

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: crushtool --build informative messages
Loic Dachary [Sun, 12 Jan 2014 16:24:39 +0000 (17:24 +0100)]
crush: crushtool --build informative messages

* dump the crush tree created by --build at debug level 1.

* display a warning at debug level 1 if there is more than one root. In
  most cases it is not what the user wants and it may be confusing
  because the ruleset will only apply to the first of root and have less
  devices under it as expected.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: crushtool --build uses OSDMap helpers for rulesets
Loic Dachary [Sun, 12 Jan 2014 16:22:36 +0000 (17:22 +0100)]
crush: crushtool --build uses OSDMap helpers for rulesets

Instead of creating a ruleset from scratch, use the
OSDMap::build_simple_crush_rulesets helper. It is more likely to match
the user expecations.

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: print --build debug information when verbose 2
Loic Dachary [Sun, 12 Jan 2014 16:18:29 +0000 (17:18 +0100)]
crush: print --build debug information when verbose 2

instead of verbose 0

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: display args on crushtool failure
Loic Dachary [Sat, 11 Jan 2014 10:19:51 +0000 (11:19 +0100)]
crush: display args on crushtool failure

When the number of args provided to --build is not a multiple of 3,
display the arguments which do not comply.

For instance the --debug_crush 0 option is not consumed by global_init
in crushtool because, unlike most ceph tools, the arguments are not
passed to global_init. As a result --debug_crush 0 become part of the
arguments and triggers the failure.

   crushtool --debug_crush 0 --build --num_osds 320 node straw 4
   remaining args: [--debug_crush,0,node,straw,4]
   layers must be specified with 3-tuples of (name, buckettype, size)

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agocrush: parse CEPH_ARGS in crushtool
Loic Dachary [Sat, 11 Jan 2014 10:46:57 +0000 (11:46 +0100)]
crush: parse CEPH_ARGS in crushtool

The arguments are not given to global_init because the -c option would
conflict. Reading arguments from CEPH_ARGS the way other ceph tools do
is the only way to control verbosity ( via --debug_crush 0 for instance ).

Signed-off-by: Loic Dachary <loic@dachary.org>
11 years agoosd: factorize build_simple_crush_map* rulesets creation
Loic Dachary [Sun, 12 Jan 2014 12:50:07 +0000 (13:50 +0100)]
osd: factorize build_simple_crush_map* rulesets creation

Group the rulesets created by build_simple_crush_map* into a helper:
build_simple_crush_rulesets()

Signed-off-by: Loic Dachary <loic@dachary.org>