Sam Lang [Tue, 12 Feb 2013 17:32:29 +0000 (11:32 -0600)]
deb: Add ceph-coverage to ceph-test deb package
Teuthology uses the ceph-coverage script extensively
and expects it to be installed by the ceph task. Add
the script to the ceph-test debian package so that it
gets installed for that use case.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit
376cca2d4d4f548ce6b00b4fc2928d2e6d41038f)
(cherry picked from commit
b70e2c270b9eb3fce673b7e51b527ebf88214f14)
Gary Lowell [Wed, 7 Nov 2012 00:23:18 +0000 (16:23 -0800)]
packaging: Add ceph-test debian package
The ceph-test package includes optional test and benchmarking programs.
Conflicts:
debian/control
debian/rules
Yehuda Sadeh [Thu, 7 Mar 2013 03:32:21 +0000 (19:32 -0800)]
rgw: don't iterate through all objects when in namespace
Fixes: #4363
Backport: argonaut, bobtail
When listing objects in namespace don't iterate through all the
objects, only go though the ones that starts with the namespace
prefix
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit
6669e73fa50e3908ec825ee030c31a6dbede6ac0)
Samuel Just [Thu, 28 Feb 2013 00:58:45 +0000 (16:58 -0800)]
FileJournal::wrap_read_bl: adjust pos before returning
Otherwise, we may feed an offset past the end of the journal to
check_header in read_entry and incorrectly determine that the entry is
corrupt.
Fixes: 4296
Backport: bobtail
Backport: argonaut
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit
5d54ab154ca790688a6a1a2ad5f869c17a23980a)
Yehuda Sadeh [Thu, 7 Feb 2013 01:10:00 +0000 (17:10 -0800)]
rgw: a tool to fix clobbered bucket info in user's bucket list
This fixes bad entries in user's bucket list that may have occured
due to issue #4039. Syntax:
$ radosgw-admin user check --uid=<uid> [--fix]
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Yehuda Sadeh [Thu, 7 Feb 2013 00:43:48 +0000 (16:43 -0800)]
rgw: bucket recreation should not clobber bucket info
Fixes: #4039
User's list of buckets is getting modified even if bucket already
exists. This fix removes the newly created directory object, and
makes sure that user info's data points at the correct bucket.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
Yehuda Sadeh [Tue, 5 Feb 2013 22:59:51 +0000 (14:59 -0800)]
rgw: unlink multipart upload parts when completing upload
Fixes: #4011
When completing the multipart upload, we also need to unlink the
parts from the bucket index. Originally we used to remove the parts
however, nowadays the parts live on as we just point the object
manifest at them. So we don't remove the objects, however, we need
to remove them from the bucket index.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Tue, 5 Feb 2013 22:50:54 +0000 (14:50 -0800)]
rgw: a tool to fix buckets with leaked multipart references
Checks specified bucket for the #4011 symptoms, optionally fix
the issue.
sytax:
radosgw-admin bucket check --bucket=<bucket> [--fix]
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Tue, 5 Feb 2013 21:54:11 +0000 (13:54 -0800)]
rgw: radosgw-admin object unlink
Add a radosgw-admin option to remove object from bucket index
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Fri, 18 Jan 2013 21:02:54 +0000 (13:02 -0800)]
Merge remote-tracking branch 'gh/wip-scrub-argonaut' into argonaut
Samuel Just [Sun, 18 Nov 2012 02:18:23 +0000 (18:18 -0800)]
os/: Add CollectionIndex::prep_delete
If an unlink is interupted between removing the file
and updating the subdir attribute, the attribute will
overestimate the number of files in the directory. This
is by design, at worst we will merge the collection later
than intended, but closing the gap would require a second
subdir xattr update. However, this can in extreme cases
result in a collection with subdirectories but no objects.
FileStore::_destry_collection would therefore see an
erroneous -ENOTEMPTY.
prep_delete allows the CollectionIndex implementation to
clean up state prior to removal.
Signed-off-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit
fdc5e5d1877d7d7ed3851b9ec01f884559748249)
Conflicts:
src/os/HashIndex.cc
src/os/HashIndex.h
Yehuda Sadeh [Wed, 16 Jan 2013 23:01:47 +0000 (15:01 -0800)]
rgw: copy object should not copy source acls
Fixes: #3802
Backport: argonaut, bobtail
When using the S3 api and x-amz-metadata-directive is
set to COPY we used to copy complete metadata of source
object. However, this shouldn't include the source ACLs.
Conflicts:
src/rgw/rgw_rados.cc
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit
ccfefe3097a51b49885f2ed5d9334e85b497d963)
Sage Weil [Wed, 16 Jan 2013 03:27:13 +0000 (19:27 -0800)]
osd: send forced scrub/repair through scrub scheduling
This marks a PG for immediate scrub or repair. Adjust the sched_scrub()
code so that we handle these PGs even when should_schedule_scrub is
false (e.g., because the load is high). When we explicitly request a
scrub or repair, we then go through the normal scrub reservation process
to avoid unduly impacting cluster performance.
This is particularly helpful on argonaut, where the final scrub
finalization step blocks writes to the PG, and overlapping scrubs can
exacerbate the problem.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 11 Jan 2013 17:03:07 +0000 (09:03 -0800)]
osd: use helpers to queue a PG in the scrub LRU
Move the duplicated reach into info.history.last_scrub_stamp into a helper
so we can control when we queue the PG for scrub.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 10 Jan 2013 06:34:12 +0000 (22:34 -0800)]
osd/ReplicatedPG: validate ino when scrubbing snap collections
Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Thu, 10 Jan 2013 00:41:40 +0000 (16:41 -0800)]
ReplicatedPG: compare nlinks to snapcolls
nlinks gives us the number of hardlinks to the object.
nlinks should be 1 + snapcolls.size(). This will allow
us to detect links which remain in an erroneous snap
collection.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Wed, 9 Jan 2013 19:56:23 +0000 (11:56 -0800)]
ReplicatedPG/PG: check snap collections during _scan_list
During _scan_list check the snapcollections corresponding to the
object_info attr on the object. Report inconsistencies during
scrub_finalize.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 16 Aug 2012 18:38:46 +0000 (11:38 -0700)]
byteorder: fix gcc 4.7 warnings
./include/encoding.h: In function 'void encode(int64_t, ceph::bufferlist&, uint64_t)':
./include/encoding.h:101:1: warning: narrowing conversion of 'v' from 'int64_t {aka long int}' to '__le64 {aka long long unsigned int}' inside { } is ill-formed in C++11 [-Wnarrowing]
Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Wed, 9 Jan 2013 19:53:52 +0000 (11:53 -0800)]
osd_types: add nlink and snapcolls fields to ScrubMap::object
Signed-off-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Wed, 9 Jan 2013 19:56:16 +0000 (11:56 -0800)]
osd_types: bring ScrubMap::object up to the 0.56.1 encoding
We need to introduce some new fields here, so to maintain compatibility
we'll need to first bring the 48.* series up to the current encoding.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Sage Weil [Wed, 2 Jan 2013 17:39:26 +0000 (09:39 -0800)]
osd: make missing head non-fatal during scrub
If we encounter a scrub without a preceeding head, warn instead of
crashing. Note that this is still something we can't repair.
See #3705.
Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Thu, 10 Jan 2013 03:17:23 +0000 (19:17 -0800)]
ReplicatedPG: fix snapdir trimming
The previous logic was both complicated and not correct. Consequently,
we have been tending to drop snapcollection links in some cases. This
has resulted in clones incorrectly not being trimmed. This patch
replaces the logic with something less efficient but hopefully a bit
clearer.
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
0f42c37359d976d1fe90f2d3b877b9b0268adc0b)
Gary Lowell [Tue, 8 Jan 2013 05:08:08 +0000 (21:08 -0800)]
v0.48.3argonaut
Sage Weil [Mon, 7 Jan 2013 04:43:21 +0000 (20:43 -0800)]
osd: fix race in do_recovery()
Verify that the PG is still RECOVERING or BACKFILL when we take the pg
lock in the recovery thread. This prevents a crash from an invalid
state machine event when the recovery queue races with a PG state change
(e.g., due to peering).
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
Samuel Just [Sat, 5 Jan 2013 05:19:45 +0000 (21:19 -0800)]
ReplicatedPG: requeue waiting_for_ondisk in apply_and_flush_repops
Fixes: #3722
Signed-off-by: Samuel Just <sam.just@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Josh Durgin [Fri, 16 Nov 2012 00:20:33 +0000 (16:20 -0800)]
ObjectCacher: fix off-by-one error in split
This error left a completion that should have been attached
to the right BufferHead on the left BufferHead, which would
result in the completion never being called unless the buffers
were merged before it's original read completed. This would cause
a hang in any higher level waiting for a read to complete.
The existing loop went backwards (using a forward iterator),
but stopped when the iterator reached the beginning of the map,
or when a waiter belonged to the left BufferHead.
If the first list of waiters should have been moved to the right
BufferHead, it was skipped because at that point the iterator
was at the beginning of the map, which was the main condition
of the loop.
Restructure the waiters-moving loop to go forward in the map instead,
so it's harder to make an off-by-one error.
Possibly-fixes: #3286
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit
2e862f4d183d8b57b43b0777737886f18f68bf00)
Sage Weil [Fri, 4 Jan 2013 19:07:48 +0000 (11:07 -0800)]
config: change default log_max_recent to 10,000
Commit
c34e38bcdc0460219d19b21ca7a0554adf7f7f84 meant to do this but got
the wrong number of zeros.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Fri, 4 Jan 2013 01:15:07 +0000 (17:15 -0800)]
os/FileStore: fix non-btrfs op_seq commit order
The op_seq file is the starting point for journal replay. For stable btrfs
commit mode, which is using a snapshot as a reference, we should write this
file before we take the snap. We normally ignore current/ contents anyway.
On non-btrfs file systems, however, we should only write this file *after*
we do a full sync, and we should then fsync(2) it before we continue
(and potentially trim anything from the journal).
This fixes a serious bug that could cause data loss and corruption after
a power loss event. For a 'kill -9' or crash, however, there was little
risk, since the writes were still captured by the host's cache.
Fixes: #3721
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit
28d59d374b28629a230d36b93e60a8474c902aa5)
Sage Weil [Fri, 28 Dec 2012 21:07:18 +0000 (13:07 -0800)]
log: broadcast cond signals
We were using a single cond, and only signalling one waiter. That means
that if the flusher and several logging threads are waiting, and we hit
a limit, we the logger could signal another logger instead of the flusher,
and we could deadlock.
Similarly, if the flusher empties the queue, it might signal only a single
logger, and that logger could re-signal the flusher, and the other logger
could wait forever.
Intead, break the single cond into two: one for loggers, and one for the
flusher. Always signal the (one) flusher, and always broadcast to all
loggers.
Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit
813787af3dbb99e42f481af670c4bb0e254e4432)
Sage Weil [Wed, 2 Jan 2013 21:58:44 +0000 (13:58 -0800)]
log: fix locking typo/stupid for dump_recent()
We weren't locking m_flush_mutex properly, which in turn was leading to
racing threads calling dump_recent() and garbling the crash dump output.
Backport: bobtail, argonaut
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit
43cba617aa0247d714632bddf31b9271ef3a1b50)
Sage Weil [Fri, 28 Dec 2012 00:06:24 +0000 (16:06 -0800)]
init-ceph: fix status version check across machines
The local state isn't propagated into the backtick shell, resulting in
'unknown' for all remote daemons. Avoid backticks altogether.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
6c7b667badc5e7608b69c533a119a2afc062e257)
Travis Rhoden [Mon, 20 Aug 2012 20:29:11 +0000 (13:29 -0700)]
init-ceph: use SSH in "service ceph status -a" to get version
When running "service ceph status -a", a version number was never
returned for remote hosts, only for the local. This was because
the command to query the version number didn't use the do_cmd
function, which is responsible for running the command over SSH
when needed.
Modify the ceph init.d script to use do_cmd for querying the
Ceph version.
Signed-off-by: Travis Rhoden <trhoden@gmail.com>
(cherry picked from commit
60fdb6fda6233b01dae4ed8a34427d5960840b84)
Sage Weil [Wed, 28 Nov 2012 21:00:36 +0000 (13:00 -0800)]
log: 10,000 recent log entries
This is what we were (wrongly) doing before, so there are no memory
utilization surprises.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
78286b1403a5e0f14f95fe6b92f2fdb163e909f1)
Sage Weil [Wed, 28 Nov 2012 20:59:43 +0000 (12:59 -0800)]
log: fix log_max_recent config
<facepalm>
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
4de7748b72d4f90eb1197a70015c199c15203354)
Sage Weil [Thu, 20 Dec 2012 21:48:06 +0000 (13:48 -0800)]
log: fix flush/signal race
We need to signal the cond in the same interval where we hold the lock
*and* modify the queue. Otherwise, we can have a race like:
queue has 1 item, max is 1.
A: enter submit_entry, signal cond, wait on condition
B: enter submit_entry, signal cond, wait on condition
C: flush wakes up, flushes 1 previous item
A: retakes lock, enqueues something, exits
B: retakes lock, condition fails, waits
-> C is never woken up as there are 2 items waiting
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Dan Mick <dan.mick@inktank.com>
(cherry picked from commit
50914e7a429acddb981bc3344f51a793280704e6)
Gary Lowell [Sat, 22 Dec 2012 01:12:07 +0000 (17:12 -0800)]
.gitignore: Add ar-lib to ignore list
Gary Lowell [Sat, 22 Dec 2012 00:55:27 +0000 (16:55 -0800)]
autogen.sh: Create m4 directory for leveldb
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Gary Lowell [Sat, 22 Dec 2012 00:17:33 +0000 (16:17 -0800)]
leveldb: Update submodule
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Gary Lowell [Fri, 21 Dec 2012 00:49:32 +0000 (16:49 -0800)]
ceph.spec.in: Fedora builds debuginfo by default.
Signed-off-by: Gary Lowell <gary.lowell@inktank.com>
Yehuda Sadeh [Thu, 20 Dec 2012 01:07:18 +0000 (17:07 -0800)]
rgw: fix error handling with swift
Fixes: #3649
verify_swift_token returns a bool and not an int.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sam Lang [Mon, 24 Sep 2012 16:55:25 +0000 (09:55 -0700)]
client: Fix for #3184 cfuse segv with no keyring
Fixes bug #3184 where the ceph-fuse client segfaults if authx is
enabled but no keyring file is present. This was due to the
client->init() return value not getting checked.
Signed-off-by: Sam Lang <sam.lang@inktank.com>
(cherry picked from commit
47983df4cbd31f299eef896b4612d3837bd7c7bd)
Joao Eduardo Luis [Tue, 9 Oct 2012 20:25:54 +0000 (21:25 +0100)]
mon: Monitor: resolve keyring option to a file before loading keyring
Otherwise our keyring default location, or any other similarly formatted
location, will be taken as the actual location for the keyring and fail.
Reported-by: tziOm (at) #ceph
Fixes: 3276
Backport: argonaut
Signed-off-by: Joao Eduardo Luis <joao.luis@inktank.com>
(cherry picked from commit
7ef0df25e001bfae303feb3ae36514608767b1f2)
Gary Lowell [Thu, 6 Dec 2012 03:39:11 +0000 (19:39 -0800)]
.gitignore: Add m4 macro directories to ignore list
Gary Lowell [Thu, 8 Nov 2012 20:43:24 +0000 (12:43 -0800)]
build: Add RPM release string generated from git describe.
Fix for bug 3451. Use the commit count and sha1 from git describe to
construct a release string for rpm packages.
Conflicts:
configure.ac
Gary Lowell [Fri, 9 Nov 2012 21:28:13 +0000 (13:28 -0800)]
ceph.spec.in: Build debuginfo subpackage.
This is a partial fix for bug 3471. Enable building of debuginfo package.
Some distributions enable this automatically by installing additional rpm
macros, on others it needs to be explicity added to the spec file.
Yehuda Sadeh [Mon, 3 Dec 2012 22:32:28 +0000 (14:32 -0800)]
rgw: fix swift auth concurrency issue
Fixes: #3565
Originally ops were using static structures, but that
has since changed. Switching swift auth handler to do
the same.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Thu, 29 Nov 2012 21:39:22 +0000 (13:39 -0800)]
rgw: fix rgw_tools get_obj()
The original implementation broke whenever data exceeded
the chunk size. Also don't keep cache for objects that
exceed the chunk size as cache is not designed for
it. Increased chunk size to 512k.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Thu, 29 Nov 2012 20:47:59 +0000 (12:47 -0800)]
rgw: fix PUT acls
This fixes a regression introduced at
17e4c0df44781f5ff1d74f3800722452b6a0fc58. The original
patch fixed error leak, however it also removed the
operation's send_response() call.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Tue, 20 Nov 2012 01:10:11 +0000 (17:10 -0800)]
rgw: fix xml parser leak
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit
f86522cdfcd81b2d28c581ac8b8de6226bc8d1a4)
Yehuda Sadeh [Tue, 20 Nov 2012 00:52:38 +0000 (16:52 -0800)]
rgw: fix memory leaks
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
(cherry picked from commit
98a04d76ebffa61c3ba4b033cdd57ac57b2f29f3)
Conflicts:
src/rgw/rgw_op.cc
src/rgw/rgw_op.h
Yehuda Sadeh [Wed, 7 Nov 2012 21:21:15 +0000 (13:21 -0800)]
rgw: don't convert object mtime to UTC
Fixes: #3452
When we read object info, don't try to convert mtime to
UTC, it's already in UTC.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Wed, 14 Nov 2012 19:30:34 +0000 (11:30 -0800)]
rgw: relax date format check
Don't try to parse beyond the GMT or UTC. Some clients use
special date formatting. If we end up misparsing the date
it'll fail in the authorization, so don't need to be too
restrictive.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Tue, 30 Oct 2012 21:17:56 +0000 (14:17 -0700)]
ceph-disk-activate: avoid duplicating mounts if already activated
If the given device is already mounted at the target location, do not
mount --move it again and create a bunch of dup entries in the /etc/mtab
and kernel mount table.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
c435d314caeb5424c1f4482ad02f8a085317ad5b)
Sage Weil [Fri, 26 Oct 2012 04:21:18 +0000 (21:21 -0700)]
ceph-disk-prepare: poke kernel into refreshing partition tables
Prod the kernel to refresh the partition table after we create one. The
partprobe program is packaged with parted, which we already use, so this
introduces no new dependency.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
402e1f5319a52c309eca936081fddede1f107268)
Sage Weil [Fri, 26 Oct 2012 04:20:21 +0000 (21:20 -0700)]
ceph-disk-prepare: fix journal partition creation
The end value needs to have + to indicate it is relative to wherever the
start is.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
2e32a0ee2d9e2a3bf5b138f50efc5fba8d5b8660)
Sage Weil [Fri, 26 Oct 2012 01:14:47 +0000 (18:14 -0700)]
ceph-disk-prepare: assume parted failure means no partition table
If the disk has no valid label we get an error like
Error: /dev/sdi: unrecognised disk label
Assume any error we get is that and go with an id label of 1.
Signed-off-by: Sage Weil <sage@inktank.com>
(cherry picked from commit
8921fc7c7bc28fb98334c06f1f0c10af58085085)
Sage Weil [Mon, 12 Nov 2012 19:24:00 +0000 (11:24 -0800)]
Merge remote-tracking branch 'gh/wip-mds-stable' into stable
Sage Weil [Fri, 9 Nov 2012 13:28:12 +0000 (05:28 -0800)]
mds: re-try_set_loner() after doing evals in eval(CInode*, int mask)
Consider a case where current loner is A and wanted loner is B.
At the top of the function we try to set the loner, but that may fail
because we haven't processed the gathered caps yet for the previous
loner. In the body we do that and potentially drop the old loner, but we
do not try_set_loner() again on the desired loner.
Try after our drop. If it succeeds, loop through the eval's one more time
so that we can issue caps approriately.
This fixes a hang induced by a simple loop like:
while true ; do echo asdf >> mnt.a/foo ; tail mnt.b/foo ; done &
while true ; do ls mnt.a mnt.b ; done
(The second loop may not be necessary.)
Signed-off-by: Sage Weil <sage@inktank.com>
Samuel Just [Fri, 13 Jul 2012 21:23:27 +0000 (14:23 -0700)]
CompatSet: users pass bit indices rather than masks
CompatSet users number the Feature objects rather than
providing masks. Thus, we should do
mask |= (1 << f.id) rather than mask |= f.id.
In order to detect old, broken encodings, the lowest
bit will be set in memory but not set in the encoding.
We can reconstruct the correct mask from the names map.
This bug can cause an incompat bit to not be detected
since 1|2 == 1|2|3.
fixes: #2748
Signed-off-by: Samuel Just <sam.just@inktank.com>
Gary Lowell [Wed, 7 Nov 2012 20:41:10 +0000 (12:41 -0800)]
ceph.spec.in: Remove ceph version requirement from ceph-fuse package.
The ceph-fuse rpm package now only requires ceph as a pre-req, not a specific
version.
Yehuda Sadeh [Wed, 24 Oct 2012 20:15:46 +0000 (13:15 -0700)]
rgw: fix multipart overwrite
Fixes: #3400
Removed a few lines of code that prematurely created the head
part of the final object (before creating the manifest).
backport:argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Tue, 6 Nov 2012 07:27:13 +0000 (23:27 -0800)]
mds: move to from loner -> mix if *anyone* wants rd|wr
We were either going to MIX or SYNC depending on whether non-loners wanted
to read/write, but it may be that the loner wants to if our logic for
choosing loner vs not longer is based on anything other that just rd|wr
wanted.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 6 Nov 2012 07:26:09 +0000 (23:26 -0800)]
mds: base loner decision on wanted RD|WR|EXCL, not CACHE|BUFFER
Observed instance where one client wanted the Fc cap and prevented the
loner from getting RD|WR caps.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 30 Oct 2012 16:00:11 +0000 (09:00 -0700)]
osd: make pool_snap_info_t encoding backward compatible
Way back in
fc869dee1e8a1c90c93cb7e678563772fb1c51fb (v0.42) when we redid
the osd type encoding we forgot to make this conditionally encode the old
format for old clients. In particular, this means that kernel clients
will fail to decode the osdmap if there is a rados pool with a pool-level
snapshot defined.
Fixes: #3290
Signed-off-by: Sage Weil <sage@inktank.com>
Conflicts:
src/test/encoding/types.h
Yan, Zheng [Fri, 7 Sep 2012 05:49:27 +0000 (13:49 +0800)]
osd/OSD.cc: Fix typo in OSD::heartbeat_check()
The check 'p->second.last_tx > cutoff' should always be false
since last_tx is periodically updated by OSD::heartbeat()
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Mon, 22 Oct 2012 23:52:11 +0000 (16:52 -0700)]
rgw: dump an error message if FCGX_Accept fails
Adding missing debug info.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Mon, 22 Oct 2012 22:38:30 +0000 (15:38 -0700)]
workqueue: make debug output include active threads
Include active thread count in threadpool debug output.
Signed-off-by: Sage Weil <sage@inktank.com>
Yehuda Sadeh [Mon, 22 Oct 2012 20:16:59 +0000 (13:16 -0700)]
rgw: don't continue processing of GET request on error
Fixes #3381
We continued processing requests long after the client
has died. This fix appliese to both s3 and swift.
backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Fri, 19 Oct 2012 15:46:19 +0000 (08:46 -0700)]
osd: be quiet about watches
Useless log noise.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Thu, 18 Oct 2012 00:44:12 +0000 (17:44 -0700)]
addr_parsing: make , and ; and ' ' all delimiters
Instead of just ,. Currently "foo.com, bar.com" will fail because of the
space after the comma. This patches fixes that, and makes all delim
chars interchangeable.
Signed-off-by: Sage Weil <sage@inktank.com>
Tommi Virtanen [Fri, 5 Oct 2012 17:57:42 +0000 (10:57 -0700)]
ceph-disk-prepare, debian/control: Support external journals.
Previously, ceph-disk-* would only let you use a journal that was a
file inside the OSD data directory. With this, you can do:
ceph-disk-prepare /dev/sdb /dev/sdb
to put the journal as a second partition on the same disk as the OSD
data (might save some file system overhead), or, more interestingly:
ceph-disk-prepare /dev/sdb /dev/sdc
which makes it create a new partition on /dev/sdc to use as the
journal. Size of the partition is decided by $osd_journal_size.
/dev/sdc must be a GPT-format disk. Multiple OSDs may share the same
journal disk (using separate partitions); this way, a single fast SSD
can serve as journal for multiple spinning disks.
The second use case currently requires parted, so a Recommends: for
parted has been added to Debian packaging.
Closes: #3078
Closes: #3079
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Yehuda Sadeh [Mon, 15 Oct 2012 16:43:47 +0000 (09:43 -0700)]
rgw: don't add port to url if already has one
Fixes: #3296
Specifically, is host name string already has ':', then
don't try to append theport (swift auth).
backport: argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Mon, 15 Oct 2012 23:37:05 +0000 (16:37 -0700)]
admin_socket: fix '0' protocol version
Broken by
895e24d198ced83ab7fed3725f12f75e3bc97b0b.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 9 Oct 2012 00:14:22 +0000 (17:14 -0700)]
mon: drop command replies on paxos reset
If paxos resets, do not send the reply for the commit we were waiting for;
let the command be reprocessed and re-proposed.
Among other things, this could lead to nondeterministic results for
'ceph osd create <uuid>'.
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Tue, 9 Oct 2012 04:02:51 +0000 (21:02 -0700)]
Merge remote-tracking branch 'gh/for-stable-fstypes-and-ext-journal' into stable
Tommi Virtanen [Thu, 2 Aug 2012 20:02:04 +0000 (13:02 -0700)]
ceph-authtool: Fix usage, it's --print-key not --print.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Fri, 5 Oct 2012 16:22:34 +0000 (09:22 -0700)]
upstart: OSD journal can be a symlink; if it's dangling, don't start.
This lets a $osd_data/journal symlink point to
/dev/disk/by-partuuid/UUID and the osd will not attempt to start until
that disk is available.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Sage Weil [Fri, 5 Oct 2012 16:10:31 +0000 (09:10 -0700)]
osd: Make --get-journal-fsid not really start the osd.
This way, it won't need -i ID and it won't access the osd_data_dir.
That makes it useful for locating the right osd to use with an
external journal partition.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Fri, 5 Oct 2012 16:08:56 +0000 (09:08 -0700)]
osd: Make --get-journal-fsid not attempt aio or direct_io.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Thu, 4 Oct 2012 23:03:40 +0000 (16:03 -0700)]
ceph-disk-prepare: Use the OSD uuid as the partition GUID.
This will make locating the right data partition for a given journal
partition a lot easier.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Wed, 3 Oct 2012 19:38:38 +0000 (12:38 -0700)]
debian/control, ceph-disk-prepare: Depend on xfsprogs, use xfs by default.
Ext4 as a default is a bad choice, as we don't perform enough QA with
it. To use XFS as the default for ceph-disk-prepare, we need to depend
on xfsprogs.
btrfs-tools is already recommended, so no change there. If you set
osd_fs_type=btrfs, and don't have the package installed, you'll just
get an error message.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Wed, 3 Oct 2012 17:13:17 +0000 (10:13 -0700)]
ceph-disk-{prepare,activate}: Default mkfs arguments and mount options.
The values for the settings were copied from teuthology task "ceph".
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Wed, 3 Oct 2012 15:47:20 +0000 (08:47 -0700)]
ceph-disk-prepare: Avoid triggering activate before prepare is done.
Earlier testing never saw this, but now a mount of a disk triggers a
udev blockdev-added event, causing ceph-disk-activate to run even
before ceph-disk-prepare has had a chance to write the files and
unmount the disk.
Avoid this by using a temporary partition type uuid ("ceph 2 be"), and
only setting it to the permanent ("ceph osd"). The hotplug event won't
match the type uuid, and thus won't trigger ceph-disk-activate.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Wed, 3 Oct 2012 00:06:11 +0000 (17:06 -0700)]
ceph-disk-activate: Add a comment about user_xattr being default now.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Tue, 2 Oct 2012 23:53:35 +0000 (16:53 -0700)]
ceph-disk-activate: Use mount options from ceph.conf
Always uses default cluster name ("ceph") for now, see
http://tracker.newdream.net/issues/3253
Closes: #2548
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Tue, 2 Oct 2012 23:43:08 +0000 (16:43 -0700)]
ceph-disk-activate: Refactor to extract detect_fstype call.
This allows us to use the fstype for a config lookup.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Tue, 2 Oct 2012 23:37:07 +0000 (16:37 -0700)]
ceph-disk-activate: Unmount on errors (if it did the mount).
This cleans up the error handling to not leave disks mounted
in /var/lib/ceph/tmp/mnt.* when something fails, e.g. when
the ceph command line tool can't talk to mons.
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Tue, 2 Oct 2012 23:23:55 +0000 (16:23 -0700)]
ceph-disk-prepare: Allow setting mkfs arguments and mount options in ceph.conf
Tested with meaningless but easy-to-verify values:
[global]
osd_fs_type = xfs
osd_fs_mkfs_arguments_xfs = -i size=512
osd_fs_mount_options_xfs = noikeep
ceph-disk-activate does not respect the mount options yet.
Closes: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Tommi Virtanen [Tue, 2 Oct 2012 23:04:15 +0000 (16:04 -0700)]
ceph-disk-prepare: Allow specifying fs type to use.
Either use ceph.conf variable osd_fs_type or command line option
--fs-type=
Default is still ext4, as currently nothing guarantees xfsprogs
or btrfs-tools are installed.
Currently both btrfs and xfs seems to trigger a disk hotplug event at
mount time, thus triggering a useless and unwanted ceph-disk-activate
run. This will be worked around in a later commit.
Currently mkfs and mount options cannot be configured.
Bug: #2549
Signed-off-by: Tommi Virtanen <tv@inktank.com>
Yehuda Sadeh [Wed, 26 Sep 2012 22:43:56 +0000 (15:43 -0700)]
rgw: copy_object should not override ETAG implicitly
When copying an object with new attrs, we still need to
maintain the ETAG.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Tue, 25 Sep 2012 01:10:24 +0000 (18:10 -0700)]
rgw: url_decode should allocate extra byte for dest
Was missing extra byte for null termination
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Tue, 11 Sep 2012 20:04:50 +0000 (13:04 -0700)]
v0.48.2argonaut
Yehuda Sadeh [Tue, 18 Sep 2012 20:45:27 +0000 (13:45 -0700)]
cls_rgw: if stats drop below zero, set them to zero
This complements fix for #3127. This is only a band aid
solution for argonaut, the real solution fixes the original
issue that made this possible.
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Yehuda Sadeh [Wed, 12 Sep 2012 23:41:17 +0000 (16:41 -0700)]
cls_rgw: change scoping of suggested changes vars
Fixes: #3127
Bad variable scoping made it so that specific variables
weren't initialized between suggested changes iterations.
This specifically affected a case where in a specific
change we had an updated followed by a remove, and the
remove was on a non-existent key (e.g., was already
removed earlier). We ended up re-substracting the
object stats, as the entry wasn't reset between
the iterations (and we didn't read it because the
key didn't exist).
backport:argonaut
Signed-off-by: Yehuda Sadeh <yehuda@inktank.com>
Sage Weil [Tue, 4 Sep 2012 18:29:21 +0000 (11:29 -0700)]
objecter: fix osdmap wait
When we get a pool_op_reply, we find out which osdmap we need to wait for.
The wait_for_new_map() code was feeding that epoch into
maybe_request_map(), which was feeding it to the monitor with the subscribe
request. However, that epoch is the *start* epoch, not what we want. Fix
this code to always subscribe to what we have (+1), and ensure we keep
asking for more until we catch up to what we know we should eventually
get.
Bug: #3075
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
(cherry picked from commit
e09b26555c6132ffce08b565780a39e4177cbc1c)
Sage Weil [Mon, 27 Aug 2012 14:38:34 +0000 (07:38 -0700)]
objecter: send queued requests when we get first osdmap
If we get our first osdmap and already have requests queued, send them.
Backported from
8d1efd1b829ae50eab7f7f4c07da04e03fce7c45.
Fixes: #3050
Signed-off-by: Sage Weil <sage@inktank.com>
Sage Weil [Wed, 22 Aug 2012 04:12:33 +0000 (21:12 -0700)]
objecter: use ordered map<> for tracking tids to preserve order on resend
We are using a hash_map<> to map tids to Op*'s. In handle_osd_map(),
we will recalc_op_target() on each Op in a random (hash) order. These
will get put in a temp map<tid,Op*> to ensure they are resent in the
correct order, but their order on the session->ops list will be random.
Then later, if we reset an OSD connection, we will resend everything for
that session in ops order, which is be incorrect.
Fix this by explicitly reordering the requests to resend in
kick_requests(), much like we do in handle_osd_map(). This lets us
continue to use a hash_map<>, which is faster for reasonable numbers of
requests. A simpler but slower fix would be to just use map<> instead.
This is one of many bugs contributing to #2947.
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Samuel Just <sam.just@inktank.com>
(cherry picked from commit
1113a6c56739a56871f01fa13da881dab36a32c4)
Dan Mick [Mon, 20 Aug 2012 22:02:57 +0000 (15:02 -0700)]
rbd: force all exiting paths through main()/return
This properly destroys objects. In the process, remove usage_exit();
also kill error-handling in set_conf_param (never relevant for rbd.cc,
and if you call it with both pointers NULL, well...)
Also switch to EXIT_FAILURE for consistency.
Backported from
fed8aea662bf919f35a5a72e4e2a2a685af2b2ed.
Signed-off-by: Dan Mick <dan.mick@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Fixes: #2948
Josh Durgin [Tue, 18 Sep 2012 16:37:44 +0000 (09:37 -0700)]
rbd: only open the destination pool for import
Otherwise importing into another pool when the default pool, rbd,
doesn't exist results in an error trying to open the rbd pool.
Reported-by: Sébastien Han <han.sebastien@gmail.com>
Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Tommi Virtanen [Mon, 17 Sep 2012 15:55:14 +0000 (08:55 -0700)]
ceph-disk-activate, upstart: Use "initctl emit" to start OSDs.
This avoids an error if the daemon was running already, and is
already being done with the other services.
Signed-off-by: Tommi Virtanen <tv@inktank.com>